Difference between revisions of "Transformer (machine learning model)"
Jump to navigation
Jump to search
(11 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[wikipedia:Transformer (machine learning model)]] | [[wikipedia:Transformer (machine learning model)]] | ||
+ | |||
+ | * [[GPT-3]]: the architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store. | ||
+ | |||
+ | |||
+ | == Related == | ||
+ | * [[Attention is all you need (2017)]] | ||
+ | * [[GPT]]: [[GPT-4]], [[GPT-3]] | ||
+ | * [[Diffusion]] | ||
== See also == | == See also == | ||
+ | * {{Transformer}} | ||
* {{GPT}} | * {{GPT}} | ||
− | * {{ | + | * {{ML}} |
* {{OpenAI}} | * {{OpenAI}} | ||
+ | |||
+ | [[Category:ML]] | ||
+ | [[Category:AI]] |
Latest revision as of 15:23, 9 April 2023
wikipedia:Transformer (machine learning model)
- GPT-3: the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.
Related[edit]
See also[edit]
- Transformer, GPT, Transformer 8, Ethched, Megatron-Core
- GPT, GPT-2, GPT-3, GPT-4, GPT-4o, Tiktoken, Bigram, Transformer, PaLM, ChatGPT
- Machine learning, Deep learning, AWS Sagemaker, PyTorch, Kubeflow, TensorFlow, Keras, Torch, Spark ML, Tinygrad, Apple Neural Engine, Scikit-learn, MNIST, MLOps, AutoML, ClearML, PostgresML, AWS Batch, Transformer, Diffusion, Backpropagation, JAX, Vector database, LLM, The Forrester Wave: AI/ML Platforms
- OpenAI, GitHub Copilot, ChatGPT, OpenAI Codex, GPT-3, GPT-4, Whisper, Sam Altman, Mira Murati, Greg Brockman, Ilya Sutskever, OpenAI board, John Schulman
Advertising: