Difference between revisions of "Transformer (machine learning model)"
Jump to navigation
Jump to search
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
[[wikipedia:Transformer (machine learning model)]] | [[wikipedia:Transformer (machine learning model)]] | ||
+ | * [[GPT-3]]: the architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store. | ||
Line 6: | Line 7: | ||
* [[Attention is all you need (2017)]] | * [[Attention is all you need (2017)]] | ||
* [[GPT]]: [[GPT-4]], [[GPT-3]] | * [[GPT]]: [[GPT-4]], [[GPT-3]] | ||
− | + | * [[Diffusion]] | |
== See also == | == See also == |
Latest revision as of 15:23, 9 April 2023
wikipedia:Transformer (machine learning model)
- GPT-3: the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.
Related[edit]
See also[edit]
- Transformer, GPT, Transformer 8, Ethched, Megatron-Core
- GPT, GPT-2, GPT-3, GPT-4, GPT-4o, Tiktoken, Bigram, Transformer, PaLM, ChatGPT
- Machine learning, Deep learning, AWS Sagemaker, PyTorch, Kubeflow, TensorFlow, Keras, Torch, Spark ML, Tinygrad, Apple Neural Engine, Scikit-learn, MNIST, MLOps, AutoML, ClearML, PostgresML, AWS Batch, Transformer, Diffusion, Backpropagation, JAX, Vector database, LLM, The Forrester Wave: AI/ML Platforms
- OpenAI, GitHub Copilot, ChatGPT, OpenAI Codex, GPT-3, GPT-4, Whisper, Sam Altman, Mira Murati, Greg Brockman, Ilya Sutskever, OpenAI board, John Schulman
Advertising: