Difference between revisions of "Generative Pre-trained Transformer"
Jump to navigation
Jump to search
(9 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[wikipedia:Generative Pre-trained Transformer]] | [[wikipedia:Generative Pre-trained Transformer]] | ||
+ | |||
+ | * [[GPT-4]] (Mar 2023) | ||
+ | * [[GPT-3]] (Jun 2020, beta) the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store. | ||
+ | * [[GPT-2]] (Feb 2019) | ||
+ | |||
+ | |||
* [[Improving Language Understanding by Generative Pre-Training]] | * [[Improving Language Understanding by Generative Pre-Training]] | ||
+ | * [[Attention is all you need (2017)]] | ||
+ | |||
+ | == Related == | ||
+ | * [[/usr/lib/systemd/system-generators/systemd-gpt-auto-generator]] | ||
+ | * [[GUID]] | ||
+ | |||
+ | == See also == | ||
+ | * {{Transformer}} | ||
+ | * {{GPT}} | ||
− | + | [[Category:GPT]] |
Latest revision as of 15:20, 24 August 2023
wikipedia:Generative Pre-trained Transformer
- GPT-4 (Mar 2023)
- GPT-3 (Jun 2020, beta) the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.
- GPT-2 (Feb 2019)
Related[edit]
See also[edit]
- Transformer, GPT, Transformer 8, Ethched, Megatron-Core
- GPT, GPT-2, GPT-3, GPT-4, GPT-4o, Tiktoken, Bigram, Transformer, PaLM, ChatGPT
Advertising: