Difference between revisions of "GPT-3"
Jump to navigation
Jump to search
Line 3: | Line 3: | ||
[[wikipedia:Generative Pre-trained Transformer 3]] | [[wikipedia:Generative Pre-trained Transformer 3]] | ||
+ | The architecture is a decoder-only transformer network with a 2048-token-long context and then-unprecedented size of 175 billion [[parameters]], requiring 800GB to store. | ||
Revision as of 17:50, 8 April 2023
wikipedia:Generative Pre-trained Transformer 3
The architecture is a decoder-only transformer network with a 2048-token-long context and then-unprecedented size of 175 billion parameters, requiring 800GB to store.
See also
Advertising: