GPT-3
Jump to navigation
Jump to search
wikipedia:Generative Pre-trained Transformer 3
The architecture is a decoder-only transformer network with a 2048-token-long context and then-unprecedented size of 175 billion parameters, requiring 800GB to store.
See also
Advertising: