Difference between revisions of "GPT-3"

Latest revision as of 15:22, 9 April 2023

The architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.

Revision as of 17:54, 8 April 2023 (edit) Welcome (talk \| contribs) ← Older edit		Latest revision as of 15:22, 9 April 2023 (edit) (undo) Welcome (talk \| contribs)
Line 3:		Line 3:
	[[wikipedia:Generative Pre-trained Transformer 3]]		[[wikipedia:Generative Pre-trained Transformer 3]]

−	The architecture is a decoder-only [[transformer network]] with a 2048-token-long context and ~~then-unprecedented size of~~ 175 billion [[parameters]], requiring 800GB to store.	+	The architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store.