Difference between revisions of "Transformer (machine learning model)"

Latest revision as of 15:23, 9 April 2023

GPT-3: the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.

Revision as of 11:12, 9 April 2023 (edit) Welcome (talk \| contribs) (→‎Related) Tags: Mobile web edit, Mobile edit ← Older edit		Latest revision as of 15:23, 9 April 2023 (edit) (undo) Welcome (talk \| contribs)
Line 1:		Line 1:
	[[wikipedia:Transformer (machine learning model)]]		[[wikipedia:Transformer (machine learning model)]]

		+	* [[GPT-3]]: the architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store.