Difference between revisions of "Transformer (machine learning model)"

Latest revision as of 15:23, 9 April 2023

GPT-3: the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.

@@ Line 1: / Line 1: @@
 [[wikipedia:Transformer (machine learning model)]]
+* [[GPT-3]]: the architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store.
+== Related ==
 * [[Attention is all you need (2017)]]
-* [[GPT]]: [[GPT-3]]
+* [[GPT]]: [[GPT-4]], [[GPT-3]]
+* [[Diffusion]]
 == See also ==