Difference between revisions of "Transformer (machine learning model)"

Latest revision as of 15:23, 9 April 2023

GPT-3: the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.

@@ Line 1: / Line 1: @@
+[[wikipedia:Transformer (machine learning model)]]
+* [[GPT-3]]: the architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store.
+== Related ==
+* [[Attention is all you need (2017)]]
+* [[GPT]]: [[GPT-4]], [[GPT-3]]
+* [[Diffusion]]
 == See also ==
+* {{Transformer}}
 * {{GPT}}
-* {{NLP}}
+* {{ML}}
 * {{OpenAI}}
+[[Category:ML]]
+[[Category:AI]]