Difference between revisions of "GPT-3"

Latest revision as of 15:22, 9 April 2023

The architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.

@@ Line 1: / Line 1: @@
-[[wikipedia:GPT-3]]
+[[wikipedia:GPT-3]] (Jun 2020)
 [[wikipedia:Generative Pre-trained Transformer 3]]
-The architecture is a decoder-only transformer network with a 2048-token-long context and then-unprecedented size of 175 billion [[parameters]], requiring 800GB to store.
+The architecture is a decoder-only [[transformer network]] with a 2048-token-long context and 175 billion [[parameters]], requiring 800GB to store.