Difference between revisions of "Generative Pre-trained Transformer"

Revision as of 15:19, 24 August 2023

GPT-4 (Mar 2023)
GPT-3 (Jun 2020, beta) the architecture is a decoder-only transformer network with a 2048-token-long context and 175 billion parameters, requiring 800GB to store.
GPT-2 (Feb 2019)

@@ Line 10: / Line 10: @@
 * [[Improving Language Understanding by Generative Pre-Training]]
 * [[Attention is all you need (2017)]]
+== Related ==
+* [[/usr/lib/systemd/system-generators/systemd-gpt-auto-generator]]
 == See also ==