wikipedia:Attention is all you need
- Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task
- Transformer, GPT, Transformer 8, Ethched
- Artificial neural networks, Neuronal network (NN), CNN, Micrograd, NPU, ConvNet, AlexNet, GoogLeNet, Apache MXNet, Neural architecture search, DAG, Feedforward neural network, NeurIPS, Feature Pyramid Network, TPU, NPU, Apple Neural Engine (ANE), LLM, TFLOPS