语言模型

因为大语言模型是一种“语言模型”，所以，我们首先理解语言模型。

语言模型是“预测语言中下一个 Token”的模型。这个 Token 可以是一个字母、一些字母的组合、一个字、一个词、或者一段代码。大语言模型也是一个语言模型：它在一个非常大的文本上训练。训练之后，给它输入一些文字，它就会接着往下写，比如写诗、列提纲。

因此，语言模型只是一个“预测”模型；它不断预测后面会出现什么文字。所以，它的“预测”并不一定是“事实”。

课程材料

语言模型是一个经典的 NLP 问题，几乎每个 NLP 的课程都会重点介绍它，所以，它的课程材料非常丰富，比如：

斯坦福 CS224n 课程中的 LM 部分。这里是其中一期的视频：N-Gram LM，神经元 LM, lecture notes
Mira Murati (CTO of OpenAI), Cristóbal Valenzuela (Creator of Runway) 给高中生讲解 LM 原理及其在 Chatbot 和 LLM 中的应用的视频：How Chatbots and Large Language Models Work, Code.org, Youtube 视频
DeepMind 和 Raspberry Pi 面向高中生的 Experience AI 中的Large Language Models (LLMs) 课程
斯坦福 CS324 2022 年 LLM 课程的 Introduction 章节
Yandex NLP 课程的 Language Modeling 讲座，PPT，课本网页

Karpathy，Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy，代码

Exploring the Limits of Language Modeling. R. Józefowicz, Oriol Vinyals, M. Schuster, Noam M. Shazeer, Yonghui Wu. 2016.
On the Opportunities and Risks of Foundation Models. Rishi Bommasani, et. al. 2021.
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell. FAccT 2021.
Ethical and social risks of harm from Language Models. Laura Weidinger, et al. 2021.
Revisiting Simple Neural Probabilistic Language Models
Prediction and Entropy of Printed English (the foundational paper by Shannon on language compression and uncertainty)
Blog post explaining perplexity