信任
信任,包括以下方面:1)要有记忆,不能忘事,说到做到;2)说实话,说真话,不骗人,不根据虚假的前提进行逻辑推理;3)保密,不会泄露秘密、隐私、训练数据;4)能够被检测出来,别被用来“代考”;5)行为得当。
显然,信任是一个老师、系统维护人员、医生首先需要和自己服务的对象建立的。为什么呢?有两个原因:1)只有信任,才能让他们敞开心扉,说出自己的情况。而这些情况,是“诊断”的前提;2)只有信任,才能让他们心悦诚服,坚持配合治疗。比如,一个老师,只有得到了学生的信任,学生才能真正展开自己,开始发自内心地学习;一个医生,只有得到了病人的信任,才能得到病人的配合,遵医嘱,不自作主张,中间停药,半途而废。
建立一个“信任”的 LLM,是 AI 能不能真正有用的核心。
论文
华盛顿大学课程推荐论文
6: Can we trust language models?
我们可以信任语言模型吗?
Are larger language models better or worse with conspiracy theories? How can we design benchmarks to test the trustworthiness of neural models? Can adversarial attacks reveal private text from neural language models?
更大的语言模型对于阴谋论来说是更好还是更差?我们如何设计基准来测试神经模型的可信度?对抗性攻击能否揭示神经语言模型中的私人文本?
- When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories (Mallen et al., 2022)
- Investigating Memorization of Conspiracy Theories in Text Generation (Levy et al., 2021)
- TruthfulQA: Measuring How Models Mimic Human Falsehoods (Lin et al., 2022)
- Recovering Private Text in Federated Learning of Language Models (Gupta et al, 2022)
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (Mitchell et al, 2023)
- Deduplicating Training Data Mitigates Privacy Risks in Language Models (Kandpal et al., 2022)
- Extracting Training Data from Large Language Models (Carlini et al., 2020)
- Large Language Models Can Be Easily Distracted by Irrelevant Context (Shi et al., 2023)
- Measuring Progress on Scalable Oversight for Large Language Models (Bowman et al., 2022)
- Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering (Kim et al., 2021)
- CREPE: Open-Domain Question Answering with False Presuppositions (Yu et al., 2022)
- Discovering Language Model Behaviors with Model-Written Evaluations (Perez et al., 2022)
普林斯顿大学课程推荐论文
Privacy(保密)
Extracting Training Data from Large Language Models
Refer: Quantifying Memorization Across Neural Language Models Deduplicating Training Data Mitigates Privacy Risks in Language Models Large Language Models Can Be Strong Differentially Private Learners Recovering Private Text in Federated Learning of Language Models
Index | Previous | Next |