Predicting Slow Judgments
In this project, we took first step towards exploring whether we can use machine learning algorithms to make well-calibrated predictions of human judgments in AI-complete domains.
What are slow judgments?
Imagine you read this statement in the news:
“When I was governor of Massachusetts, we didn’t just slow the rate of growth of our government, we actually cut it.” - Mitt Romney (Interview, 2012)
Is Mitt Romney’s statement true or false? You can make a quick guess about whether it is true or false, but to be confident you’d probably need to do some research.
We are using machine learning to predict slow judgments—judgments that require time and resources. Some tasks can only be solved through a lengthy deliberation process involving thinking, research, and discussion with experts. This includes judging whether a newspaper headline is truthful, or whether a defendant is guilty, or the quality of a research paper.
Machine learning does best with lots of labeled examples. However, by definition, collecting a dataset of slow judgments is extremely costly: each slow judgment could take 5 hours of deliberation and research. That means 5 hours to generate a single label!
To address this, we collect many quick judgments (which are cheap) and fewer slow judgments. ML algorithms can use the quick judgments as noisy labels (or alternatively as a regularizer), while the algorithm’s objective is to predict slow judgments.
Why predict slow judgments?
Our mission is to find scalable ways to leverage machine learning for deliberation. We view predicting slow judgments as a simplified test domain where we can explore some issues relevant to this mission and to AI alignment more generally.
Robust generalization for AI-complete tasks
We would like to see machine learning systems that produce well-calibrated predictions of human judgments for AI-complete tasks. These ML systems should remain well-calibrated (i.e. the system "knows what it knows") under distribution shift (see e.g. Amodei et al 2016). We aim to create datasets that help develop such ML systems.
The tasks we have chosen are plausibly AI-complete. Solving novel Fermi problems requires general scientific reasoning. Deciding if a political statement is true requires extensive research and broad world knowledge. Predicting someone's preferences over a brand new ML paper requires understanding both the technical details of the paper and the person's preferences (e.g. do they prefer detailed mathematical proofs or verbal exposition?).
Why do we want robustly well-calibrated ML systems on AI-complete tasks? For an ML system to be reliably trustworthy, it must do well in situations that are distinct from anything experienced previously. The ML system cannot always take the best possible action in novel situations but it should recognize their distinctiveness and act conservatively. For example, the system might ask a human for guidance or take an action known to be safe in all situations.
Algorithms for distillation of human judgment
More specifically, we think that iterated distillation and amplification of human judgment could be an important step towards scalable automation of deliberation, and towards AI alignment in general. Distillation means training fast ML systems to predict (increasingly amplified) human judgments. Initially (when the amplification process is weak) the distillation step is similar to predicting slow judgments in AI-complete problems. Developing algorithms for our datasets may provide insights for robust distillation. (We are also exploring amplification in our project on factored cognition.)
- ThinkAgain (discontinued)
Our web app for data collection. Play games on Fermi estimation, political fact-checking, and evaluating Machine Learning papers. No signup required.
- Predicting Slow Judgments (pdf)
Slides for a presentation given at a NIPS 2017 workshop
- Predicting Human Deliberative Judgments with Machine Learning (pdf)
FHI tech report published in July 2018