Predicting Slow Judgments

In this project, we took first step towards exploring whether we can use machine learning algorithms to make well-calibrated predictions of human judgments in AI-complete domains.

What are slow judgments?

Imagine you read this statement in the news:

“When I was governor of Massachusetts, we didn’t just slow the rate of growth of our government, we actually cut it.” - Mitt Romney (Interview, 2012)

Is Mitt Romney’s statement true or false? You can make a quick guess about whether it is true or false, but to be confident you’d probably need to do some research.

We are using machine learning to predict slow judgments—judgments that require time and resources. Some tasks can only be solved through a lengthy deliberation process involving thinking, research, and discussion with experts. This includes judging whether a newspaper headline is truthful, or whether a defendant is guilty, or the quality of a research paper.

Machine learning does best with lots of labeled examples. However, by definition, collecting a dataset of slow judgments is extremely costly: each slow judgment could take 5 hours of deliberation and research. That means 5 hours to generate a single label!

To address this, we collect many quick judgments (which are cheap) and fewer slow judgments. ML algorithms can use the quick judgments as noisy labels (or alternatively as a regularizer), while the algorithm’s objective is to predict slow judgments.

Why predict slow judgments?

Our mission is to find scalable ways to leverage machine learning for deliberation. We view predicting slow judgments as a simplified test domain where we can explore some issues relevant to this mission and to AI alignment more generally.

Robust generalization for AI-complete tasks

We would like to see machine learning systems that produce well-calibrated predictions of human judgments for AI-complete tasks. These ML systems should remain well-calibrated (i.e. the system "knows what it knows") under distribution shift (see e.g. Amodei et al 2016). We aim to create datasets that help develop such ML systems.

The tasks we have chosen are plausibly AI-complete. Solving novel Fermi problems requires general scientific reasoning. Deciding if a political statement is true requires extensive research and broad world knowledge. Predicting someone's preferences over a brand new ML paper requires understanding both the technical details of the paper and the person's preferences (e.g. do they prefer detailed mathematical proofs or verbal exposition?).

Why do we want robustly well-calibrated ML systems on AI-complete tasks? For an ML system to be reliably trustworthy, it must do well in situations that are distinct from anything experienced previously. The ML system cannot always take the best possible action in novel situations but it should recognize their distinctiveness and act conservatively. For example, the system might ask a human for guidance or take an action known to be safe in all situations.

Algorithms for distillation of human judgment

More specifically, we think that iterated distillation and amplification of human judgment could be an important step towards scalable automation of deliberation, and towards AI alignment in general. Distillation means training fast ML systems to predict (increasingly amplified) human judgments. Initially (when the amplification process is weak) the distillation step is similar to predicting slow judgments in AI-complete problems. Developing algorithms for our datasets may provide insights for robust distillation. (We are also exploring amplification in our project on factored cognition.)



This is a joint project with Owain Evans and collaborators at FHI. Our team members include Tom McGrath, Zac Kenton, Chris Cundy, Ryan Carey, Andrew Schreiber, Neal Jean, and Girish Sastry.