Scalable mechanisms for solving cognitive tasks


Our goal is to find scalable mechanisms for solving cognitive tasks such as "Tell me how to invest $100k to achieve the most social good". Such mechanisms would produce increasingly helpful solutions as we supply more human work-hours and better ML algorithms.

In order to view the design of scalable mechanisms for human cognitive work as an algorithm design problem, we make a few simplifying assumptions:

  1. Human workers are well-motivated. There is no principal-agent problem.
  2. Each worker is only available for a short amount of time, say 15 minutes.
  3. Each worker has the same background knowledge.

We describe Iterated Distillation-Amplification, the only concrete candidate we know so far for automating deliberation using ML in a scalable way. This approach requires that we know how to scale with respect to human work under the assumptions above.

Cognitive tasks

Here are a few cognitive tasks:

  • Read this book and tell me why x did y.
  • Provide a detailed analysis of the pros and cons of these two products.
  • Tell me how to invest $100k to achieve the most social good.
  • Write a paper on NLP that substantially advances the state of the art.

We want to build scalable mechanisms for solving such tasks.

Scalable mechanisms

A mechanism is scalable if the solutions it generates get better as its resources increase.

Assuming that each task is posed by a principal, a solution is better if it is more aligned with the principal’s interests (because it is more well-reasoned, well-explained, or whatever else the principal cares about, including any preferences they may have about the process that generated the solution).

Resources include the number of human work hours and the quality of machine learning components. (I’ll say what I mean by “quality of ML components” below.)

For human work, scalability is desirable since it allows us to convert money into predictable progress on cognitive tasks. If we care about a question, we can purchase marginal improvements in the answer simply by adding more work hours. This is one of the building blocks for turning thinking into a commodity.

For machine learning, scalability is desirable since it means that we don't have to "keep up" with ML developments in order to reap the benefits for supporting deliberation. A scalable mechanism will automatically get more helpful as we plug in more advanced algorithms and models.

Organizing human work on cognitive tasks

As a starting point, imagine giving one of the tasks above to a single person and setting a deadline (two weeks, say). The person is motivated to help us out and works on the task more or less continuously, discusses it with relevant experts, writes down notes, consults the Internet, learns relevant background, asks follow-up questions, etc. When the deadline comes around, they send us a solution of a certain quality.

Any given person has a finite number of hours available, so this approach only scales up to that limit; and even before we reach that limit, continued progress depends on the person being sufficiently good at reasoning and learning. To get around individual limitations, we could instead hire a group of people, and scale by increasing the group size. This raises additional questions: How do they communicate? Who does what? How can we have confidence that adding more people will improve the quality of the solutions we get, and won’t stop making a difference (or even hurt) beyond some point?

Setting up incentives such that each of the participants is motivated to do their best would of course be a big challenge, and we are interested in making progress on it, but we will not address it here. (Simplification 1)

Even assuming well-motivated participants, an interesting question remains: How should we orchestrate human cognitive work such that the outputs keep improving as we add more work-hours? In other words, what does a scalable approach to crowdsourcing cognitive work look like if there is no principal-agent problem?

Short-term context-free work

This is still a very general problem, subsuming much of organization design.

To make the problem statement simpler and more concrete, we focus on the case of short-term contributions. That is, we would like to know what problems we can solve if we have many workers, but each is only available for a short time period, say 15 minutes, before they leave the pool of workers, never to return. (Simplification 2)

This simplification could be justified by thinking about what kinds of contributions would be most practical as an input into truly commoditized cognitive work, but the main reason we are making it is in preparation for how we will apply ML later.

In this setting, no single person can build up much context, and so none of them can individually do well at any of the tasks above.

Here are some tasks that each of our workers can easily do:

  • “Read these three sentences and tell us whether the first two imply the third one.”
  • “Look at these two proposed ideas and give us your quick impression of how similar they are.”
  • “Look at the first step in this plan towards that goal and see if you can come up with one way it could go wrong.”

Can we compose such “local” tasks in a way that lets us solve complex problems if we just use enough of them?

Coordination of short-term work as algorithm design

For concreteness, we use H to refer to a helpful human who has 15 minutes to help us out. H majored in computer science in college, but otherwise doesn’t have any special knowledge besides instructions and training we may provide ahead of time. H is motivated to help us to the best of her ability.

We interact with H using a computer terminal. We send a task string to H, and before her 15 minutes are up, H types an answer into the terminal (otherwise the terminal returns “timeout” to us). This lets us think about H as a function from strings to strings. By talking about a particular H, we factor out differences between humans from the question of interest. This is a useful abstraction for thinking about the problem, even if we have to implement H using humans that do differ. (Simplification 3)

Using this terminology, our question is: If we can make many calls to this stateless function H, can we mechanically compose them to accomplish complex tasks that H cannot do?

This is a question about algorithm design. We are looking for an algorithm f that takes a task string x and a number n that controls how many calls to H to make. When executed, this algorithm interleaves arbitrary computation with up to n calls to H, and finally returns a solution that depends on what it queried H on, how she responded, and how f processed her responses.

Matching the quality of any other approach to solving cognitive tasks

Our goal is to find approaches that scale, i.e. that get better as we increase the number of calls to H. It would be ideal to make this goal more precise. We could try and ask a question like this:

Is there a simple algorithm f such that, for all tasks x and all alternative solution methods g, there is a number n of sub-calls to H such that f(x, n) solves x at least as well as g?

However, this not a crisp technical question. Most critically, evaluating the quality of a solution is itself a difficult cognitive problem—for complex tasks, a human can’t look at two proposed solutions and decide which is better. In addition, there are concerns about the empirical content of f and g, which may include built-in solutions to some tasks. A potential path towards a crisp question could rely on evaluating solutions based on our subjective expectation (about what would happen if we thought longer, etc.), but that in turn depends on a process for idealized deliberation, which is more or less what we are trying to come up with in the first place.

At this point, it seems that the best we can do is ask an informal question:

Is it possible to compose short and mostly context-free tasks to solve any cognitive problem at arbitrarily high quality, if we just compose enough of them and in the right way?

If this were the case, it would imply that H’s capability is above kind a of universality threshold.

It is reasonable to expect that this isn’t the case. If it were, then whatever institutions, systems, and tools humanity might develop in the future, the simple process composed of H and f would encompass or match their ability. More prosaically, it would show that any task that involves human learning (perhaps over the course of years) can also be done without such learning.

If it turns out to be theoretically or practically infeasible to build systems that satisfy this strong notion of scalability, or if the notion itself is incoherent, aiming for it still seems like a useful tool for finding ideas that could lead to systems that scale well in practice, even if they ultimately do encounter bounds.

The other reason not to reject this goal immediately is that we’re happy to consider very large numbers of calls to H, so we can use strategies that would be very expensive to implement directly, including strategies that involve sophisticated cognitive systems built out of H-shaped pieces. Such strategies will become useful once we start to automate human labor using machine learning.

Applying machine learning to cognitive tasks

We want to apply machine learning to fuzzy cognitive tasks like the ones mentioned at the beginning. I have motivated machine learning for deliberation elsewhere, so I will focus on scalability here: We want the results to get better as the ML components we use (such as supervised learning or reinforcement learning) improve. As a special case, sufficiently advanced ML should lead to outputs that are more helpful than what any human could produce.

By “better ML” or “more advanced ML”, I refer to a cluster of properties that roughly factors into better priors (more flexible/abstract/hierarchical internal representations and inductive biases that allow learners to quickly build accurate models of relevant aspects of the world, and more task-relevant prior knowledge), better inference (algorithms for belief revision and planning that more closely approximate what an ideal reasoner—using exact Bayesian inference and Bayes-optimal planning, or perhaps some form of idealized logical induction—would do), and better training paradigms (as long as they don’t fundamentally change the learning problem; e.g., active learning, training on adversarial examples). In practical implementations, these aspects will probably be intertwined, but this taxonomy is still useful for analyzing algorithms from the outside.

Approaches that don't scale

Here are two ways to apply ML to cognitive tasks in an “end-to-end” fashion, and why they wouldn’t scale. These are straw men insofar as I don’t expect sophisticated algorithms to be applied as part of such simplistic schemes, but they are still instructive:

  • We train supervised learning algorithms on (task, solution) pairs. This doesn’t scale because we don’t know how to generate the training data. We can generate a list of tasks, but we can’t generate solutions of arbitrarily high quality, and so our training data is limited to whatever quality humans can achieve.
  • Reinforcement learning algorithms receive a task as input, generate a solution, and then we generate a reward signal based on how good the solution seems. This doesn’t scale, since optimizing for how good something seems doesn’t optimize for actual goodness.

Of course, current algorithms wouldn’t learn to solve any interesting cognitive tasks in the first place, but it is notable that, if we followed the procedures above, even much more sophisticated algorithms wouldn’t be as helpful as we might hope. (This is in addition to more esoteric failure modes that sophisticated algorithms could run into.)

An approach that might scale

How else could we apply ML to cognitive tasks? It is likely that this will become clearer as the field develops new techniques, and indeed is probably contingent on what these new techniques will look like. In the meantime, it seems worthwhile to consider whether there are scalable ways to apply ML to cognitive tasks if future algorithms essentially look like today’s, only better along the dimensions outlined above. (See also: Prosaic AI Alignment)

If we could break down arbitrary problem-solving into small steps, such as the 15-minute tasks that our human H can do, we would be in a much better position. In that case, we could run the following procedure (Iterated Distillation-Amplification):

  1. Initialize fast ML agent A randomly.
  2. Repeat:
    1. Build a relatively slow system that involves H executing a single step, with the ability to make multiple calls to A during that step (Amplification)
    2. Retrain A to quickly replicate the behavior of the slow system, e.g. using imitation learning, RL, or IRL (Distillation)

By repeating 2a and 2b, we create a yet better slow system, train A to be a fast copy of that yet better system, and so on, at each iteration giving H access to the most advanced version of A. This way, we could hope to saturate the capability of any given ML system. Unlike the end-to-end approaches above, this procedure is scalable with respect to its ML components.

However, this can only work if it is possible to break down long-term problem solving into small and mostly context-free steps, and it is only scalable if we can reach arbitrary solution quality by assembling sufficiently many of these steps. So, we are back to our question about organizing human cognitive labor, but with better justification for the three simplifications we made (no principal-agent problem, short tasks, no differences between humans): Is there a mechanism for composing local work that is scalable, such that we can reach arbitrary levels of capability if we just compose enough such work and in the right way?