AI Safety Needs Great Product Builders

In his AI Safety Needs Great Engineers post, Andy Jones explains how software engineers can reduce the risks of unfriendly artificial intelligence. Even without deep ML knowledge, these developers can work effectively on the challenges involved in building and understanding large language models.

I would broaden the claim: AI safety doesn’t need only great engineers – it needs great product builders.

This post will describe why, list some concrete projects for a few different roles, and show how they contribute to AI going better for everyone.


This post is aimed at anyone who has been involved with building software products: web developers, product managers, designers, founders, devops, generalist software engineers, … I’ll call these product builders.

Non-technical roles (e.g. operations, HR, finance) do exist in many organisations focussed on AI safety, but this post isn’t aimed at them.

But I thought I would need a PhD!

In the past, most technical AI safety work was done in academia or in research labs. This is changing because – among other things – we now have concrete ideas for how to construct AI in a safer manner.

However, it’s not enough for us to merely have ideas of what to build. We need teams of people to partner with these researchers and build real systems, in order to:

  • Test whether they work in the real world.
  • Demonstrate that they have the nice safety features we’re looking for.
  • Gather empirical data for future research.

This strand of AI safety work looks much more like product development, which is why you – as a product builder – can have a direct impact today.

Example projects, and why they’re important

To prove there are tangible ways that product builders can contribute to AI safety, I’ll give some current examples of work we’re doing at Ought.

For software engineers

In addition to working on our user-facing app, Elicit, we recently open-sourced our Interactive Composition Explorer (ICE).

ICE is a tool to help us and others better understand Factored Cognition. It consists of a software framework and an interactive visualiser:

ICE visualiser

On the back-end, we’re looking for better ways to instrument the cognition “recipes” such that our framework stays out of the user’s way as much as possible, while still giving a useful trace of the reasoning process. We’re using some meta-programming, and having good CS fundamentals would be helpful, but there’s no ML experience required. Plus working on open-source projects is super fun!

If you are more of a front-end developer, you’ll appreciate that representing a complex deductive process is a UX challenge as much as anything else. These execution graphs can be very large, cyclic, oddly and unpredictably shaped, and each node can contain masses of information. How can we present this in a useful UI which captures the macro structure and still allows the user to dive into the minutiae?

This work is important for safety because AI systems that have a legible decision-making process are easier to reason about and more trustworthy. On a more technical level, Factored Cognition looks like it will be a linchpin of Iterated Distillation and Amplification – one of the few concrete suggestions for a safer way to build AI.

For product managers

At first, it might not be obvious how big of an impact product managers can have on AI safety (the same goes for designers). However, interface design is an alignment problem – and it’s even more neglected than other areas of safety research.

You don’t need a super technical background, or to already be steeped in ML. The competing priorities we face every day in our product decisions will be fairly familiar to experienced PMs. Here are some example trade-offs that we regularly navigate for our app, Elicit:

Deliver terse, valuable insights to users

Simple answers and compact summaries make our product feel magical and save users’ time.


Expose the inner-workings of the system

Revealing what is happening under the covers makes our product more trustworthy and builds trust.

Follow familiar product paradigms

Users get more immediate value from interfaces which feel familiar.


Imagine radical new workflows

Language models offer opportunities for novel interaction styles which might be more valuable in the medium-term.

Get users quick answers to their questions

Quick answers let our users power through well-defined tasks.


Help users carefully navigate a complex task

Perhaps we offer more lasting value by acting as a reasoning assistant – it certainly better fits our overall mission.

Compensate for limitations in today’s language models

Language models have various known limitations, and our current product is valuable to users when we work around those limitations.


Build functionality which scales to powerful future models

Limitations are going to retreat and change with every improvement to language models. We want our work to be super-charged by change, rather than deprecated by it.

Elicit product questions

In my opinion, the main thing which makes product management at Ought different from other places is that we are working on the frontier of new technology. We regularly dream up features which turn out to just not be possible – even using today’s most powerful models. Our product direction isn’t only informed by our strategy and our users, it’s also influenced by what can be realised on the cutting edge of machine learning.

Great product managers can have an enormous impact on AI safety by helping us find the right balance between:

  1. Proving our app is useful, and
  2. Proving our approach is safer

If we lean too much towards adding crowd-pleasing widgets to our product, we won’t make enough progress on our mission to prove out process-based systems. On the other hand, if we lean too much towards researching process-based systems, we will – at best – prove they’re theoretically interesting rather than actually useful.

For infrastructure engineers

Because of the nature of the research we’re doing and the design of the product we’re building, we’re hitting a bunch of different ML APIs to do different jobs in different places.

Some of the models we use for different tasks in Elicit

At the moment, we don’t have the infrastructure to record all of these interactions with 3rd party and internal models. We’d like to insert an abstraction layer between our code and the ML models to achieve a few things:

  • The ability to fallback to alternatives when a model goes down or has increased latency (this is still quite common with 3rd party models).
  • Build up datasets of task/result pairs, which we can then use to train our own models in the future. These training data would, by their nature, be representative of the queries our users are interested in. We could even enrich the dataset with user feedback from the app: if someone marked one of our answers as “not good”, we can use that to improve the models on the next iteration.
  • Enable us to split traffic to a couple of different back-end models, so that we can compare them head-to-head. We’d like to understand how operational metrics like latency, throughput, and error rate compare between different APIs, using real traffic.

Implementing this abstraction layer would be challenging work. Obviously, when running in our production environment performance and reliability is critical. In contrast, for internal experiments we’d like the system to be highly flexible and easy to modify. There are some tools for tracking datasets and versioning, but it’s a relatively immature space compared to conventional infrastructure: there’s not an enormous toolchain for you to learn!

This is a key component of our AI safety work because building up these datasets of task/result pairs is one way to do the “Distillation” part of Iterated Distillation and Amplification. For example, imagine we have a complex and expensive Factored Cognition process running in our infrastructure. After 1000, 10000, or 100000 examples we could hot-swap in a new model trained on the real data captured at your abstraction layer.

Why would you do this?

Hopefully the examples above show that there is a wide range of product building work which can help with AI safety. But there are lots of exciting opportunities out there! What makes this area so special?

Here are some of the things that excite me the most about working on AI safety:

Impactful work

80,000 Hours rates AI safety as one of its highest priority causes because the risks are astronomical, there are tangible things we can do to help, and talent is currently the main constraint.

The people are fantastic

Because it’s a nascent field with an altruistic bent, your colleagues and peers will generally have strong prosocial motivations. They will be gifted, dedicated, interesting team mates who aren’t just doing it for a paycheck.

Talking of paychecks…

Many of the organisations working on AI safety (including Ought!) are well-funded. You can work on something important without needing to sacrifice your creature comforts, or priorities like Earning To Give.

Interesting, challenging work every day

Nowadays, a lot of product development work is highly commoditised. We have enough frameworks, methodologies, tools, and services to make software projects feel like building a Lego set, rather than a truly novel challenge. This is not yet true with AI. New models, architectures, and techniques are being developed all the time, and there’s a tight feedback loop between academia and industry.

In fact, if you fit the profile of the audience of this post, and the points I just made above resonate with you, you might find that AI safety is your Ikigai – I did!

What to do now?

If this post has resonated with you and you’d like to know more about Ought:

If you’re interested in working in AI safety more generally:

  • Preventing an AI-related catastrophe is a comprehensive and up-to-date overview of the cause area.
  • 80,000 Hours also offers 1-1 advice to people looking to move to a high-impact career.
  • We’re starting a reading group aimed at the same sorts of people that this post is catered to. You can register your interest here!

My thanks to Jungwon Byun, Andreas Stuhlmüller, Odette Brady, Eric Arellano, Jess Smith, and Maggie Appleton for their contributions to this post.

A Library and Tutorial for Factored Cognition with Language Models

We want to advance process-based supervision for language models. To make it easier for others to contribute to that goal, we've released code for writing compositional language model programs and a tutorial that explains how to get started:

We've been using ICE as part of our work on Elicit and have found it useful in practice.

Interactive Composition Explorer (ICE)

ICE is an open-source Python library for writing, debugging, and visualizing compositional language model programs. ICE makes it easy to:

  1. Run language model recipes in different modes: humans, human+LM, LM
  2. Inspect the execution traces in your browser for debugging
  3. Define and use new language model agents, e.g. chain-of-thought agents
  4. Run recipes quickly by parallelizing language model calls
  5. Reuse component recipes such as question-answering, ranking, and verification

ICE looks like this:

ICE Screenshot

Factored Cognition Primer

The Factored Cognition Primer is a tutorial that explains (among other things) how to:

  1. Implement basic versions of amplification and debate using ICE
  2. Reason about long texts by combining search and generation
  3. Run decompositions quickly by parallelizing language model calls
  4. Use verification of answers and reasoning steps to improve responses

The Primer looks like this:

Primer Screenshot

If you end up using either, consider joining our Slack. We think that factored cognition research parallelizes unusually well and would like to collaborate with others who are working on recipes for cognitive tasks.

To learn more about how we've been using ICE, watch our recent Factored Cognition lab meeting.

How to use Elicit responsibly

Elicit has gotten exciting coverage on Twitter the last few days, leading to an influx of new users [1, 2, 3]. Welcome! We’re so excited to have you and grateful for your interest.

Alongside the overwhelmingly positive response, some people wisely pointed out the need for more transparency about who is building Elicit, how it works, and where it doesn’t. We’ll start the conversation with this note, but we expect this will be an ongoing dialogue.

Read more

The Plan for Elicit

Ought is an applied machine learning lab. We’re building Elicit, the AI research assistant. Our mission is to automate and scale open-ended reasoning. To get there, we train language models by supervising reasoning processes, not outcomes. This is better for reasoning capabilities in the short run and better for alignment in the long run.

In this post, we review the progress we’ve made over the last year and lay out our plan.

Progress in 2021:

  1. We built Elicit to support researchers because high-quality research is a bottleneck to important progress and because researchers care about good reasoning processes.
  2. We identified some building blocks of research (e.g. search, summarization, classification), operationalized them as language model tasks, and connected them in the Elicit literature review workflow.
  3. On the infrastructure side, we built a streaming task execution engine for running compositions of language model tasks. This engine is supporting the literature review workflow in production.
  4. About 1,500 people use Elicit every month.

Roadmap for 2022+:

  1. We expand literature review to digest the full text of papers, extract evidence, judge methodological robustness, and help researchers do deeper evaluations by decomposing questions like “What are the assumptions behind this experimental result?”
  2. After literature review, we add other research workflows, e.g. evaluating project directions, decomposing research questions, and augmented reading.
  3. To support these workflows, we refine the primitive tasks through verifier models and human feedback, and expand our infrastructure for running complex task pipelines, quickly adding new tasks, and efficiently gathering human data.
  4. Over time, Elicit becomes a general-purpose reasoning assistant, transforming any task involving evidence, arguments, plans and decisions.
Read more

Supervise Process, not Outcomes

We can think about machine learning systems on a spectrum from process-based to outcome-based:

  • Process-based systems are built on human-understandable task decompositions, with direct supervision of reasoning steps.
  • Outcome-based systems are built on end-to-end optimization, with supervision of final results.

This post explains why Ought is devoted to process-based systems. The argument is:

  1. In the short term, process-based ML systems have better differential capabilities: They help us apply ML to tasks where we don’t have access to outcomes. These tasks include long-range forecasting, policy decisions, and theoretical research.
  2. In the long term, process-based ML systems help avoid catastrophic outcomes from systems gaming outcome measures and are thus more aligned.
  3. Both process- and outcome-based evaluation are attractors to varying degrees: Once an architecture is entrenched, it’s hard to move away from it. This lock-in applies much more to outcome-based systems.
  4. Whether the most powerful ML systems will primarily be process-based or outcome-based is up in the air.
  5. So it’s crucial to push toward process-based training now.

There are almost no new ideas here. We’re reframing the well-known outer alignment difficulties for traditional deep learning architectures and contrasting them with compositional approaches. To the extent that there are new ideas, credit primarily goes to Paul Christiano and Jon Uesato.

We only describe our background worldview here. In a follow-up post, we’ll explain why we’re building Elicit, the AI research assistant.

Read more