At Ought, our mission is to leverage machine learning to help people think. We want to support people in thinking through personal questions, such as “How can I find a boyfriend?” “How can I be happier?” or “How can I find a better job?” but also bigger issues, such as “How can we effectively prevent global warming?” “What should we do about risks from nuclear weapons?” or “When will we see human-level AI?”
Natural language is currently our best interface for communicating thoughts to other humans, and dialog is one of our most powerful tools for collaborative problem-solving. For this reason, we’re interested in building automated systems that use dialog to communicate with their users in order to help them think through issues they care about.
At the same time, we expect ML techniques to be limited in their capacity for true language understanding and deep reasoning for many years. Any system we build will critically depend on human support, at least in the near future. When human helpers support dialogs, we want them to get paid for their work. We view markets as a particularly efficient way to organize and incentivize work, including cognitive work, and so are planning to build a market around contributions to dialogs.
Given the goals above, we need to fill in two questions to make the proposal concrete:
- How do we structure the interaction between ML systems, human helpers, and users? What exactly do we automate, and what form do human contributions take?
- What does the market look like? Who pays how much to whom?
The two questions correspond to our two core technologies:
- Automating dialogs by learning cognitive actions on a shared workspace
In a first stab at automation, one could train ML algorithms to predict responses based on recorded human dialogs. However, this training procedure provides very remote supervision for dialogs that require serious thinking between any two messages. Therefore, we associate each dialog with a “workspace” that makes this thinking explicit. The workspaces are expressed in natural language and contain structured notes about the dialog so far, ongoing considerations, and things to discuss and investigate in the future. A diverse crowd of participants can edit the workspaces, including humans, ML algorithms, and domain-specific tools. Data about operations on such workspaces then results in more fine-grained supervision of ML algorithms.
- Markets for microtasks with uncertain rewards
To incentivize humans to contribute to the workspaces mentioned above, users can pledge rewards towards their questions. A market mechanism distributes these rewards to contributors based on how helpful their contributions are. This is challenging: The overhead of evaluating many small, diverse contributions can easily exceed the value of the contributions themselves, and it’s unclear how to assign value to contributions in the first place. To address this, we’ll only evaluate occasionally, but then in great depth, and will otherwise use ML to predict the value of contributions based on cheap features.
Taken together, this forms a Dialog Market — a mechanism for creating high-quality conversations that resolve vague questions. (This idea was first described in this tech report.)
There is synergy between the two technologies:
- Human contributions are training data for ML
Setting up a market that rewards human contributions solves the problem of generating training data for ML algorithms.
- Rewards incentivize automation
Indeed, the market will incentivize the creation of helpful algorithms as well, including bots based on ML and more domain-specific rule-based bots.
- Rewards enable reinforcement learning
A major advantage of the market setting is that monetary rewards can serve as reward signals for RL agents. In my discussion of dialog automation, I focused on imitating human behavior using supervised learning, but in the long run, optimizing reward directly is what enables superhuman performance.
- Expected rewards let bots decide when to abstain
By making a prediction about whether an action will receive positive or negative reward in expectation, bots can make principled decisions about whether or not to act.
- For deep evaluation, recurse
When we evaluate a contribution in depth to determine how much to pay, we can start a dialog that asks “How much should we pay for this contribution?” and use the entire machinery including crowdsourcing, automation, and reward assignment on a meta-level. The contents of such dialogs can support training the system that is responsible for predicting rewards when we don’t do deep evaluations.
We expect to have three kinds of users:
- People asking questions will mostly access Ought through a mobile app. From their perspective, they are simply having a chat with a (hopefully) surprisingly helpful partner. To incentivize more work on their particular conversations, they can pledge rewards which the market system then distributes over all actions that are taken on the corresponding workspaces.
- Human thinkers and experts can browse ongoing dialogs using a web interface and earn rewards by making helpful edits to the workspaces. They may be required to digitally sign an NDA to preserve the privacy of the people asking questions. Since there are significant task switching costs due to the need to load new dialogs into memory, we can expect the same contributors to make multiple sequential edits to the same workspace.
- Programmers can similarly earn rewards by writing bots that reliably make helpful contributions to the workspaces.
Building a Dialog Market is a long-term project. To achieve its potential, advances in several ML technologies will be required. Why start now?
- We want to be confident that — by the time such advances happen — they will help people solve their most challenging problems. Developing significant infrastructure takes time, so it’s good to start early.
- The pieces built along the way will have value before the full vision is realized. Dialog markets could probably function with only human participants, and can leverage even the sort of rudimentary automation available today.
In summary, we want to build a tool for thinking that becomes increasingly useful as ML advances, starting from today’s basic techniques all the way to human-level AI and beyond.
Learn more about dialog markets here:
Thanks to Paul Christiano, Noah Goodman, Owen Cotton-Barratt, Owain Evans, Daniel Hawthorne, Natalie Schaworonkow, Frauke Harms, and Long Ouyang for helpful comments on this sequence of posts.