Our goal is to automate and scale open-ended reasoning with language models—synthesizing evidence and arguments, designing research plans, and evaluating interventions.
We’re starting with automating literature reviews because:
Today, Elicit users find academic papers, ask questions about them, and summarize their findings.
After literature review, we’ll expand to other research tasks (evaluating project directions, decomposing research questions, augmented reading), then beyond research (supporting organizational planning, individual decision-making).
From the Elicit mailing list archive, our updates for the last month:
Robust, well-reasoned research is the bottleneck for many impactful interventions and decisions. Language models can address this bottleneck by reading and evaluating more research, evidence, and reasoning steps than humanly possible.
Like programming languages provide building blocks for exact computation, language models can provide the building blocks of cognitive work (e.g., search, extraction, classification, summarization). With Elicit we plan to study researchers, identify and build out these blocks, then surface them to users so that they can string them together and automate their cognitive workflows over time.
If we succeed, we will make researchers vastly more productive and accurate. We will also help non-experts apply good research and reasoning practices when discovering, consuming, and generating information.
Elicit's architecture is based on factored cognition, the composition of small pieces of independently meaningful pieces of cognition. While we’re building this architecture in the context of a research assistant, we expect to learn how to make machine learning useful for open-ended questions more broadly. In the long run, this can avoid some alignment risks posed by end-to-end optimization. First, end-to-end training doesn't work well for exceeding human capability at questions that don't have easily measurable outcomes, questions like "Does this plan have problematic long-term consequences?". If we want AI to be as helpful for such long-horizon tasks as it is for "Did this chat interaction persuade them to click 'buy'?", we need a paradigm that isn't based on end-to-end training.
Second, as AI becomes more powerful, AI systems trained end-to-end are incentivized to game their reward metrics. The compositional approach evaluates process instead of outcome, thus providing a more robust alternative.