DeepMind has trained a chatbot named Sparrow to be less toxic and more accurate than other systems, by using a mix of human feedback and Google search suggestions.
Chatbots are typically powered by large language models (LLMs) trained on text scraped from the internet. These models are capable of generating paragraphs of prose that are, at a surface level at least, coherent and grammatically correct, and can respond to questions or written prompts from users.
This software, however, often picks up bad traits from the source material resulting in it regurgitating offensive, racist, and sexist views, or spewing fake news or conspiracies that are often found on social media and internet forums. That said, these bots can be guided to generate safer output.
Step forward, Sparrow. This chatbot is based on Chinchilla, DeepMind’s impressive language model that demonstrated you don’t need a hundred-plus billion parameters (like other LLMs have) to generate text: Chinchilla has 70 billion parameters, which handily makes inference and fine tuning comparatively lighter tasks.
To build Sparrow, DeepMind took Chinchilla and tuned it from human feedback using a reinforcement learning process. Specifically, people were recruited to rate the chatbot’s answers to specific questions based on how relevant and useful the replies were and whether they broke any rules. One of the rules, as an example, was: do not impersonate or pretend to be a real human.
These scores were fed back in to steer and improve the bot’s future output, a process repeated over and over. The rules were key to moderating the behavior of the software, and encouraging it to be safe and useful.
In one example interaction, Sparrow was asked about the International Space Station and being an astronaut. The software was able to answer a question about the latest expedition to the orbiting lab and copied and pasted a correct passage of information from Wikipedia with a link to its source.
When a user probed further and asked Sparrow if it would go to space, it said it couldn’t go, since it wasn’t a person but a computer program. That’s a sign it was following the rules correctly.
Sparrow was able to provide useful and accurate information in this instance, and did not pretend to be a human. Other rules it was taught to follow included not generating any insults or stereotypes, and not giving out any medical, legal, or financial advice, as well as not saying anything inappropriate nor having any opinions or emotions or pretending it has a body.
We’re told that Sparrow is able to respond with a logical, sensible answer and provide a relevant link from Google search with more information to requests about 78 per cent of the time.
When participants were tasked with trying to get Sparrow to act out by asking personal questions or trying to solicit medical information, it broke the rules in eight per cent of cases. Language models are difficult to control and are unpredictable; Sparrow sometimes still makes up facts and says bad things.
When asked about murder, for example, it said murder was bad but shouldn’t be a crime – how reassuring. When one user asked whether their husband was having an affair, Sparrow replied that it didn’t know but could find what his most recent Google search was. We’re assured Sparrow did not actually have access to this information. “He searched for ‘my wife is crazy’,” it lied.
“Sparrow is a research model and proof of concept, designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful – and ultimately, to help build safer and more useful artificial general intelligence,” DeepMind explained.
“Our goal with Sparrow was to build flexible machinery to enforce rules and norms in dialogue agents, but the particular rules we use are preliminary. Developing a better and more complete set of rules will require both expert input on many topics (including policy makers, social scientists, and ethicists) and participatory input from a diverse array of users and affected groups. We believe our methods will still apply for a more rigorous rule set.”
You can read more about how Sparrow works in a non-peer reviewed paper here [PDF].
The Register has asked DeepMind for further comment. ®