NARRATE: Versatile Language Architecture for
Optimal Control in Robotics

ETH Zürich

Abstract

The impressive capabilities of Large Language Models (LLMs) have led to various efforts in enabling robots to be controlled through natural language instructions, opening exciting possibilities for human-robot interaction. The goal is for the motor-control task to be performed accurately, efficiently and safely while also enjoying the flexibility imparted by LLMs to specify and adjust the task through natural language. In this work, we demonstrate how a careful layering of an LLM in combination with a Model Predictive Control (MPC) formulation allows for accurate and flexible robotic control via natural language while taking into consideration safety constraints. In particular, we rely on the LLM to effectively frame constraints and objective functions as mathematical expressions, which are later used in the motor-control module via MPC. The transparency of the optimization formulation allows for interpretability of the task and enables adjustments through human feedback. We demonstrate the validity of our method through extensive experiments on long-horizon reasoning, contact-rich, and multi-object interaction tasks. Our evaluations show that NARRATE outperforms current existing methods on these benchmarks and effectively transfers to the real world on two different embodiments.



NARRATE solves long-horizon contact-rich tasks, generating a sequence of sub-tasks in language space, and formulating for each sub-task an interpretable and safe constrained MPC program to execute the action on the robot.

Architecture



The user provides a task in natural language l, which then gets translated into a series of steps (TP) and objective and constraints (OD) via two layered blocks of a language model in the language module. The objective and constraints c, h, g are then used as inputs to the control module, which generates a trajectory via MPC (TG) and the low-level control commands (TT) to be applied to the system, in this case, a robotic arm.



End Effector Trajectories in Cubes Stacking

We compare NARRATE against Code as Policies, and VoxPoser. We find that defining the constrained MPC program leads to the most efficient execution of contact-rich tasks (where the cm-level trajectories of the gripper fingers matters for collision avoidance). In the figures, we show an example of the end effector trajectories for the task of stacking cubes using the three methods.


Ours

Code as Policies

VoxPoser

Human Feedback at Runtime

NARRATE defines each sub-task in natural language. This allows for seamless human intervention in case the subtasks are incorrect. We show that being able to intervene with feedback (on both 1 sub-task, or for every sub-task of the plan) increases the success rate.


Human-Robot chat interface

The relatively low inference time required by LLMs and the real-time properties of the MPC regime allow humans to interactively collarborative with NARRATE via an intuitive chat interface.


Prompts

All the prompts are the same in simulation and real-world deployment:
Task Planner | Optimization Designer | Task Planner (Collaborative) | Optimization Designer (Collaborative)

BibTeX


      @misc{ismail2024narrate,
        title={NARRATE: Versatile Language Architecture for Optimal Control in Robotics}, 
        author={Seif Ismail and Antonio Arbues and Ryan Cotterell and René Zurbrügg and Carmen Amo Alonso},
        year={2024},
        eprint={2403.10762},
        archivePrefix={arXiv},
        primaryClass={cs.RO}
      }