Components of Autonomous Agents

LLMs can be more than just next token predictors. Their applications can span beyond just Question Answering or Chatting. LLMs can be assigned a task such as making a reservation at a restaurant. They can come up with a plan which means divide the task into steps/sub-tasks, start executing each sub-task (action), monitor the output of each action and reason out in cases of success or failure and adapt the plan based on its reasoning. Such systems are called Autonomous Agents.

Here is a pictorial illustration of such an Agent system:

From: Lilian Weng’s LLM powered Autonomous Agents blog

Autonomous Agents can be categorized into two categories based on whether they take feedback or not.

Agents without Feedback

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

This is the first paper in which the authors demonstrated that LLMs can be used to plan by asking them to reason. In this paper the researchers investigate the impact of generating a chain of thought, which consists of a series of intermediate reasoning steps, on enhancing the complex reasoning capabilities of large language models. They specifically demonstrate how these reasoning abilities naturally surface in sufficiently large language models through a straightforward method known as chain-of-thought prompting. This method involves supplying a few examples of chain-of-thought reasoning as exemplars during the prompting process.

Untitled

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

In this paper, the authors address the limitations of current language models, which are primarily designed for token-level, sequential decision-making processes and may struggle with tasks requiring strategic foresight or critical initial decisions. To overcome these challenges, they introduce a novel framework for language model inference named “Tree of Thoughts” (ToT). This framework extends beyond the widely recognized "Chain of Thought" approach, facilitating the exploration of coherent text units, or "thoughts," as intermediary steps in problem-solving. ToT empowers language models to engage in deliberate decision-making by exploring multiple reasoning paths, evaluating choices to determine subsequent actions, and employing strategies such as looking ahead or backtracking to make comprehensive decisions.

Untitled

Agents with Feedback

ReACT: Synergizing Reasoning and Acting in Language Models

In their study, the authors investigate the application of large language models (LLMs) for generating reasoning traces and task-specific actions in a concurrent fashion, enhancing the interaction between these components. Reasoning traces assist the model in formulating, monitoring, and revising action plans, as well as managing exceptions, whereas actions enable it to interface with external sources like knowledge bases or environments for additional information. The introduced system, ReAct, addresses common issues such as hallucination and error propagation seen in chain-of-thought reasoning by leveraging a straightforward Wikipedia API. This approach results in the generation of human-like task-solving trajectories that are more interpretable compared to standard methods lacking reasoning traces. Moreover, ReAct demonstrates superior performance over imitation and reinforcement learning techniques on two interactive decision-making benchmarks, ALFWorld and WebShop, with the advantage of requiring only one or two examples for in-context prompting.
Reflexion: Language Agents with Verbal Reinforcement Learning

Rather than traditional weight updates, Reflexion utilizes linguistic feedback to improve agent performance. Specifically, Reflexion agents engage in verbal self-reflection based on feedback signals received from tasks. They store these reflective texts in an episodic memory buffer, which aids in making more informed decisions in future trials. The Reflexion framework is versatile, capable of integrating various kinds of feedback signals, whether scalar values or free-form language, and from different sources, including external inputs or internally simulated feedback. The implementation of Reflexion shows notable advancements over standard baseline agents in a range of tasks, including sequential decision-making, coding, and language reasoning.
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

In this study, the authors present Language Agent Tree Search (LATS), a comprehensive framework that enhances the planning, acting, and reasoning capabilities of Large Language Models (LLMs). Inspired by the principles of Monte Carlo tree search found in model-based reinforcement learning, LATS leverages LLMs to serve multiple roles: as agents, value functions, and optimizers. This approach harnesses the latent abilities of LLMs for superior decision-making. A key feature of LATS is its integration with an environment that provides external feedback, introducing a more thoughtful and flexible approach to problem-solving that surpasses the constraints of previous methodologies. Through rigorous experimental analysis across various fields, including programming, HotPotQA, and WebShop, the authors demonstrate the versatility and efficiency of LATS. Notably, LATS achieves remarkable results, such as a 94.4% success rate in programming tasks using HumanEval with GPT-4 and an average score of 75.9 in web browsing tasks on WebShop with GPT-3.5, underscoring the method's effectiveness and broad applicability.

Overview of the LATS system

Untitled

Review papers to dive deeper into each module

Holistic Review: A Survey on Large Language Model based Autonomous Agents

In this paper, the authors offer an exhaustive survey and systematic review of the field of Large Language Model (LLM)-based autonomous agents, approached from a holistic perspective. Initially, they explore the construction of LLM-based autonomous agents, proposing a unified framework that encapsulates the majority of prior efforts in this area. Following this, a comprehensive overview of the varied applications of these agents across multiple disciplines, including social science, natural science, and engineering, is presented. The paper also delves into the common evaluation strategies employed for assessing LLM-based autonomous agents. Drawing on insights from previous research, the authors identify several challenges and propose future directions for the field, aiming to guide further advancements in the development and application of LLM-based autonomous agents.
Reasoning Review: A Survey of Reasoning with Foundation Models

In their paper, the authors introduce and discuss seminal foundation models that have been proposed or are adaptable for reasoning tasks, showcasing the latest developments across various reasoning tasks, methods, and benchmarks. They delve into the potential future directions for enhancing reasoning abilities within foundation models, suggesting areas for further exploration and innovation. Additionally, the paper examines the significance of multimodal learning, autonomous agents, and the concept of super alignment in relation to reasoning capabilities.

Table of Contents

Components of Autonomous Agents

Agents without Feedback

Agents with Feedback

Review papers to dive deeper into each module