Recent developments have demonstrated that language agents, particularly those built on large language models (LLMs), have the potential to perform a wide array of intricate tasks in diverse environments using natural language. However, the primary focus of most language agent frameworks currently is on facilitating the construction of proof-of-concept language agents. This focus often comes with little to no attention to application-level designs and frequently neglects the accessibility of these agents to non-expert users.
To bridge the current limitations experienced by language agents, developers have come up with the OpenAgents framework, an open platform for hosting and deploying language agents in the wild, and across a host of everyday tasks. The OpenAgents framework is built around three agents
- Data Agent : Helps with Data Analysis using data tools, and query languages like SQL, or programming languages like Python.
- Plugin Agents : Helps by providing access to over 200+ API tools helpful for daily tasks.
- Web Agents : Helps in browsing the web while maintaining your anonymity.
The OpenAgents framework uses a web user interface optimized for common failures and swift responses in an attempt to allow general users to interact with the agent functionalities, while at the same time, offering researchers and developers a seamless deployment experience on their local setups. It would be safe to say that the OpenAgents framework is an attempt to provide a solid foundation for facilitating real-world evaluations, and crafting innovative, effective, and advanced language agents.
In today’s article, we will be taking a deeper dive into OpenAgents framework, and talk about the framework in greater detail. We will talk about the working and architecture of the framework, while also discussing the common challenges faced, and the results. So let’s get started.
Language agents, at their core, are derived from intelligent agents. These intelligent agents are conceptualized to possess autonomous problem-solving capabilities, along with the ability to sense their environment, make decisions, and act accordingly. With advancements in large language models, the global development community has leveraged the concept of intelligent agents and LLMs to create language agents. These agents utilize natural language programming (NLP) to perform a wide array of intricate tasks in diverse environments, and they have recently shown remarkable potential.
Current language agent frameworks, such as Gravitas and Chase, primarily provide a console interface tailored for developers, along with proof-of-concept implementations. However, they often restrict accessibility to a wider audience, particularly those not proficient in coding. Additionally, current agent benchmarks are constructed by developers with specific requirements for deterministic evaluation, especially in scenarios that require web browsing, coding, tool utilization, or a combination thereof.
In an effort to develop LLM-powered intelligent and language agents for a broader user base, established players like OpenAI and Microsoft have deployed a range of well-designed products, including Advanced Data Analysis, also known as Code Interpreter, and browser plugins. Although these agents are effective in their functions, they offer limited help to the development community. This limitation arises because the business logic code and model implementations have not been open-sourced, hindering the opportunities for developers and researchers to further explore them, as well as limiting free access for users.
In an attempt to tackle this problem, developers have come up with OpenAgents, an open-source platform for hosting and using agents, and it is currently built on a foundation of three internal agents
- Data Agent : Helps with Data Analysis using data tools, and query languages like SQL, or programming languages like Python.
- Plugin Agents : Helps by providing access to over 200+ API tools helpful for daily tasks.
- Web Agents : Helps in browsing the web while maintaining your anonymity.
The following figure demonstrates the OpenAgents platform for general users, developers and researchers.
- Instead of using a programmer-oriented package or consoles, general users can interact with the three agents in the OpenAgents framework using an online web interface.
- Developers can make use of the business logic and research codes provided by the OpenAgents framework to seamlessly deploy backend and frontend for further developments.
- Researchers have the flexibility of either building new language agents from scratch, or implement agent-related methods using the shared components & examples, and evaluate their performance using the web UI.
To sum it up, the OpenAgents framework is originally meant to be a holistic, and realistic platform for human-in-the-loop language agent evaluation that allows users to interact with these agents to complete a wide array of tasks, and these human-agent interactions along with the user feedback are stored & analyzed for further development & evaluation.
For those who are not aware, LLM prompting is a process that allows developers to craft instructions that safeguards against adversarial or wrong inputs, enhances output aesthetics, and caters to the backend logic. During the development phase, developers working on the OpenAgents framework use the LLM prompting technique to underscore the significance of specifying application requirements effectively. However, developers soon observed that buildup of these instructions or LLM prompts can be substantial at times that might affect the context handling abilities of LLM frameworks along with token limitations. The developers also observed that in order to deploy these agents effectively in the real world, the agent models should not only exhibit exceptional performance, but they should also be able to tackle a wide array of interactive scenarios in real-time. Although current agent frameworks have got the performance covered, they often ignore real-world considerations especially in real-time that often obfuscates the true potential of LLM frameworks by trading off responsiveness or accuracy.
In the following figure, we are comparing the OpenAgents framework directly with existing works on benchmarks on agent concept, and building prototypes.
OpenAgents : Platform Design and Implementation
The systematic design or architecture of the OpenAgents platform can be split into two primary components: User Interface, including both backend & frontend, and Language Agent, comprising tools, language models, and environments. The OpenAgents framework provides an interface for communication between the users and the agents. The flow of interaction in the framework is as follows.
The agents use tools available to them to plan and take the required actions in the environments once they have received inputs from the users. The architecture or systematic design of the framework is demonstrated in the following image.
User Interface
Developers of the OpenAgents framework have put a lot of thought and effort into developing not only a highly functional but also a user-friendly UI after tackling a load of host agents and reusable business logic. As a result, the OpenAgents framework boasts in providing support for a wide array of technical tasks including error handling, backend server operations, data streaming, and much more, with the primary goal being to make the OpenAgents framework user friendly, but highly effective & usable at the same time.
Language Agent
Within the OpenAgents framework, the language agent has three essential components: a tool interface, a language model, and the environment itself. The prompting method implemented in the OpenAgents framework creates a sequential process for the agents to follow that starts with Observation -> Deliberation -> Action. The framework also prompts the LLM to generate parsable text with enhanced efficiency, and the tool interface consists of parsers that can translate these parsable texts generated by LLMs into executable actions like making API calls or generating code. These actions are then executed by the framework within the boundaries of the corresponding environment.
OpenAgents’ Agents
At the core of OpenAgents, there are three distinct agents: Data Agent that helps with Data Analysis using data tools, and query languages like SQL, or programming languages like Python, Plugin Agents that helps by providing access to over 200+ API tools helpful for daily tasks, and Web Agents that helps in browsing the web while maintaining your anonymity. These agents have individual domain expertise similar to ChatGPT plugins, however unlike ChatGPT, the implementation on OpenAgents is based purely on top of open language Application Programming Interface or APIs.
Data Agent
The data agent in the OpenAgents framework has been designed and deployed in a way to deal with a wide array of data related tasks that the end users encounter on a regular basis. The data agents support code generation and execution in two programming languages namely SQL and Python, and the agent also has several data tools at its disposal including Data Profiling for providing basic data information, Kaggle Data Search for searching datasets, and ECharts Tool for plotting interactive ECharts. Furthermore, the OpenAgents framework prompts the data agent to use these tools proactively to effectively respond to the end users requests. Additionally, given the exhaustive coding requirements, the OpenAgents framework opts for embedded language models for the data agent, and rather than the agent generating the code, it’s the tools like Python, ECharts, and SQL that generate the code. With this approach, the framework is able to harness the programming prowess of language models completely, and thus reduces the strain on the data agent.
With the aid of these data tools, the data agent is capable of managing numerous data-centric requests, and performs data visualization, manipulation, and queries proficiently, thus exceeding the boundaries of code & text generation. The following figure highlights a data agent in action, and the tools available to common users.
Plugins Agent
The plugin agent in the OpenAgents framework has been designed by developers meticulously to cater to a user’s multifaceted requirements for daily tasks including searching the internet, online shopping, reading news, or creating websites & applications by providing access to over 200 plugins, with special attention being paid on function calling interface, API pings, and API response lengths. Some of the prominent plugins include
- Google Search
- Wolfram Alpha
- Zapier
- Klarna
- Coursera
- Show Me
- Speak
- AskYourPDF
- BizTok
- Klook
Based on their needs and requirements, users can choose the number of plugins they want the plugin agents to use, and the working is demonstrated in the figure below.
Furthermore, to aid users in situations where they are not sure what plugin will suit their requirements the best, the OpenAgents framework offers users a feature that automatically selects the plugins most relevant to their instructions.
Web Agents
The OpenAgents framework presents web agent as a specialized tool tasked to enhance the efficiency and capabilities of the chat agent. Although the chat agent still houses the main interaction interface, it seamlessly incorporates the web agent whenever necessary. The final response is then delivered to the end user by the web agent, and the process is illustrated in the figure below.
The design strategy implemented in these web agents prove to be of great benefit as the chat agent processes important parameters or initiates URLs systematically, before they are transferred to the web agent, thus ensuring a better alignment between the user’s requirements, and generated output, thus resulting in clear communication. Furthermore, the strategy also allows the web agents to accommodate layered & adaptable user queries by employing a dynamic multi-turn web navigation coupled with chat dialogues. Therefore, by demarcating the roles and responsibilities of chat and multi-browsing agents distinctly, the OpenAgents framework makes way for refinement & evolution of every individual module.
OpenAgents : Practical Applications and Real World Deployment
In this section, we will be talking about the trajectory of OpenAgents framework from theorization to deployment in real-world along with the challenges encountered, and learnings imbibed along with the evaluation complexities the developers tackled.
Using Prompts to Transform Large Language Models into Real-World Apps
When using LLM prompts for building real-world applications for end users, the OpenAgents framework uses prompt instructions to specify certain requirements. The aim of some of the instructions is to ensure the output is in alignment with a specific format, thus allowing the backend logic to process, whereas the aim of other instructions is to enhance the output’s aesthetic appeal, whereas the rest protect the framework against potential malicious attacks.
Uncontrollable Real-World Factors
When developers deployed the OpenAgents framework in the real world, they were welcomed by an array of uncontrollable real-world factors triggered by internet infrastructure, users, business logics, and more. These uncontrollable factors forced developers to reevaluate and overtune some assumptions on the basis of prior research, and they could ultimately lead to situations where the end users may not be satisfied by the response that the framework generates.
Evaluation Complexity
Although constructed agents aimed directly at applications might have a broader application, and facilitate better evaluation, it does add to the complexity of building LLM-powered applications which makes it difficult to analyze the performance of the applications. Furthermore, this approach also adds to the instability, and extends the system chain of the LLMs that makes it challenging for the framework to adapt to different components. It thus makes sense to refine the system design and operating logic of these agents to simplify the procedures, and ensure effective output.
Final Thoughts
In this article, we have talked about OpenAgents framework, an open platform for hosting and deploying language agents in the wild, and across a host of everyday tasks. The OpenAgents framework is built around three agents: Data Agent, helps with Data Analysis using data tools, and query languages like SQL, or programming languages like Python, Plugin Agents, helps by providing access to over 200+ API tools helpful for daily tasks, and Web Agents helps in browsing the web while maintaining your anonymity. The OpenAgents framework uses a web user interface optimized for common failures and swift responses in an attempt to allow general users to interact with the agent functionalities, while at the same time, offering researchers and developers a seamless deployment experience on their local setups. By providing a transparent, holistic, and a deployable platform, OpenAgents aims to make the potential of LLMs accessible to a wider range of users not limited to researchers and developers, but also end users with limited technical expertise.
Credit: Source link