In today’s digital landscape, automating interactions with web content remains a nuanced challenge. Many existing solutions are resource-intensive and tailored for narrowly defined tasks, which limits their broader applicability. Developers often face the dual challenge of balancing computational efficiency with the need for a model that can generalize well across diverse websites. Traditional systems, heavily reliant on prompt-prediction, often lack the reflective reasoning required for the unpredictable nature of web environments. Additionally, proprietary models typically restrict access to detailed inner workings, making it difficult for researchers and practitioners in the open-source community to build on state-of-the-art methods. These persistent issues underline the importance of developing an automation tool that is both efficient and accessible.
Convergence has introduced Proxy Lite: a mini, open-weights version of their well-regarded Proxy assistant. This 3B parameter Vision-Language Model is designed to extend sophisticated web automation capabilities to the open-source community. Rather than promising extraordinary feats, Proxy Lite aims to offer a balanced approach that marries efficiency with reliability. Its architecture builds on a solid foundation, allowing it to perform a variety of web-based tasks without imposing heavy computational demands.
What makes Proxy Lite notable is its transparent design and open-weights approach. This encourages the community to explore, modify, and improve upon its framework. With an integrated system for Vision-Language Model (VLM) and browser interactions, Proxy Lite allows for nuanced control over browser tasks. The model’s configuration supports practical applications ranging from routine data extraction to more complex navigational tasks, all while keeping resource usage in check.
Technical Aspects and Their Benefits
At its core, Proxy Lite leverages a 3B parameter model built on the Qwen2.5-VL-3B-Instruct foundation. This choice reflects a commitment to balancing performance with efficiency. The model employs a three-phase process to generate responses:
- Observation: The model first examines the current state of the web page—confirming, for instance, that an overlay or privacy banner has been dismissed.
- Thinking: It then methodically determines the next course of action, weighing the various possibilities based on the context.
- Tool Call: Finally, it issues a precise command to execute the selected action within the browser.
This structured approach not only improves task reliability but also facilitates the model’s ability to generalize across different types of web interactions. By mirroring human-like reasoning processes, Proxy Lite manages to strike a balance between simplicity and sophistication. Moreover, its design supports a straightforward integration into both command-line interfaces and Streamlit applications, making deployment accessible even for those with modest technical resources.
Performance Insights and Practical Evaluations
Proxy Lite has been carefully evaluated using the WebVoyager benchmark, a comprehensive set of tasks designed to test web automation capabilities. The model achieved an overall score of 72.4%, a strong performance indicator given its open-weights nature. Detailed performance statistics across various websites reveal its thoughtful design:
- Allrecipes: Achieving an 87.8% success rate with an average of 10.3 message exchanges, it demonstrates effectiveness in content-rich environments.
- Amazon: A 70.0% success rate here highlights the model’s ability to navigate more complex, dynamic e-commerce platforms.
- Notable High-Profile Sites: With success rates in the low 80s on platforms such as Apple and GitHub, Proxy Lite consistently shows reliable behavior on diverse sites.
- Google Services: While some areas, such as Google Flights, yield lower success metrics, the overall performance remains competitive considering the model’s scope.

These findings reflect a balanced performance, with Proxy Lite efficiently managing tasks without the overhead typically associated with larger, proprietary models. The comprehensive evaluation not only underscores its current utility but also points to potential enhancements through community-driven refinements.
Conclusion
Proxy Lite emerges as a thoughtfully designed tool in the field of web automation. By addressing key challenges—such as resource constraints, generalization, and transparency—it offers a practical solution for automating routine online tasks. Its open-weights approach and modular design invite collaboration and ongoing development, providing a valuable resource for both academic research and commercial projects.
Check out the Technical Details and Model here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.
🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.
Credit: Source link