LLMWare.ai, a pioneer in deploying and fine-tuning Small Language Models (SLMs) announced today the launching of Model Depot in Hugging Face, one of the largest collections of SLMs that are optimized for Intel PCs. With over 100 models spanning multiple use cases such as chat, coding, math, function calling, and embedding models, Model Depot aims to provide to the open-source AI community an unprecedented collection of the latest SLMs that are optimized for Intel-based PCs in Intel’s OpenVINO as well as ONNX formats.
Using LLMWare’s Model Depot combined with LLMWare’s open-source library that provides a complete toolkit for end-to-end development of AI-enabled workflows, developers can create Retrieval Augmented Generation (RAG) and agent-based workflows using SLMs in OpenVINO format for Intel hardware users. OpenVINO is an open-source library for optimizing and deploying deep learning model inferencing capabilities, including large and small language models. Specifically designed to reduce resource demands to efficiently deploy on a range of platforms, including on-device and AI PCs, OpenVINO supports model inferencing on CPUs, GPUs and Intel NPUs.
Similarly, ONNX provides an open-source format for AI models, both deep learning and traditional ML, with a current focus on the capabilities needed for inferencing. ONNX can be found in many frameworks, tools, and hardware and aims to enable interoperability between different frameworks.
In a recent white paper, LLMWare found that deploying 4-bit quantized small language models (1B-9B parameters) in the OpenVINO format maximizes model inference performance on Intel AI PCs. When tested on a Dell laptop with Intel Core Ultra 9 (Meteor Lake), using a 1.1B parameter BLING-Tiny-Llama model, the OpenVINO quantized format led to inference speeds that are up to 7.6x faster than PyTorch and up to 7.5x faster than GGUF.
The comparison uses consistently LLMWare’s 21-question RAG test. The processing time shows the total runtime for all 21 questions:
Detailed information about LLMWare’s testing methodology can be found in the white paper.
LLMWare’s goal is to provide a powerful abstraction layer for working with various inferencing capabilities. By supporting OpenVINO, ONNX and Llama.cpp all in one platform, developers are able to leverage model formats that are most performant with specific hardware capabilities of their intended users. With Model Depot, Intel PC developers are able to access SLMs that are specifically optimized for inferencing on Intel hardware.
Providing OpenVINO and ONNX support for the most popular SLMs today, including Microsoft Phi-3, Mistal, Llama, Yi and Qwen as well as LLMWare’s specialized function calling SLIM models designed for multi-step workflows and RAG specialized DRAGON and BLING family of models, LLMWare provides developers the SLMs to easily and seamlessly build productivity-enhancing workflows that maximize the local capabilities of AI PCs.
Optimized with powerful integrated GPUs and NPUs that provide the hardware capability to enable AI apps to be deployed on-device, AI PCs allows enterprises to deploy many lightweight AI apps locally without exposing sensitive data or necessitating data copies in external systems. This unlocks tremendous benefits from added security, safety and significant cost-savings.
LLMWare also recently announced its strategic collaboration with Intel with its launch of Model HQ in limited release for private preview. Specifically designed for AI PCs with Intel Core Ultra Processors, Model HQ provides an out-of-the-box no-code kit for running, creating and deploying AI-enabled apps with integrated UI/UX and low-code agent workflow for easy app creation. With built-in Chatbot and Document Search and Analysis features, the app comes ready to use, with the ability to launch custom workflows directly on the device. Model HQ also comes with many enterprise-ready security and safety features such as Model Vault for model security checks, Model Safety Monitor for toxicity and bias screening, hallucination detector, AI Explainability data, Compliance and Auditing Toolkit, Privacy Filters and much more.
“At LLMWare, we believe strongly in lowering the center of gravity in AI to enable local, private, decentralized, self-hosted deployment – with high-quality models and data pipelines optimized for safe, controlled, cost-optimized roll-outs of lightweight, customized RAG, Agent and Chat apps for enterprises of all sizes. We are thrilled to launch the Model Depot collection in open source to expand access to OpenVino and ONNX packaged models to support the AI PC roll-out over the coming months,” said Darren Oberst, Chief Technology Officer of LLMWare.
“Rise of Generative AI unlocks new application experiences that were not available with previous generations of data processing algorithms. Unique combination of powerful AI PC platform and optimization software like OpenVINO is a way to get best characteristics for local and private owned LLMs deployment without thinking of optimization details. LLMWare’s platform goes step further by allowing to use software building blocks and pretrained models to implement data processing within final application and save time to market. Combination of OpenVINO and LLMWare’s platform truly unlocks best performing Generative AI capabilities at the edge for applications,” said Yury Gorbachev, Intel Fellow, and OpenVINO Architect at Intel.
Please visit LLMWare’s Github and Hugging Face sites for its comprehensive library in open source and collection of small language models as well as llmware.ai for the latest white paper and blogs.
Thanks to AI Bloks for the thought leadership/ Educational article. AI Bloks has supported us in this content/article.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.
Credit: Source link