Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Canadian AI lab Cohere made waves recently by announcing a merger with German AI startup Aleph Alpha, but now it has even more in store for enterprise builders around the globe: today, the firm co-founded by former Googler and “Attention Is All You Need” co-author Aidan Gomez unveiled Command A+, a highly optimized, 218-billion-parameter language model engineered specifically for complex reasoning, multimodal document processing, and agentic workflows.

The most significant aspect of the release is not just the model’s capabilities; it is its accessibility.

Enterprise AI agents keep failing because they forget what they learned

New York City Mayor Zohran Mamdani Is Launching A Twitch Show

By releasing the model weights free on the popular AI code sharing repository Hugging Face under a highly permissive Apache 2.0 open-source license — a first for the company, according to a post by Gomez, now Cohere’s CEO, on X — Cohere is making a calculated bet on “sovereign AI”—the thesis that enterprises, governments, and developers should have the ability to run, control, and adapt frontier-grade AI entirely within their own secure environments, without sacrificing performance.

Sparse architecture with extreme quantization

At the architectural level, Command A+ represents a major evolution from Cohere’s previous dense models. It is a decoder-only Sparse Mixture-of-Experts (MoE) Transformer.

While the model houses a relatively modest 218 billion total parameters, even fewer — only 25 billion — are active during any given generation step. It’s a much lighter footprint and requires far less compute resources to run in inference (serving the model in production environments to end users or via agents) than the proprietary U.S. giants like OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7, which are estimated by third-party observers to be in the trillions of parameters.

This sparse architecture is the key to the model’s efficiency. In plain terms, an MoE model routes incoming queries only to the specific “expert” neural networks best suited to handle them, leaving the rest of the model dormant.

This is a familiar formulation and one followed by most leading LLMs these days, allowing models to retain the vast knowledge base and nuanced reasoning capabilities of a giant, but at the faster speeds and reduced compute and energy requirements of a much smaller model, since only a fraction of parameters are ever activated at any time.

But where Cohere has taken an extra step beyond most for Command A+ is that it has focused heavily on hardware efficiency through quantization—a process that compresses the model’s memory footprint by reducing the precision of its parameters.

Command A+ is available in 16-bit (BF16), 8-bit (FP8), and a highly compressed 4-bit (W4A4) format.

The W4A4 quantization is the technical centerpiece of this release. Typically, reasoning models suffer an outsized “quantization tax,” where compressing the model leads to visible regressions in complex problem-solving.

Cohere mitigated this by only quantizing the MoE experts to 4-bit, while keeping the critical attention pathways at full precision, supplemented by a technique called Quantization-Aware Distillation.

The result is a nearly lossless compression that allows this massive model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.

The speed gains are equally notable. According to performance data released by the company, the W4A4 quantization at low concurrency achieves 375 tokens per second (TOPS) with a Time-to-First-Token (TTFT) latency of just 113 milliseconds—representing up to a 63% increase in output speed and a 17% reduction in latency compared to the previous Command A Reasoning model.

Furthermore, Cohere has overhauled the model’s tokenizer. Tokenizers break text down into the fragments that AI models process. The new tokenizer is highly optimized for global enterprise use, featuring native support for 48 languages.

More importantly, it dramatically improves tokenization efficiency for non-European languages, reducing the number of tokens required to generate responses in Arabic by 20%, Japanese by 18%, and Korean by 16%. Because inference costs are calculated per token, this translates directly to lower operational costs for global, multilingual or non-English deployments.

Agentic workflows and high benchmarks on math, specialized fields

While raw speed and size dictate deployment, a model’s utility is defined by its product capabilities. Command A+ was built specifically for “agentic” tasks — workflows where the AI operates autonomously or semi-autonomously, uses external tools, queries databases, and synthesizes information across multiple steps.

The benchmark leaps over the previous generation are stark.

Cohere Command A+ benchmark comparison charts. Credit: Cohere

On 𝜏²-Bench Telecom, which tests complex reasoning, the model jumped from a 37% score to 85%. On Terminal-Bench Hard, which measures agentic coding performance, it climbed from 3% to 25%. In complex mathematics, it scored 90% on AIME 25, up from 57%.

Command A+ punches above its weight class (25B active parameters) in pure reasoning and mathematics, competing directly with much larger models like DeepSeek V4 Pro on math benchmarks. However, for deep agentic coding and general broad-scale intelligence indexing, it currently trails behind the latest generations from Chinese open source rivals like DeepSeek, Z.ai (GLM), and MiniMax.

That said, comparing them directly ignores Cohere’s core value proposition: hardware efficiency.

Beyond the benchmarks, Command A+ introduces deep integrations for enterprise trust and verification. The model supports conversational tool use via standard chat templates, allowing developers to connect it seamlessly to internal APIs, search engines, or SQL databases.

Crucially, Command A+ features native citation generation. When Command A+ retrieves information from an external tool, it doesn’t just synthesize the answer; it generates explicit “grounding spans.” Using special tags embedded in the output, the model directly links every factual claim it makes to the specific source document or database row it pulled the information from.

For enterprises heavily regulated industries like finance, healthcare, or legal, this traceability is the difference between an interesting prototype and a production-ready application. If a user asks for a daily sales report, the model will output the total sales amount and explicitly cite the database query result that provided that number, minimizing the risk of undetected hallucinations.

Additionally, Command A+ is fully multimodal, capable of processing both text and images natively within its massive 128K input context window, making it highly effective for complex document processing, such as analyzing scanned invoices, charts, or technical manuals.

The first fully Apache 2.0 licensed Cohere AI model

In the current AI landscape, “open source” has become a fraught term. Many leading AI companies release their model weights under restrictive commercial licenses or acceptable use policies that explicitly forbid large enterprises from using the models for commercial purposes, or prohibit the models from being used to train competing AI systems.

Indeed, Cohere’s prior models, including Command R and Command R+, were released under a CC-BY-NC 4.0 (Creative Commons NonCommercial) license. While their model weights were open for researchers and developers to download, tinker with, and evaluate, they were strictly prohibited from being used for commercial purposes without purchasing a separate enterprise license from Cohere or going through its application programming interface (API), similar to the arrangement many enterprises use for accessing AI models from OpenAI, Anthropic, Google and other leading labs.

Cohere has changed up its approach by releasing Command A+ under the Apache 2.0 license. This is a critical distinction for the developer community. Apache 2.0 is a true, OSI-approved open-source license. It allows anyone—from independent developers to Fortune 500 corporations—to use, modify, distribute, and commercialize the model without paying licensing fees or adhering to restrictive non-compete clauses.

As Gomez wrote on X, the decision was championed by fellow Cohere co-founder Nick Frosst, who posted a two-minute long overview calling it “the best model we’ve ever put out.”

For the enterprise, this license means total vendor independence. A company can download the Command A+ weights, fine-tune them on highly classified internal data, and deploy them on their own private servers or air-gapped networks. They are not tethered to Cohere’s infrastructure, pricing changes, or API uptime. It is the ultimate realization of sovereign AI.

The release was met with immediate traction across the AI developer ecosystem, driven heavily by its day-one integration with major open-source inference frameworks like Hugging Face and vLLM.

What’s next?

The release of Command A+ marks a maturing of the open-source AI ecosystem. By combining frontier-level reasoning, robust agentic tool use, and multimodal capabilities with an architecture specifically designed for hardware efficiency, Cohere is changing the calculus for enterprise AI adoption.

The requirement of massive, centralized compute clusters has long been a bottleneck for companies prioritizing data privacy and cost control. By democratizing access to a model of this caliber under a true open-source license, Cohere has provided the enterprise market with exactly what it has been asking for: the power of the cloud, capable of running securely in the server room down the hall.

Credit: Source link