• bitcoinBitcoin(BTC)$77,584.000.11%
  • ethereumEthereum(ETH)$2,137.11-0.01%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$656.791.19%
  • rippleXRP(XRP)$1.380.18%
  • usd-coinUSDC(USDC)$1.000.01%
  • solanaSolana(SOL)$87.411.26%
  • tronTRON(TRX)$0.3640831.54%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.71%
  • dogecoinDogecoin(DOGE)$0.1058351.39%
  • HyperliquidHyperliquid(HYPE)$58.8012.95%
  • whitebitWhiteBIT Coin(WBT)$57.280.09%
  • zcashZcash(ZEC)$675.483.29%
  • USDSUSDS(USDS)$1.000.01%
  • cardanoCardano(ADA)$0.2519340.76%
  • leo-tokenLEO Token(LEO)$10.040.06%
  • bitcoin-cashBitcoin Cash(BCH)$381.532.53%
  • moneroMonero(XMR)$402.200.23%
  • chainlinkChainlink(LINK)$9.791.70%
  • CantonCanton(CC)$0.1607265.28%
  • the-open-networkToncoin(TON)$2.06-0.02%
  • stellarStellar(XLM)$0.1478972.74%
  • USD1USD1(USD1)$1.000.03%
  • suiSui(SUI)$1.146.64%
  • Ethena USDeEthena USDe(USDE)$1.00-0.05%
  • daiDai(DAI)$1.000.01%
  • litecoinLitecoin(LTC)$54.250.04%
  • avalanche-2Avalanche(AVAX)$9.461.53%
  • hedera-hashgraphHedera(HBAR)$0.0901280.96%
  • paypal-usdPayPal USD(PYUSD)$1.000.03%
  • RainRain(RAIN)$0.0075520.59%
  • MemeCoreMemeCore(M)$2.76-18.00%
  • shiba-inuShiba Inu(SHIB)$0.0000060.85%
  • crypto-com-chainCronos(CRO)$0.0695180.42%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$285.844.74%
  • tether-goldTether Gold(XAUT)$4,535.490.20%
  • Global DollarGlobal Dollar(USDG)$1.000.02%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • nearNEAR Protocol(NEAR)$1.848.76%
  • uniswapUniswap(UNI)$3.61-0.06%
  • mantleMantle(MNT)$0.685.78%
  • polkadotPolkadot(DOT)$1.282.57%
  • pax-goldPAX Gold(PAXG)$4,537.630.21%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.56%
  • OndoOndo(ONDO)$0.4227065.16%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.063545-0.06%
  • HTX DAOHTX DAO(HTX)$0.0000021.11%
  • AsterAster(ASTER)$0.702.07%
  • Falcon USDFalcon USD(USDF)$1.000.01%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

May 20, 2026
in AI & Technology
Reading Time: 7 mins read
A A
Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+
ShareShareShareShareShare

Canadian AI lab Cohere made waves recently by announcing a merger with German AI startup Aleph Alpha, but now it has even more in store for enterprise builders around the globe: today, the firm co-founded by former Googler and “Attention Is All You Need” co-author Aidan Gomez unveiled Command A+, a highly optimized, 218-billion-parameter language model engineered specifically for complex reasoning, multimodal document processing, and agentic workflows.

The most significant aspect of the release is not just the model’s capabilities; it is its accessibility.

YOU MAY ALSO LIKE

Enterprise AI agents keep failing because they forget what they learned

New York City Mayor Zohran Mamdani Is Launching A Twitch Show

By releasing the model weights free on the popular AI code sharing repository Hugging Face under a highly permissive Apache 2.0 open-source license — a first for the company, according to a post by Gomez, now Cohere’s CEO, on X — Cohere is making a calculated bet on “sovereign AI”—the thesis that enterprises, governments, and developers should have the ability to run, control, and adapt frontier-grade AI entirely within their own secure environments, without sacrificing performance.

Sparse architecture with extreme quantization

At the architectural level, Command A+ represents a major evolution from Cohere’s previous dense models. It is a decoder-only Sparse Mixture-of-Experts (MoE) Transformer.

While the model houses a relatively modest 218 billion total parameters, even fewer — only 25 billion — are active during any given generation step. It’s a much lighter footprint and requires far less compute resources to run in inference (serving the model in production environments to end users or via agents) than the proprietary U.S. giants like OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7, which are estimated by third-party observers to be in the trillions of parameters.

This sparse architecture is the key to the model’s efficiency. In plain terms, an MoE model routes incoming queries only to the specific “expert” neural networks best suited to handle them, leaving the rest of the model dormant.

This is a familiar formulation and one followed by most leading LLMs these days, allowing models to retain the vast knowledge base and nuanced reasoning capabilities of a giant, but at the faster speeds and reduced compute and energy requirements of a much smaller model, since only a fraction of parameters are ever activated at any time.

But where Cohere has taken an extra step beyond most for Command A+ is that it has focused heavily on hardware efficiency through quantization—a process that compresses the model’s memory footprint by reducing the precision of its parameters.

Command A+ is available in 16-bit (BF16), 8-bit (FP8), and a highly compressed 4-bit (W4A4) format.

The W4A4 quantization is the technical centerpiece of this release. Typically, reasoning models suffer an outsized “quantization tax,” where compressing the model leads to visible regressions in complex problem-solving.

Cohere mitigated this by only quantizing the MoE experts to 4-bit, while keeping the critical attention pathways at full precision, supplemented by a technique called Quantization-Aware Distillation.

The result is a nearly lossless compression that allows this massive model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.

The speed gains are equally notable. According to performance data released by the company, the W4A4 quantization at low concurrency achieves 375 tokens per second (TOPS) with a Time-to-First-Token (TTFT) latency of just 113 milliseconds—representing up to a 63% increase in output speed and a 17% reduction in latency compared to the previous Command A Reasoning model.

Furthermore, Cohere has overhauled the model’s tokenizer. Tokenizers break text down into the fragments that AI models process. The new tokenizer is highly optimized for global enterprise use, featuring native support for 48 languages.

More importantly, it dramatically improves tokenization efficiency for non-European languages, reducing the number of tokens required to generate responses in Arabic by 20%, Japanese by 18%, and Korean by 16%. Because inference costs are calculated per token, this translates directly to lower operational costs for global, multilingual or non-English deployments.

Agentic workflows and high benchmarks on math, specialized fields

While raw speed and size dictate deployment, a model’s utility is defined by its product capabilities. Command A+ was built specifically for “agentic” tasks — workflows where the AI operates autonomously or semi-autonomously, uses external tools, queries databases, and synthesizes information across multiple steps.

The benchmark leaps over the previous generation are stark.

Cohere Command A+ benchmark comparison charts. Credit: Cohere

On 𝜏²-Bench Telecom, which tests complex reasoning, the model jumped from a 37% score to 85%. On Terminal-Bench Hard, which measures agentic coding performance, it climbed from 3% to 25%. In complex mathematics, it scored 90% on AIME 25, up from 57%.

Command A+ punches above its weight class (25B active parameters) in pure reasoning and mathematics, competing directly with much larger models like DeepSeek V4 Pro on math benchmarks. However, for deep agentic coding and general broad-scale intelligence indexing, it currently trails behind the latest generations from Chinese open source rivals like DeepSeek, Z.ai (GLM), and MiniMax.

That said, comparing them directly ignores Cohere’s core value proposition: hardware efficiency.

Beyond the benchmarks, Command A+ introduces deep integrations for enterprise trust and verification. The model supports conversational tool use via standard chat templates, allowing developers to connect it seamlessly to internal APIs, search engines, or SQL databases.

Crucially, Command A+ features native citation generation. When Command A+ retrieves information from an external tool, it doesn’t just synthesize the answer; it generates explicit “grounding spans.” Using special tags embedded in the output, the model directly links every factual claim it makes to the specific source document or database row it pulled the information from.

For enterprises heavily regulated industries like finance, healthcare, or legal, this traceability is the difference between an interesting prototype and a production-ready application. If a user asks for a daily sales report, the model will output the total sales amount and explicitly cite the database query result that provided that number, minimizing the risk of undetected hallucinations.

Additionally, Command A+ is fully multimodal, capable of processing both text and images natively within its massive 128K input context window, making it highly effective for complex document processing, such as analyzing scanned invoices, charts, or technical manuals.

The first fully Apache 2.0 licensed Cohere AI model

In the current AI landscape, “open source” has become a fraught term. Many leading AI companies release their model weights under restrictive commercial licenses or acceptable use policies that explicitly forbid large enterprises from using the models for commercial purposes, or prohibit the models from being used to train competing AI systems.

Indeed, Cohere’s prior models, including Command R and Command R+, were released under a CC-BY-NC 4.0 (Creative Commons NonCommercial) license. While their model weights were open for researchers and developers to download, tinker with, and evaluate, they were strictly prohibited from being used for commercial purposes without purchasing a separate enterprise license from Cohere or going through its application programming interface (API), similar to the arrangement many enterprises use for accessing AI models from OpenAI, Anthropic, Google and other leading labs.

Cohere has changed up its approach by releasing Command A+ under the Apache 2.0 license. This is a critical distinction for the developer community. Apache 2.0 is a true, OSI-approved open-source license. It allows anyone—from independent developers to Fortune 500 corporations—to use, modify, distribute, and commercialize the model without paying licensing fees or adhering to restrictive non-compete clauses.

As Gomez wrote on X, the decision was championed by fellow Cohere co-founder Nick Frosst, who posted a two-minute long overview calling it “the best model we’ve ever put out.”

For the enterprise, this license means total vendor independence. A company can download the Command A+ weights, fine-tune them on highly classified internal data, and deploy them on their own private servers or air-gapped networks. They are not tethered to Cohere’s infrastructure, pricing changes, or API uptime. It is the ultimate realization of sovereign AI.

The release was met with immediate traction across the AI developer ecosystem, driven heavily by its day-one integration with major open-source inference frameworks like Hugging Face and vLLM.

What’s next?

The release of Command A+ marks a maturing of the open-source AI ecosystem. By combining frontier-level reasoning, robust agentic tool use, and multimodal capabilities with an architecture specifically designed for hardware efficiency, Cohere is changing the calculus for enterprise AI adoption.

The requirement of massive, centralized compute clusters has long been a bottleneck for companies prioritizing data privacy and cost control. By democratizing access to a model of this caliber under a true open-source license, Cohere has provided the enterprise market with exactly what it has been asking for: the power of the cloud, capable of running securely in the server room down the hall.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Enterprise AI agents keep failing because they forget what they learned
AI & Technology

Enterprise AI agents keep failing because they forget what they learned

May 21, 2026
New York City Mayor Zohran Mamdani Is Launching A Twitch Show
AI & Technology

New York City Mayor Zohran Mamdani Is Launching A Twitch Show

May 21, 2026
Anthropic Is Reportedly About To Have Its First Profitable Quarter
AI & Technology

Anthropic Is Reportedly About To Have Its First Profitable Quarter

May 21, 2026
One Model, Three Modalities: ByteDance Releases Lance for Image and Video Understanding, Generation, and Editing
AI & Technology

One Model, Three Modalities: ByteDance Releases Lance for Image and Video Understanding, Generation, and Editing

May 21, 2026
Next Post
Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google's TurboQuant Algorithm

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
How I Turned a Bad Trade into a Winning Setup

How I Turned a Bad Trade into a Winning Setup

May 15, 2026
Wembanyama has 41 points, 24 rebounds and Spurs top Thunder 122-115 in 2OT to open West finals – AP News

Wembanyama has 41 points, 24 rebounds and Spurs top Thunder 122-115 in 2OT to open West finals – AP News

May 19, 2026
Kornacki: Virginia Democrats can win more ‘low-hanging fruit’ districts if they win referendum

Kornacki: Virginia Democrats can win more ‘low-hanging fruit’ districts if they win referendum

May 20, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!