• bitcoinBitcoin(BTC)$77,541.000.18%
  • ethereumEthereum(ETH)$2,129.530.23%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$656.211.28%
  • rippleXRP(XRP)$1.370.51%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$87.111.39%
  • tronTRON(TRX)$0.3646621.54%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.97%
  • dogecoinDogecoin(DOGE)$0.1053031.88%
  • HyperliquidHyperliquid(HYPE)$58.307.68%
  • whitebitWhiteBIT Coin(WBT)$57.170.17%
  • zcashZcash(ZEC)$667.76-0.39%
  • USDSUSDS(USDS)$1.00-0.01%
  • cardanoCardano(ADA)$0.2505080.66%
  • leo-tokenLEO Token(LEO)$9.85-1.95%
  • bitcoin-cashBitcoin Cash(BCH)$379.942.24%
  • moneroMonero(XMR)$393.34-1.64%
  • chainlinkChainlink(LINK)$9.731.25%
  • CantonCanton(CC)$0.1590043.00%
  • the-open-networkToncoin(TON)$2.061.50%
  • stellarStellar(XLM)$0.1466642.35%
  • USD1USD1(USD1)$1.000.01%
  • suiSui(SUI)$1.124.49%
  • Ethena USDeEthena USDe(USDE)$1.00-0.04%
  • daiDai(DAI)$1.000.00%
  • litecoinLitecoin(LTC)$54.080.27%
  • avalanche-2Avalanche(AVAX)$9.451.92%
  • hedera-hashgraphHedera(HBAR)$0.0897481.10%
  • MemeCoreMemeCore(M)$2.85-7.73%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.02%
  • RainRain(RAIN)$0.0075170.85%
  • shiba-inuShiba Inu(SHIB)$0.0000060.91%
  • crypto-com-chainCronos(CRO)$0.0696241.08%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$280.532.92%
  • tether-goldTether Gold(XAUT)$4,530.530.01%
  • Global DollarGlobal Dollar(USDG)$1.00-0.02%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • nearNEAR Protocol(NEAR)$1.9314.19%
  • uniswapUniswap(UNI)$3.60-0.58%
  • mantleMantle(MNT)$0.682.54%
  • polkadotPolkadot(DOT)$1.293.73%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.14%
  • pax-goldPAX Gold(PAXG)$4,532.380.03%
  • OndoOndo(ONDO)$0.4179784.40%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.062140-1.09%
  • HTX DAOHTX DAO(HTX)$0.0000020.83%
  • Falcon USDFalcon USD(USDF)$1.000.01%
  • AsterAster(ASTER)$0.69-0.57%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs

May 21, 2026
in AI & Technology
Reading Time: 10 mins read
A A
Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs
ShareShareShareShareShare

Cohere just released Command A+, as an open-source model targeting enterprise agentic workflows. Available under an Apache 2.0 license, Command A+ is a mixture-of-experts (MoE) model built for high-performance agentic tasks with minimal compute overhead. The model is optimized for reasoning, agentic workflows, RAG, multilingual, and multimodal document processing. It unifies capabilities from four prior models — Command A, Command A Reasoning, Command A Vision, and Command A Translate — into a single scalable model.

Architecture

Command A+ is a decoder-only Sparse Mixture-of-Experts Transformer with 218B total parameters and 25B active parameters. It has 128 experts, of which 8 are active per token, and a single shared expert is applied to all tokens. In a MoE model, each token is routed through only a subset of expert sub-networks rather than the full parameter set, keeping active compute at 25B-parameter scale at inference time.

YOU MAY ALSO LIKE

A 0.12% parameter add-on gives AI agents the working memory RAG can’t

Meta Settles Closely Watched School District Lawsuit Weeks Ahead Of Trial

The attention layers interleave sliding-window attention layers with Rotational Positional Embeddings and global attention layers without positional embeddings in a 3:1 ratio. The sparse MoE layer is trained in a fully dropless manner and uses a token-choice router, with a normalized sigmoid over the top-k expert logits per token.

Input modalities are text, image, and tool use. Output modalities are text, reasoning, and tool use. The model supports a 128K input context length and a 64K max generation length.

Hardware Requirements and Quantization

Three quantization variants are available with minimum GPU requirements: BF16 (16-bit) requires 4× B200 or 8× H100 GPUs; FP8 (8-bit) requires 2× B200 or 4× H100 GPUs; W4A4 (4-bit) runs on a single B200 or 2× H100 GPUs. All three quantizations show negligible differences in benchmark quality. Cohere recommends W4A4 for most deployments.

W4A4 Quantization Methodology

Cohere applies NVFP4 W4A4 quantization, 4-bit weights and activations with two-level scaling, to the MoE experts only. The attention path, including Q/K/V/O projections, the KV cache, and attention compute, is kept at full precision.

To close residual quality gaps, Cohere uses Quantization-Aware Distillation (QAD) in the post-training phase: the quantized student model is trained to match the full-precision teacher’s output distribution, using fake quantization operators in the forward pass and straight-through estimators on the backward pass.

https://cohere.com/blog/command-a-plus

Performance vs. Prior Command A Models

On τ²-Bench Telecom, scores improved from 37% to 85% over Command A Reasoning, and Terminal-Bench Hard agentic coding performance reached 25% from 3%.

On internal North platform evaluations, all scored using LLM-as-a-judge techniques, Agentic Question Answering accuracy improved by 20% over Command A Reasoning. Agentic QA measures how well the model answers enterprise questions using MCP-connected cloud file systems. Spreadsheet analysis quality improved by 32%, and Memory Usage Quality — measuring how well an agent leverages information from a previous session to answer questions in a subsequent session — scored 54% with Command A+ compared to 39% with Command A Reasoning.

Command A+ is Cohere’s first multimodal reasoning model. It achieved 63% on MMMU Pro and 75.1% on MMMU, compared with 65.3% for Command A Vision on the latter. MathVista scores improved from 73.5% to 80.6%, and CharXiv reasoning improved from 46.9% to 52.7%.

Command A+ expands multilingual coverage from 23 to 48 languages, with gains in machine translation and multilingual reasoning.

Command A+ scored 37 on the Artificial Analysis Intelligence Index, outperforming other leading open models.

https://cohere.com/blog/command-a-plus
https://cohere.com/blog/command-a-plus

Speed and Latency

At the same quantization and concurrency levels, Command A+ delivers up to 63% higher Output Tokens per Second (TOPS) and reduces Time To First Token (TTFT) by up to 17% compared with Command A Reasoning. The W4A4 quantization contributes an additional 47% increase in speed and a 13% reduction in latency. Speculative decoding, optimized specifically for the MoE architecture, delivers an additional 1.5–1.6× inference speedup for both text and multimodal inputs.

Tokenizer

Command A+ is the first model to use Cohere’s latest tokenizer, reducing the number of tokens required to generate the same response. Tokenization efficiency improved by 20% for Arabic, 16% for Korean, and 18% for Japanese.

Getting Started

The model is supported by vLLM and Transformers. Tool use is handled through chat templates in Transformers using JSON schema for tool descriptions. When reasoning is enabled, the model generates thinking traces between <|START_THINKING|> and <|END_THINKING|> tags before producing a final answer.

The W4A4 variant requires vLLM ≥0.21.0 and cohere_melody>=0.9.0 for accurate response parsing. Cohere recommends the following sampling parameters: temperature=0.9, top_p=0.95, and repetition_penalty=1.04.

Key Takeaways

  • Command A+ has 218B total / 25B active parameters in a Sparse MoE architecture, released under Apache 2.0.
  • W4A4 applies NVFP4 quantization to MoE experts only with QAD post-training, running on 2× H100s.
  • τ²-Bench Telecom improved from 37% to 85%; Terminal-Bench Hard from 3% to 25% vs. Command A Reasoning.
  • TOPS increased up to 63% and TTFT reduced up to 17% vs. Command A Reasoning at matching quantization.
  • Command A+ is Cohere’s first multimodal reasoning model, expanding language support from 23 to 48 languages.

Check out the Model Weights and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Credit: Source link

ShareTweetSendSharePin

Related Posts

A 0.12% parameter add-on gives AI agents the working memory RAG can’t
AI & Technology

A 0.12% parameter add-on gives AI agents the working memory RAG can’t

May 21, 2026
Meta Settles Closely Watched School District Lawsuit Weeks Ahead Of Trial
AI & Technology

Meta Settles Closely Watched School District Lawsuit Weeks Ahead Of Trial

May 21, 2026
Enterprise AI agents keep failing because they forget what they learned
AI & Technology

Enterprise AI agents keep failing because they forget what they learned

May 21, 2026
New York City Mayor Zohran Mamdani Is Launching A Twitch Show
AI & Technology

New York City Mayor Zohran Mamdani Is Launching A Twitch Show

May 21, 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
My Friends Say I’ve Made a Big Financial Mistake

My Friends Say I’ve Made a Big Financial Mistake

May 20, 2026
BEFORE YOU SELL EVERYTHING WATCH THIS…

BEFORE YOU SELL EVERYTHING WATCH THIS…

May 18, 2026
Airbnb Expands Into Hotel Bookings And Even Grocery Deliveries

Airbnb Expands Into Hotel Bookings And Even Grocery Deliveries

May 20, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!