• bitcoinBitcoin(BTC)$61,646.00-2.39%
  • ethereumEthereum(ETH)$1,638.54-2.40%
  • tetherTether(USDT)$1.00-0.03%
  • binancecoinBNB(BNB)$588.05-2.56%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.12-4.55%
  • solanaSolana(SOL)$64.53-3.49%
  • tronTRON(TRX)$0.322214-0.43%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.54%
  • dogecoinDogecoin(DOGE)$0.084102-2.52%
  • HyperliquidHyperliquid(HYPE)$55.96-9.90%
  • USDSUSDS(USDS)$1.000.00%
  • leo-tokenLEO Token(LEO)$9.430.27%
  • RainRain(RAIN)$0.012699-3.21%
  • zcashZcash(ZEC)$435.24-7.30%
  • CantonCanton(CC)$0.1645512.21%
  • stellarStellar(XLM)$0.186228-7.63%
  • whitebitWhiteBIT Coin(WBT)$50.8112.99%
  • cardanoCardano(ADA)$0.161654-4.97%
  • moneroMonero(XMR)$316.61-3.04%
  • chainlinkChainlink(LINK)$7.78-1.89%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • USD1USD1(USD1)$1.000.05%
  • the-open-networkToncoin(TON)$1.66-5.51%
  • daiDai(DAI)$1.00-0.01%
  • bitcoin-cashBitcoin Cash(BCH)$199.55-5.15%
  • MemeCoreMemeCore(M)$2.91-3.65%
  • hedera-hashgraphHedera(HBAR)$0.078645-2.85%
  • litecoinLitecoin(LTC)$42.55-0.76%
  • suiSui(SUI)$0.75-0.70%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • LABLAB(LAB)$9.11-16.21%
  • avalanche-2Avalanche(AVAX)$6.56-2.80%
  • paypal-usdPayPal USD(PYUSD)$1.000.02%
  • shiba-inuShiba Inu(SHIB)$0.000005-1.04%
  • nearNEAR Protocol(NEAR)$2.08-4.94%
  • crypto-com-chainCronos(CRO)$0.060117-2.39%
  • Global DollarGlobal Dollar(USDG)$1.00-0.02%
  • tether-goldTether Gold(XAUT)$4,177.20-3.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.12-0.91%
  • BittensorBittensor(TAO)$206.69-4.94%
  • pax-goldPAX Gold(PAXG)$4,186.35-3.05%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0574043.53%
  • mantleMantle(MNT)$0.54-0.60%
  • worldcoin-wldWorldcoin(WLD)$0.50-1.97%
  • OndoOndo(ONDO)$0.349006-6.32%
  • AsterAster(ASTER)$0.620.17%
  • Ripple USDRipple USD(RLUSD)$1.000.00%
  • polkadotPolkadot(DOT)$0.95-2.58%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Cohere open-sources a coding agent that runs on a single H100

June 9, 2026
in AI & Technology
Reading Time: 4 mins read
A A
Cohere open-sources a coding agent that runs on a single H100
ShareShareShareShareShare

Engineering teams building agentic coding pipelines now have a concrete open-source alternative to managed models like Claude Fable 5 — one that runs on a single H100. The tradeoff: Cohere’s North Mini Code, which launched Tuesday, generated three times the output tokens of comparable models in independent testing, a verbosity cost that compounds in high-volume production workloads.

YOU MAY ALSO LIKE

Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken

Conan O’Brien Is Hosting Educational Videos For An AI Cybersecurity Company

The new open-source model is a 30 billion parameter mixture-of-experts (MoE) model with 3 billion parameters active per token, built for agentic software engineering including sub-agent orchestration, architecture mapping, code review and terminal work. The model supports a 256,000 token context window with a 64,000 token maximum generation length, and is available on Hugging Face under an Apache 2.0 license.

What North Mini Code can do

North Mini Code targets the full agentic coding stack. Here is what the model does and what it runs on.

Software engineering. Cohere built North Mini Code specifically for agentic software engineering, not adapted from a general-purpose base. It has integrated tool-use capabilities and supports interleaved thinking, which Cohere says improves performance across multi-step agentic work.

Architecture mapping and code review. North Mini Code can analyze and map systems architecture, surface dependencies and perform code review across large codebases. With a 256,000 token context window, it can hold substantial multi-file projects in a single context pass.

Terminal-based agentic tasks. The model is trained for terminal environments, handling shell interactions, package scripts and command-line tooling. Cohere benchmarked it on Terminal-Bench v2, which tests agents in real terminal environments rather than synthetic code generation tasks.

How it was built

North Mini Code is a sparse mixture-of-experts model with 128 experts, of which 8 activate per token. The compute requirement at inference time is closer to a 3 billion parameter model despite 30 billion total parameters. Nick Frosst, co-founder of Cohere, demoed it running on a Mac Studio via MLX at around 20 gigabytes of RAM, the same machine he uses for his own local coding work.

Cohere trained the model through two stages of supervised fine-tuning followed by reinforcement learning with verifiable rewards across more than 70,000 verifiable tasks spanning approximately 5,000 repositories, deduplicated against SWE-Bench. 

Rather than optimizing against a single agent scaffold, Cohere trained across three. SWE-Agent uses a rich CLI with specialized commands. Mini-SWE-Agent uses a single bash tool with raw shell output. OpenCode uses individually typed tools returning structured JSON. Cohere reports a 10 percentage point gain on OpenCode evaluation from the multi-harness approach while maintaining SWE-Agent performance.

Where it fits

North Mini Code enters a market that now includes Mistral Devstral Small 2, GitHub Copilot, Cursor, and Claude Fable 5 — each with distinct cost and deployment tradeoffs.

Cohere’s primary benchmark comparison is against Mistral Devstral Small 2, a 24 billion parameter dense model. In vendor-reported internal tests, Cohere claims 2.8x higher output throughput and a 30% inter-token latency advantage over Devstral Small 2 in internal tests under identical hardware configurations. Cohere also claims, in its Hugging Face technical post, that North Mini Code outperforms open-source models up to four times its parameter count on its reported benchmarks, including models at 120 billion parameters.

Artificial Analysis independently ranks it eighth of 127 comparable open-weight models on output speed at 210 tokens per second, with a time to first token of 0.25 second against a class median of 1.95 seconds. It places 18th of 127 on the Artificial Analysis Intelligence Index. One flag from the same data: the model generated 75 million output tokens to complete the Intelligence Index against a class median of 25 million. In high-volume agentic pipelines, that verbosity compounds into inference cost and latency.

“Suddenly people are thinking like hey, am I getting enough economic value out of the tokens from a model?” Frosst said during the launch video. “Local deployment is one way of empowering people and making AI really something that works for them.”

GitHub Copilot, Cursor and Claude Code operate on per-usage or subscription pricing with no on-premises option. Anthropic’s Claude Fable 5, now the most capable publicly available managed coding model, runs at $50 per million output tokens. For Frosst, the model is the polar opposite of Fable.

“Its small, cost effective, apache 2.0, and locally deployable. This is the way LLMs should go. small, open source, transparent and sovereign, vs large, expensive, proprietary and hegemonic,” Frosst wrote in a post on X.

What this means for enterprises

For teams building production agentic coding pipelines, North Mini Code’s release clarifies a set of decisions that have been forming for months.

Purpose-built agentic training is now a baseline to evaluate against. The distinction between models fine-tuned for code and models trained specifically for agentic workflows, with verified tool calls and multi-harness robustness, is now a material factor in pipeline decisions. Any model vendor claiming agentic coding capability should be able to answer whether its training used verifiable agentic tasks or was adapted from a general-purpose base.

Verbosity is a hidden pipeline cost that benchmarks do not surface. Artificial Analysis measured North Mini Code generating three times the output tokens of comparable models. That verbosity compounds across inference cost and latency in high-volume pipelines. Throughput testing against actual workload volume is the evaluation step the benchmark rankings skip.

The frontier pricing split is now a real architectural decision. Fable 5 at $50 per million output tokens and North Mini Code on a single H100 represent a genuine tradeoff between cost control and data residency on one side, and managed infrastructure overhead on the other. Teams running high-volume agentic coding pipelines should model both cost paths against their actual workload before committing to either.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken
AI & Technology

Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken

June 10, 2026
Conan O’Brien Is Hosting Educational Videos For An AI Cybersecurity Company
AI & Technology

Conan O’Brien Is Hosting Educational Videos For An AI Cybersecurity Company

June 9, 2026
Apple’s new Siri AI is more than just a smarter assistant — it’s a new enterprise app layer
AI & Technology

Apple’s new Siri AI is more than just a smarter assistant — it’s a new enterprise app layer

June 9, 2026
Opera’s Latest Android Update Includes A Soccer Hub And A Refreshed Start Page
AI & Technology

Opera’s Latest Android Update Includes A Soccer Hub And A Refreshed Start Page

June 9, 2026
Next Post
Apple’s new Siri AI is more than just a smarter assistant — it’s a new enterprise app layer

Apple’s new Siri AI is more than just a smarter assistant — it's a new enterprise app layer

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Hunter Biden renews feud with CNN’s Jake Tapper, blasts him for ‘attacking my mom’

Hunter Biden renews feud with CNN’s Jake Tapper, blasts him for ‘attacking my mom’

June 4, 2026
British lawmaker Jess Asato sues Elon Musk’s xAI after Grok users made fake sexualized images of her in a bikini

British lawmaker Jess Asato sues Elon Musk’s xAI after Grok users made fake sexualized images of her in a bikini

June 3, 2026
Trans rights progress sparked backlash, lawyer says

Trans rights progress sparked backlash, lawyer says

June 5, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!