• bitcoinBitcoin(BTC)$60,609.00-1.15%
  • ethereumEthereum(ETH)$1,555.40-2.44%
  • tetherTether(USDT)$1.00-0.01%
  • binancecoinBNB(BNB)$572.13-0.40%
  • usd-coinUSDC(USDC)$1.000.00%
  • rippleXRP(XRP)$1.08-1.72%
  • solanaSolana(SOL)$61.64-4.09%
  • tronTRON(TRX)$0.3233370.51%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.031.17%
  • HyperliquidHyperliquid(HYPE)$56.47-4.98%
  • dogecoinDogecoin(DOGE)$0.081215-1.52%
  • USDSUSDS(USDS)$1.00-0.02%
  • leo-tokenLEO Token(LEO)$9.48-1.34%
  • RainRain(RAIN)$0.012830-2.74%
  • stellarStellar(XLM)$0.2098583.33%
  • CantonCanton(CC)$0.16738712.51%
  • zcashZcash(ZEC)$350.89-5.74%
  • cardanoCardano(ADA)$0.156018-2.50%
  • moneroMonero(XMR)$293.95-7.30%
  • chainlinkChainlink(LINK)$7.32-1.47%
  • whitebitWhiteBIT Coin(WBT)$43.25-1.50%
  • USD1USD1(USD1)$1.00-0.01%
  • Ethena USDeEthena USDe(USDE)$1.00-0.01%
  • the-open-networkToncoin(TON)$1.659.60%
  • bitcoin-cashBitcoin Cash(BCH)$215.66-0.77%
  • LABLAB(LAB)$13.9142.22%
  • daiDai(DAI)$1.00-0.02%
  • MemeCoreMemeCore(M)$3.013.67%
  • hedera-hashgraphHedera(HBAR)$0.079242-1.67%
  • litecoinLitecoin(LTC)$41.34-5.17%
  • avalanche-2Avalanche(AVAX)$6.60-3.85%
  • suiSui(SUI)$0.710.19%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.01%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.000005-1.30%
  • tether-goldTether Gold(XAUT)$4,282.79-0.68%
  • crypto-com-chainCronos(CRO)$0.0582210.43%
  • Global DollarGlobal Dollar(USDG)$1.00-0.03%
  • nearNEAR Protocol(NEAR)$1.86-6.04%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.70%
  • pax-goldPAX Gold(PAXG)$4,288.75-1.01%
  • BittensorBittensor(TAO)$193.12-1.14%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.055226-3.26%
  • mantleMantle(MNT)$0.51-0.98%
  • Ripple USDRipple USD(RLUSD)$1.000.01%
  • AsterAster(ASTER)$0.620.37%
  • polkadotPolkadot(DOT)$0.94-1.62%
  • HTX DAOHTX DAO(HTX)$0.000002-0.44%
  • OndoOndo(ONDO)$0.322123-5.81%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Poetiq’s Meta-System Automatically Builds a Model-Agnostic Harness That Improved Every LLM Tested on LiveCodeBench Pro Without Fine-Tuning

May 15, 2026
in AI & Technology
Reading Time: 7 mins read
A A
Poetiq’s Meta-System Automatically Builds a Model-Agnostic Harness That Improved Every LLM Tested on LiveCodeBench Pro Without Fine-Tuning
ShareShareShareShareShare

Poetiq has just published some very interesting results showing its Meta-System reached a new state-of-the-art on LiveCodeBench Pro (LCB Pro), a competitive coding benchmark, by automatically building and optimizing its own inference harness — without fine-tuning any underlying model or accessing model internals.

The result: GPT 5.5 High with Poetiq’s harness scores 93.9% on LCB Pro (25Q2), up from its baseline of 89.6%. Gemini 3.1 Pro, the model the harness was specifically optimized on, jumps from 78.6% to 90.9% — surpassing Google’s own Gemini 3 Deep Think (88.8%), a model that isn’t even accessible via API for external verification.

YOU MAY ALSO LIKE

Broadcom CEO on the Biggest AI Chip Bets

AI Not Holding Back Companies From Hiring: Yale Budget Lab

https://poetiq.ai/posts/recursive_self_improvement_coding/

What is LiveCodeBench Pro?

Before getting into the mechanics, it helps to understand why the benchmark matters. LiveCodeBench Pro (LCB) is designed to test AI coding ability in a way that resists two common failure modes in benchmarks: data contamination and overfitting.

LCB Pro pulls problems from major competitive programming competitions and withholds public ground-truth code. Instead, solutions are validated against a comprehensive testing framework. Correct output alone isn’t enough — solutions must also satisfy specific memory and runtime constraints. The benchmark is also subject to continuous updates, which distinguishes it from many standard benchmarks that become stale.

The benchmark focuses on C++ challenges and emphasizes creative coding, testing a model’s capacity for complex problem-solving and high-quality, performant procedural logic. This distinguishes it from datasets like SWEBench that evaluate tool usage or bug-fixing workflows. Problems are categorized by difficulty — Easy, Medium, and Hard — based on competitive human solve rates.

https://poetiq.ai/posts/recursive_self_improvement_coding/

Poetiq’s Strategic Framing: Three LLM Task Categories

This is Poetiq’s third publicly reported benchmark, and the choice of LCB Pro was deliberate. The research team frames LLM performance around three distinct task categories: Reasoning challenges (ARC-AGI is their benchmark here), Retrieval challenges (Humanity’s Last Exam, or HLE), and Coding challenges — which, as the most pervasive commercial application for AI today, meld reasoning and retrieval with the generation of specialized procedural logic.

Their coding initiative had three specific, stated objectives: first, prove that an intelligent harness can boost efficacy without fine-tuning or special model access; second, validate the Meta-System’s capacity for recursive self-improvement in creating that harness automatically; and third, demonstrate that the resulting harness is model-agnostic and can be applied to any model without modification. According to their results, all three were satisfied.

What is a Harness, and Why Does It Matter?

In this context, a harness refers to the infrastructure wrapped around a language model to handle a specific task. Think of it as an orchestration layer — it controls how the model is prompted, how outputs are structured, how answers are assembled across multiple calls, and how solutions are evaluated.

Traditionally, these harnesses are hand-built by engineers. Poetiq’s claim is that their Meta-System builds and optimizes these harnesses automatically, through recursive self-improvement. Internally, the Meta-System works by developing better strategies for determining what to ask, refining sequential chain-of-questions, and devising new methods for assembling the answers. The system constantly incorporates learnings from previous and current tasks and datasets to create new, custom task-specific harnesses — as well as agents and orchestrators for other task types.

How the Harness was Built?

Poetiq’s Meta-System was given the LCB Pro task and constructed a harness from scratch using only Gemini 3.1 Pro as the base model. The Meta-System accounted for all three dimensions LCB Pro tests: accuracy, runtime, and memory constraints. The system built on insights from its previous work on ARC-AGI and HLE when designing the harness. No fine-tuning of the underlying model was performed, and no access to internal model activations was required — only standard API access.

Once the harness was built and optimized for Gemini 3.1 Pro, it was then applied to a broad set of other models from different providers and generations — both open-weights and proprietary — without any additional optimization. Every model tested improved.

The Numbers

The benchmark results across difficulty tiers are worth looking at in detail. On Hard problems — the category where gaps between models are largest — Gemini 3.1 Pro with Poetiq’s harness scores 58.3%, up from its 7.7% baseline. GPT 5.5 High with the harness reaches 75.0% on Hard, up from 50.0%. Across Easy and Medium categories, the harness also outperforms all base models.

Some of the smaller model results are also notable. Gemini 3.0 Flash improves by 10 percentage points, going from 72.3% to 82.3% — overtaking Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.2 High, all larger and more expensive models. This mirrors a pattern Poetiq previously observed on ARC-AGI, where their optimization allowed a smaller, more economical model to surpass a bigger one. Kimi K2.6 sees the largest jump: from 50.0% to 79.9%, a roughly 30 percentage point improvement. Nemotron 3 Super 120B improves by 12.8%.

Accuracy numbers are reported directly from the LCB Pro leaderboard at livecodebenchpro.com (25Q2). For models not featured on the leaderboard, Poetiq conducted its own evaluations, cross-validating its experimental setup by replicating official leaderboard accuracies for baseline models.

Key Takeaways

  • Poetiq’s Meta-System automatically builds task-specific harnesses through recursive self-improvement, with no model fine-tuning or internal model access
  • GPT 5.5 High with the harness reaches 93.9% on LCB Pro (25Q2), up 4.3% from its 89.6% baseline; Gemini 3.1 Pro jumps 12.3% (78.6% → 90.9%)
  • The harness is model-agnostic: optimized using only Gemini 3.1 Pro, it improved every other model tested — open-weights and proprietary — without modification
  • Gemini 3.0 Flash gains 10 percentage points with the harness (72.3% → 82.3%), surpassing Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.2 High despite being smaller and cheaper
  • Kimi K2.6 shows the largest gain at ~30 percentage points (50.0% → 79.9%); Nemotron 3 Super 120B improves by 12.8%

Check out the Technical details here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Credit: Source link

ShareTweetSendSharePin

Related Posts

Broadcom CEO on the Biggest AI Chip Bets
AI & Technology

Broadcom CEO on the Biggest AI Chip Bets

June 6, 2026
AI Not Holding Back Companies From Hiring: Yale Budget Lab
AI & Technology

AI Not Holding Back Companies From Hiring: Yale Budget Lab

June 6, 2026
Has the AI Trade  Run Too Far?
AI & Technology

Has the AI Trade Run Too Far?

June 6, 2026
Astronomers Measure The Mass Of A Dormant Black Hole, Our Solar System’s Lost Protoplanet, And More Science Stories
AI & Technology

Astronomers Measure The Mass Of A Dormant Black Hole, Our Solar System’s Lost Protoplanet, And More Science Stories

June 6, 2026
Next Post
Trump says ‘no country is immune’ from violence after shooting incident at WHCD

Trump says 'no country is immune' from violence after shooting incident at WHCD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Poland Wants To Ban Phones And Smartwatches In Schools

Poland Wants To Ban Phones And Smartwatches In Schools

June 3, 2026
The Agentic Reckoning: Enterprise AI organizations have a runtime problem, not a model problem — and most are building the wrong solution

The Agentic Reckoning: Enterprise AI organizations have a runtime problem, not a model problem — and most are building the wrong solution

June 2, 2026
What CBS staffers are really saying about Scott Pelley’s brazen attack on new ’60 Minutes’ boss

What CBS staffers are really saying about Scott Pelley’s brazen attack on new ’60 Minutes’ boss

June 2, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!