• bitcoinBitcoin(BTC)$75,078.00-1.40%
  • ethereumEthereum(ETH)$2,234.84-2.36%
  • tetherTether(USDT)$1.00-0.02%
  • rippleXRP(XRP)$1.35-2.16%
  • binancecoinBNB(BNB)$612.02-1.83%
  • usd-coinUSDC(USDC)$1.00-0.03%
  • solanaSolana(SOL)$81.73-2.31%
  • tronTRON(TRX)$0.322906-0.11%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.65%
  • dogecoinDogecoin(DOGE)$0.1016352.36%
  • whitebitWhiteBIT Coin(WBT)$53.47-0.90%
  • USDSUSDS(USDS)$1.00-0.02%
  • leo-tokenLEO Token(LEO)$10.33-0.31%
  • HyperliquidHyperliquid(HYPE)$39.31-1.01%
  • cardanoCardano(ADA)$0.240010-2.47%
  • bitcoin-cashBitcoin Cash(BCH)$443.86-0.86%
  • moneroMonero(XMR)$371.12-2.28%
  • chainlinkChainlink(LINK)$8.98-2.74%
  • CantonCanton(CC)$0.1524801.69%
  • zcashZcash(ZEC)$319.76-4.24%
  • stellarStellar(XLM)$0.159030-1.98%
  • USD1USD1(USD1)$1.00-0.17%
  • daiDai(DAI)$1.00-0.05%
  • MemeCoreMemeCore(M)$3.37-2.82%
  • litecoinLitecoin(LTC)$54.70-0.86%
  • avalanche-2Avalanche(AVAX)$9.02-1.55%
  • hedera-hashgraphHedera(HBAR)$0.087522-1.81%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • RainRain(RAIN)$0.0076702.40%
  • shiba-inuShiba Inu(SHIB)$0.000006-1.07%
  • suiSui(SUI)$0.89-3.58%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • the-open-networkToncoin(TON)$1.300.89%
  • crypto-com-chainCronos(CRO)$0.067703-2.09%
  • Circle USYCCircle USYC(USYC)$1.120.01%
  • tether-goldTether Gold(XAUT)$4,534.78-1.22%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • BittensorBittensor(TAO)$249.37-2.88%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • pax-goldPAX Gold(PAXG)$4,529.89-1.31%
  • mantleMantle(MNT)$0.62-1.72%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.064295-12.94%
  • polkadotPolkadot(DOT)$1.19-3.14%
  • uniswapUniswap(UNI)$3.13-3.32%
  • Pi NetworkPi Network(PI)$0.1896961.31%
  • SkySky(SKY)$0.082098-6.61%
  • Falcon USDFalcon USD(USDF)$1.000.01%
  • okbOKB(OKB)$81.37-1.57%
  • nearNEAR Protocol(NEAR)$1.30-2.85%
  • AsterAster(ASTER)$0.651.17%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Baidu Qianfan Team Releases Qianfan-OCR: A 4B-Parameter Unified Document Intelligence Model

March 18, 2026
in AI & Technology
Reading Time: 6 mins read
A A
Baidu Qianfan Team Releases Qianfan-OCR: A 4B-Parameter Unified Document Intelligence Model
ShareShareShareShareShare

The Baidu Qianfan Team introduced Qianfan-OCR, a 4B-parameter end-to-end model designed to unify document parsing, layout analysis, and document understanding within a single vision-language architecture. Unlike traditional multi-stage OCR pipelines that chain separate modules for layout detection and text recognition, Qianfan-OCR performs direct image-to-Markdown conversion and supports prompt-driven tasks like table extraction and document question answering.

https://arxiv.org/pdf/2603.13398

Architecture and Technical Specifications

Qianfan-OCR utilizes the multimodal bridging architecture from the Qianfan-VL framework. The system consists of three primary components:

YOU MAY ALSO LIKE

Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems

Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified

  • Vision Encoder (Qianfan-ViT): Employs an Any Resolution design that tiles images into 448 x 448 patches. It supports variable-resolution inputs up to 4K, producing up to 4,096 visual tokens per image to maintain spatial resolution for small fonts and dense text.
  • Cross-Modal Adapter: A lightweight two-layer MLP with GELU activation that projects visual features into the language model’s embedding space.
  • Language Model Backbone (Qwen3-4B): A 4.0B-parameter model with 36 layers and a native 32K context window. It utilizes Grouped-Query Attention (GQA) to reduce KV cache memory usage by 4x.

‘Layout-as-Thought’ Mechanism

The main feature of the model is Layout-as-Thought, an optional thinking phase triggered by <think> tokens. During this phase, the model generates structured layout representations—including bounding boxes, element types, and reading order—before producing the final output.

  • Functional Utility: This process recovers explicit layout analysis capabilities (element localization and type classification) often lost in end-to-end paradigms.
  • Performance Characteristics: Evaluation on OmniDocBench v1.5 indicates that enabling the thinking phase provides a consistent advantage on documents with high “layout label entropy”—those containing heterogeneous elements like mixed text, formulas, and diagrams.
  • Efficiency: Bounding box coordinates are represented as dedicated special tokens (<COORD_0> to <COORD_999>), reducing thinking output length by approximately 50% compared to plain digit sequences.

Empirical Performance and Benchmarks

Qianfan-OCR was evaluated against both specialized OCR systems and general vision-language models (VLMs).

Document Parsing and General OCR

The model ranks first among end-to-end models on several key benchmarks:

  • OmniDocBench v1.5: Achieved a score of 93.12, surpassing DeepSeek-OCR-v2 (91.09) and Gemini-3 Pro (90.33).
  • OlmOCR Bench: Scored 79.8, leading the end-to-end category.
  • OCRBench: Achieved a score of 880, ranking first among all tested models.

Key Information Extraction (KIE)

On public KIE benchmarks, Qianfan-OCR achieved the highest average score (87.9), outperforming significantly larger models.

Model Overall Mean (KIE) OCRBench KIE Nanonets KIE (F1)
Qianfan-OCR (4B) 87.9 95.0 86.5
Qwen3-4B-VL 83.5 89.0 83.3
Qwen3-VL-235B-A22B 84.2 94.0 83.8
Gemini-3.1-Pro 79.2 96.0 76.1

Document Understanding

Comparative testing revealed that two-stage OCR+LLM pipelines often fail on tasks requiring spatial reasoning. For instance, all tested two-stage systems scored 0.0 on CharXiv benchmarks, as the text extraction phase discards the visual context (axis relationships, data point positions) necessary for chart interpretation.

https://arxiv.org/pdf/2603.13398

Deployment and Inference

Inference efficiency was measured in Pages Per Second (PPS) using a single NVIDIA A100 GPU.

  • Quantization: With W8A8 (AWQ) quantization, Qianfan-OCR achieved 1.024 PPS, a 2x speedup over the W16A16 baseline with negligible accuracy loss.
  • Architecture Advantage: Unlike pipeline systems that rely on CPU-based layout analysis—which can become a bottleneck—Qianfan-OCR is GPU-centric. This avoids inter-stage processing delays and allows for efficient large-batch inference.

Check out Paper, Repo and Model on HF. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Baidu Qianfan Team Releases Qianfan-OCR: A 4B-Parameter Unified Document Intelligence Model appeared first on MarkTechPost.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems
AI & Technology

Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems

April 29, 2026
Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified
AI & Technology

Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified

April 29, 2026
How to build custom reasoning agents with a fraction of the compute
AI & Technology

How to build custom reasoning agents with a fraction of the compute

April 28, 2026
American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding
AI & Technology

American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding

April 28, 2026
Next Post
New MiniMax M2.7 proprietary AI model is ‘self-evolving’ and can perform 30-50% of reinforcement learning research workflow

New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
You’re Losing Money Not Using This New AI Model

You’re Losing Money Not Using This New AI Model

April 29, 2026
Exclusive: Cuban president says he’s ‘not stepping down’

Exclusive: Cuban president says he’s ‘not stepping down’

April 28, 2026
Israel says Iran ceasefire ‘does not include Lebanon’

Israel says Iran ceasefire ‘does not include Lebanon’

April 29, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!