• bitcoinBitcoin(BTC)$69,267.00-4.31%
  • ethereumEthereum(ETH)$2,133.75-4.69%
  • tetherTether(USDT)$1.00-0.01%
  • rippleXRP(XRP)$1.44-1.61%
  • binancecoinBNB(BNB)$639.19-2.85%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$88.53-2.24%
  • tronTRON(TRX)$0.3020990.11%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.66%
  • dogecoinDogecoin(DOGE)$0.092986-3.09%
  • whitebitWhiteBIT Coin(WBT)$55.42-3.78%
  • USDSUSDS(USDS)$1.000.00%
  • cardanoCardano(ADA)$0.266246-3.54%
  • HyperliquidHyperliquid(HYPE)$39.02-3.76%
  • bitcoin-cashBitcoin Cash(BCH)$454.34-0.51%
  • leo-tokenLEO Token(LEO)$9.181.34%
  • chainlinkChainlink(LINK)$8.99-4.38%
  • moneroMonero(XMR)$343.76-3.50%
  • Ethena USDeEthena USDe(USDE)$1.00-0.07%
  • CantonCanton(CC)$0.144874-2.55%
  • stellarStellar(XLM)$0.165560-1.78%
  • USD1USD1(USD1)$1.00-0.01%
  • daiDai(DAI)$1.00-0.01%
  • litecoinLitecoin(LTC)$55.07-1.72%
  • RainRain(RAIN)$0.008720-2.10%
  • avalanche-2Avalanche(AVAX)$9.47-3.30%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.08%
  • hedera-hashgraphHedera(HBAR)$0.092397-4.34%
  • zcashZcash(ZEC)$240.57-8.49%
  • suiSui(SUI)$0.95-4.26%
  • shiba-inuShiba Inu(SHIB)$0.000006-2.38%
  • MemeCoreMemeCore(M)$1.85-2.57%
  • crypto-com-chainCronos(CRO)$0.075008-2.54%
  • the-open-networkToncoin(TON)$1.24-5.23%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.094031-5.60%
  • tether-goldTether Gold(XAUT)$4,567.70-5.88%
  • polkadotPolkadot(DOT)$1.53-3.73%
  • mantleMantle(MNT)$0.74-11.53%
  • Circle USYCCircle USYC(USYC)$1.120.01%
  • BittensorBittensor(TAO)$244.44-8.82%
  • pax-goldPAX Gold(PAXG)$4,564.44-6.33%
  • uniswapUniswap(UNI)$3.57-5.89%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • okbOKB(OKB)$88.07-5.27%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • Falcon USDFalcon USD(USDF)$1.00-0.09%
  • nearNEAR Protocol(NEAR)$1.34-5.46%
  • aaveAave(AAVE)$113.29-1.98%
  • Pi NetworkPi Network(PI)$0.1759621.18%
  • AsterAster(ASTER)$0.67-2.59%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Baidu Qianfan Team Releases Qianfan-OCR: A 4B-Parameter Unified Document Intelligence Model

March 18, 2026
in AI & Technology
Reading Time: 6 mins read
A A
Baidu Qianfan Team Releases Qianfan-OCR: A 4B-Parameter Unified Document Intelligence Model
ShareShareShareShareShare

The Baidu Qianfan Team introduced Qianfan-OCR, a 4B-parameter end-to-end model designed to unify document parsing, layout analysis, and document understanding within a single vision-language architecture. Unlike traditional multi-stage OCR pipelines that chain separate modules for layout detection and text recognition, Qianfan-OCR performs direct image-to-Markdown conversion and supports prompt-driven tasks like table extraction and document question answering.

https://arxiv.org/pdf/2603.13398

Architecture and Technical Specifications

Qianfan-OCR utilizes the multimodal bridging architecture from the Qianfan-VL framework. The system consists of three primary components:

YOU MAY ALSO LIKE

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost

The FBI confirms it’s buying Americans’ location data

  • Vision Encoder (Qianfan-ViT): Employs an Any Resolution design that tiles images into 448 x 448 patches. It supports variable-resolution inputs up to 4K, producing up to 4,096 visual tokens per image to maintain spatial resolution for small fonts and dense text.
  • Cross-Modal Adapter: A lightweight two-layer MLP with GELU activation that projects visual features into the language model’s embedding space.
  • Language Model Backbone (Qwen3-4B): A 4.0B-parameter model with 36 layers and a native 32K context window. It utilizes Grouped-Query Attention (GQA) to reduce KV cache memory usage by 4x.

‘Layout-as-Thought’ Mechanism

The main feature of the model is Layout-as-Thought, an optional thinking phase triggered by <think> tokens. During this phase, the model generates structured layout representations—including bounding boxes, element types, and reading order—before producing the final output.

  • Functional Utility: This process recovers explicit layout analysis capabilities (element localization and type classification) often lost in end-to-end paradigms.
  • Performance Characteristics: Evaluation on OmniDocBench v1.5 indicates that enabling the thinking phase provides a consistent advantage on documents with high “layout label entropy”—those containing heterogeneous elements like mixed text, formulas, and diagrams.
  • Efficiency: Bounding box coordinates are represented as dedicated special tokens (<COORD_0> to <COORD_999>), reducing thinking output length by approximately 50% compared to plain digit sequences.

Empirical Performance and Benchmarks

Qianfan-OCR was evaluated against both specialized OCR systems and general vision-language models (VLMs).

Document Parsing and General OCR

The model ranks first among end-to-end models on several key benchmarks:

  • OmniDocBench v1.5: Achieved a score of 93.12, surpassing DeepSeek-OCR-v2 (91.09) and Gemini-3 Pro (90.33).
  • OlmOCR Bench: Scored 79.8, leading the end-to-end category.
  • OCRBench: Achieved a score of 880, ranking first among all tested models.

Key Information Extraction (KIE)

On public KIE benchmarks, Qianfan-OCR achieved the highest average score (87.9), outperforming significantly larger models.

Model Overall Mean (KIE) OCRBench KIE Nanonets KIE (F1)
Qianfan-OCR (4B) 87.9 95.0 86.5
Qwen3-4B-VL 83.5 89.0 83.3
Qwen3-VL-235B-A22B 84.2 94.0 83.8
Gemini-3.1-Pro 79.2 96.0 76.1

Document Understanding

Comparative testing revealed that two-stage OCR+LLM pipelines often fail on tasks requiring spatial reasoning. For instance, all tested two-stage systems scored 0.0 on CharXiv benchmarks, as the text extraction phase discards the visual context (axis relationships, data point positions) necessary for chart interpretation.

https://arxiv.org/pdf/2603.13398

Deployment and Inference

Inference efficiency was measured in Pages Per Second (PPS) using a single NVIDIA A100 GPU.

  • Quantization: With W8A8 (AWQ) quantization, Qianfan-OCR achieved 1.024 PPS, a 2x speedup over the W16A16 baseline with negligible accuracy loss.
  • Architecture Advantage: Unlike pipeline systems that rely on CPU-based layout analysis—which can become a bottleneck—Qianfan-OCR is GPU-centric. This avoids inter-stage processing delays and allows for efficient large-batch inference.

Check out Paper, Repo and Model on HF. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Baidu Qianfan Team Releases Qianfan-OCR: A 4B-Parameter Unified Document Intelligence Model appeared first on MarkTechPost.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost
AI & Technology

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost

March 18, 2026
The FBI confirms it’s buying Americans’ location data
AI & Technology

The FBI confirms it’s buying Americans’ location data

March 18, 2026
A Meta agentic AI sparked a security incident by acting without permission
AI & Technology

A Meta agentic AI sparked a security incident by acting without permission

March 18, 2026
Tsinghua and Ant Group Researchers Unveil a Five-Layer Lifecycle-Oriented Security Framework to Mitigate Autonomous LLM Agent Vulnerabilities in OpenClaw
AI & Technology

Tsinghua and Ant Group Researchers Unveil a Five-Layer Lifecycle-Oriented Security Framework to Mitigate Autonomous LLM Agent Vulnerabilities in OpenClaw

March 18, 2026
Next Post
New MiniMax M2.7 proprietary AI model is ‘self-evolving’ and can perform 30-50% of reinforcement learning research workflow

New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Trump says enemies of the U.S. are scared during State of the Union speech

Trump says enemies of the U.S. are scared during State of the Union speech

March 13, 2026
‘A molten, mushy state’: scientists may have found a new type of liquid planet – The Guardian

‘A molten, mushy state’: scientists may have found a new type of liquid planet – The Guardian

March 16, 2026
New Claude Upgrade Builds Interactive Diagrams

New Claude Upgrade Builds Interactive Diagrams

March 16, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!