• bitcoinBitcoin(BTC)$64,431.00-0.69%
  • ethereumEthereum(ETH)$1,748.24-0.84%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$591.29-1.69%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.18-1.50%
  • solanaSolana(SOL)$71.87-0.76%
  • tronTRON(TRX)$0.3208470.71%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-1.42%
  • HyperliquidHyperliquid(HYPE)$72.42-0.72%
  • dogecoinDogecoin(DOGE)$0.085092-1.07%
  • USDSUSDS(USDS)$1.000.00%
  • RainRain(RAIN)$0.0145823.53%
  • leo-tokenLEO Token(LEO)$9.61-0.67%
  • stellarStellar(XLM)$0.2396819.27%
  • zcashZcash(ZEC)$472.43-6.33%
  • CantonCanton(CC)$0.1654443.14%
  • whitebitWhiteBIT Coin(WBT)$53.22-0.12%
  • moneroMonero(XMR)$333.30-4.01%
  • cardanoCardano(ADA)$0.166925-1.28%
  • chainlinkChainlink(LINK)$8.05-1.55%
  • LABLAB(LAB)$15.4019.42%
  • USD1USD1(USD1)$1.000.06%
  • Ethena USDeEthena USDe(USDE)$1.00-0.01%
  • the-open-networkGram (prev. Toncoin)(GRAM)$1.661.18%
  • bitcoin-cashBitcoin Cash(BCH)$210.21-1.40%
  • daiDai(DAI)$1.000.01%
  • MemeCoreMemeCore(M)$2.96-3.45%
  • hedera-hashgraphHedera(HBAR)$0.0808111.22%
  • litecoinLitecoin(LTC)$44.47-1.86%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • suiSui(SUI)$0.75-4.42%
  • nearNEAR Protocol(NEAR)$2.23-2.00%
  • avalanche-2Avalanche(AVAX)$6.68-1.96%
  • shiba-inuShiba Inu(SHIB)$0.000005-0.67%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.01%
  • crypto-com-chainCronos(CRO)$0.059313-0.31%
  • tether-goldTether Gold(XAUT)$4,262.64-0.79%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • BittensorBittensor(TAO)$242.06-3.85%
  • worldcoin-wldWorldcoin(WLD)$0.64-3.37%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.140.24%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0620503.09%
  • uniswapUniswap(UNI)$3.14-10.87%
  • pax-goldPAX Gold(PAXG)$4,273.33-0.84%
  • AsterAster(ASTER)$0.682.33%
  • OndoOndo(ONDO)$0.3676850.32%
  • mantleMantle(MNT)$0.54-2.12%
  • polkadotPolkadot(DOT)$0.98-2.20%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

How to Build a Parsing Pipeline with Docling Parse for Layout-Aware Document Intelligence

June 16, 2026
in AI & Technology
Reading Time: 2 mins read
A A
How to Build a Parsing Pipeline with Docling Parse for Layout-Aware Document Intelligence
ShareShareShareShareShare

YOU MAY ALSO LIKE

Midjourney, The AI Image Generator, Is Developing A Full-Body Ultrasonic Scanner

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

def create_demo_image(path):
   img = Image.new("RGB", (320, 180), "white")
   draw = ImageDraw.Draw(img)
   draw.rectangle([20, 20, 300, 160], outline="black", width=3)
   draw.ellipse([55, 45, 145, 135], outline="black", width=4)
   draw.line([180, 140, 285, 45], fill="black", width=4)
   draw.text((45, 145), "Embedded bitmap image", fill="black")
   img.save(path)
create_demo_image(DEMO_IMAGE_PATH)
def build_pdf(pdf_path):
   c = canvas.Canvas(str(pdf_path), pagesize=A4)
   width, height = A4
   c.setFont("Helvetica-Bold", 20)
   c.drawString(60, height - 70, "Docling Parse Advanced PDF Parsing Tutorial")
   c.setFont("Helvetica", 11)
   intro = (
       "This generated document is designed for testing text extraction, coordinate parsing, "
       "line grouping, vector path detection, bitmap resources, and layout-aware reconstruction."
   )
   text_obj = c.beginText(60, height - 105)
   text_obj.setLeading(15)
   for line in textwrap.wrap(intro, width=90):
       text_obj.textLine(line)
   c.drawText(text_obj)
   c.setFont("Helvetica-Bold", 14)
   c.drawString(60, height - 170, "1. Two-column text region")
   left_para = (
       "The left column contains compact explanatory text. A parser should expose words, "
       "characters, and line-level cells along with coordinates. These coordinates allow us "
       "to reconstruct reading order and inspect the spatial structure of a page."
   )
   right_para = (
       "The right column contains a separate paragraph. In document AI pipelines, layout "
       "features are useful for retrieval, table extraction, chunking, and downstream RAG "
       "applications where page position can matter."
   )
   y_start = height - 200
   left_text = c.beginText(60, y_start)
   left_text.setFont("Helvetica", 10)
   left_text.setLeading(13)
   for line in textwrap.wrap(left_para, width=42):
       left_text.textLine(line)
   c.drawText(left_text)
   right_text = c.beginText(325, y_start)
   right_text.setFont("Helvetica", 10)
   right_text.setLeading(13)
   for line in textwrap.wrap(right_para, width=42):
       right_text.textLine(line)
   c.drawText(right_text)
   c.setStrokeColor(colors.darkblue)
   c.setLineWidth(2)
   c.rect(55, height - 315, 225, 130, stroke=1, fill=0)
   c.rect(320, height - 315, 225, 130, stroke=1, fill=0)
   c.setStrokeColor(colors.darkgreen)
   c.setLineWidth(3)
   c.circle(140, height - 390, 40, stroke=1, fill=0)
   c.line(220, height - 430, 310, height - 355)
   c.setFont("Helvetica-Bold", 14)
   c.setFillColor(colors.black)
   c.drawString(60, height - 470, "2. Simple table-like structure")
   data = [
       ["Section", "Signal", "Expected parser behavior"],
       ["Text", "Words and lines", "Return text cells with coordinates"],
       ["Vector", "Boxes and lines", "Expose page path/vector resources"],
       ["Bitmap", "Embedded image", "Expose or render image resources"],
   ]
   table = Table(data, colWidths=[100, 130, 260])
   table.setStyle(TableStyle([
       ("BACKGROUND", (0, 0), (-1, 0), colors.lightgrey),
       ("GRID", (0, 0), (-1, -1), 0.7, colors.black),
       ("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
       ("FONTSIZE", (0, 0), (-1, -1), 9),
       ("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
   ]))
   table.wrapOn(c, width, height)
   table.drawOn(c, 60, height - 590)
   c.setFont("Helvetica", 9)
   c.drawString(60, 55, "Page 1: generated programmatic PDF with text, table-like layout, and vector paths.")
   c.showPage()
   c.setFont("Helvetica-Bold", 18)
   c.drawString(60, height - 70, "Page 2: Bitmap, Dense Text, and Reading Order")
   c.setFont("Helvetica", 10)
   dense = (
       "This page includes an embedded bitmap image and several short blocks of text. "
       "We use it to test whether rendering works, whether the parser preserves page-level "
       "coordinates, and whether our own reconstruction logic can group words into lines."
   )
   y = height - 105
   for para_idx in range(4):
       tx = c.beginText(60, y)
       tx.setFont("Helvetica", 10)
       tx.setLeading(13)
       for line in textwrap.wrap(f"Block {para_idx + 1}: {dense}", width=92):
           tx.textLine(line)
       c.drawText(tx)
       y -= 70
   c.drawImage(str(DEMO_IMAGE_PATH), 110, height - 510, width=320, height=180, preserveAspectRatio=True)
   c.setStrokeColor(colors.red)
   c.setLineWidth(2)
   c.roundRect(95, height - 525, 350, 210, 10, stroke=1, fill=0)
   c.setFillColor(colors.black)
   c.setFont("Helvetica-Bold", 12)
   c.drawString(60, height - 570, "Coordinate-aware extraction lets us keep page, text, and position together.")
   c.setFont("Helvetica", 9)
   c.drawString(60, 55, "Page 2: embedded bitmap image and multiple text blocks.")
   c.save()
build_pdf(PDF_PATH)
print("Created PDF:", PDF_PATH)

Credit: Source link

ShareTweetSendSharePin

Related Posts

Midjourney, The AI Image Generator, Is Developing A Full-Body Ultrasonic Scanner
AI & Technology

Midjourney, The AI Image Generator, Is Developing A Full-Body Ultrasonic Scanner

June 18, 2026
OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric
AI & Technology

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

June 18, 2026
NVIDIA SkillSpector Guide: Scanning AI Skills for Security Risks with Static Analysis and SARIF Reports
AI & Technology

NVIDIA SkillSpector Guide: Scanning AI Skills for Security Risks with Static Analysis and SARIF Reports

June 18, 2026
Tim Cook Says Apple Price Increases Are ‘Unavoidable’ Due To Memory Crunch
AI & Technology

Tim Cook Says Apple Price Increases Are ‘Unavoidable’ Due To Memory Crunch

June 17, 2026
Next Post
Zimmer Biomet: Does Slow And Steady Still Win The Race?

Zimmer Biomet: Does Slow And Steady Still Win The Race?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Recent deals lift 2 Park Ave. to 90% occupancy

Recent deals lift 2 Park Ave. to 90% occupancy

June 14, 2026
Keir Starmer set to unveil plans for new online protections for children – NBC News

Keir Starmer set to unveil plans for new online protections for children – NBC News

June 15, 2026
OpenAI’s Deployment Simulation Extends Pre-Deployment Risk Assessment to Agentic Coding Through Simulated Tool Calls

OpenAI’s Deployment Simulation Extends Pre-Deployment Risk Assessment to Agentic Coding Through Simulated Tool Calls

June 17, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!