πŸ”‘
πŸ”’ browser only Β· PDF processed locally
πŸ“„

Drop a PDF here or click to browse
Text is extracted locally β€” only your question goes to OpenAI

Extracted Text β€”
Extracted text will appear here after loading a PDF…
AI Chat Strategy 1 β€” PDF.js + GPT-4o-mini
Load a PDF above then ask questions about it.
πŸ“„

Drop a PDF to extract structured fields
Invoice Β· Resume / CV Β· Contract

Schema Selection
Document type
Fields for selected schema
Extracted JSON
// Upload a PDF and click Extract // Returns a typed JavaScript object // No JSON.parse needed β€” guaranteed schema
Which strategy should I use?
Check if you can select text in the PDF β†’ Strategy 1.
Scanned or image PDF β†’ Strategy 2.
Need typed fields (invoice, resume, contract) β†’ Strategy 3.
1
PDF.js Text Extraction + Chat ~$0.0002/page
Extract text with PDF.js β†’ clean it β†’ send to gpt-4o-mini as context β†’ chat Q&A.
βœ… Digital PDFs with selectable text βœ… Reports, articles, e-books β€” up to ~40 pages βœ… Cheapest and fastest option ❌ Fails on scanned / image-only PDFs
Token efficiency after cleaning
2
GPT-4o Vision (Page as Image) ~$0.01–0.02/page
Render PDF pages to canvas β†’ export as base64 PNG β†’ send to GPT-4o Vision API.
βœ… Scanned PDFs and image-only documents βœ… Complex layouts, charts, handwritten notes βœ… Works on any PDF type regardless of text encoding ❌ ~50Γ— more expensive than Strategy 1 ❌ Slow β€” must render and upload each page
Cost per page relative to Strategy 1
3
Structured JSON Extraction ~$0.0003/page
Extract text β†’ define a JSON schema β†’ AI fills in the fields β†’ typed object returned. Uses response_format: json_object.
βœ… Invoices, receipts, bills βœ… Resumes and CVs βœ… Contracts and agreements βœ… Any document with predictable fields ❌ Not for open-ended Q&A
Unique to this tutorial β€” not covered by competitors
Token waste warning: Raw PDF extraction wastes 40–60% of tokens on layout artifacts (page numbers, headers, footers, separator lines). Always run cleanPdfText() before sending to the AI β€” it typically cuts token count nearly in half with zero quality loss.
Read the tutorial