QA OCR Single-file HTML

Browser-based Google Document AI Enterprise OCR for scanned PDFs. Supports synchronous OCR for smaller files and batch OCR via Cloud Storage for larger files, then rebuilds a searchable PDF with an invisible text layer.

Processor: 66f317431e9bb80c
Region: us
Modes: Sync / Batch
Languages: en / zh-Hant / zh-Hans

1) Configure and authenticate

This page uses OAuth 2.0 redirect flow with no popup window. It requests a cloud-platform access token and calls Google APIs directly from the browser.

Auth state
Not signed in
Token expiry
-
Active scope
https://www.googleapis.com/auth/cloud-platform

2) Select file and OCR mode

Rotate pages visually in output
Experimental. Uses Document AI orientation to rotate the reconstructed PDF page image and text layer together.
Enable native PDF parsing
Useful mainly for digital PDFs that already contain embedded text. Usually not needed for purely scanned documents.
Enable image-quality scores
Adds useful diagnostics but also extra latency.
Enable symbol-level OCR
Can improve fine-grained text placement, but increases response size.
Also download raw OCR JSON

Status and output

Ready.
Detected pages
-
File size
-
Chosen mode
-
Last job ID
-
Searchable PDF
-
OCR JSON
-

Notes

Important: true multilingual searchable PDF generation in the browser needs a Unicode-capable font. This page therefore loads a remote CJK font at runtime for overlay text.

Bucket setup: batch mode requires Cloud Storage CORS that allows qaocr2.pages.dev to send authenticated PUT/POST/GET/OPTIONS requests.

Rotation: Enterprise OCR supports rotation correction for extraction. This page separately tries to rotate the final output visually using the page orientation metadata.

PDF preview is not embedded in this build. Use the generated download links below after processing.

Required setup outside this HTML

  1. Add https://qaocr2.pages.dev to Authorized JavaScript origins for this OAuth web client.
  2. Keep https://qaocr2.pages.dev in Authorized redirect URIs.
  3. Enable both Document AI API and Cloud Storage JSON API for the same Google Cloud project.
  4. Ensure the signed-in user has permission to call Document AI and to read/write objects in qaocr_ocrbuckets.
  5. Set Cloud Storage CORS on the bucket so the browser can upload, poll, and download batch outputs.

Suggested bucket CORS example for direct SPA uploads/downloads:

[
  {
    "origin": ["https://qaocr2.pages.dev"],
    "method": ["GET", "HEAD", "PUT", "POST", "OPTIONS"],
    "responseHeader": [
      "Authorization",
      "Content-Type",
      "Content-Length",
      "Content-Range",
      "X-Upload-Content-Type",
      "X-Upload-Content-Length",
      "x-goog-resumable",
      "x-goog-meta-*"
    ],
    "maxAgeSeconds": 3600
  }
]