QA OCR Single-file HTML

Browser-based Google Document AI Enterprise OCR for scanned PDFs. Supports synchronous OCR for smaller files and batch OCR via Cloud Storage for larger files, then rebuilds a searchable PDF with an invisible text layer.

Processor: 66f317431e9bb80c

Region: us

Modes: Sync / Batch

Languages: en / zh-Hant / zh-Hans

1) Configure and authenticate

This page uses OAuth 2.0 redirect flow with no popup window. It requests a cloud-platform access token and calls Google APIs directly from the browser.

OAuth client ID

Redirect URI

Google Cloud project ID

GCS bucket for batch mode

Processor ID

Processor location

Auth state

Not signed in

Token expiry

Active scope

https://www.googleapis.com/auth/cloud-platform

2) Select file and OCR mode

Scanned PDF input

Processing mode

Optional note

GCS input prefix (batch)

GCS output prefix (batch)

OCR language hints (comma-separated BCP-47)

Remote CJK font URL for searchable PDF

Rotate pages visually in output
Experimental. Uses Document AI orientation to rotate the reconstructed PDF page image and text layer together.

Enable native PDF parsing
Useful mainly for digital PDFs that already contain embedded text. Usually not needed for purely scanned documents.

Enable image-quality scores
Adds useful diagnostics but also extra latency.

Enable symbol-level OCR
Can improve fine-grained text placement, but increases response size.

Also download raw OCR JSON

Status and output

Ready.

Detected pages

File size

Chosen mode

Last job ID

Searchable PDF

OCR JSON

Notes

Important: true multilingual searchable PDF generation in the browser needs a Unicode-capable font. This page therefore loads a remote CJK font at runtime for overlay text.

Bucket setup: batch mode requires Cloud Storage CORS that allows qaocr2.pages.dev to send authenticated PUT/POST/GET/OPTIONS requests.

Rotation: Enterprise OCR supports rotation correction for extraction. This page separately tries to rotate the final output visually using the page orientation metadata.

PDF preview is not embedded in this build. Use the generated download links below after processing.

Required setup outside this HTML

Add https://qaocr2.pages.dev to Authorized JavaScript origins for this OAuth web client.
Keep https://qaocr2.pages.dev in Authorized redirect URIs.
Enable both Document AI API and Cloud Storage JSON API for the same Google Cloud project.
Ensure the signed-in user has permission to call Document AI and to read/write objects in qaocr_ocrbuckets.
Set Cloud Storage CORS on the bucket so the browser can upload, poll, and download batch outputs.

Suggested bucket CORS example for direct SPA uploads/downloads:

[
  {
    "origin": ["https://qaocr2.pages.dev"],
    "method": ["GET", "HEAD", "PUT", "POST", "OPTIONS"],
    "responseHeader": [
      "Authorization",
      "Content-Type",
      "Content-Length",
      "Content-Range",
      "X-Upload-Content-Type",
      "X-Upload-Content-Length",
      "x-goog-resumable",
      "x-goog-meta-*"
    ],
    "maxAgeSeconds": 3600
  }
]