OCR Studio SDK for Web Browser Integration

Overview

WebAssembly (Wasm) is a technology that allows running compiled code from various programming languages directly in the browser.

Our library is written in C++ and ported to WebAssembly using Emscripten.

To cover a wide range of devices, we provide 4 library builds, each of which should be loaded depending on the browser's feature support.

Build Descriptions

  • nosimd.nothreads — universal build, supported on most devices.
  • simd.nothreads — fastest build.
  • simd.threads — build for continuous recognition tasks. Multi-thread support requires initialization of web workers, which may take longer than executing a simple task using simd.nothreads.
  • nosimd.threads — build for rare use cases.

SIMD — a set of instructions that significantly improves computation speed.

THREADS — multi-threading support via additional web workers using a shared SharedArrayBuffer. Proper server-side configuration of COOP and COEP headers is required to use it.

Browser support for SIMD and multi-threading can be detected using the wasm-feature-detect library.

Server Requirements

Content-Type

The server must serve *.wasm files with the header:

Content-Type: application/wasm

Compression

The server should support compression for .wasm files. WebAssembly files compress well, reducing delivery time.

Check the content-encoding header in devtools or using:

curl -H "Accept-Encoding: gzip" -I https://example.com/yourfile.wasm

We recommend serving pre-compressed files to avoid runtime compression overhead.

Web Application Integration

WebAssembly Module

Include the compiled OCR Studio WebAssembly module in your project. No additional configuration is required — just place it in the project directory.

Web Worker

To prevent blocking the main execution thread with heavy recognition tasks, a web worker is used to communicate with the Wasm module.

The worker acts as a mediator between client JavaScript and the Wasm module.

HTML Page

For real-time recognition, the page needs:

Video and canvas elements to display camera input:

<video id="video" class="video" playsinline muted autoplay></video>
<canvas id="canvas" class="canvas"></canvas>

Buttons for scanning documents and face matching:

<button id="scan-button" class="button">
  Scan Document
</button>

<button id="capture-face-button" class="button">
  Match Face
</button>

Result container to display recognition results:

<div id="result-wrapper" class="result-wrapper">
  <h2>Recognition Result</h2>
  <div id="output"></div>
</div>

Include the JavaScript file for client logic.

Camera Handling

Request video stream from the camera:

stream = await navigator.mediaDevices.getUserMedia({ video: { facingMode: { ideal: "environment" } } })

Assign it to the video element:

const videoEl = document.querySelector("#video")
videoEl.srcObject = stream
await videoEl.play()

Draw video frames on the canvas:

const canvasEl = document.querySelector("#canvas")
const ctx = canvasEl.getContext("2d", { willReadFrequently: true })

const animate = function () {
  ctx.drawImage(videoEl, 0, 0, canvasEl.width, canvasEl.height)
  requestAnimationFrame(animate)
}
animate()

Web Worker Integration

Initialize the worker and send the document type for recognition:

OCRStudioWorker = new Worker("./worker.js")

OCRStudioWorker.postMessage({
  requestType: "createSession",
  docData: "*",
})

On button click, capture a video frame and send it to the worker:

const scanButtonEl = document.querySelector("#scan-button")

scanButtonEl.addEventListener("click", async () => {
  OCRStudioWorker.postMessage({
    requestType: "frame",
    imageData: canvasEl.getContext("2d", { willReadFrequently: true })
      .getImageData(0, 0, canvasEl.width, canvasEl.height)
  })
})

Receiving Results

The worker returns coordinates of detected elements to highlight them in the video.

Receive and display results on the page:

OCRStudioWorker.onmessage = function (message) {
  switch (message.data.requestType) {
    case "result":
      let result = message.data
      if (Object.keys(result.data).length === 0) {
        console.log("Document not found")
        OCRStudioWorker.postMessage({ requestType: "reset" })
        return
      }
      printResult(result)
      canvasHandler.clear(canvasOverlayEl)
      OCRStudioWorker.postMessage({ requestType: "reset" })
    break
  }
}

A minimal web application with real-time image recognition is now ready.