WebAssembly (Wasm) is a technology that allows running compiled code from various programming languages directly in the browser.
Our library is written in C++ and ported to WebAssembly using Emscripten.
To cover a wide range of devices, we provide 4 library builds, each of which should be loaded depending on the browser's feature support.
nosimd.nothreads — universal build, supported on most devices.simd.nothreads — fastest build.simd.threads — build for continuous recognition tasks. Multi-thread support requires
initialization of web workers, which may take longer than executing a simple task using simd.nothreads.
nosimd.threads — build for rare use cases.SIMD — a set of instructions that significantly improves computation speed.
THREADS — multi-threading support via additional web workers using a shared SharedArrayBuffer.
Proper server-side configuration of COOP and COEP headers is required to use it.
Browser support for SIMD and multi-threading can be detected using the wasm-feature-detect
library.
The server must serve *.wasm files with the header:
Content-Type: application/wasm
The server should support compression for .wasm files. WebAssembly files compress well, reducing
delivery time.
Check the content-encoding header in devtools or using:
curl -H "Accept-Encoding: gzip" -I https://example.com/yourfile.wasm
We recommend serving pre-compressed files to avoid runtime compression overhead.
Include the compiled OCR Studio WebAssembly module in your project. No additional configuration is required — just place it in the project directory.
To prevent blocking the main execution thread with heavy recognition tasks, a web worker is used to communicate with the Wasm module.
The worker acts as a mediator between client JavaScript and the Wasm module.
For real-time recognition, the page needs:
Video and canvas elements to display camera input:
<video id="video" class="video" playsinline muted autoplay></video>
<canvas id="canvas" class="canvas"></canvas>
Buttons for scanning documents and face matching:
<button id="scan-button" class="button">
Scan Document
</button>
<button id="capture-face-button" class="button">
Match Face
</button>
Result container to display recognition results:
<div id="result-wrapper" class="result-wrapper">
<h2>Recognition Result</h2>
<div id="output"></div>
</div>
Include the JavaScript file for client logic.
Request video stream from the camera:
stream = await navigator.mediaDevices.getUserMedia({ video: { facingMode: { ideal: "environment" } } })
Assign it to the video element:
const videoEl = document.querySelector("#video")
videoEl.srcObject = stream
await videoEl.play()
Draw video frames on the canvas:
const canvasEl = document.querySelector("#canvas")
const ctx = canvasEl.getContext("2d", { willReadFrequently: true })
const animate = function () {
ctx.drawImage(videoEl, 0, 0, canvasEl.width, canvasEl.height)
requestAnimationFrame(animate)
}
animate()
Initialize the worker and send the document type for recognition:
OCRStudioWorker = new Worker("./worker.js")
OCRStudioWorker.postMessage({
requestType: "createSession",
docData: "*",
})
On button click, capture a video frame and send it to the worker:
const scanButtonEl = document.querySelector("#scan-button")
scanButtonEl.addEventListener("click", async () => {
OCRStudioWorker.postMessage({
requestType: "frame",
imageData: canvasEl.getContext("2d", { willReadFrequently: true })
.getImageData(0, 0, canvasEl.width, canvasEl.height)
})
})
The worker returns coordinates of detected elements to highlight them in the video.
Receive and display results on the page:
OCRStudioWorker.onmessage = function (message) {
switch (message.data.requestType) {
case "result":
let result = message.data
if (Object.keys(result.data).length === 0) {
console.log("Document not found")
OCRStudioWorker.postMessage({ requestType: "reset" })
return
}
printResult(result)
canvasHandler.clear(canvasOverlayEl)
OCRStudioWorker.postMessage({ requestType: "reset" })
break
}
}
A minimal web application with real-time image recognition is now ready.