March 31, 2026

Insights from ICMV 2025: How OCR Studio Made ID Detection Fast Enough for Mobile Devices

Ph.D. Сhief technology officer

OCR Studio’s cutting-edge ID scanning solutions recognize identity documents issued by more than 250 countries and territories while running entirely on the end user’s device with near-instant speed. Even before you fully point your camera at an ID card or driver’s license, the algorithms already know where the document is. Developing such an OCR system, however, is an extremely challenging task since the technology is mostly used in real-life scenarios. Documents are rarely captured in perfect conditions – they can be tilted, blurred, or partially obscured. Lighting can be uneven, causing shadows or reflections, and sometimes the background is cluttered with objects. Nevertheless, all this has to be processed quickly and directly on a mobile device to avoid high-risk data transfer.

Even though smartphones have strict limits on memory and computational resources, OCR solutions must remain accurate in any conditions. Many modern neural networks handle this task well, but they are simply too heavy for real-time use on a mobile device. To address this problem, we brought the power of infinite impulse response (IIR) filters into deep learning – it helped us make neural networks trainable and use them to build a compact but highly effective lightweight model called IIRDoc-Net.

OCR Studio’s advanced neural network architecture can recognize IDs with 32% fewer operations than existing state-of-the-art models. Last year we presented it at the 18th International Conference on Machine Vision (ICMV 2025) in Paris, France, and now we are ready to announce that the corresponding paper has been published in the electronic proceedings of the International Society for Optics and Photonics (SPIE). Find out more about our breakthrough in document segmentation on mobile devices below.

What Makes IIRDoc-Net Different

Most existing neural networks analyze document images piece by piece. This approach works well, but it severely limits how much global context the model can capture. We moved beyond this limitation by introducing learnable IIR filters into the network. In simple terms, they allow the model to “remember” information and propagate it across the image, enabling it to capture ID’s overall structure instead of focusing only on small local regions. As a result, the network can understand relationships between distant parts of the document.

Since IIR filters propagate information across the image gradually, we can control the direction in which that information flows. In our model, it moves in several directions (left to right, right to left, top to bottom, and bottom to top) – right after all these signals are combined into a single, consistent prediction. You can think of it as looking at the document from different angles and merging all the observations together. At the same time, we keep the model small and efficient – it contains only 36K parameters, which makes it fast enough for mobile use without sacrificing accuracy. 

Why This Matters in Practice

The final model is extremely compact and differs from typical low-compute CNN (convolutional neural network) approaches in its method for reducing the amount of required computations. It is suitable for real-time applications on mobile and embedded devices, where every millisecond matters. Though efficiency alone is not enough – the model also needs to be reliable. We tested IIRDoc-Net on real datasets with difficult conditions: glares, blurs, and perspective distortions. Despite its compact size, the model matches larger networks in quality and proves even more reliable when conditions are challenging.

What Happens If We Remove IIR Filters

To answer this question, we ran a simple experiment and compared IIRDoc-Net’s document segmentation results with and without IIR filters. Once we removed the filters, the model began to produce noisier results, especially near document borders. It struggled more in low-light conditions and with complex backgrounds. This showed that IIR filters are not just an extra detail – they are the core component that makes the model robust and reliable.

The goal is not to make neural networks bigger, but to make them more intelligent. By combining signal processing principles with modern deep learning, we developed a model that achieves a strong balance between accuracy and efficiency. This is where a scientific approach truly matters – instead of adding complexity, we reconsider the model’s core design. In real-world applications, especially on mobile devices, that balance is often the key to practical performance.

About OCR Studio

OCR Studio, a developer of optical character recognition solutions, remains committed to a science-driven approach to innovation. Each year, at major international conferences our researchers showcase cutting-edge systems for document recognition, ID authentication, and machine-readable objects scanning. This continuous work helps us bring scientific ideas into practical, deployable solutions. Learn more about our technology for identity documents scanning.

Contents

Get in Touch With Us Today!

For comprehensive details about our complete
range of solutions and services.

Or contact our sales team:

sales@ocrstudio.ai

    * Required information
    By clicking the “Send request” button, you consent to data processing