In our previous blog post, we explained how MRZs in identity documents can be recognized directly on AR glasses. Today, we continue with another research topic from SenSys 2026 – the IIR Block, a new neural network component based on learnable infinite impulse response filters that helps compact models capture wider image context without becoming larger. This approach is especially important for document processing on mobile, embedded, and edge devices, where neural networks must remain lightweight while still understanding the full structure of the image. Learn more below.
Making Compact Neural Networks See the Whole Document
For OCR Studio, speed and accuracy are not enough on their own. Our document processing technologies often need to work directly on mobile, embedded, and edge devices, where large neural networks are not always an option. The model must be compact, but it still has to understand the image well. A document image is not only a set of separate pixels or small text fragments. To process it correctly, a neural network needs to catch the global context: where the text is, how the background changes, which parts are noise, and which details must be preserved. In other words, the model needs a wide view of the image.
The standard way to get this wider view is to make the network deeper or more complex. But this is not ideal for embedded devices. So we asked a different question: can a small neural network see more without becoming bigger?
Our answer is the IIR Block — a new neural network component based on learnable infinite impulse response filters. It helps the model aggregate context from distant image regions, giving the model a wider context with very few extra parameters.
What Makes the IIR Block Different
Most convolutional neural networks look at images step by step, through small local regions. This is useful for detecting details, but it can beсome a limitation when the model needs to understand the full structure of an image. To “see” more, networks usually need to become deeper or more complex.
To solve this, we propose the IIR Block — a compact neural network component that applies learnable recursive filtering along horizontal and vertical directions. This helps expand the model’s receptive field without increasing network depth and, in theory, gives it an unlimited receptive field. The idea comes from classical signal processing. IIR filters are recursive, which means they can use not only the current input, but also information that came before. In simple words, they can “remember” and propagate information further across the image. For a neural network, this means that even a compact model can use wider context while making predictions.
One IIR Block helps the model collect context along horizontal and vertical lines. When two IIR Blocks are used one after another, these directional views start to complement each other. The model can combine information not only along separate lines, but also across larger image regions. This gives it a more complete view of the document, helping it preserve small text details while also taking into account larger background patterns.
Why This Matters for Document Images
We tested the approach on document image binarization. This task may sound simple: separate text from the background. In practice, it is much more difficult. Old paper, uneven lighting, background texture, scanning noise, and ink showing through from the other side can make this task challenging.
For good binarization, the model must understand two things at the same time. It needs to preserve small text contours, but it also needs to understand the larger background structure. A very local model may miss this wider context. A very large model may solve the task, but become too heavy for embedded use.
Our model with IIR Blocks contains only 49K parameters. It is more than 40 times smaller than the U-Net-bin baseline, but still performs on the level of much larger networks on the DIBCO 2017 and H-DIBCO 2018 benchmarks. It also achieves lower DRD values than the baseline, which means better preservation of text structure and contours. In practical terms, this means that a small model can still make intelligent decisions about the whole document image. It does not need to become large to understand the wider context.
What Happens Without IIR Filters
To check whether the IIR filters really make a difference, we removed them from the model and replaced them with standard convolutions. The result was clear: with the same number of parameters but without recursive filtering, the model lost part of its advantage. It became less accurate and less precise in preserving text contours.
This confirmed that IIR filters are not just an extra layer in the architecture — they are the mechanism that helps the compact model look wider. That is what makes the approach practical: the network stays lightweight while still using a global context of the document image.
Smaller Networks, Smarter Information Flow
The main idea behind this work is simple: neural networks do not always need to become bigger to become better. Sometimes, the key is to rethink how they process information. By combining signal processing principles with modern deep learning, we can build compact models that are better suited for real-world deployment. The proposed IIR Block shows that recursive filtering can be a practical alternative to deeper architectures and attention-based modules when the goal is efficient receptive field expansion.
About OCR Studio
OCR Studio, a developer of optical character recognition solutions, remains committed to a science-driven approach to innovation. Every year, our researchers take part in leading international conferences, where they present our latest advances in document recognition, ID authenticity verification, and machine-readable objects scanning. This ongoing scientific work helps us transform cutting-edge research into practical technologies for real-world use.
Konstantin Bulatov is a scientist and Chief Technology Officer of OCR Studio, where he has led the development and implementation of advanced OCR technologies. He has designed a method for optimizing object recognition in video streams, which has improved the accuracy and efficiency of real-time OCR systems. Under his direction, OCR Studio develops secure on-device programming solutions that address diverse industry needs and contribute to advancements in the field.
Konstantin is an IEEE Senior Member, he has authored multiple patent applications and published his research in prominent academic conferences and journals. His work emphasizes innovative approaches to developing high-performance recognition systems, reinforcing OCR Studio’s position as a significant contributor to the global technology landscape.