Optical character recognition is still one of the most difficult disciplines in image processing and machine intelligence. The pure variety of possible characters and methods of applying characters to different surfaces gives an idea of the challenges involved. The difficulties of converting such complex visual data into clear, structured text include dirt, reflections and shape errors caused by scratching, embossing or laser engraving on solid materials. In addition, overlapping or incomplete characters, as well as a generally low pixel resolution of the image data, can lead to characters quickly becoming almost impossible to distinguish from each other. For example, an 8 quickly becomes a 3. The image processing market is constantly evolving to improve the accuracy and reliability of text recognition. But what are the decisive factors when choosing an OCR system?

Comprehensive database with reproducible accuracy

To be convincing, an OCR must simply work from the outset and offer high reading performance. This requires a well-developed network architecture that has been pre-trained with many diverse training images. Here, situations from real applications are just as indispensable as the use of synthetic data. This not only allows many additional special cases and variations to be learned, but also ensures far more robust recognition of the relevant features. After all, nothing should be left to chance, especially in industrial automation.

This is where DENKnet, the AI vision solution for individual image analysis, steps in. In addition to leading AI technology, users have access to an extremely high-performance and constantly evolving OCR model. All development steps are strictly versioned so that application developers can fall back on defined versions, but also have the option of updating to a new improved version to ensure versatile and robust reading at all times. For quality assurance purposes, the performance and reproducibility of the trained networks can be tested and verified against sample data sets in a quality center before a production system is upgraded with new software.

Application examples for DENKnet OCR

DENKnet OCR reads very reliably in many applications, even without fine-tuning. For example, tire numbers with little contrast, strongly deformed and small numbers on crown caps or information on separating discs with considerable overprinting even with a strongly inhomogeneous background.

The production number on the crown cork of a bottle is unevenly distorted by the forming tool and therefore difficult to read by machine.
The production number on the crown cork of a bottle is unevenly distorted by the forming tool and therefore difficult to read by machine.
OCR of the DOT number on car tires
The DOT number on the tire shows hardly any contrast, but is still read with 91% confidence
OCR of character strings on separator disks
Due to the poor printing, the texts on the separating disk are difficult to read even for humans

Of transformers & large language models

Another positive feature of a good OCR model is its ability to recognize not only individual characters, but also the relationships between them - in the case of character sequences, such as serial numbers or words - and to take this knowledge into account when recognizing characters. The better the OCR can predict subsequent characters and weight the reading result accordingly, the more robustly and precisely special applications can be solved. The generative and combinatorial characteristics of transformer networks or large language models (LLM), such as those used in ChatGPT, can have a further positive influence on such predictions and thus on the reading quality. However, it should be considered that these architectures are rather slow in execution and require a lot of system resources. This makes it all the more important that such cutting-edge technologies are used to the right extent to provide maximum support for the requirements of customer use cases. And in the automation sector in particular, image processing should not be in the range of seconds, but rather in a low millisecond range. A trained neural network should therefore remain fast and lightweight so that it can run on "normal" hardware. If high recognition accuracy and speed in productive use are only possible with almost infinite system performance, applications would hardly be economically viable.

„There is a trend towards making AI smaller and therefore faster and cheaper in execution"

— Daniel Routschka, Sales Manager Artificial Intelligence, IDS Imaging Development Systems GmbH —

Simple correction and retraining

If the OCR fails to read characters, regardless of whether the reason was an error or an unknown character, font or language, it is important that the user can correct the reading results or train any new characters with little effort. However, this fine-tuning is not simply a matter of "continuing training" the network. Imagine, for example, that the OCR model has been already trained with 2 million images and the user now wants to teach the OCR model something new with a few more images of his own. What weighting should be given to such information in the model in order to make a difference but not change everything? And this is precisely where the provider's expertise is required to expand the AI in such a way that previous stable recognitions are not negatively affected by such an adjustment. An example: For some reason, an OCR has problems with numbers and the user only annotates numbers during the training process, but never letters. The goal is to use intelligent "knowledge backup" to prevent this network from only being able to read numbers successfully at some point because it considers that it does not need to read letters.

The DENK Vision AI Hub therefore generates suitable synthetic data for all new images when fine-tuning the DENKnet OCR in order to retrain and weight the network to the right degree. This prevents the OCR from losing its previous abilities, no matter how long it continues to be trained. At the same time, "retraining" remains easy for the user of the Vision AI Hub and fast and performant thanks to cloud-based training in the background. In the best case scenario, the basic skills of the OCR are so good that users no longer need to retrain at all.

Process of label correction and retraining the OCR model
Fine-tuning DENKnet OCR in the DENK Vision AI Hub requires little user interaction and very quickly improves reading quality

Cloud training advantage

All functions and services of the DENK Vision AI Hub are based entirely on cloud technology. This means that fine-tuning takes place on your own image data on a constantly updated and controlled software basis and not on any software version on any local hardware system. The OCR model that can be used there is becoming increasingly resistant to difficulties that have already been solved thanks to continuous further development in the technical backend. As a result, more and more customer applications can even be used without major adjustments or additional training. "Press Play" is like a job for DENKcloud, which trains a large number of suitable network models with different architectures in the background and ultimately provides the user with the best result.

The cloud solution is also an added value for the user in a support case. If there are difficulties with data in a use case, e.g. with unknown characters, technical support in the backend can quickly provide a remedy and positively influence recognition performance. Without having to export/import data and without the risk of different build systems or software versions leading to different results, changes can be made to the network architecture, for example, or the generation of synthetic additional data can be optimized. This can be done directly in the customer use case without any loss of time. Not sending sensitive data also minimizes the risk of unauthorized access.

OCR simply and economically from a single source

There are many providers of OCR solutions in the AI vision environment and there is a veritable race for the best networks. For experienced users, there are also many open source tools and public network architectures available that can be used to quickly gain initial experience and achieve results. However, without in-depth technical knowledge of how AI technology or cutting edge networks and large vision models can be used and combined economically and efficiently, many OCR tasks remain unsolved.

This is not the case with industrial camera manufacturer IDS: together with the AI vision solution DENKnet, all image processing components for fast, reliable and economical OCR tasks can be supplied from a single source. Customers benefit because it works. And it costs nothing to try it out. "Just Press Play"

DENKnet OCR - That makes the difference

  • Synthetic data - Each time new images are uploaded, image variants are automatically generated to expand and strengthen model capabilities in a systematic way.
  • Ease of use + time savings - Intuitive tools such as "Autoprediction" and "1-Click Annotation" require no prior knowledge and reduce testing, preparation and maintenance time.
  • Cutting-edge technology - Knowledge of the latest network architectures, such as Transformer or Large Language Models, is continuously incorporated into the development of DENKnet OCR.
  • Smart Architecture - Fully automated training independently selects the most suitable architecture for the task
  • Cloud training - Always up to date with cutting edge technology and continuous improvement of the network base
  • Fast and economical local execution - The goal is an optimally accurate, lean and fast model for local execution in a closed application environment