BIQE HTR Software

Handwritten Tekst Recognition

Difficulties of OCR to recognise handwritten documents

OCR recognition of written text is not easy. You cannot categorise someone’s handwriting under a particular font such as Times New Roman, Calibri or Arial.

Every handwriting has its own unique characteristics of the person who took the pen in hand or – in old days – dipped his or her quill into an inkwell.

Also, handwritten documents are usually not written on line paper, so the words that belong together will not be exactly the same height in the line. The complexity here is the correct segmentations of lines, since the OCR software does not know which words should be together and also consecutively on the same line to achieve correct recognition of a handwritten text document.

For this, a technique called segmentation is used. The challenge here is to process this segmentation AUTOmatically as much as possible.

Another problem with OCR of a handwritten document is, that not every image always has the same formatting or layout. Sometimes a page has a picture with some accompanying text, and sometimes a page has text only, or a combination of both text and pictures.

And, for the best OCR result, it is important those different handwritten pages are rotated in the correct legible way. Usually this is done during scanning, but sometimes this is not possible.

BIQE HTR Software deals with all these issues.

Artificial Intelligence and Machine Learning

Artificial Intelligence makes computer systems do something that ‘normally’ required natural or human intelligence. That artificial intelligence enables computer systems to ‘independently’ take actions that lead to the goal the user of this AI or developer has with it. These artificial intelligence or AI applications are very diverse from Google Search, to the well-known movie channel Netflix, but also the movies and music on YouTube, or Siri and Google Assistant. And of course the well-known ChatGPT.

Machine Learning is a part of Artificial Intelligence that focuses on exploring static algorithms.

For example:
Suppose you have an 1800s book of 500 pages that you want to make searchable (OCR). Then you train such a book using Machine Learning. How? By typing the text of those 50 images into a certain programme. You then use ML to train these 50 images together with the 50 pages of typed text. This training creates a language model. You can then apply that language model to the remaining non-typed 450 pages with that trained language model. Those 450 pages are then automatically OCR-ed. So you train with ML and this ML learns to draw general conclusions from similar unknown data.  

There is quite a bit of confusion about whether Machine Learning is the same thing as Datamining. The main difference is that Datamining is used to extract the laws or rules from large amounts of data, while Machine Learning teaches a computer how to learn to better understand the given parameters. So data mining is a research method to determine a particular outcome from the data collected.  

BIQE HTR uses Artificial Intelligence but also Machine Learning to solve the difficulties that exist when OCR-ing Handwritten Text Documents. Below we list some features we developed to achieve the best OCR results for all your handwritten documents.

Features BIQE HTR Software

  • AUTOMATIC ROTATION

The first and most important step in document OCR is scanning. This means scanning at least at 300dpi and preferably in colour, so that as much pixel data as possible is preserved for editing.

Sometimes the material has already been B&W scanned by others at 150dpi, or has been scanned skewed, scanned upside down or rotated 90 degrees or more. Then we recommend using our BIQE PROduction or BIQE Archive to enhance the images. We cannot turn B&W scanned images into colour images, but we do can improve almost everything else for your images with our 39 image filters.

The ultimate goal of the image filters is to improve the written text in such a way as to achieve the highest possible recognition rate.

Our software recognises whether your images need to be rotated. With our BIQE OCR Server or with our BIQE HTR, incorrectly rotated images are automatically rotated correctly. A properly rotated document will significantly improve the overall quality of OCR.

  • SEGMENTATION

With typed letters, you usually don’t have segmentation problems because all the words are neatly straight on a line.

With typed text in the background, a good OCR Engine like Abbyy will segment your document properly before it is OCR-ed. But this is very different and much trickier with handwritten documents (see image above).

You will have to use a segmentation tool such as Escriptorium in many cases of handwritten texts. You can then manually correct page segmentation by drawing a segmentation line under, through or at the top of the words of each line. This is a time-consuming task.

Often, you don’t have access to segmentation options at all, because the OCR Engine already does that automatically, for you. This is not a problem with typed text, but if you depend on their expertise for Handwritten Text Recognition as well, then the segmentation and thus the OCR result will be quite disappointing.

BIQE HTR has a unique algorithm within a very high-performance architecture that solves the segmentation problem in almost any manuscript.

You cannot control this segmentation technique because it automatically performs in the background, but with BIQE at least you have a say in it, because even for segmentation, we provide customisation. We therefore honestly believe that we’ve developed the best algorithm for Handwritten Text Recognition!

  • LANGUAGES-INDEPENDENT

Most OCR Engines will recognise one language in a page and use that language’s dictionary. If a handwritten page contains multiple languages, for example Greek and Latin, the OCR of that page will be more sensitive to OCR errors.

 

BIQE HTR Software is first and foremost language-independent. Through artificial intelligence (AI), the OCR software knows which language or languages are present in a document, even if there are several languages on a page! In the case of a multilingual document, such as Greek and Latin in our example, BIQE HTR software will automatically recognise the languages and select and apply the correct Greek and/or Latin dictionary in this page or document, in addition to the correct OCR language.

  • PARALLEL PROCESSING OR MULTI-THREADS SYSTEM

As the name suggests, parallel processing works on multiple processors or cores at the same time. These processors or cores/threads work independently to carry out (partial) tasks that need to be completed.

Please note that multithreading is not the same as parallel processing. One might think “the more threads, the faster the task will be completed”, but that’s not the case. To understand this issue, let’s look at multithreading for a single-core processor and for a multi-core processor.

Single core processors

At first glance, multithreading on a single-core processor may seem counterintuitive. After all, how can one physical processor simultaneously perform multiple tasks?

In fact, simulating multithreading on single-core processors is achieved through a technique called temporary multithreading or context switching.

How it works:

  1. Task Queue: The operating system divides tasks into small chunks called threads. All these threads are placed in a queue, waiting to be processed.
  2. Fast switching: The processor core quickly switches between threads, giving each of them short periods of time. During this time, the thread does its share of work and then gives way to the next one in the queue.
  3. Illusion of multitasking: By quickly switching between threads, it appears that the processor is processing multiple tasks simultaneously.

 

Technologic benefits:

  • Improved responsiveness: By quickly switching between tasks, your computer feels more responsive, especially when running light-duty applications.
  • Efficient use of resources: Even if the core is occupied by one thread, other threads can use other processor resources such as cache and memory.

It is important to understand that context switching between threads requires additional resources, which can slightly slow down the system.

Multi core processors

Multi-core processors allow you to achieve true parallelism when working with tasks, since the processor has the ability to distribute tasks across multiple cores. This allows the system not to “choke” and ensure smooth operation and quick transition between tasks.

Additional cores increase the overall performance of the processor as some tasks can be executed in parallel. This is called multitasking.

It is important to note that not all programs can effectively distribute the load across multiple cores. In such cases, the advantage of a multi-core processor may be less noticeable.

When designing programs, you need to understand that the permissible maximum number of simultaneously executed tasks should not be greater than the number of processor cores. Otherwise, we will not only not increase the performance of the program, but also reduce it due to additional context switches.

Multithreading BIQE

Our BIQE products (BIQE HTR, BIQE PRO and other products) are designed and developed to make maximum use of all processor cores. We use modern principles and technologies to build programs that effectively use modern multi-core processors.

Thus, our products can simultaneously process several different documents or pages, and export to different formats (for example, ALTO – XML, JP 2 and TXT ).

BIQE is therefore many times faster than when doing this work sequentially.

  • QUICK EXPORT AND SEARCH IN VIEWER

When handwritten documents or old or typed documents are OCR-ed, it is with the aim of searching them. When it comes to very large data files, an ordinary viewer is often inadequate which is why we developed our own very fast elastic search viewer.

Our viewer can be subdivide into folders and sub-folders as you see fit. This allows you to select even more precisely and in greater detail which chapter of, for example, a book or document you want to search or, conversely, exclude from the search. Our viewer works with the combined file types Alto-xml with jp2, which you can easily import via our CMS.

Would you like to learn more about our software?
Please contact us, we are happy to help you!
info@biqe.biz 

Postal address
Meerweg 17
8313 AK Rutten
Netherlands

BIQE HTR Software

  • Windows Software
  • OCR handwritten documents
  • Language-independent
  • Quality software

BIQE delivers. Unlimited!
Scanning - Optimization - OCR
We are your expert. Ask us!

All blogs