BIQE HTR Software
Handwritten Tekst Recognition
Difficulties of OCR to recognise handwritten documents
About BIQE HTR
OCR recognition of written text is a challenging task, because you can not categorise someone’s handwriting under a particular font such as Times New Roman, Calibri or Arial.
Every handwriting has the unique characteristics of the person who took the pen in hand or—in the old days—dipped their quill into an inkwell.
Also, handwritten documents are usually not written on line paper, so the words that belong together will be a different height in the line. The complexity here is the correct segmentation of lines since the OCR software does not know which words should be together and consecutively on the same line to properly recognise a handwritten text document.
The technique of segmentation, used to separate and identify individual words in a handwritten document, presents a significant challenge. The goal is to automate this process as much as possible, a task that requires advanced OCR technology and a deep understanding of handwriting patterns.
Another problem with the OCR of a handwritten document is that only some images have the same formatting or layout. Sometimes, a page has a picture with some accompanying text, and sometimes, a page has text only or a combination of text and photographs.
For the best OCR result, the different handwritten pages must be rotated correctly and legibly. This is usually done during scanning, but sometimes, it is impossible.
BIQE HTR Software deals with all these issues.
Artificial Intelligence and Machine Learning
Artificial Intelligence makes computer systems do something that ‘normally’ requires natural or human intelligence. Artificial intelligence enables computer systems to ‘independently’ take actions that lead to the goal the user of this AI or developer has with it. These artificial intelligence or AI applications are diverse, from Google Search to the well-known movie channel Netflix, the movies and music on YouTube, or Siri and Google Assistant. And of course the well-known ChatGPT.
Machine Learning is a part of Artificial Intelligence that explores static algorithms.
For example:
Suppose you have an 1800s book of 500 pages that you want to make searchable (OCR). Then, you train such a book using Machine Learning. How? By typing the text of those 50 images into a particular program. You then use ML to train these 50 images together with the 50 pages of typed text. This training creates a language model. With that trained language model, you can then apply that language model to the remaining non-typed 450 pages. Those 450 pages are then automatically OCR-ed. So you train with ML, and this ML learns to draw general conclusions from similar unknown data.
There is quite a bit of confusion about whether Machine Learning is the same as Data Mining. The main difference is that Data Mining is used to extract laws or rules from large amounts of data, while Machine Learning teaches a computer how to learn to better understand the given parameters. So, data mining is a research method that determines a particular outcome from the data collected.
BIQE HTR uses artificial intelligence and machine learning to solve the difficulties of OCR-ing handwritten text documents. Below, we list some features we developed to achieve the best OCR results for all your handwritten documents.
Features BIQE HTR Software
AUTOMATIC ROTATION
The initial and crucial step in document OCR involves scanning. This entails scanning at a minimum of 300 dpi and preferably in color to retain as much pixel data as possible for editing.
Sometimes, the material has already been B&W scanned by others at 150 dpi or has been skewed, upside down, or rotated 90 degrees or more. Then, we recommend using our BIQE PROduction or BIQE Archive to enhance the images. We cannot turn B&W scanned images into colour images, but we can improve almost everything else for your images with our 39 image filters.
The ultimate goal of the image filters is to improve the written text so as to achieve the highest possible recognition rate.
Our software recognises whether your images need to be rotated. With our BIQE OCR Server or BIQE HTR, incorrectly rotated images are automatically rotated correctly. A properly rotated document will significantly improve the overall quality of OCR.
SEGMENTATION
With typed letters, you usually don’t have segmentation problems because all the words are neatly straight on a line.
With typed text in the background, a good OCR Engine like Abbyy will segment your document correctly before it is OCR-ed. But this is very different and much trickier with handwritten documents (see image above).
In many cases of handwritten texts, you will have to use a segmentation tool such as Escriptorium. You can then manually correct page segmentation by drawing a segmentation line under, through, or at the top of the words of each line. This is a time-consuming task.
While OCR Engines can automatically handle segmentation for typed text, the same cannot be said for handwritten text. Relying on their expertise for Handwritten Text Recognition can lead to disappointing segmentation and OCR results.
BIQE HTR has a unique algorithm within a high-performance architecture that solves the segmentation problem in almost any manuscript.
You cannot control this segmentation technique because it automatically performs in the background, but with BIQE, at least you have a say in it because we provide customisation even for segmentation. Therefore, we believe we’ve developed the best algorithm for Handwritten Text Recognition!
LANGUAGES-INDEPENDENT
Most OCR Engines recognise one language on a page and use that language’s dictionary. If a handwritten page contains multiple languages, such as Greek and Latin, the OCR of that page will be more sensitive to OCR errors.
BIQE HTR Software is first and foremost language-independent.
Through artificial intelligence (AI), the OCR software knows which language or languages are present in a document, even if several languages are on a page!
In the case of a multilingual document, such as Greek and Latin in our example, BIQE HTR software will automatically recognise the languages and select and apply the correct Greek and/or Latin dictionary in this page or document, in addition to the correct OCR language.
PARALLEL PROCESSING OR MULTI-THREAD SYSTEM
As the name suggests, parallel processing works on multiple processors or cores simultaneously. These processors or cores/threads work independently to carry out (partial) tasks that must be completed.
So, as you know, multithreading is not the same as parallel processing. One might think, “The more threads, the faster the task will be completed,” but that’s not the case.
To gain a deeper understanding, let’s explore the concept of multithreading in the context of both a single-core processor and a multi-core processor.
Single core processors
At first glance, multithreading on a single-core processor may seem counterintuitive. After all, how can one physical processor simultaneously perform multiple tasks?
In fact, simulating multithreading on single-core processors is achieved through a technique called temporary multithreading or context switching.
How it works:
- Task Queue: The operating system divides tasks into small chunks called threads. All these threads are placed in a queue and are waiting to be processed.
- Fast switching: The processor core quickly switches between threads, giving each of them short periods of time. During this time, the thread does its share of work and then gives way to the next one in the queue.
- Illusion of multitasking: By quickly switching between threads, the processor appears to be processing multiple tasks simultaneously.
Technologic benefits:
- Improved responsiveness: Quickly switching between tasks makes your computer more responsive, especially when running light-duty applications.
- Efficient use of resources: Even if one thread occupies the core, other threads can use other processor resources, such as cache and memory.
Understanding that context switching between threads requires additional resources, which can slightly slow down the system.
Multi-core processors
Multi-core processors allow you to achieve true parallelism when working with tasks since the processor can distribute tasks across multiple cores. This allows the system not to “choke” and ensures smooth operation and quick transition between tasks.
Additional cores increase the processor’s overall performance, as some tasks can be executed in parallel. This is called multitasking.
It is important to note that not all programs can effectively distribute the load across multiple cores. In such cases, the advantage of a multi-core processor may be less noticeable.
When designing programs, you must understand that the maximum number of simultaneously executed tasks should be the number of processor cores at most. Otherwise, we will not only not increase the program’s performance but also reduce it due to additional context switches.
Multithreading BIQE
Our BIQE products (BIQE HTR, BIQE PRO, and others) are designed and developed to maximise the use of all processor cores. We use modern principles and technologies to build programs that use modern multi-core processors effectively.
Thus, our products can simultaneously process several different documents or pages and export them to other formats (for example, ALTO-XML, JP2 and TXT).
BIQE, thanks to multithreading, is now operating at a speed that is several times faster than when tasks are performed sequentially, significantly improving its performance.
QUICK EXPORT AND SEARCH IN VIEWER
When handwritten documents or old or typed documents are OCR-ed, it is to search them. When it comes to very large data files, an ordinary viewer is often inadequate, so we developed our fast elastic search viewer.
Our viewer can be subdivided into folders and sub-folders as you see fit. This allows you to select more precisely and in greater detail which chapter of, for example, a book or document you want to search or exclude from the search. Our viewer works with the combined file types Alto-xml with jp2, which you can easily import via our CMS.
Would you like to learn more about our software?
Please contact us, we are happy to help you!
info@biqe.biz
Postal address
Meerweg 17
8313 AK Rutten
Netherlands
BIQE HTR Software
- Windows Software
- OCR handwritten documents
- Language-independent
- Best segmentation feature
- Uses your full PC speed
- Quality Software
- Handwriting to searchable PDF
- No page editing is needed