Text recognition accuracy assessment
How the BIQE OCR Server achieves OCR accuracy
Recognition accuracy assessment as a tool for obtaining the best result
Information recognition and extraction are important functions of BIQE OCR server. In order to provide our customers with the best recognition quality, we use one of the smartest recognition engines in the world, ABBYY FineReader Engine. (Upon request we can use any other recognition engine that the client wants.)
Our clients usually process large (and even huge) files arrays (scans, images, pdf). Therefore, it is useful and important for our clients to obtain summary information on file processing in order to assess the quality of processing, detect and correct possible problems. For example, low quality scans, blank pages, upside down pages, images containing a lot of garbage, etc. BIQE OCR server helps our customers to conveniently analyze and improve the processing result.
BIQE OCR server displays useful summary information about the results of processing each page. This information is available in tabular form. A separate column in this table shows the estimated value of recognition quality/accuracy for each page. The operator can sort the table by this column and see sorted blank pages or pages with a low recognition quality score. The operator easily finds problematic pages. Then he can correct them in the general package of files.
Thus, BIQE OCR server helps our clients in a convenient way to control the quality of processing and obtain the best target result.
Calculating Recognition Accuracy Score
Assessment of recognition accuracy is the criterion on which the analysis of the target result and, as a consequence, the final quality of file processing is based on. Therefore, the recognition accuracy estimate must be calculated reliably.
Typically, recognition engines evaluate the confidence of recognizing words and individual characters. ABBYY included. The level of recognition confidence depends, first of all, on the quality of the scan and on some other factors (for example, semantic analysis).
The confidence level basically shows how well the recognition engine “recognized” the character or word. This criterion is useful for assessing how well the recognition engine is trained to recognize a particular text and font. But it will not allow you to evaluate the accuracy of recognition.
For example, when recognizing a low-quality scan, the recognition result may be 100% correct and accurate, but the level of confidence that the engine provides can be very low. Therefore, this criterion is not an assessment of recognition accuracy.
To calculate the recognition accuracy score, BIQE OCR server uses its own complex algorithm. This algorithm specifically takes into account not only the confidence level indicator, but also other criteria.
BIQE OCR Server
- Unlimited Speed
- Unlimited MRC PDF compression
- Fully scalable according to available cores/threads
- Unique hotfolder processing