Tesseract for iOS
Introduction
Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, and development has been sponsored by Google since 2006 until now (2020). This is the most popular and qualitative OCR-library. It uses artificial intelligence(AI) for text search and its recognition on images. It supports multiple platforms: MacOS, Window, Linux, but it can be compiled for iOS and Android also.
This is the source code’ s repository.
https://github.com/tesseract-ocr/tesseract
The number of languages supported is over 100, which each .traineddata file is a language trained model.
https://github.com/tesseract-ocr/tessdata
Now I would like to describe how to implement Tesseract for iOS.
Development Environment
- Macos Catalina 10.15.2
- Xcode 11.3, Swift 5
- Tesseract 4.1.1
- Leptonica 1.79.0
- OpenCV 4.2.0
Download and include dependencies
First, create new xcode project with Single View App mode
Download tesseract 4.1.1 version for iOS
(This is the compiled version from https://github.com/tesseract-ocr/tesseract for iOS ONLY)
https://github.com/kang298/Tesseract-builds-for-iOS/tree/tesseract-4.1.1
After downloaded and unzipped, you will have 2 folders “include” and “lib”, drag and drop both to your xcode project
Download OpenCV iOS framework
https://opencv.org/releases/ then drag drop to xcode project
Press command + R to build to make sure no error
Download languages’ trained model files
https://github.com/tesseract-ocr/tessdata
In this tutorial, we will test with 3 languages: English, Japanese, Vietnamese. So we should download the following models files and saved them in a folder named tessdata:
- eng.traineddata
- jpn.traineddata
- vie.traineddata
Then drag drop that folder to xcode project. NOTE: choose “Create folder references” instead of “Create Group” when adding that folder to project
Coding
Because Tesseract is developed by C++ so you only code by C++. Create an C++ file named tesseract_wrapper.cpp in project like following
Remember to check “Also create a header file” so that Xcode will create a header (tesseract_wrapper.hpp) file for you C++ file.
tesseract_wrapper.hpp
// // tesseract_wrapper.hpp // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // #ifndef tesseract_wrapper_hpp #define tesseract_wrapper_hpp #include "opencv2/imgproc.hpp" #include "stdio.h" using namespace cv; String ocrUsingTesseractCPP(String image_path,String data_path,String language); #endif /* tesseract_wrapper_hpp */
tesseract_wrapper.cpp
// // tesseract_wrapper.cpp // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // #include "allheaders.h" #include "opencv2/imgproc.hpp" #include "opencv2/highgui.hpp" #include "baseapi.h" #include "tesseract_wrapper.hpp" using namespace cv; using namespace tesseract; /* matToPix(): convert from OpenCV Image Container to Leptonica's Pix Struct Params: mat: OpenCV Mat image Container Output Leptonica's Pix Struct */ Pix* matToPix(Mat *mat){ int image_depth = 8; //create a Leptonica's Pix Struct with width, height of OpenCV Image Container Pix *pixd = pixCreate(mat->size().width, mat->size().height, image_depth); for(int y=0; yrows; y++) { for(int x=0; xcols; x++) { pixSetPixel(pixd, x, y, (l_uint32) mat->at(y,x)); } } return pixd; } /* ocrUsingTesseractCPP(): Using Tesseract engine to read text from image Params: image_path: path to image data_path: path to folder containing .traineddata files language: expeted language to detect (eng,jpn,..) Output: String detected from image */ String ocrUsingTesseractCPP(String image_path,String data_path,String language){ //load a Mat Image Container from image's path and gray scale mode Mat image = imread(image_path,IMREAD_GRAYSCALE); TessBaseAPI* tessEngine = new TessBaseAPI(); //Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns, in this tutorial we just focus on LSTM only OcrEngineMode mode = tesseract::OEM_LSTM_ONLY; //init Tesseract engine tessEngine->Init(data_path.c_str(), language.c_str(), mode); //Set mode for page layout analysis, refer for all modes supporting //https://tesseract.patagames.com/help/html/T_Patagames_Ocr_Enums_PageSegMode.htm PageSegMode pageSegMode = tesseract::PSM_SINGLE_BLOCK; tessEngine->SetPageSegMode(pageSegMode); //increase accuracy for japanese if(language.compare("jpn") == 0){ tessEngine->SetVariable("chop_enable", "true"); tessEngine->SetVariable("use_new_state_cost", "false"); tessEngine->SetVariable("segment_segcost_rating", "false"); tessEngine->SetVariable("enable_new_segsearch", "0"); tessEngine->SetVariable("language_model_ngram_on", "0"); tessEngine->SetVariable("textord_force_make_prop_words", "false"); tessEngine->SetVariable("edges_max_children_per_outline", "40"); } //convert from OpenCV Image Container to Leptonica's Pix Struct Pix *pixImage = matToPix(&image); //set Leptonica's Pix Struct to Tesseract engine tessEngine->SetImage(pixImage); //get recognized text in UTF8 encoding char *text = tessEngine->GetUTF8Text(); //release Tesseract's cache tessEngine->End(); pixDestroy(&pixImage); return text; }
Because Swift can not call C++ function directly so we will a objective-c wrapper file to handle that.
- TesseractWrapper.h
- TesseractWrapper.mm (not .m because this file is for C++ compilation)
TesseractWrapper.h
// // TesseractWrapper.h // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // #import "Foundation/Foundation.h" #import "UIKit/UIKit.h" @interface TesseractWrapper : NSObject +(NSString*)ocrUsingTesseractObjectiveC:(UIImage*)image language:(NSString*)language; @end
TesseractWrapper.mm
// // TesseractWrapper.m // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // #import "TesseractWrapper.h" #include "tesseract_wrapper.hpp" @implementation TesseractWrapper /* ocrUsingTesseractObjectiveC() call ocrUsingTesseractCPP() to recognize text from image params: image: image to recognize text language: eng/jpn/vie output: recognized string */ +(NSString*)ocrUsingTesseractObjectiveC:(UIImage*)image language:(NSString*)language{ //get path of folder containing .traineddata files NSString* data_path = [NSString stringWithFormat:@"%@/tessdata/",[[NSBundle mainBundle] bundlePath]]; //save image to app's cache directory NSString* cache_dir = [NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) lastObject]; NSString* image_path = [NSString stringWithFormat:@"%@/image.jpeg",cache_dir]; NSData* data = UIImageJPEGRepresentation(image, 0.5); NSURL* url = [NSURL fileURLWithPath:image_path]; [data writeToURL:url atomically:true]; //get text from image using ocrUsingTesseractCPP() from file tesseract_wrapper.hpp String str = ocrUsingTesseractCPP([image_path UTF8String], [data_path UTF8String], [language UTF8String]); NSString* result_string = [NSString stringWithCString:str.c_str() encoding:NSUTF8StringEncoding]; //remove cached image [[NSFileManager defaultManager] removeItemAtURL:url error:nil]; return result_string; } @end
Create a simple screen with a textview and button only in ViewController.swift
ViewController.swift
// // ViewController.swift // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // import UIKit import CropViewController class ViewController: UIViewController { @IBOutlet weak var txt: UITextView! override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view. } @IBAction func ocr(_ sender: Any) { //if camera not supported if !UIImagePickerController.isSourceTypeAvailable(.camera){ return } //present camera to take image let pickerController = UIImagePickerController() pickerController.delegate = self as UIImagePickerControllerDelegate & UINavigationControllerDelegate pickerController.sourceType = .camera self.present(pickerController, animated: true, completion: nil) } } extension ViewController: UIImagePickerControllerDelegate,UINavigationControllerDelegate{ func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) { picker.dismiss(animated: true) { guard let image = info[.originalImage] as? UIImage else { return } //present a crop image frame to focus on text content let cropViewController = CropViewController.init(image: image) cropViewController.delegate = self self.present(cropViewController, animated: true, completion: nil) } } func imagePickerControllerDidCancel(_ picker: UIImagePickerController) { picker.dismiss(animated: true, completion: nil) } } extension ViewController:CropViewControllerDelegate{ func cropViewController(_ cropViewController: CropViewController, didCropToImage image: UIImage, withRect cropRect: CGRect, angle: Int) { cropViewController.dismiss(animated: true) { //call objective-c wrapper with expected language let str = TesseractWrapper.ocr(usingTesseract: image, language: "jpn") self.txt.text = str } } func cropViewController(_ cropViewController: CropViewController, didFinishCancelled cancelled: Bool) { cropViewController.dismiss(animated: true, completion: nil) } }
Here is the test result with Japanese language. You can check with English and Vietnamese also with the same above way.
Conclusion
The text recognition on images is realizable task but there are some difficulties. The main problem is quality (size, lightning, contrast) of images. And each image has different problems so adding a filter tool so that user can edit manually, which is also an option. Refer to below link for improving image quality:
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality