使用Emgu.CV.OCR

安装emgucv.

Emgu CV – Browse /emgucv/4.1.1 at SourceForge.net

使用Emgu.CV.OCR

一、下载tesseract-ocr

http://digi.bib.uni-mannheim.de/tesseract/

使用Emgu.CV.OCR

二、安装

使用Emgu.CV.OCR

使用Emgu.CV.OCR

安装完毕后设置环境变量

TESSDATA_PREFIX X:\Program Files\Tesseract-OCR\

将X:\Tesseract-OCR”添加到环境变量Path中 (可以省略,设置该变量后可以用于训练)

参考:Emgu.CV.OCR Unable to create ocr model using Path and language_jzdzhiyun的博客-CSDN博客

tesseract-ocr的安装及使用_褶皱的包子的博客-CSDN博客_tesseract-ocr

三、下载其他语言识别包 ,也可在安装时直接勾选要安装的语言包,放入tessdata文件夹。

https://github.com/tesseract-ocr/tessdata

GitHub – tesseract-ocr/tessdata_best: Best (most accurate) trained LSTM models.

四、使用

using Emgu.CV.Structure;
using Emgu.CV;
using Emgu.CV.OCR;
private Tesseract _ocr;//创建Tesseract 类

            string path = Application.StartupPath+"//";//申明数据源的路径,在运行目录的tessdata 文件夹下。
            string language = "";//申明选择语言。
                                 //*判断选择的语言*//
            if (checkBox1.Checked && checkBox2.Checked)//checkBox1 为识别英文。
            {
                language = "chi_sim+eng";
            }
            else
            {
                if (checkBox2.Checked)
                {
                    language = "chi_sim";
                }
                else
                {
                    language = "eng";
                    checkBox1.Checked = true;
                }
            }
            try
            {//https://github.com/tesseract-ocr/tessdata    Application.StartupPath + @"\tessdata"  \tessdata   .TesseractOnly))  //TesseractCubeCombined
                _ocr = new Tesseract(@"E:\Program Files\Tesseract-OCR\tessdata", language, OcrEngineMode.Default);//指定参数实例化tessdata 类。地址为空时,需将tessdata文件夹放在debug根目录
                _ocr.PageSegMode = PageSegMode.SingleBlock;
                _ocr.SetImage(gray);
                int result = _ocr.Recognize();
                if (result != 0)
                {
                    MessageBox.Show("识别失败!");
                    return;
                }
                Tesseract.Character[] characters = _ocr.GetCharacters();//获取识别数据
                //Bgr drawColor = new Bgr(Color.Blue);//创建Bgr 为蓝色。
                //foreach (Tesseract.Character c in characters)//遍历每个识别数据。
                //{
                //    image.Draw(c.Region, drawColor, 1);//绘制检测到的区域。
                //}
                //imageBox1.Image = image;//显示绘制矩形区域的图像
                String text = _ocr.GetUTF8Text();//得到识别字符串。
                richTextBox1.Text = text;//显示获取的字符串。
            }
            catch(Exception ex)
            {
                MessageBox.Show("检查运行目录是否有语言包"+ex.ToString());
            }

五、效果

使用Emgu.CV.OCR
Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)

public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)

    Member of Emgu.CV.OCR.Tesseract

Summary:

Create an tesseract OCR engine.

Parameters:

dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.

language: The language is (usually) an ISO 639-3 string or NULL will default to eng.  It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier.  The language may be a string of the form [~]%lt;lang>[+[~]]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.

mode: OCR engine mode

whiteList: This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY

Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言

tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言

tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

参考:

C# OpenCV6 -车牌识别__iorilan的博客-CSDN博客_c#车牌识别

Tesseract.GetText, Emgu.CV.OCR C# (CSharp) Code Examples – HotExamples

OpenCVSharp入门教程——导读_小康师兄的博客-CSDN博客_opencvsharp中文文档

http://code.google.com/p/opencvsharp/w/list

OpenCVSharp学习 – 知乎 (zhihu.com)

shimat/opencvsharp_samples (github.com)

Original: https://blog.csdn.net/cxyhjl/article/details/124162840
Author: 十年一梦实验室
Title: 使用Emgu.CV.OCR

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/703915/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球