使用Emgu.CV.OCR

2023年7月20日上午12:44 • 人工智能 • 阅读 62

安装emgucv.

Emgu CV – Browse /emgucv/4.1.1 at SourceForge.net

一、下载tesseract-ocr

http://digi.bib.uni-mannheim.de/tesseract/

二、安装

安装完毕后设置环境变量

TESSDATA_PREFIX X:\Program Files\Tesseract-OCR\

将X:\Tesseract-OCR”添加到环境变量Path中（可以省略，设置该变量后可以用于训练）

参考：Emgu.CV.OCR Unable to create ocr model using Path and language_jzdzhiyun的博客-CSDN博客

tesseract-ocr的安装及使用_褶皱的包子的博客-CSDN博客_tesseract-ocr

三、下载其他语言识别包，也可在安装时直接勾选要安装的语言包，放入tessdata文件夹。

https://github.com/tesseract-ocr/tessdata

GitHub – tesseract-ocr/tessdata_best: Best (most accurate) trained LSTM models.

四、使用

using Emgu.CV.Structure;
using Emgu.CV;
using Emgu.CV.OCR;
private Tesseract _ocr;//创建Tesseract 类

            string path = Application.StartupPath+"//";//申明数据源的路径，在运行目录的tessdata 文件夹下。
            string language = "";//申明选择语言。
                                 //*判断选择的语言*//
            if (checkBox1.Checked && checkBox2.Checked)//checkBox1 为识别英文。
            {
                language = "chi_sim+eng";
            }
            else
            {
                if (checkBox2.Checked)
                {
                    language = "chi_sim";
                }
                else
                {
                    language = "eng";
                    checkBox1.Checked = true;
                }
            }
            try
            {//https://github.com/tesseract-ocr/tessdata    Application.StartupPath + @"\tessdata"  \tessdata   .TesseractOnly))  //TesseractCubeCombined
                _ocr = new Tesseract(@"E:\Program Files\Tesseract-OCR\tessdata", language, OcrEngineMode.Default);//指定参数实例化tessdata 类。地址为空时，需将tessdata文件夹放在debug根目录
                _ocr.PageSegMode = PageSegMode.SingleBlock;
                _ocr.SetImage(gray);
                int result = _ocr.Recognize();
                if (result != 0)
                {
                    MessageBox.Show("识别失败！");
                    return;
                }
                Tesseract.Character[] characters = _ocr.GetCharacters();//获取识别数据
                //Bgr drawColor = new Bgr(Color.Blue);//创建Bgr 为蓝色。
                //foreach (Tesseract.Character c in characters)//遍历每个识别数据。
                //{
                //    image.Draw(c.Region, drawColor, 1);//绘制检测到的区域。
                //}
                //imageBox1.Image = image;//显示绘制矩形区域的图像
                String text = _ocr.GetUTF8Text();//得到识别字符串。
                richTextBox1.Text = text;//显示获取的字符串。
            }
            catch(Exception ex)
            {
                MessageBox.Show("检查运行目录是否有语言包"+ex.ToString());
            }

五、效果

Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)

public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)

    Member of Emgu.CV.OCR.Tesseract

Summary:

Create an tesseract OCR engine.

Parameters:

dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.

language: The language is (usually) an ISO 639-3 string or NULL will default to eng.  It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier.  The language may be a string of the form [~]%lt;lang>[+[~]]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.

mode: OCR engine mode

whiteList: This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY

Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径，lang为语言

tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径，lang为语言

tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

参考：

C# OpenCV6 -车牌识别__iorilan的博客-CSDN博客_c#车牌识别

Tesseract.GetText, Emgu.CV.OCR C# (CSharp) Code Examples – HotExamples

OpenCVSharp入门教程——导读_小康师兄的博客-CSDN博客_opencvsharp中文文档

http://code.google.com/p/opencvsharp/w/list

OpenCVSharp学习 – 知乎 (zhihu.com)

shimat/opencvsharp_samples (github.com)

Original: https://blog.csdn.net/cxyhjl/article/details/124162840
Author: 十年一梦实验室
Title: 使用Emgu.CV.OCR

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/703915/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

PointPillars论文解析和OpenPCDet代码解析

PointPillars是一个来自工业界的模型，整体思想基于图片的处理框架，直接将点云从俯视图的视角划分为一个个的Pillar（立方柱体），从而构成了类似图片的数据，然后在使用2…

人工智能 2023年7月25日
0058
2021年7月20日，使用yolov5训练摔倒检测模型，效果超乎你想象！！！

1.准备阶段，配置好自己的cuda 10.0环境，这里我就不多说了，网上教程太多了。在cmd中输入nvcc -V,最终结果显示下图即确定你的cuda 10.0环境安装成功。2.去g…

人工智能 2023年7月6日
00168
Python中Print()函数的用法___实例详解(全，例多）

Python中Print()函数的用法___实例详解(全，例多）目录：一、print()函数的语法二、print()打印输出文本三、print()中空格的使用方法四、P…

人工智能 2023年7月3日
0068
Excel函数公式大全—INDEX函数

EXCEL系列文章目录 Excel系列文章是本人亲身经历职场之后萌发的想法，为什么Excel覆盖如此之广，几乎每个公司、学校、家庭都在使用，但是它深藏的宝藏功能却很少被人使用，PQ…

人工智能 2023年7月15日
0076
超简单教你在树莓派上安装opencv（二）

超简单教你在树莓派上安装opencv（二）前言一、如何基于python3.9.2安装OpenCv？ * 1.1 查看自己系统位数及Python版本，跟对教程 1.2 换源 1….

人工智能 2023年7月20日
0098
在 Ubuntu 20.04.3 LTS上安装python2.7

Ubuntu 20.04 LTS默认安装的是python3，如果需要使用python2.7，需要自己进行安装。一、启用Universe仓库一般情况下其实应该是已经启用了，但是还…

人工智能 2023年7月5日
0072
HyperLynx（二十八）板层噪声分析和SI/PI联合仿真实例

板层噪声分析和SI/PI联合仿真实例 1.前仿真噪声分析2.后仿真噪声分析3.设置和运行SI/PI联合仿真4.执行信号过孔旁路分析 1.前仿真噪声分析 (1)从”开始&…

人工智能 2023年6月29日
0082
机器学习模型自我代码复现：Softmax分类

根据模型的数学原理进行简单的代码自我复现以及使用测试，仅作自我学习用。模型原理此处不作过多赘述，仅罗列自己将要使用到的部分公式。代码框架部分参照了以下视频中的内容。清华博士爆肝30…

人工智能 2023年7月1日
00113
python数据分析与可视化案例 python数据分析项目 python数据分析基础

python数据分析基础（一）该部分将对python数据结构、函数等基础内容进行回顾，python大牛和想要直接套用模板进行数据分析方法的朋友可以直接跳过此部分。一、基本数据结…

人工智能 2023年7月15日
0052
MySQL-索引

一、介绍索引是数据库对象之一，用于提高字段检索效率，使用者只需要对哪个表中哪些字段建立索引即可，其余什么都不做，数据库会自行处理。索引提供指向存储在表的指定列中的数据值的指针，…

人工智能 2023年7月31日
0058
知识图谱:neo4j图数据库2D、3D前端可视化展示，可与Gis联动

可与neo4j图数据库无缝对接，配备基于nodejs写的请求后台服务。前端可通过url+cypher查询语句，返回构建好的json数据。节点柱状图信息展示与Gis互通消息，互动…

人工智能 2023年6月1日
0080
【python表白神器】手把手教你用代码浪漫追求对象！（附完整源码+讲解）

文章目录 * – 一、前言 – 二、演示看效果！ – 三、网站制作 – 四、部署网站 – 五、二维码制作 –…

人工智能 2023年7月6日
0095
计算机视觉入门路线

给大家写了一个计算机视觉入门路线，这个路线一共分为十一步，每一步指明了学习内容，学习程度，学习方式和学习目的，并指明了各个内容的重难点。欢迎关注公众号 CV技术指南，专注于计算…

人工智能 2023年5月28日
0081
RANSAC点云多平面拟合分割

回答1： pcl是Point Cloud Library的缩写，是一个功能强大的库，提供了多种处理算法。其中，是pcl中比较基础的一个算法。的目的是根据给定的一组出一个 …

人工智能 2023年6月16日
0078
tf.data.Dataset读取数据详细文档

tf.data.Dataset tf.data中包含了两个用于TensorFLow程序的接口:Dataset和Iterator 我们今天主要来看tf.data.Dataset 一 …

人工智能 2023年5月24日
0095
新春特辑 | 中台战略专题合辑报告下载

志在哪里，哪里就有成功；心在哪里，哪里就有风景；爱在哪里，哪里就有感动。在新年来临之际，互联互通社区衷心祝愿您：春节快乐，身体健康，心想事成！ 1、中国非结构化数据中台实践白皮书…

人工智能 2023年6月10日
0089

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

使用Emgu.CV.OCR

大家都在看