pytesseract
# 环境配置
python离线图像识别使用tesseract库,对应python库为pytesseract。使用前需要先安装tesseract
安装文档 (opens new window) 编译安装文档 (opens new window)
# windows 环境安装
# mac 环境安装
编译安装
# Packages which are always needed.
brew install automake autoconf libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
# Packages required for training tools.
brew install pango
# Optional packages for extra features.
brew install libarchive
# Optional package for builds using g++.
brew install gcc
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
mkdir build
cd build
# Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler.
../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfig
make -j
# Optionally install Tesseract.
sudo make install
# Optionally build and install training tools.
# 下面安装用于训练工具,酌情安装
make training
sudo make training-install
1
2
3
4
5
6
7
8
9
10
11
12
13
14
2
3
4
5
6
7
8
9
10
11
12
13
14
安装完成后,设置环境变量
export TESSDATA_PREFIX=/usr/local/share/tessdata
执行代码会发现有报错 Error opening data file /usr/local/share/tessdata/eng.traineddata
出错原因是程序在载入训练数据,但未找到训练数据,这里我们可以从github上下载数据,地址为 https://github.com/tesseract-ocr/tessdata
或使用下面命令直接下载:
wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
1
# 代码示例
import pytesseract
from PIL import Image
img = Image.open('./num.jpg')
print(img)
text = pytesseract.image_to_string(img)
print(text)
1
2
3
4
5
6
7
2
3
4
5
6
7
编辑 (opens new window)
上次更新: 2024/03/02, 18:30:15