SpringBoot中使用tess4j进行OCR(在macos上面开发)

SpringBoot中使用tess4j进行OCR(在macos上面开发)

问题

最近需要做OCR的实现,需要在Spring Boot工程中引入tess4j库,进行OCR识别。然后,这里使用macos m1进行开发。出现了找不到动态链接库的问题。主要就是找不到如下动态链接库:

  • darwin-aarch64/libtesseract.dylib
  • darwin-aarch64/libleptonica.dylib

思路

在macos上面安装tesseract,tesseract-lang,leptonica这三个库,macos上面本地安装完成后,然后,把上面两个缺少的动态链接库文件,手动加入到tess4j的jar里面即可。

安装tesseract

# tesseract主要程序
brew install tesseract
# tesseract语音支持包,这里默认会把所有语言给安装上
brew install tesseract-lang
# 使用带中文文字的图片,手动测试即可
tesseract 0.png output -l chi_sim --psm 3 --oem 3

这里tesseract相关参数需要解释一下:

  • -l chi_sim:这是指定语言。这里指定的识别语音是中文简体;
  • --psm 3:Page segmentation modes (PSM)页面分割模式,这里使用3是自动模式,默认也是3,自动模式。详细参数如下:
Page segmentation modes (PSM):
  0|osd_only                Orientation and script detection (OSD) only.
  1|auto_osd                Automatic page segmentation with OSD.
  2|auto_only               Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3|auto                    Fully automatic page segmentation, but no OSD. (Default)
  4|single_column           Assume a single column of text of variable sizes.
  5|single_block_vert_text  Assume a single uniform block of vertically aligned text.
  6|single_block            Assume a single uniform block of text.
  7|single_line             Treat the image as a single text line.
  8|single_word             Treat the image as a single word.
  9|circle_word             Treat the image as a single word in a circle.
 10|single_char             Treat the image as a single character.
 11|sparse_text             Sparse text. Find as much text as possible in no particular order.
 12|sparse_text_osd         Sparse text with OSD.
 13|raw_line                Raw line. Treat the image as a single text line,
                            bypassing hacks that are Tesseract-specific.
  • --oem 3:OCR Engine modes (OEM)OCR引擎模式,这里使用3为默认模式,即能用就用。详细如下:
OCR Engine modes (OEM):
  0|tesseract_only          Legacy engine only.
  1|lstm_only               Neural ***s LSTM engine only.
  2|tesseract_lstm_***bined Legacy + LSTM engines.
  3|default                 Default, based on what is available.

如果上面能够正常手动使用tesseract,进行OCR,那就说明tesseract安装没有问题,可以进行下一步了。

安装leptonica

brew install leptonica

这安装主要也是为了获得其中一个动态链接库。

springboot中使用tess4j

pom.xml

<dependency>
    <groupId>***.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>5.16.0</version>
</dependency>

TesseractConfig.java

package ***.xxxx.config;

import ***.sourceforge.tess4j.Tesseract;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;

import java.io.File;
import java.io.IOException;

@Configuration
public class TesseractConfig {

    @Value("classpath:tessdata/")
    private Resource tessdataResource;

    @Bean
    Tesseract tesseract() throws IOException {
        Tesseract tesseract = new Tesseract();
        File tessdataDir = tessdataResource.getFile();
        tesseract.setDatapath(tessdataDir.getPath()); //files of the example : https://github.***/tesseract-ocr/tessdata
        tesseract.setLanguage("chi_sim");
        tesseract.setOcrEngineMode(3);
        tesseract.setPageSegMode(3);
        return tesseract;
    }

}

如果不在代码里面设置tessdata路径,则需要设置环境变量TESSDATA_PREFIX

ITesseractOCRService.java

package ***.xxxx.service;

import org.springframework.web.multipart.MultipartFile;

/**
 * ocr识别服务
 */
public interface ITesseractOCRService {
    /**
     * ocr识别
     * @param multipartFile 上传文件对象
     * @return ocr识别字符串
     */
    String recognizeText(MultipartFile multipartFile);
}

TesseractOCRServiceImpl.java

package ***.xxxxservice.impl;

import ***.xxxx.exception.ServiceException;
import ***.xxxx.service.ITesseractOCRService;
import lombok.extern.slf4j.Slf4j;
import ***.sourceforge.tess4j.Tesseract;
import ***.sourceforge.tess4j.TesseractException;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;

import javax.annotation.Resource;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;

@Service
@Slf4j
public class TesseractOCRServiceImpl implements ITesseractOCRService {
    @Resource
    private Tesseract tesseract;

    @Override
    public String recognizeText(MultipartFile multipartFile) {
        try {
            BufferedImage image = ImageIO.read(multipartFile.getInputStream());
            return tesseract.doOCR(image);
        } catch (TesseractException e) {
            log.error("OCR识别异常", e);
            throw new ServiceException("OCR识别异常");
        } catch (IOException e) {
            log.error("OCR读文件异常", e);
            throw new ServiceException("OCR读文件异常");
        }
    }
}

OcrController.java

package ***.xxxx.web.controller.system.api;

import ***.xxxx.system.service.ITesseractOCRService;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import javax.annotation.Resource;

@RestController
@RequestMapping("/api/ocr")
public class OcrController {


    @Resource
    private ITesseractOCRService iTesseractOCRService;

    @PostMapping("/test")
    public String recognizeText(@RequestParam("file") MultipartFile file) {
        return iTesseractOCRService.recognizeText(file);
    }
}

改造tess4j jar文件

# 找到依赖库jar包位置
cd /Users/xxx/.m2/repository/***/sourceforge/tess4j/tess4j/5.16.0

# 创建os架构对应的文件夹
mkdir darwin-aarch64
# 上传文件夹到jar内部
jar uf tess4j-5.16.0.jar darwin-aarch64

# 复制libtesseract.dylib文件
cp /opt/homebrew/Cellar/tesseract/5.5.1/lib/libtesseract.5.dylib darwin-aarch64/libtesseract.dylib 
# 上传libtesseract.dylib文件到jar中
jar uf tess4j-5.16.0.jar darwin-aarch64/libtesseract.dylib 

# 复制libleptonica.dylib文件
cp /opt/homebrew/Cellar/leptonica/1.86.0/lib/libleptonica.6.dylib darwin-aarch64/libleptonica.dylib
# 上传libleptonica.dylib文件到jar中
jar uf tess4j-5.16.0.jar darwin-aarch64/libleptonica.dylib

# 检查jar文件
jar tf tess4j-5.16.0.jar

最终jar文件如下:

META-INF/
META-INF/MANIFEST.MF
***/
***/recognition/
***/recognition/software/
***/recognition/software/jdeskew/
***/
***/sourceforge/
***/sourceforge/tess4j/
***/sourceforge/tess4j/util/
tessdata/
tessdata/configs/
win32-x86/
win32-x86-64/
META-INF/maven/
META-INF/maven/***.sourceforge.tess4j/
META-INF/maven/***.sourceforge.tess4j/tess4j/
***/recognition/software/jdeskew/ImageDeskew$HoughLine.class
***/recognition/software/jdeskew/ImageDeskew.class
***/recognition/software/jdeskew/ImageUtil.class
***/sourceforge/tess4j/ITessAPI$CANCEL_FUNC.class
***/sourceforge/tess4j/ITessAPI$EANYCODE_CHAR.class
***/sourceforge/tess4j/ITessAPI$ETEXT_DESC.class
***/sourceforge/tess4j/ITessAPI$TessBaseAPI.class
***/sourceforge/tess4j/ITessAPI$TessCancelFunc.class
***/sourceforge/tess4j/ITessAPI$TessChoiceIterator.class
***/sourceforge/tess4j/ITessAPI$TessMutableIterator.class
***/sourceforge/tess4j/ITessAPI$TessOcrEngineMode.class
***/sourceforge/tess4j/ITessAPI$TessOrientation.class
***/sourceforge/tess4j/ITessAPI$TessPageIterator.class
***/sourceforge/tess4j/ITessAPI$TessPageIteratorLevel.class
***/sourceforge/tess4j/ITessAPI$TessPageSegMode.class
***/sourceforge/tess4j/ITessAPI$TessParagraphJustification.class
***/sourceforge/tess4j/ITessAPI$TessPolyBlockType.class
***/sourceforge/tess4j/ITessAPI$TessProgressFunc.class
***/sourceforge/tess4j/ITessAPI$TessResultIterator.class
***/sourceforge/tess4j/ITessAPI$TessResultRenderer.class
***/sourceforge/tess4j/ITessAPI$TessTextlineOrder.class
***/sourceforge/tess4j/ITessAPI$TessWritingDirection.class
***/sourceforge/tess4j/ITessAPI$TimeVal.class
***/sourceforge/tess4j/ITessAPI.class
***/sourceforge/tess4j/ITesseract$RenderedFormat.class
***/sourceforge/tess4j/ITesseract.class
***/sourceforge/tess4j/OCRResult.class
***/sourceforge/tess4j/OSDResult.class
***/sourceforge/tess4j/TessAPI.class
***/sourceforge/tess4j/TessAPI1.class
***/sourceforge/tess4j/Tesseract$1.class
***/sourceforge/tess4j/Tesseract.class
***/sourceforge/tess4j/Tesseract1$1.class
***/sourceforge/tess4j/Tesseract1.class
***/sourceforge/tess4j/TesseractException.class
***/sourceforge/tess4j/util/Hocr2PdfParser$1.class
***/sourceforge/tess4j/util/Hocr2PdfParser.class
***/sourceforge/tess4j/util/ImageHelper.class
***/sourceforge/tess4j/util/ImageIOHelper.class
***/sourceforge/tess4j/util/LoadLibs.class
***/sourceforge/tess4j/util/LoggHelper.class
***/sourceforge/tess4j/util/PdfBoxUtilities$1.class
***/sourceforge/tess4j/util/PdfBoxUtilities$2.class
***/sourceforge/tess4j/util/PdfBoxUtilities.class
***/sourceforge/tess4j/util/PdfUtilities.class
***/sourceforge/tess4j/util/Utils.class
***/sourceforge/tess4j/Word.class
readme.html
tessdata/configs/alto
tessdata/configs/api_config
tessdata/configs/bazaar
tessdata/configs/digits
tessdata/configs/hocr
tessdata/configs/lstmbox
tessdata/configs/pdf
tessdata/configs/quiet
tessdata/configs/tsv
tessdata/configs/txt
tessdata/configs/unlv
tessdata/configs/wordstrbox
tessdata/eng.traineddata
tessdata/osd.traineddata
tessdata/pdf.ttf
versionchanges.txt
win32-x86/libtesseract551.dll
win32-x86-64/libtesseract551.dll
META-INF/maven/***.sourceforge.tess4j/tess4j/pom.xml
META-INF/maven/***.sourceforge.tess4j/tess4j/pom.properties
META-INF/INDEX.LIST
darwin-aarch64/
darwin-aarch64/libtesseract.dylib
darwin-aarch64/libleptonica.dylib

测试ocr效果

补充

遇到识别白色字体在直播透明背景无法识别问题,需要进行如下处理:

BufferedImage image = ImageIO.read(multipartFile.getInputStream());
String result = tesseract.doOCR(originalImage);

// 没有识别出来的图片,先进行图片预处理
if (!StringUtils.hasText(result)){
    // 1. 反色(白字变黑字)
    BufferedImage inverted = invert(originalImage);
    // 2. 降噪(轻微模糊,去掉纹理)
    BufferedImage denoised = gaussianBlur(inverted);

    // 3. 自适应阈值(保留细字)
    BufferedImage binary = adaptiveThreshold(denoised, 15, 0.5);
    result = tesseract.doOCR(binary);
}

预先图片处理

// ------------------ 图像处理部分 ---------------------

// 反色处理(最重要!)
public static BufferedImage invert(BufferedImage src) {
    int w = src.getWidth();
    int h = src.getHeight();
    BufferedImage out = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);

    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {
            int rgb = src.getRGB(x, y);

            int r = 255 - ((rgb >> 16) & 0xFF);
            int g = 255 - ((rgb >> 8) & 0xFF);
            int b = 255 - (rgb & 0xFF);

            out.setRGB(x, y, (r << 16) | (g << 8) | b);
        }
    }
    return out;
}

// 高斯模糊(降噪必需)
public static BufferedImage gaussianBlur(BufferedImage src) {
    float[] kernel = {
            1 / 16f, 2 / 16f, 1 / 16f,
            2 / 16f, 4 / 16f, 2 / 16f,
            1 / 16f, 2 / 16f, 1 / 16f
    };
    return convolve(src, kernel, 3);
}

// 卷积
public static BufferedImage convolve(BufferedImage src, float[] kernel, int size) {
    int w = src.getWidth();
    int h = src.getHeight();
    BufferedImage out = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);

    int half = size / 2;

    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {

            float rSum = 0, gSum = 0, bSum = 0;

            for (int ky = -half; ky <= half; ky++) {
                for (int kx = -half; kx <= half; kx++) {
                    int px = clamp(x + kx, 0, w - 1);
                    int py = clamp(y + ky, 0, h - 1);

                    int rgb = src.getRGB(px, py);
                    int r = (rgb >> 16) & 0xFF;
                    int g = (rgb >> 8) & 0xFF;
                    int b = (rgb & 0xFF);

                    float k = kernel[(ky + half) * size + (kx + half)];
                    rSum += r * k;
                    gSum += g * k;
                    bSum += b * k;
                }
            }

            int nr = clamp(Math.round(rSum), 0, 255);
            int ng = clamp(Math.round(gSum), 0, 255);
            int nb = clamp(Math.round(bSum), 0, 255);

            out.setRGB(x, y, (nr << 16) | (ng << 8) | nb);
        }
    }
    return out;
}

// 自适应阈值(Sauvola 简化版)
public static BufferedImage adaptiveThreshold(BufferedImage src, int window, double k) {
    int w = src.getWidth();
    int h = src.getHeight();

    // 转灰度
    BufferedImage gray = new BufferedImage(w, h, BufferedImage.TYPE_BYTE_GRAY);
    Graphics2D g = gray.createGraphics();
    g.drawImage(src, 0, 0, null);
    g.dispose();

    BufferedImage out = new BufferedImage(w, h, BufferedImage.TYPE_BYTE_BINARY);

    int half = window / 2;

    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {

            // 局部均值 + 方差
            int sum = 0, sumSq = 0, count = 0;

            for (int wy = -half; wy <= half; wy++) {
                for (int wx = -half; wx <= half; wx++) {
                    int px = clamp(x + wx, 0, w - 1);
                    int py = clamp(y + wy, 0, h - 1);
                    int grayVal = gray.getRaster().getSample(px, py, 0);

                    sum += grayVal;
                    sumSq += grayVal * grayVal;
                    count++;
                }
            }

            double mean = sum / (double) count;
            double variance = (sumSq / (double) count) - (mean * mean);
            double std = Math.sqrt(Math.max(0, variance));

            // Sauvola 阈值
            double threshold = mean * (1 + k * ((std / 128) - 1));

            int pixel = gray.getRaster().getSample(x, y, 0);
            int bw = (pixel < threshold) ? 0 : 255;

            out.getRaster().setSample(x, y, 0, bw);
        }
    }
    return out;
}

// 辅助函数
public static int clamp(int v, int min, int max) {
    return Math.max(min, Math.min(max, v));
}

参考

  • Tess4J Tesseract For Java » 5.16.0
  • tess4j
  • tessdata_best
  • 使用 tesseract 做 OCR 文字识别(Java)
  • Tess4j unsatisfied link error on mac OS X
  • A***essing Resource Files in Spring Boot: A ***prehensive Guide
  • A***ess a File from the Classpath in a Spring Application
  • Optical Character Recognition with Tesseract
  • OCR with Tesseract in Java: Converting Images to Text Made Easy
转载请说明出处内容投诉
CSS教程网 » SpringBoot中使用tess4j进行OCR(在macos上面开发)

发表评论

欢迎 访客 发表评论

一个令你着迷的主题!

查看演示 官网购买