Programing

테서 랙트 OCR 정확도 향상을위한 이미지 처리

lottogame 2020. 7. 6. 07:47
반응형

테서 랙트 OCR 정확도 향상을위한 이미지 처리


tesseract를 사용하여 문서를 텍스트로 변환했습니다. 문서의 품질은 매우 다양하며 어떤 종류의 이미지 처리가 결과를 향상시킬 수 있는지에 대한 팁을 찾고 있습니다. 팩스 장치에서 생성 된 것과 같이 픽셀 화가 높은 텍스트는 특히 테서 랙트가 처리하기 어려운 것으로 나타났습니다. 아마도 문자의 모든 들쭉날쭉 한 가장자리가 모양 인식 알고리즘을 혼란스럽게 만듭니다.

어떤 종류의 이미지 처리 기술로 정확도가 향상됩니까? 가우시안 블러를 사용하여 픽셀 화 된 이미지를 부드럽게하고 약간의 개선이 있었지만 더 나은 결과를 얻을 수있는 더 구체적인 기술이 있기를 바랍니다. 흑백 이미지로 조정 된 필터를 사용하여 불규칙한 가장자리를 매끄럽게 한 다음 대비를 높여서 문자를 더 뚜렷하게 만듭니다.

이미지 처리에 초보자 인 사람을위한 일반적인 팁이 있습니까?


  1. DPI 수정 (필요한 경우) 300 DPI 이상
  2. 텍스트 크기 수정 (예 : 12pt는 괜찮아 야 함)
  3. 텍스트 줄을 수정하려고합니다 (텍스트를 줄이거 나 줄임)
  4. 이미지의 조명을 수정하려고합니다 (예 : 이미지의 어두운 부분 없음)
  5. 이진화 및 노이즈 제거 이미지

모든 경우에 맞는 범용 명령 줄이 없습니다 (때로는 이미지를 흐리게하고 선명하게해야 함). 그러나 Fred의 ImageMagick Scripts에서 TEXTCLEANER를 사용해 볼 수 있습니다 .

커맨드 라인을 좋아하지 않는다면 opensource scantailor.sourceforge.net 또는 상업용 bookrestorer를 사용해보십시오 .


나는 결코 OCR 전문가가 아닙니다. 그러나 이번 주에는 jpg에서 텍스트를 변환해야했습니다.

색상이 지정된 RGB 445x747 픽셀 jpg로 시작했습니다. 나는 즉시 이것에 대해 tesseract를 시도했으며 프로그램은 거의 아무것도 변환하지 않았습니다. 그런 다음 김프에 들어가서 다음을 수행했습니다. 이미지> 모드> 회색조 이미지> 스케일 이미지> 1191x2000 픽셀 필터> 향상> 반경 값 = 6.8, 양 = 2.69, 임계 값 = 0 인 언샵 마스크 100 % 품질로 새 jpg로 저장했습니다.

그런 다음 Tesseract는 모든 텍스트를 .txt 파일로 추출 할 수있었습니다

김프는 당신의 친구입니다.


이미지의 가독성을 높이기위한 세 가지 점 : 1) 가변 높이와 너비로 이미지 크기를 조정합니다 (이미지 높이와 너비에 0.5와 1, 2를 곱함). 2) 이미지를 그레이 스케일 형식 (흑백)으로 변환합니다. 3) 노이즈 픽셀을 제거하고 더 선명하게 만듭니다 (이미지 필터링).

아래 코드를 참조하십시오 :

//Resize
  public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
        {

                Bitmap temp = (Bitmap)bmp;

                Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);

                double nWidthFactor = (double)temp.Width / (double)newWidth;
                double nHeightFactor = (double)temp.Height / (double)newHeight;

                double fx, fy, nx, ny;
                int cx, cy, fr_x, fr_y;
                Color color1 = new Color();
                Color color2 = new Color();
                Color color3 = new Color();
                Color color4 = new Color();
                byte nRed, nGreen, nBlue;

                byte bp1, bp2;

                for (int x = 0; x < bmap.Width; ++x)
                {
                    for (int y = 0; y < bmap.Height; ++y)
                    {

                        fr_x = (int)Math.Floor(x * nWidthFactor);
                        fr_y = (int)Math.Floor(y * nHeightFactor);
                        cx = fr_x + 1;
                        if (cx >= temp.Width) cx = fr_x;
                        cy = fr_y + 1;
                        if (cy >= temp.Height) cy = fr_y;
                        fx = x * nWidthFactor - fr_x;
                        fy = y * nHeightFactor - fr_y;
                        nx = 1.0 - fx;
                        ny = 1.0 - fy;

                        color1 = temp.GetPixel(fr_x, fr_y);
                        color2 = temp.GetPixel(cx, fr_y);
                        color3 = temp.GetPixel(fr_x, cy);
                        color4 = temp.GetPixel(cx, cy);

                        // Blue
                        bp1 = (byte)(nx * color1.B + fx * color2.B);

                        bp2 = (byte)(nx * color3.B + fx * color4.B);

                        nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Green
                        bp1 = (byte)(nx * color1.G + fx * color2.G);

                        bp2 = (byte)(nx * color3.G + fx * color4.G);

                        nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Red
                        bp1 = (byte)(nx * color1.R + fx * color2.R);

                        bp2 = (byte)(nx * color3.R + fx * color4.R);

                        nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
                (255, nRed, nGreen, nBlue));
                    }
                }



                bmap = SetGrayscale(bmap);
                bmap = RemoveNoise(bmap);

                return bmap;

        }


//SetGrayscale
  public Bitmap SetGrayscale(Bitmap img)
        {

            Bitmap temp = (Bitmap)img;
            Bitmap bmap = (Bitmap)temp.Clone();
            Color c;
            for (int i = 0; i < bmap.Width; i++)
            {
                for (int j = 0; j < bmap.Height; j++)
                {
                    c = bmap.GetPixel(i, j);
                    byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);

                    bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
                }
            }
            return (Bitmap)bmap.Clone();

        }
//RemoveNoise
   public Bitmap RemoveNoise(Bitmap bmap)
        {

            for (var x = 0; x < bmap.Width; x++)
            {
                for (var y = 0; y < bmap.Height; y++)
                {
                    var pixel = bmap.GetPixel(x, y);
                    if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
                        bmap.SetPixel(x, y, Color.Black);
                    else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
                        bmap.SetPixel(x, y, Color.White);
                }
            }

            return bmap;
        }

이미지 입력
이미지 입력

출력 이미지 출력 이미지


이것은 다소 오래되었지만 여전히 유용 할 수 있습니다.

My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.

Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.


What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.

BTW: Kudos to it's author for sharing such a painstaking algorithm.

Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.

If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.

P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.


As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:

  1. Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):

    img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
    
  2. Converting image to grayscale:

    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
  3. Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):

    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)
    
  4. Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):

    cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    
    cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    
    cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    
    cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
    
    cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
    
    cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
    

I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.

In case you'd like to check them out, here I'm sharing the links with you:


Java version for Sathyaraj's code above:

// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
    Bitmap bmap = img.copy(img.getConfig(), true);

    double nWidthFactor = (double) img.getWidth() / (double) newWidth;
    double nHeightFactor = (double) img.getHeight() / (double) newHeight;

    double fx, fy, nx, ny;
    int cx, cy, fr_x, fr_y;
    int color1;
    int color2;
    int color3;
    int color4;
    byte nRed, nGreen, nBlue;

    byte bp1, bp2;

    for (int x = 0; x < bmap.getWidth(); ++x) {
        for (int y = 0; y < bmap.getHeight(); ++y) {

            fr_x = (int) Math.floor(x * nWidthFactor);
            fr_y = (int) Math.floor(y * nHeightFactor);
            cx = fr_x + 1;
            if (cx >= img.getWidth())
                cx = fr_x;
            cy = fr_y + 1;
            if (cy >= img.getHeight())
                cy = fr_y;
            fx = x * nWidthFactor - fr_x;
            fy = y * nHeightFactor - fr_y;
            nx = 1.0 - fx;
            ny = 1.0 - fy;

            color1 = img.getPixel(fr_x, fr_y);
            color2 = img.getPixel(cx, fr_y);
            color3 = img.getPixel(fr_x, cy);
            color4 = img.getPixel(cx, cy);

            // Blue
            bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
            bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
            nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            // Green
            bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
            bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
            nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            // Red
            bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
            bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
            nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
        }
    }

    bmap = setGrayscale(bmap);
    bmap = removeNoise(bmap);

    return bmap;
}

// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
    Bitmap bmap = img.copy(img.getConfig(), true);
    int c;
    for (int i = 0; i < bmap.getWidth(); i++) {
        for (int j = 0; j < bmap.getHeight(); j++) {
            c = bmap.getPixel(i, j);
            byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
                    + .114 * Color.blue(c));

            bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
        }
    }
    return bmap;
}

// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
    for (int x = 0; x < bmap.getWidth(); x++) {
        for (int y = 0; y < bmap.getHeight(); y++) {
            int pixel = bmap.getPixel(x, y);
            if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
                bmap.setPixel(x, y, Color.BLACK);
            }
        }
    }
    for (int x = 0; x < bmap.getWidth(); x++) {
        for (int y = 0; y < bmap.getHeight(); y++) {
            int pixel = bmap.getPixel(x, y);
            if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
                bmap.setPixel(x, y, Color.WHITE);
            }
        }
    }
    return bmap;
}

Adaptive thresholding is important if the lighting is uneven across the image. My preprocessing using GraphicsMagic is mentioned in this post: https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4

GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.

Another method of thresholding using OpenCV is described here: http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html


The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.

To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).

More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:

$ tesseract --oem 1 -l deu page.png result pdf

(this example selects the german language)

Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.

(as of late 2017, Tesseract 4 isn't released as stable yet, but the development version is usable)


I did these to get good results out of an image which has not very small text.

  1. Apply blur to the original image.
  2. Apply Adaptive Threshold.
  3. Apply Sharpening effect.

And if the still not getting good results, scale the image to 150% or 200%.


Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.

1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.

2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.

3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.

This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.


Text Recognition depends on a variety of factors to produce a good quality output. OCR output highly depends on the quality of input image. This is why every OCR engine provides guidelines regarding the quality of input image and its size. These guidelines help OCR engine to produce accurate results.

I have written a detailed article on image processing in python. Kindly follow the link below for more explanation. Also added the python source code to implement those process.

Please write a comment if you have a suggestion or better idea on this topic to improve it.

https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033

참고URL : https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy

반응형