Extracting Text from Image || Converting Image – Text

This article is a code for  extracting text from images using Java Tesseract API from   net.sourceforge.tess4j.*;

STEP 1 : ADD THE  net.sourceforge.tess4j.*; API to your POM.XML

 
 net.sourceforge.tess4j 
 tess4j 
 3.2.1 



Image to extract text from

img

STEP 2 : DOWNLOAD  AND PUT THE CAPTCHAS LANGUAGE EXTRACTOR in the tessdata folder https://github.com/tesseract-ocr/tessdata

Suppose you download eng.trainedata from the above url, put the file at the project root folder  tessdata/eng-trainedata

STEP 3 : JAVA CODE TO READ TEXT FROM IMAGE IN ANY FORMAT

package com.amudabadmus.awfa;
import net.sourceforge.tess4j.*;
import java.io.*;

public class App {
   
    public String getImgText(String imageLocation) {
      ITesseract instance = new Tesseract();
      try 
      {
         String imgText = instance.doOCR(new File(imageLocation));
         return imgText;
      } 
      catch (TesseractException e) 
      {
         e.getMessage();
         return "Error while reading image";
      }
   }
   public static void main ( String[] args)
   {
      App app = new App();
      System.out.println(app.getImgText("C:\\Users\\User\\Pictures\\img.png"));
   }
}




Ouput


2017-04-04 at 13-13-03

For more information, check out the source code and the demo.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: