Building Image -> Text web app using JavaEE

Years back, extracting text from images seems to be one of the greatest challenges to all developers , with the arrival of great tools, reading and extracting text from images is easy.

I will be explaining and showing how to extract text from images using Java 
Tesseract API from   net.sourceforge.tess4j.*;

Extracting text from image means you are considering the flowchart imagery which is processed to extract the text components and then extract the geometrical shapes components. We analyze text, and various geometrical shapes present in flowchart and carry out a variety of processes such as image segmentation, shape description, text and geometric components extraction, recognition and linking. The text components are extracted and then geometrical components are extracted. we also proposed a auto directional transformation of contour chain method for shape description. The internal relationship between the components is set up by tracing the flow lines which connect different components. Thus a flowchart is correctly extracted. The extracted components are output to metadata (XML format) which is machine readable. These metadata can be archived, store as knowledge base or shared with others.

STEP 1 : ADD THE  net.sourceforge.tess4j.*; API to your POM.XML

<dependency> 
 <groupId>net.sourceforge.tess4j</groupId> 
 <artifactId>tess4j</artifactId> 
 <version>3.2.1</version> 
</dependency>

 

STEP 3 : DOWNLOAD  AND PUT THE CAPTCHAS LANGUAGE EXTRACTOR in the tessdata folder https://github.com/tesseract-ocr/tessdata

Suppose you download eng.trainedata from the above url, put the file at the project root folder  tessdata/eng-trainedata

public class ImageExtractor
{
    private String imgText;
    public String getImgText(String imageLocation) {
        ITesseract instance = new Tesseract();
        try {
            imgText = instance.doOCR(new File(imageLocation));
            return imgText;
        } catch (TesseractException e) {
            e.getMessage();
            return "Ooops , can't read Image.Please try again...";
        }
    }
}

 

Checkout the source code and the demo

Hope this make your day better…. 🙂

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: