Sunday, May 1, 2011

Extracting data from a pdf file

There are a few quite expensive commercial programs for extracting data from pdf files, but a few lines of Mathematica code may be able to do the job for most cases. For example, the following file with  many pages
results in the following table
Note.- This method works for text-searchable pdf files. Otherwise, this would be an image processing job.


  1. I'm sure the image processing could also be able in Mathematica.

  2. Yes, Mathematica can do imagine processing as well but that would require more than a few lines of code. Some day I will do it.

  3. Hi all,

    Pdf is an attractive option for businesses and individuals wanting to send and store brochures and manuals electronically without compromising design or aesthetic. Unfortunately, the integrated nature of the files can make things difficult for those who want to extract text and information from the files. A number of options exist to aid in this process. Thanks a lot......

    Extracting Data From Web