VectorLinux

Please login or register.

Login with username, password and session length
Advanced search  

News:

Visit our home page for VL info. For support and documentation, visit the Vector Linux Knowledge Center or search the Knowledge Center and this Forum using the search box above.

Author Topic: pdf image to text (ocr software) SOLVED  (Read 4167 times)

sledgehammer

  • Vectorian
  • ****
  • Posts: 1459
pdf image to text (ocr software) SOLVED
« on: February 04, 2008, 11:21:54 am »

I can extract text from pdf files created by a wordprocessor using pdfedit.  However,  can't extract text from pdf files which were created by scanning from a copier.  I believe I can convert pdf images to tif or jpeg using gimp, but can't extract text from them either.

If anyone has had luck using character recognition software on VL 5.8, I would appreciate some tips.

John
« Last Edit: April 14, 2008, 05:22:58 pm by sledgehammer »
Logged
VL7.0 xfce4 Samsung RF511

bad_gui

  • Member
  • *
  • Posts: 61
Re: pdf image to text (ocr software)
« Reply #1 on: February 29, 2008, 07:06:30 pm »

Logged

sledgehammer

  • Vectorian
  • ****
  • Posts: 1459
Re: pdf image to text (ocr software)
« Reply #2 on: March 03, 2008, 10:48:19 pm »

Though I can't yet get to the wiki for some reason, it looks like its just what I need.  Thanks.  I will download it and try it soon.

John
Logged
VL7.0 xfce4 Samsung RF511

sledgehammer

  • Vectorian
  • ****
  • Posts: 1459
Re: pdf image to text (ocr software)
« Reply #3 on: March 24, 2008, 11:01:24 pm »

So far I have been unable to install tesseract.  However, when time permits (or necessity requires) I am going to try the following site, which says it has tesseract already installed. 

http://www.abillionbillion.com/about/document-management-for-everyone

If anyone has tried it, I would appreciate a heads up.
Logged
VL7.0 xfce4 Samsung RF511

sledgehammer

  • Vectorian
  • ****
  • Posts: 1459
Re: pdf image to text (ocr software)
« Reply #4 on: April 05, 2008, 03:57:41 pm »

Thanks bad_gui. 

Thanks a million!  I originally had trouble installing tesseract (didn't know make file had to be run as root).  Eventually I got on to abillionbillion.com and followed the directions there and it installed!  Then I had to learn to import pdfs into gimp as black and white, not color, and save as tiff (tesseract only renders tif files).  Then simple: run tesseract from the prompt and it OCR's perfectly!

Thanks, thanks, thanks, thanks! 

John
Logged
VL7.0 xfce4 Samsung RF511

never_stop_learning

  • Vectorite
  • ***
  • Posts: 263
    • CigarWeekly
Re: pdf image to text (ocr software)
« Reply #5 on: April 05, 2008, 04:16:02 pm »

I happen to be visiting sledgehammer watching the UCLA - Memphis game and can attest to his exuberance..... We thought we were going to have to get the defibrillator out..... ;) ;D
Logged
Laptop: IBM X60s (Centrino/Duo, 2gb ram, 80gb hd) VL 6.0 Std
Netbook: HP Mini (Intel Atom 1ghz, 2gb ram, 16gb SSD + 8gb flash ) VL 6.0 Std
Desktop: Dell Dimension 5150 (P4 3ghz, 2gb ram, 80gb hd) VL 6.0 Std
Wife's Desktop: Gateway (P4 2ghz, 1gb ram, 80gb hd) VL 6.0 Std

789

  • Member
  • *
  • Posts: 26
Re: pdf image to text (ocr software) SOLVED
« Reply #6 on: May 22, 2009, 10:16:13 am »

>>>>>Warning: this topic has not been posted in for at least 120 days.
>>>>>Unless you're sure you want to reply, please consider starting a new topic.
___________
Would there be, somewhere, a compiled version of this TesserAct available for download ?
Logged

sledgehammer

  • Vectorian
  • ****
  • Posts: 1459
Re: pdf image to text (ocr software) SOLVED
« Reply #7 on: May 22, 2009, 10:26:49 am »

Logged
VL7.0 xfce4 Samsung RF511

789

  • Member
  • *
  • Posts: 26
Re: pdf image to text (ocr software) SOLVED
« Reply #8 on: May 22, 2009, 11:01:15 am »

Me no compile ... it is simpler to boot NT and use OmniPage
Is there a compiled version available for download ?
Logged