VectorLinux
September 30, 2014, 10:27:20 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Visit our home page for VL info. To search the old message board go to http://vectorlinux.com/forum1. The first VL forum is temporarily offline until we can find a host for it. Thanks for your patience.
 
Now powered by KnowledgeDex.
   Home   Help Search Login Register  
Please support VectorLinux!
Pages: [1]
  Print  
Author Topic: pdf image to text (ocr software) SOLVED  (Read 3867 times)
sledgehammer
Vectorian
****
Posts: 1424



« on: February 04, 2008, 11:21:54 am »

I can extract text from pdf files created by a wordprocessor using pdfedit.  However,  can't extract text from pdf files which were created by scanning from a copier.  I believe I can convert pdf images to tif or jpeg using gimp, but can't extract text from them either.

If anyone has had luck using character recognition software on VL 5.8, I would appreciate some tips.

John
« Last Edit: April 14, 2008, 04:22:58 pm by sledgehammer » Logged

VL7.0 xfce4 Samsung RF511
bad_gui
Member
*
Posts: 61


« Reply #1 on: February 29, 2008, 07:06:30 pm »

I haven't used this but it may work:

http://code.google.com/p/tesseract-ocr/

http://gentoo-wiki.com/HOWTO_do_OCR
Logged
sledgehammer
Vectorian
****
Posts: 1424



« Reply #2 on: March 03, 2008, 10:48:19 pm »

Though I can't yet get to the wiki for some reason, it looks like its just what I need.  Thanks.  I will download it and try it soon.

John
Logged

VL7.0 xfce4 Samsung RF511
sledgehammer
Vectorian
****
Posts: 1424



« Reply #3 on: March 24, 2008, 10:01:24 pm »

So far I have been unable to install tesseract.  However, when time permits (or necessity requires) I am going to try the following site, which says it has tesseract already installed. 

http://www.abillionbillion.com/about/document-management-for-everyone

If anyone has tried it, I would appreciate a heads up.
Logged

VL7.0 xfce4 Samsung RF511
sledgehammer
Vectorian
****
Posts: 1424



« Reply #4 on: April 05, 2008, 02:57:41 pm »

Thanks bad_gui. 

Thanks a million!  I originally had trouble installing tesseract (didn't know make file had to be run as root).  Eventually I got on to abillionbillion.com and followed the directions there and it installed!  Then I had to learn to import pdfs into gimp as black and white, not color, and save as tiff (tesseract only renders tif files).  Then simple: run tesseract from the prompt and it OCR's perfectly!

Thanks, thanks, thanks, thanks! 

John
Logged

VL7.0 xfce4 Samsung RF511
never_stop_learning
Vectorite
***
Posts: 263


WWW
« Reply #5 on: April 05, 2008, 03:16:02 pm »

I happen to be visiting sledgehammer watching the UCLA - Memphis game and can attest to his exuberance..... We thought we were going to have to get the defibrillator out..... Wink Grin
Logged

Laptop: IBM X60s (Centrino/Duo, 2gb ram, 80gb hd) VL 6.0 Std
Netbook: HP Mini (Intel Atom 1ghz, 2gb ram, 16gb SSD + 8gb flash ) VL 6.0 Std
Desktop: Dell Dimension 5150 (P4 3ghz, 2gb ram, 80gb hd) VL 6.0 Std
Wife's Desktop: Gateway (P4 2ghz, 1gb ram, 80gb hd) VL 6.0 Std
789
Member
*
Posts: 26


« Reply #6 on: May 22, 2009, 09:16:13 am »

>>>>>Warning: this topic has not been posted in for at least 120 days.
>>>>>Unless you're sure you want to reply, please consider starting a new topic.
___________
Would there be, somewhere, a compiled version of this TesserAct available for download ?
Logged
sledgehammer
Vectorian
****
Posts: 1424



« Reply #7 on: May 22, 2009, 09:26:49 am »

http://code.google.com/p/tesseract-ocr/

Install from source.

John
Logged

VL7.0 xfce4 Samsung RF511
789
Member
*
Posts: 26


« Reply #8 on: May 22, 2009, 10:01:15 am »

Me no compile ... it is simpler to boot NT and use OmniPage
Is there a compiled version available for download ?
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!