VectorLinux

Please login or register.

Login with username, password and session length
Advanced search  

News:

Visit our home page for VL info. For support and documentation, visit the Vector Linux Knowledge Center or search the Knowledge Center and this Forum using the search box above.

Author Topic: pdftotext  (Read 3218 times)

InTheWoods

  • Vectorite
  • ***
  • Posts: 302
pdftotext
« on: November 16, 2009, 05:25:35 am »

I have a PDF file originaly created in AbiWord that I would like to convert back to text for further editing.

I have tried "pdftotext", "pdftohtml", and opening the file in Acroread then saving as text.

All of these result in a new file that is comprised of nothing but symbols.

Is there any way to edit or convert this PDF file to an editable format?
Logged

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3232
Re: pdftotext
« Reply #1 on: November 16, 2009, 06:06:05 am »

Have you tried pdfedit from the repos?

Daniel

  • Packager
  • Vectorian
  • ****
  • Posts: 704
    • TuxToys - Packages for VectorLinux 6.0
Re: pdftotext
« Reply #2 on: November 16, 2009, 08:53:53 pm »

I have tried "pdftotext", "pdftohtml", and opening the file in Acroread then saving as text.

Did you use the commands: pdf2txt or pdf2html ? (I think those commands exist.) Or did you use: pdftotext and pdftohtml ?
Logged
The following sentence is true. The previous sentence is false.

VL 6.0 SOHO KDE-Classic on 2.3 Ghz Dual-core AMD with 3 Gigs of RAM

Hamzah

  • Member
  • *
  • Posts: 20
  • Wanna be hacker
Re: pdftotext
« Reply #3 on: January 03, 2010, 06:45:15 pm »

I just tried using the command "pdftotext". And it worked.
Type pdftotext -h to get information
Logged
1001 0101 0010 0110 0011 0000 1010 0111 1101 0100 1001 1011 1000 0100 1111 1100

sledgehammer

  • Vectorian
  • ****
  • Posts: 1464
Re: pdftotext
« Reply #4 on: January 03, 2010, 08:06:34 pm »

pdftotext should work on the pdf which was created by abiword.  I call them electronic pdf files, but I am sure there is a better name.  pdftotext will also work on other pdf files which were originally created by saving a file with a word processor.  However, it won't work on pdf files which were scanned. Perhaps the word is flattened.  You have to first ocr the scanned pdf file with tesseract and then edit it with a word processor.

John

Logged
VL7.0 xfce4 Samsung RF511

sledgehammer

  • Vectorian
  • ****
  • Posts: 1464
Re: pdftotext
« Reply #5 on: February 09, 2012, 11:23:20 pm »

InTheWoods,

zmcmay's post reminded me that for the past several months I have been using google docs to convert pdfs to text.  Just upload the pdf to google docs (first check the "convert pdf" box. 
Logged
VL7.0 xfce4 Samsung RF511