scanned PDF files converted to Word files
Thread poster: Emilia Delibasheva
Emilia Delibasheva
Emilia Delibasheva  Identity Verified
Local time: 02:31
Member (2005)
English to Bulgarian
+ ...
Feb 2, 2013

Hello,

I have a large volume of PDF files and I have to edit them. I did some research work on the Internet and realized that there was software converting PDF files to Word docs. However, mine are not true PDF files but they are scanned. Is it possible at all to perform such kind of conversion? Thank you.


 
Walter Landesman
Walter Landesman  Identity Verified
Uruguay
Local time: 21:31
English to Spanish
+ ...
Nop Feb 2, 2013

No, I don't think so.

 
Natalie
Natalie  Identity Verified
Poland
Local time: 01:31
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
Of course, it is possible Feb 2, 2013

Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.


 
Triston Goodwin
Triston Goodwin  Identity Verified
United States
Local time: 18:31
Spanish to English
+ ...
You sure can! Feb 2, 2013

Natalie wrote:

Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.



Here's a link to the finereader, since it can be a little tricky to find sometimes http://www.abbyy.com/

I haven't had that much luck with these programs, since they don't catch accent marks, so I typically just translate directly or use dragon and read it over first.

If the image is clean and the program set up right, you shouldn't have to much trouble though.


 
Michel de Ruyter
Michel de Ruyter  Identity Verified
Finland
Local time: 02:31
Member (2011)
English to Dutch
+ ...
here for example: Feb 2, 2013

http://www.proz.com/forum/wordfast_support/195890-wordfast_anywhere_announces_support_for_scanned_pdfs.html

 
Emilia Delibasheva
Emilia Delibasheva  Identity Verified
Local time: 02:31
Member (2005)
English to Bulgarian
+ ...
TOPIC STARTER
Thanks Feb 2, 2013

Thank you all very much!

 
finnword1
finnword1
United States
Local time: 20:31
English to Finnish
+ ...
OCR Feb 2, 2013

I use a separate OCR program. I can then make necessary adjustments, depending on the quality of the scanner image.

 
Angelique Blommaert
Angelique Blommaert  Identity Verified
Netherlands
Local time: 01:31
Member (2012)
German to Dutch
+ ...
Works for me Feb 2, 2013

FineReader is what I use.

 
Emilia Delibasheva
Emilia Delibasheva  Identity Verified
Local time: 02:31
Member (2005)
English to Bulgarian
+ ...
TOPIC STARTER
Thanks Feb 3, 2013

Thank you all.

 
Emma Goldsmith
Emma Goldsmith  Identity Verified
Spain
Local time: 01:31
Member (2004)
Spanish to English
Quality of scanned pdf Feb 3, 2013

Triston Goodwin wrote:

I haven't had that much luck with these programs, since they don't catch accent marks


If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.

Of course, much depends on the quality of the scanned PDF. If you have a lot of background noise (a vertical line crossing through all pages, stamps placed on top of text, etc.) then no program will be able to decipher what the text says. But real people might not be able to in that case, either!


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 01:31
English to Hungarian
+ ...
They work Feb 3, 2013

Emma Goldsmith wrote:

Triston Goodwin wrote:

I haven't had that much luck with these programs, since they don't catch accent marks


If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.


Backed. I used to think that OCR was pretty much unusable, esp. with languages with accented characters. This might have been the case a decade ago, but it is definitely not any more. They use very smart algorithms to determine what each character might logically be and do a somewhat decent job of formatting. As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand. Whey you look it up in the source text you're likely to find that the image quality was abysmal at that spot. That said, for translation, it's generally better to use a setting that does not conserve much of the formatting and format the output text at the end. Otherwise, you end up with text boxes all over the place and mis-recognized headers and so on.
ABBYY Finereader recognizes Hungarian text pretty much perfectly, even if the image quality leaves a lot to be desired. I'm impressed.

[Edited at 2013-02-03 09:57 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 01:31
English to German
Catch false/missing accents etc. automatically Feb 3, 2013

FarkasAndras wrote:

As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand.


Just get a good spellchecker for that language to run over the ocr'ed text.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

scanned PDF files converted to Word files






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »