Extraction of text from Web site
Thread poster: luka
luka
luka
Spain
Local time: 15:04
English to Spanish
+ ...
Jan 16, 2007

I have been asked to translate a big web site for a company. They do not have the source text for the site. I am looking for a tool which is capable of looking through all the pages, stripping out the code, leaving me with only the text to translate. Or any other suggestions on ways to do this. Thanks.

 
Rod Darby (X)
Rod Darby (X)  Identity Verified
Ghana
Local time: 13:04
German to English
+ ...
possible solution Jan 16, 2007

luka,
there's a shareware called Trellian which I believe will download the code of a site for you - I haven't tried it, but you might have a look.
Rod


[Edited at 2007-01-16 11:26]


 
Jerónimo Fernández
Jerónimo Fernández  Identity Verified
English to Spanish
+ ...
WinHTTrack Jan 16, 2007

Hola.

I use WinHTTrack (http://www.httrack.com). It's free and it works wonders. It mirrors in your hard drive the website that you want to work with.

Good luck,
Jerónimo


 
Marc P (X)
Marc P (X)  Identity Verified
Local time: 15:04
German to English
+ ...
Extraction of text from Web site Jan 16, 2007

Why strip out the code from the pages? The customer will then have the job of putting it all back in again.

Tools are available with which you can download entire web sites, retaining the directory structure. wget is an example: www.gnu.org/software/wget

Once you have downloaded the site, you can translate the pages in a CAT tool which is capable of handling HTML. OmegaT, for ex
... See more
Why strip out the code from the pages? The customer will then have the job of putting it all back in again.

Tools are available with which you can download entire web sites, retaining the directory structure. wget is an example: www.gnu.org/software/wget

Once you have downloaded the site, you can translate the pages in a CAT tool which is capable of handling HTML. OmegaT, for example, will present you with the text for translation whilst keeping the entire web site structure - directories, images, the works - intact.

It is possible, however, that your customer's web pages are created dynamically with data from a database. In this case, you will probably have to get the customer to deliver the data to you.

Marc
Collapse


 
franksf
franksf
Chinese to English
try webstripper to download the site you need Jan 17, 2007

http://webstripper.net/reghelp.html

 
Michael Bastin
Michael Bastin  Identity Verified
Spain
Local time: 15:04
English to French
+ ...
big website Jan 19, 2007

If the site is that big, chances are it is database-driven. Using a software to donwload pages may result in your waiting ages to complete the download.

In any case, you should only use the page for quoting purposes. The customer should send you the page they would like to have translated, or an export of the database if the content is generated that way.

My 2 cents


 
luka
luka
Spain
Local time: 15:04
English to Spanish
+ ...
TOPIC STARTER
Thank you very much Jan 19, 2007

I want to thank all of you for your help.
Eventually I have given up because the site is huge and I have told the client I can't find out the number of words and they should try to find the source files.

Have a great weekend


 
Talent Success
Talent Success
United States
Local time: 08:04
Member (2006)
English to Spanish
+ ...
extract web text and get summary? Jan 23, 2007

hi all:

i just posted this question yesterday, and someone kindly pointed me to this discussion. i'm in the process of trying out the software recommended, so thank you. however, has anyone found a program that can then generate a similar report to the following: http://www.apex-translations.com/en/cost_estimate/website_summary.html?

tha
... See more
hi all:

i just posted this question yesterday, and someone kindly pointed me to this discussion. i'm in the process of trying out the software recommended, so thank you. however, has anyone found a program that can then generate a similar report to the following: http://www.apex-translations.com/en/cost_estimate/website_summary.html?

thank you.
Collapse


 
Paul Betts
Paul Betts  Identity Verified
Local time: 15:04
French to English
Prior declarations or content management set-ups... Apr 20, 2007

I have found that if a client is considering a relatively small static-html web site translation, they often prefer a complete price for the site (including graphical elements).

My work has the attched conditions that all web page URLs requiring translation, are declared in advance - with the pages I have already seen listed by me in the quote.

On much larger dynamic page (data-based) sites, the clients often have domestic-language web programmers/developers presen
... See more
I have found that if a client is considering a relatively small static-html web site translation, they often prefer a complete price for the site (including graphical elements).

My work has the attched conditions that all web page URLs requiring translation, are declared in advance - with the pages I have already seen listed by me in the quote.

On much larger dynamic page (data-based) sites, the clients often have domestic-language web programmers/developers present in their team. Well this is the ideal anyway.

If this is the case, I find it makes sense to ask that their developer adds a column to their database in the language that I offer them (as they will have to eventually) + create a simple content administration page which I can access with a password. This way I can see the text to be translated and below have a field blank to input, save and revise the equivalent new-language version.

When all is done, it takes no time for the developer to change the content reference variable in their web page template to the new language variable. It's all rather simple really

[Edited at 2007-04-20 14:59]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extraction of text from Web site






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »