TMX Editor complains "File size too big"
Thread poster: Walter Blaser
Walter Blaser
Walter Blaser  Identity Verified
Switzerland
Local time: 12:57
French to German
+ ...
Feb 6, 2015

Hi

When opening a large TMX file, Heartsome TM Editor stops with the error "Other error: file size too big" >= 1GB.

Why do I need to do? I read that this editor can handle very large TMX files, so why do I get this error?

I have already increased the INI setting -"Xmx" to "-Xmx4096m"

I have tried to read your article on your blog about opening large files, but the website doesn't answer ( I always get a timeout).

Walter


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 11:57
Member (2009)
Dutch to English
+ ...
The Heartsome TMX Editor has a 1GB file size limit Feb 6, 2015

some_text

From a recent email conversation with Heartsome support (before they went out of business):

--------------------------
Hi Michael,

Thank you for your request.

By default, there are only 512MB RAM can be used in TMX Editor, which can support opening TMX files not larger than 200~300MB. Please try the following steps to increase the RAM allocation:
1. Open the installation folder of Heartsome TMX Editor.
2. Open the file "Heartsome TMX Editor Beta.ini" with text editor (e.g. EmEditor, Notepad++, etc., the default Notepad comes with Windows is not recommended).
3. Find the value "-Xmx512m" in line 8, change it to a larger value. For example: "-Xmx1536m", which means TMX Editor can use 1.5GB RAM at most. Please note that the value cannot be greater than the free RAM size of your computer.
4. Save the change and restart TMX Editor.

Please let us know if it works for you or not.

Kind regards,
Felix
--------------------------
Hi Felix,

I just tried to open a TMX of 1.5 GB, but the editor told me 1GB was the maximum. Is there any way to change this?

Michael
--------------------------
Hi Michael,

This limitation inherits from our underlying XML parser, VTD-XML. Please refer to the following page for more information:
http://vtd-xml.sourceforge.net/userGuide/0.html (Upper limits of various fields, 3)

So now the software can use 1.5 GB RAM as you configured, but the size of the TMX file you wanted to open exceeds the above limit. Please try splitting your TMX file into smaller ones (less than 1 GB) manually by the following steps:
Backup your TMX file.
Open the TMX file with text editor, EmEditor is recommended.
Split the file between < /tu> and (including < body> itself), and then paste it to the very beginning of all other split files.
Add < /body>< /tmx> to the last place of all split files except the last one.

Kind regards,
Felix
--------------------------
Thanks Felix.

Are there any plans to increase the maximum size of TMXs that the tool can open? 1 GB is quite small; Olifant, e.g., can easily open files several GB in size. One of the reasons I bought this tool is to edit very large files. For example, I would like to use it to remove duplictes from very large TMXs or collections of TMXs. A max size of only 1GB would mean I can't really do that anymore.

Michael
--------------------------
Hi Michael,

Sorry, there's no plan to change the XML parser currently. And it seems that the VTD-XML team knows this limitation already, so I'm not sure when will they improve it.

A workaround can be: import your TMX files into server-based Translation Memory (e.g. MySQL), and then edit the TM directly. Because (structured) database is far more efficient to handle huge data than single text-based TMX file.

Kind regards,
Felix
--------------------------
Hi Michael,

OK. There is a MySQL tutorial on our website: http://www.heartsome.net/EN/support.html

Kind regards,
Felix
--------------------------
PS: added a few extra spaces to make the HTML above display better
PPS: The built-in TMX editor in CafeTran does not have this limit.


 
Walter Blaser
Walter Blaser  Identity Verified
Switzerland
Local time: 12:57
French to German
+ ...
TOPIC STARTER
Thanks for this info Feb 7, 2015

Hi Michael

Thanks for this useful, although disappointing info.

Walter


 
Walter Blaser
Walter Blaser  Identity Verified
Switzerland
Local time: 12:57
French to German
+ ...
TOPIC STARTER
What about CafetTran? Feb 7, 2015

Hi Michael

It's me again. I just noticed that you are familiar with CafeTran. Can you confirm that CafeTran can open very large files? My TMX has about 2.2 million TUs, is about 2 GB in physical size.

If it does, I could use CafeTran for some of the TM cleanup tasks, such as spellchecking.

Walter


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 11:57
Member (2009)
Dutch to English
+ ...
TMX editing in CafeTran Feb 7, 2015

Hi Walter,

Yes, as long as you assign enough RAM to CafeTran (under Edit > Options > Memory: Java memory size (MB)), it can open (and edit) very large files. I can't remember, but I think I have opened and edited TMXs of up to around 2 GB in the past. In CT you can open a TMX and edit it just like any other file, so all the program's tools can be used on it (look under the Task menu):

some_text

A number of TMX tools are also available when opening a TMX, from the "Filter" button:

some_text

Once open in CT, you can also run all the QA routines on the TMX:
some_text

Igor recently added quite a few new TMX editing/cleaning routines to CafeTran, so not all of them will have been documented yet in the (volunteer-authored) help website (by Hans Lenting) @ http://cafetranhelp.com/

See also:

http://www.proz.com/forum/cat_tools_technical_help/254891-best_way_to_perform_qa_of_translation_memory_file_tmx.html
http://cafetran.wikidot.com/editing-a-tmx-translation-memory
http://cafetran.wikidot.com/tmx
http://cafetran.wikidot.com/the-tmx-memory-dialog
http://cafetran.wikidot.com/performing-qa-checks (once open in CT, you can also run all the QA routines on the TMX)
http://cafetran.wikidot.com/the-task-menu (once open in CT, you can also run all the items in the Task menu on the TMX)

If you have any questions people are always ready to help over at the CT Google Group: https://groups.google.com/forum/#!forum/cafetranslators

Michael

PS: Another tool that can open very big TMXs is of course Xbench (http://www.xbench.net/ ), which has spellchecking as far as I am aware.
PPS: Please excuse the state of this post. It was written in haste, as I am supposed to be working on this fine Saturday in Hastings

[Edited at 2015-02-07 13:08 GMT]


 
2nl (X)
2nl (X)  Identity Verified
Netherlands
Local time: 12:57
Collect all words with unknown spelling Feb 7, 2015

Open your TM in TMX edit mode via the Dashboard. Dump all unknown words in a list, e.g. to check it with Word's spell checker.

Tested with 2.4 GB.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 11:57
Member (2009)
Dutch to English
+ ...
Task > List words with unknown spelling Feb 7, 2015

2nl wrote:

Open your TM in TMX edit mode via the Dashboard. Dump all unknown words in a list, e.g. to check it with Word's spell checker.

Tested with 2.4 GB.

Yes, this is one of Igor's latest inventions, and can be found under: Task >List words with unknown spelling

Basically, start CafeTran, and select "Edit TMX memory" in the New project dropdown in the Dashboard:

some_text

Once the TMX is open in the Grid, click Task >List words with unknown spelling, which will open a tab in the Tabbed pane listing all words not known by the spellchecker (CT uses either Hunspell or OpenOffice/LibreOffice). This is pretty cool and I don't think it has been done yet in any other CAT tool.

More info here: http://cafetran.wikidot.com/listing-words-with-unknown-spelling


 
Emma Goldsmith
Emma Goldsmith  Identity Verified
Spain
Local time: 12:57
Member (2004)
Spanish to English
Xbench Feb 7, 2015

Michael Beijer wrote:

This is pretty cool and I don't think it has been done yet in any other CAT tool.


The Xbench spellcheck works like this. Default behaviour is to flag misspellings from the list, but you can reverse this and flag false alarms instead. Then, if you want, you can export the list to Word or elsewhere.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 11:57
Member (2009)
Dutch to English
+ ...
Tools > Abbreviations > Scan Project for abbreviations Feb 7, 2015

Emma Goldsmith wrote:

Michael Beijer wrote:

This is pretty cool and I don't think it has been done yet in any other CAT tool.


The Xbench spellcheck works like this. Default behaviour is to flag misspellings from the list, but you can reverse this and flag false alarms instead. Then, if you want, you can export the list to Word or elsewhere.


Indeed. However, Xbench isn't a CAT tool

Incidentally, the same trick can now also be applied to find all the unknown abbreviations in the text, which might otherwise trigger unwanted segmentation (splitting of segments). memoQ now have something similar.

some_text

See:

2015-01-21 New feature: Tools > Abbreviations > Scan Project for abbreviations. CT will list abbreviation candidates in the tabbed pane. Pressing button in list of candidates adds the relevant abbr. to the Abbreviations file (see 2015-01-20.1). Any one, two or three letter string + . is a candidate for the abbrs. list. See: http://cafetran.wikidot.com/abbreviations
2015-01-20.2 "Dynamic segmentation": CafeTran now auto-joins segments at any next occurrence of an abbreviation added via Tools > Abbreviations > Add selection to abbreviations (i.e., whole project is not re-segmented).

(src: http://cafetranhelp.com/changelog )


 
2nl (X)
2nl (X)  Identity Verified
Netherlands
Local time: 12:57
I've answered this in the CafeTran forum Feb 8, 2015

Hello Walter,

Please hop over to the CafeTran forum:

http://www.proz.com/forum/cafetran_support/281536-spell_checking_a_large_tmx_file.html

Cheers,

Hans


 
A Wei
A Wei
China
Local time: 19:57
heartsome terminology tools 8.0 help! Mar 18, 2015

Walter Blaser wrote:

Hi

When opening a large TMX file, Heartsome TM Editor stops with the error "Other error: file size too big" >= 1GB.

Why do I need to do? I read that this editor can handle very large TMX files, so why do I get this error?

I have already increased the INI setting -"Xmx" to "-Xmx4096m"

I have tried to read your article on your blog about opening large files, but the website doesn't answer ( I always get a timeout).

Walter

Can you have heartsome terminology tools 8.0 share with me, thank you!


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 11:57
Member (2009)
Dutch to English
+ ...
on GitHub Mar 19, 2015

A Wei wrote:

Walter Blaser wrote:

Hi

When opening a large TMX file, Heartsome TM Editor stops with the error "Other error: file size too big" >= 1GB.

Why do I need to do? I read that this editor can handle very large TMX files, so why do I get this error?

I have already increased the INI setting -"Xmx" to "-Xmx4096m"

I have tried to read your article on your blog about opening large files, but the website doesn't answer ( I always get a timeout).

Walter

Can you have heartsome terminology tools 8.0 share with me, thank you!


All the Heartsome tools are now open source and can be downloaded from GitHub:

https://github.com/heartsome


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


TMX Editor complains "File size too big"






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »