TMX Editor complains "File size too big" Thread poster: Walter Blaser
| Walter Blaser Switzerland Local time: 12:57 French to German + ...
Hi When opening a large TMX file, Heartsome TM Editor stops with the error "Other error: file size too big" >= 1GB. Why do I need to do? I read that this editor can handle very large TMX files, so why do I get this error? I have already increased the INI setting -"Xmx" to "-Xmx4096m" I have tried to read your article on your blog about opening large files, but the website doesn't answer ( I always get a timeout). Walter | | | Michael Beijer United Kingdom Local time: 11:57 Member (2009) Dutch to English + ... The Heartsome TMX Editor has a 1GB file size limit | Feb 6, 2015 |
From a recent email conversation with Heartsome support (before they went out of business): -------------------------- Hi Michael, Thank you for your request. By default, there are only 512MB RAM can be used in TMX Editor, which can support opening TMX files not larger than 200~300MB. Please try the following steps to increase the RAM allocation: 1. Open the installation folder of Heartsome TMX Editor. 2. Open the file "Heartsome TMX Editor Beta.ini" with text editor (e.g. EmEditor, Notepad++, etc., the default Notepad comes with Windows is not recommended). 3. Find the value "-Xmx512m" in line 8, change it to a larger value. For example: "-Xmx1536m", which means TMX Editor can use 1.5GB RAM at most. Please note that the value cannot be greater than the free RAM size of your computer. 4. Save the change and restart TMX Editor. Please let us know if it works for you or not. Kind regards, Felix -------------------------- Hi Felix, I just tried to open a TMX of 1.5 GB, but the editor told me 1GB was the maximum. Is there any way to change this? Michael -------------------------- Hi Michael, This limitation inherits from our underlying XML parser, VTD-XML. Please refer to the following page for more information: http://vtd-xml.sourceforge.net/userGuide/0.html (Upper limits of various fields, 3) So now the software can use 1.5 GB RAM as you configured, but the size of the TMX file you wanted to open exceeds the above limit. Please try splitting your TMX file into smaller ones (less than 1 GB) manually by the following steps: Backup your TMX file. Open the TMX file with text editor, EmEditor is recommended. Split the file between < /tu> and (including < body> itself), and then paste it to the very beginning of all other split files. Add < /body>< /tmx> to the last place of all split files except the last one. Kind regards, Felix -------------------------- Thanks Felix. Are there any plans to increase the maximum size of TMXs that the tool can open? 1 GB is quite small; Olifant, e.g., can easily open files several GB in size. One of the reasons I bought this tool is to edit very large files. For example, I would like to use it to remove duplictes from very large TMXs or collections of TMXs. A max size of only 1GB would mean I can't really do that anymore. Michael -------------------------- Hi Michael, Sorry, there's no plan to change the XML parser currently. And it seems that the VTD-XML team knows this limitation already, so I'm not sure when will they improve it. A workaround can be: import your TMX files into server-based Translation Memory (e.g. MySQL), and then edit the TM directly. Because (structured) database is far more efficient to handle huge data than single text-based TMX file. Kind regards, Felix -------------------------- Hi Michael, OK. There is a MySQL tutorial on our website: http://www.heartsome.net/EN/support.html Kind regards, Felix -------------------------- PS: added a few extra spaces to make the HTML above display better PPS: The built-in TMX editor in CafeTran does not have this limit. | | | Walter Blaser Switzerland Local time: 12:57 French to German + ... TOPIC STARTER Thanks for this info | Feb 7, 2015 |
Hi Michael Thanks for this useful, although disappointing info. Walter | | | Walter Blaser Switzerland Local time: 12:57 French to German + ... TOPIC STARTER What about CafetTran? | Feb 7, 2015 |
Hi Michael It's me again. I just noticed that you are familiar with CafeTran. Can you confirm that CafeTran can open very large files? My TMX has about 2.2 million TUs, is about 2 GB in physical size. If it does, I could use CafeTran for some of the TM cleanup tasks, such as spellchecking. Walter | |
|
|
Michael Beijer United Kingdom Local time: 11:57 Member (2009) Dutch to English + ... TMX editing in CafeTran | Feb 7, 2015 |
Hi Walter, Yes, as long as you assign enough RAM to CafeTran (under Edit > Options > Memory: Java memory size (MB)), it can open (and edit) very large files. I can't remember, but I think I have opened and edited TMXs of up to around 2 GB in the past. In CT you can open a TMX and edit it just like any other file, so all the program's tools can be used on it (look under the Task menu): A number of TMX tools are also available when opening a TMX, from the "Filter" button: Once open in CT, you can also run all the QA routines on the TMX: Igor recently added quite a few new TMX editing/cleaning routines to CafeTran, so not all of them will have been documented yet in the (volunteer-authored) help website (by Hans Lenting) @ http://cafetranhelp.com/ See also: http://www.proz.com/forum/cat_tools_technical_help/254891-best_way_to_perform_qa_of_translation_memory_file_tmx.html http://cafetran.wikidot.com/editing-a-tmx-translation-memory http://cafetran.wikidot.com/tmx http://cafetran.wikidot.com/the-tmx-memory-dialog http://cafetran.wikidot.com/performing-qa-checks (once open in CT, you can also run all the QA routines on the TMX) http://cafetran.wikidot.com/the-task-menu (once open in CT, you can also run all the items in the Task menu on the TMX) If you have any questions people are always ready to help over at the CT Google Group: https://groups.google.com/forum/#!forum/cafetranslators Michael PS: Another tool that can open very big TMXs is of course Xbench (http://www.xbench.net/ ), which has spellchecking as far as I am aware. PPS: Please excuse the state of this post. It was written in haste, as I am supposed to be working on this fine Saturday in Hastings
[Edited at 2015-02-07 13:08 GMT] | | | 2nl (X) Netherlands Local time: 12:57 Collect all words with unknown spelling | Feb 7, 2015 |
Open your TM in TMX edit mode via the Dashboard. Dump all unknown words in a list, e.g. to check it with Word's spell checker. Tested with 2.4 GB. | | | Michael Beijer United Kingdom Local time: 11:57 Member (2009) Dutch to English + ... Task > List words with unknown spelling | Feb 7, 2015 |
2nl wrote: Open your TM in TMX edit mode via the Dashboard. Dump all unknown words in a list, e.g. to check it with Word's spell checker. Tested with 2.4 GB. Yes, this is one of Igor's latest inventions, and can be found under: Task >List words with unknown spelling Basically, start CafeTran, and select "Edit TMX memory" in the New project dropdown in the Dashboard: Once the TMX is open in the Grid, click Task >List words with unknown spelling, which will open a tab in the Tabbed pane listing all words not known by the spellchecker (CT uses either Hunspell or OpenOffice/LibreOffice). This is pretty cool and I don't think it has been done yet in any other CAT tool. More info here: http://cafetran.wikidot.com/listing-words-with-unknown-spelling | | | Emma Goldsmith Spain Local time: 12:57 Member (2004) Spanish to English
Michael Beijer wrote: This is pretty cool and I don't think it has been done yet in any other CAT tool. The Xbench spellcheck works like this. Default behaviour is to flag misspellings from the list, but you can reverse this and flag false alarms instead. Then, if you want, you can export the list to Word or elsewhere. | |
|
|
Michael Beijer United Kingdom Local time: 11:57 Member (2009) Dutch to English + ... Tools > Abbreviations > Scan Project for abbreviations | Feb 7, 2015 |
Emma Goldsmith wrote: Michael Beijer wrote: This is pretty cool and I don't think it has been done yet in any other CAT tool. The Xbench spellcheck works like this. Default behaviour is to flag misspellings from the list, but you can reverse this and flag false alarms instead. Then, if you want, you can export the list to Word or elsewhere. Indeed. However, Xbench isn't a CAT tool Incidentally, the same trick can now also be applied to find all the unknown abbreviations in the text, which might otherwise trigger unwanted segmentation (splitting of segments). memoQ now have something similar. See: 2015-01-21 New feature: Tools > Abbreviations > Scan Project for abbreviations. CT will list abbreviation candidates in the tabbed pane. Pressing button in list of candidates adds the relevant abbr. to the Abbreviations file (see 2015-01-20.1). Any one, two or three letter string + . is a candidate for the abbrs. list. See: http://cafetran.wikidot.com/abbreviations 2015-01-20.2 "Dynamic segmentation": CafeTran now auto-joins segments at any next occurrence of an abbreviation added via Tools > Abbreviations > Add selection to abbreviations (i.e., whole project is not re-segmented). (src: http://cafetranhelp.com/changelog ) | | | 2nl (X) Netherlands Local time: 12:57 | A Wei China Local time: 19:57 heartsome terminology tools 8.0 help! | Mar 18, 2015 |
Walter Blaser wrote: Hi When opening a large TMX file, Heartsome TM Editor stops with the error "Other error: file size too big" >= 1GB. Why do I need to do? I read that this editor can handle very large TMX files, so why do I get this error? I have already increased the INI setting -"Xmx" to "-Xmx4096m" I have tried to read your article on your blog about opening large files, but the website doesn't answer ( I always get a timeout). Walter Can you have heartsome terminology tools 8.0 share with me, thank you! | | | Michael Beijer United Kingdom Local time: 11:57 Member (2009) Dutch to English + ...
A Wei wrote: Walter Blaser wrote: Hi When opening a large TMX file, Heartsome TM Editor stops with the error "Other error: file size too big" >= 1GB. Why do I need to do? I read that this editor can handle very large TMX files, so why do I get this error? I have already increased the INI setting -"Xmx" to "-Xmx4096m" I have tried to read your article on your blog about opening large files, but the website doesn't answer ( I always get a timeout). Walter Can you have heartsome terminology tools 8.0 share with me, thank you! All the Heartsome tools are now open source and can be downloaded from GitHub: https://github.com/heartsome | | | There is no moderator assigned specifically to this forum. To report site rules violations or get help, please contact site staff » TMX Editor complains "File size too big" Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |