Conversion of a large terminological db from tbx to multiterm db Thread poster: Rossano Rossi
|
My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]). I am trying to convert a tbx file with 5268366 entries to Multiterm format. However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException". I have extracted 10000 entries from the db (and reconstructed a tbx-compliant file) and the process was completed without errors (conversion to .mtf... See more My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]). I am trying to convert a tbx file with 5268366 entries to Multiterm format. However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException". I have extracted 10000 entries from the db (and reconstructed a tbx-compliant file) and the process was completed without errors (conversion to .mtf.xml, generation of .xdt and finally import into a Multiterm termbase). Are there limits on the size (number of entries) of a tbx file that Multiterm Convert can convert? Also, are there limits on number of entries a Multiterm sqlite db can contain? TIA, Rossano
[Edited at 2019-03-15 08:13 GMT] ▲ Collapse | | | DZiW (X) Ukraine English to Russian + ... SQLITE_MAX_LENGTH = 1'000'000'000 | Mar 14, 2019 |
Hello Rossano--It's not as much about hard limits, as database abstraction layer and implementation. Practically, it's very limited by the hardware and architecture, working without performance issues somewhere between 50-300GB in a single file. As a rule of thumb, it should be less than 60% of the storage partition. How big is your file and could you make sure it's not corrupted? If you're DBA or techy, just check the mem usage and system log to see what else could trigger t... See more Hello Rossano--It's not as much about hard limits, as database abstraction layer and implementation. Practically, it's very limited by the hardware and architecture, working without performance issues somewhere between 50-300GB in a single file. As a rule of thumb, it should be less than 60% of the storage partition. How big is your file and could you make sure it's not corrupted? If you're DBA or techy, just check the mem usage and system log to see what else could trigger the err. ▲ Collapse | | | Michael Beijer United Kingdom Local time: 19:49 Member (2009) Dutch to English + ...
Rossano Rossi wrote: My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]). I am trying to convert a tmx file with 5268366 entries to Multiterm format. However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException". I have extracted 10000 entries from the db (and reconstructed a tmx-compliant file) and the process was completed without errors (conversion to .mtf.xml, generation of .xdt and finally import into a Multiterm termbase). Are there limits on the size (number of entries) of a tmx file that Multiterm Convert can convert? Also, are there limits on number of entries a Multiterm sqlite db can contain? TIA, Rossano What's the structure of the data in the TMX? How big is the TMX? How about splitting the big TMX into smaller chunks and trying them? If there is no metadata, or you don't care if it gets mangled, Xbench will import the TMX, and you can export all the entries into e.g. .xlsx. Then open that and divide it into smaller pieces and try importing these. Michael
[Edited at 2019-03-14 21:18 GMT] | | | DZiW (X) Ukraine English to Russian + ...
Michael, Excel is not a real database, so XLS (97-2003) is limited 65'536 rows by 256 columns whereas XLSX (2007+) can afford 1'048'576 rows by 16'384 columns up to 32'767 characters each. I made a quick search and they recommend SQL file to be under 5-25GB for Windows x64 and up to 200GB for *nix. | |
|
|
Rossano Rossi Local time: 20:49 English to Italian + ... TOPIC STARTER Format of db | Mar 15, 2019 |
I correct my post above tmx should be tbx. I have edited my post to avoid misleading other readers. This is the header and one entry of my cleaned-up tbx file: --------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd"> <martif type="TBX-Default" xml:lang="en"> <martifH... See more I correct my post above tmx should be tbx. I have edited my post to avoid misleading other readers. This is the header and one entry of my cleaned-up tbx file: --------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd"> <martif type="TBX-Default" xml:lang="en"> <martifHeader> <fileDesc> <sourceDesc> <p>This is a TBX file downloaded from the IATE website. Address any enquiries to [email protected].</p> </sourceDesc> </fileDesc> <encodingDesc> <p type="XCSURI">TBXXCS.xcs</p> </encodingDesc> </martifHeader> <text> <body> <termEntry id="IATE-84"> <langSet xml:lang="de"> <tig> <term>Zuständigkeit der Mitgliedstaaten</term> </tig> </langSet> <langSet xml:lang="en"> <tig> <term>competence of the Member States</term> </tig> </langSet> </termEntry> -------------------------------------------------------------------------------- There are 6701542 lines & 488245 entries (termEntry). The file is 168 MB. SDL MultiTerm 2015 Convert cannot convert it ("SystemOutOfMemoryException") to .mtf.xml. However, it can convert a subset with 10000 entries.
[Edited at 2019-03-15 08:14 GMT]
[Edited at 2019-03-15 08:15 GMT] ▲ Collapse | | | DZiW (X) Ukraine English to Russian + ... https://gateway.sdl.com/CommunityHome | Mar 15, 2019 |
While SDL generally recommends that you use the latest version to mitigate glitches and memory leak issues, they say it might be caused by the occurrence of ampersands, even escaped &ersands, in some fields. ... See more While SDL generally recommends that you use the latest version to mitigate glitches and memory leak issues, they say it might be caused by the occurrence of ampersands, even escaped &ersands, in some fields. Little wonder even machines with 32+GB RAM no guarantee. ▲ Collapse | | | Rossano Rossi Local time: 20:49 English to Italian + ... TOPIC STARTER TBX to multiterm DB | Mar 15, 2019 |
I have upped the ante to a subset of 100,000 entries. Multiterm converter is able to deal with such a chunk and Multiterm is able to import it. This seems a reasonable solution for an overall set of 488245 entries (i.e. five chunks of 100,000 entries). Thanks for your support. Rossano
[Edited at 2019-03-15 15:55 GMT] | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Conversion of a large terminological db from tbx to multiterm db Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |