Software that lets you join TUs at a given character?
Thread poster: Hans Lenting

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
Feb 14

Is there a tool (TMX editor, CAT tool) that lets you join segments at a given character (e.g. $), so that I can fix a TMX that has been split incorrectly at abbreviations?

This is an ex$
of a feature.

Dies ist ein Beispiel
einer Funktion.

Becomes:

This is an ex$ of a feature.

Dies ist ein Beispiel einer Funktion.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:38
Member (2006)
English to Afrikaans
+ ...
@Hans Feb 14

Hans Lenting wrote:
This is an ex$
of a feature.


Just how certain are you that all instances that end on $ must be joined with the next segment? And, how sure are you that the TM hasn't been sorted (e.g. alphabetically) in the mean time? And... what OS are you on?


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Flexible Feb 14

Samuel Murray wrote:

Hans Lenting wrote:
This is an ex$
of a feature.


Just how certain are you that all instances that end on $ must be joined with the next segment? And, how sure are you that the TM hasn't been sorted (e.g. alphabetically) in the mean time? And... what OS are you on?


I replace the full stop after abbreviations with a dollar sign. So I specify exactly where the joining has to take place.

No sorting.

Mac and Win

I feel an AutoIt macro coming...


 

Dan Lucas  Identity Verified
United Kingdom
Local time: 23:38
Member (2014)
Japanese to English
Search and replace Feb 14

Hans Lenting wrote:
I feel an AutoIt macro coming...

If you're dealing with XML files (basically) wouldn't some kind of grep utility be both quicker and more reliable? I guess the issue is ensuring that you only replace text in segments, so you'd need to be quite confident that you understood the structure of the file... Perhaps an Xpath editor would be useful.

Dan


 

Stepan Konev  Identity Verified
Russian Federation
Local time: 01:38
English to Russian
With Heartsome TMX Editor Feb 14

you can convert your tmx file to MS Word as a simple table, then join cells where necessary, and convert the edited file back to tmx.
Also I have an AHK script to find $ and merge relevant source and target cells. However you mentioned that you use AutoIt.

[Edited at 2021-02-15 00:43 GMT]


Grigori Gazarian
Hans Lenting
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Via Excel Feb 15

Here is how I am planning to solve this task:

  • At the source side, I append a unique sign (e.g. ¶) to all abbreviations that caused incorrect segmentation.
  • Then I export the project to a table for external review.
  • I paste the content of the table to a spreadsheet in Excel.
  • I insert a simple formula in every cell of the B and E column, causing insertion of the second parts of the incorrectly truncated segments.
  • I copy everything to... See more
Here is how I am planning to solve this task:

  • At the source side, I append a unique sign (e.g. ¶) to all abbreviations that caused incorrect segmentation.
  • Then I export the project to a table for external review.
  • I paste the content of the table to a spreadsheet in Excel.
  • I insert a simple formula in every cell of the B and E column, causing insertion of the second parts of the incorrectly truncated segments.
  • I copy everything to a text editor and filter on all lines containing a ¶.
  • I make the necessary replacements to get the correctly segmented, 2-column version.


1.
1

2.
2

3.
3
Collapse


 

Dan Lucas  Identity Verified
United Kingdom
Local time: 23:38
Member (2014)
Japanese to English
Sounds like a plan Feb 15

Hans Lenting wrote:
Here is how I am planning to solve this task:
1.
1

Looks sort of reasonable. Won't that leave you with "extra" rows (like row 2 in the above) to deal with? I guess you just filter in Excel to show rows equal to "0" in column B, and delete them.

Also, you want something absolutely unique for your symbol/s - maybe something like "^^^" - to be sure. I doubt you'd find a pilcrow symbol in the text (as opposed to an actual new line) but I can't help feeling you'd be tempting fate by using something that could possibly be misinterpreted!

Regards,
Dan


Hans Lenting
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
BBEdit Feb 15

Dan Lucas wrote:

Hans Lenting wrote:
Here is how I am planning to solve this task:
1.
1

Looks sort of reasonable. Won't that leave you with "extra" rows (like row 2 in the above) to deal with? I guess you just filter in Excel to show rows equal to "0" in column B, and delete them.


I'll copy everything to BBEdit and I'll use the feature Process lines containing ... there.

Screenshot 2021-02-15 at 08.52.27


Dan Lucas
 

Stepan Konev  Identity Verified
Russian Federation
Local time: 01:38
English to Russian
Heartsome again Feb 15

I forgot to mention that Heartsome can also convert tmx files to Excel spreadsheets and backwards.
Just in case...

[Edited at 2021-02-15 11:47 GMT]


Hans Lenting
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Another approach in Excel Feb 15

With these two formulas, you can merge cells that contain a pilcrow:

Source segment:

=IF(ISNUMBER(SEARCH("¶";A1));(LEFT(A1;LEN(A1)-1))&" "&A2;"")

Target segment:

=IF(ISNUMBER(SEARCH("¶";A1));B1&" "&B2;"")

Screenshot 2021-02-15 at 17.44.35

Note that in order to make this work, you'll have to replace TAB characters inside segments with placeholders (e.g. ¬). (You can put the TAB characters back afterwards.)

Propagating the formulas downwards in Excel:

If the formula is in the first cell of a column:

  • Select the entire column by clicking the column header or selecting any cell in the column and pressing CTRL+SPACE.
  • Fill down by pressing CTRL+D.



[Edited at 2021-02-16 13:05 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Software that lets you join TUs at a given character?

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
SDL Trados Business Manager Lite
Create customer quotes and invoices from within SDL Trados Studio

SDL Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »



Forums
  • All of ProZ.com
  • Termsøk
  • Jobber
  • Forumer
  • Multiple search