Pages in topic:   < [1 2]
CAT for translating srt files.
Thread poster: Nasriddin Klichev

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
CafeTran Espresso 10 Croissant supports the SRT format May 27, 2019

Nasriddin Klichev wrote:

Hi all,

What CAT tools are best for translating SRT files? So far, I tried MATECAT, it did not help: it translated the caption groups separately from the context. And SMARTCAT, which was almost there merging several caption groups into one segment, BUT one has to insert the line break tags to the target manually, which creates much of routine work. To conclude, I need something like SMARTCAT but with tags autoinsertion function.


CafeTran Espresso 10 Croissant offers:

  • Support for subtitles (.itt and .srt file formats)
  • Video resource preview for translation of subtitles


Yaotl Altan
 

Jan Truper  Identity Verified
Germany
Local time: 10:41
Member (2016)
English to German
MT (OT) May 27, 2019

Juan Jacob wrote:

DeepL once just for fun on an old movie I translated years ago into English... well, aside she/he/it normal mistakes and some very local expressions, not too bad. Let's say 30% less work. But only 150 subtitles a time.


The usefulness of MT for subtitle translation will vary greatly depending on the target language.

German, for example,
- has a sort of reversed sentence structure when compared to English (i.e., MT is useless for sentences that are spread over several subtitles),
- has formal/informal addressing with grammatical consequences for the remainder of the sentence (i.e., not even taking other factors into account, MT will just do a crapshoot and screw up 50% of all sentences containing addressing)
- has sentences and words that are on average about 40% longer than English (i.e., MT is absolutely useless because the main challenge for German subtitle translators is staying within reading speed restrictions, and in order to manage this, most sentences need to be rephrased).

MT can be useful for short titles like "What?", "Thank you very much" or "I'm sorry", but generally its benefits are outweighed by its drawbacks: it takes time and brain power to gauge whether an MT suggestion needs to be overwritten whole or in parts, and it takes time, brain power and work to do so.

[Edited at 2019-05-27 06:36 GMT]


Yaotl Altan
 

Jan Truper  Identity Verified
Germany
Local time: 10:41
Member (2016)
English to German
... May 27, 2019

Some of the problems I mentioned above make CAT tools pretty much useless for subtitle translation, as well.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 10:41
Member (2006)
English to Afrikaans
+ ...
Re: Smartcat May 27, 2019

Nasriddin Klichev wrote:
Samuel Murray wrote:
I know of no CAT tool that can do this.

As I wrote earlier, Smartcat is perfect for this except one has to insert all the line break tags manually.


I did a test in Smartcat, thanks. To confirm (for the benefit of others here):

Smartcat does not change the time codes and does not check if the text length suits the amount of time that the text is on-screen. Smartcat assumes that the translation will use the exact same time codes as the original. This also means that if a sentence was split over five caption units in the source text, you need to split the target text into five units as well (or: split it into fewer units, but then Smartcat won't recalculate the time codes, but will instead show no subtitles during some of the time units). If a translation is too long for the number of available caption units, then the translator has to shorten the translation (you can't create additional caption units to fit all the text).

Smartcat does not show caption unit boundaries, so you may need to keep the source text open in a separate window while you translate (or fix resultant problems afterwards). For example, these three caption units:

1
00:04:48,080 --> 00:04:50,080
The rain in Spain. It falls mainly
on the plains. The quick brown fox

2
00:04:50,081 --> 00:04:52,081
jumps over the lazy dog. The rain
in Spain still falls mainly on the

3
00:04:52,082 --> 00:04:54,282
plains in summer. The quick brown
fox is sleeping soundly.


will be displayed in Smartcat as five segments (with no indication of caption unit boundary):

1. The rain in Spain.
2. It falls mainly{LF} on the plains.
3. The quick brown fox{LF} jumps over the lazy dog.
4. The rain{LF} in Spain still falls mainly on the{LF} plains in summer.
5. The quick brown{LF} fox is sleeping soundly.


The translator has to add line breaks manually. The shortcut for doing so is Ctrl+Q. I don't think of this as a disadvantage, because the translator should be in control of where the line break takes place. If you rely on the software to do it for you, the software will likely just use averages, which will not always be the easiest to read.

One can split segments in Smartcat (e.g. if the source text lacks proper end-of-segment punctuation): position the cursor where you want to split, and press Shift+Ctrl+S. It takes Smartcat about 2 seconds to split a segment, so it's better to edit the source text beforehand and make sure all sentences end on an end-of-segment punctuation mark (e.g. a fullstop of exclamation mark).

In Smartcat you have to confirm every segment by clicking on the check mark in order for that translation to go into the final file. If you don't have enough line break tags in a segment, you'll get a "critical" error flag.

You can export the file from Smartcat as XLIFF, in which the line breaks are simple tags, to place as you will with whatever tool you're editing the XLIFF file in (e.g. OmegaT). To import an XLIFF, you have to click the arrow next to the "upload" button and select "update". If a segment in the XLIFF file has to many or too few line break tags in it, no line break tags will be imported for that segment (and it'll get a red flag), and you'll have to add line breaks for it manually.

You can also export the file as bilingual DOCX, but you can't import that file again (it's just for checking).

[Edited at 2019-05-27 07:27 GMT]


 

Samuel Murray  Identity Verified
Netherlands
Local time: 10:41
Member (2006)
English to Afrikaans
+ ...
Re: CafeTran May 27, 2019

Hans Lenting wrote:
CafeTran Espresso 10 Croissant offers support for subtitles (.itt and .srt file formats).


Yes, but CafeTran treats caption unit boundaries as segment boundaries, which is exactly what the OP does not want. Also, CafeTran segments SRT files by caption unit, not by sentence.

For example, these three caption units:

1
00:04:48,080 --> 00:04:50,080
The rain in Spain. It falls mainly
on the plains. The quick brown fox

2
00:04:50,081 --> 00:04:52,081
jumps over the lazy dog. The rain
in Spain still falls mainly on the

3
00:04:52,082 --> 00:04:54,282
plains in summer. The quick brown
fox is sleeping soundly.


will be displayed in CafeTran as three segments:

The rain in Spain. It falls mainly
on the plains. The quick brown fox

jumps over the lazy dog. The rain
in Spain still falls mainly on the

plains in summer. The quick brown
fox is sleeping soundly.


Sure, you can merge and split segments in CafeTran (and it's near instantaneous to do so), but you have to do it manually on a case-by-case basis.

What is potentially nice about CafeTran is that it shows the time codes (and hence also the caption unit boundaries) (oddly, it shows the time code *below* the segment). However, when merging segments, CafeTran shows only the first caption unit's time code, even though the segment contains text from more than one caption unit. This isn't necessarily a big problem, but it may be something to be aware of.

In CafeTran, you also have to enter line breaks manually, but you can just press Enter to do so (no tags required). The caption unit boundary in merged segments is a tag, however. If you neglect to insert the unit boundary tag, then the final file will put all the text in the one caption unit, and the next caption unit will be empty.


[Edited at 2019-05-27 07:27 GMT]

[Edited at 2019-05-27 07:28 GMT]


 

Nasriddin Klichev
Uzbekistan
Local time: 13:41
English to Russian
+ ...
TOPIC STARTER
I think it's not the case with Russian. May 28, 2019

Jan Truper wrote:

The usefulness of MT for subtitle translation will vary greatly depending on the target language.

German, for example,
- has a sort of reversed sentence structure when compared to English (i.e., MT is useless for sentences that are spread over several subtitles),
- has formal/informal addressing with grammatical consequences for the remainder of the sentence (i.e., not even taking other factors into account, MT will just do a crapshoot and screw up 50% of all sentences containing addressing)
- has sentences and words that are on average about 40% longer than English (i.e., MT is absolutely useless because the main challenge for German subtitle translators is staying within reading speed restrictions, and in order to manage this, most sentences need to be rephrased).

MT can be useful for short titles like "What?", "Thank you very much" or "I'm sorry", but generally its benefits are outweighed by its drawbacks: it takes time and brain power to gauge whether an MT suggestion needs to be overwritten whole or in parts, and it takes time, brain power and work to do so.

[Edited at 2019-05-27 06:36 GMT]
I think it's not the case with Russian. Its sentence structure matches in most cases, though it is about 10% longer(but it is not a big deal, I think).

@Samuel Murray
Exactly, you got my point.
Moreover, you conducted a very thorough test. Thank you! I think many people will benefit.

But
Samuel Murray wrote:
The translator has to add line breaks manually. The shortcut for doing so is Ctrl+Q. I don't think of this as a disadvantage, because the translator should be in control of where the line break takes place. If you rely on the software to do it for you, the software will likely just use averages, which will not always be the easiest to read.
As I wrote above English/Russian sentence structures match frequently. I noticed this even more after inserting the line break tags manually as the tags locations matched. So, I think it should be pretty predictable for an MT engine to insert the tags.

By the way, I just contacted their support team on this issue, and they aswered this:
"Unfortunately, not all MT engines support tag auto-insertion. You may use Google Neural Machine Translation if you want this function applied for now. Be sure that we are working to implement auto-insertion further with other MT."
So I am going to use another MT.

[Edited at 2019-05-28 10:57 GMT]

[Edited at 2019-05-28 11:08 GMT]


 

Nasriddin Klichev
Uzbekistan
Local time: 13:41
English to Russian
+ ...
TOPIC STARTER
Solved. May 29, 2019

The problem has been solved. As the support team advised, I just needed to change the MT.

Thank you all for attention!


 

Claudio Porcellana  Identity Verified
Italy
Local time: 10:41
Member (2004)
English to Italian
+ ...
CAT for translating srt files Nov 26, 2020

Hi there

it's cool looking how memoQ developers were able to find a smart workaround in about one year...


don't believe it?
try the new memoQ video preview: it's a bomb IMHO

I don't do so much subtitles jobs, let's say one per year, but today I started one and thanks to the memoQ video preview I can check in real-time (the video is looping on my 2nd monitor) that the translated te
... See more
Hi there

it's cool looking how memoQ developers were able to find a smart workaround in about one year...


don't believe it?
try the new memoQ video preview: it's a bomb IMHO

I don't do so much subtitles jobs, let's say one per year, but today I started one and thanks to the memoQ video preview I can check in real-time (the video is looping on my 2nd monitor) that the translated text flows well in the available space/time

moreover, I can check that each line of text doesn't go on multiple lines, possibly overlapping the background picture or other elements, that is very important 

with this real-time feature, the text "max length" is useless now, and largely surpassed by the capability to choose properly the words so that the sentence is easy to read and the listener doesn't have to come back and replay

my 2 cents
Collapse


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

CAT for translating srt files.

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Termsøk
  • Jobber
  • Forumer
  • Multiple search