It’s not just ctrl+f anymore — Lessons in Regex & Trados

Overview

For this project, I wanted to experiment with some uses for regex expressions in Trados. Specifically, I wanted to write some expressions that could be used to QA check an English to Japanese translation. For this purpose, I wrote two kinds of regex expression: ones that simply find common errors, and ones that both find and rectify common errors. I wrote four expressions total, tested them first using the website https://regex101.com/ and then took screenshots of them in use in Trados.

To test out the expressions I wrote a document with the following four lines. I worked with the assumption that these lines had been translated from US English.

はじめまして!

アーロンです。

01/15/1997に生まれました。

いつも言っていることわざは”you only live once”です。

If you’re familiar with Japanese, you will have noticed that the date and the quotes both look incorrect. It was my hope that my regex expressions could point these errors out and fix them. I also wanted to have a warning pop up for the English and also the level of politeness.

Regex Expressions

1. Dates

Dates in Japanese are commonly written differently than those in US English. For example, the date above, 01/15/1997, would be instead written as 1997年01月15日. With this in mind, I wrote the following regex expression that would find any dates written in the mm/dd/yyyy format and restructure them in Japanese.

Find: (\d+)/(\d+)/(\d+)

Replace: $3年$1月$2日

You can see it in action in Trados here:

After executing the “replace” command:

This worked as intented; however in the future I might want to write a more complex expression that accounts for other date formats, such as those utilizing – symbols, or those written dd/mm/yyyy.

2. Quotes

Quotes have weird rules in every language I’ve learned. Often, when I’m translating, I overlook them or my eyes just glaze past them. Knowing my own weakness here, I felt it apropriate to write regex expressions to find any overlooked quotation marks and replace them with the appropriate symbol in Japanese. I attempted to accomplish this with a single find & replace regex pair; nevertheless it proved more complicated than I had anticipated, so I simply used two pairs, one for each side of the quote.

Find: “(\S)

Replace:「$1

Find: (\S)”

Replace: $1」

Here are some pictures of them in action:

After executing the “replace” command:

After executing the “replace” command:

This all worked well; however, as mentioned, it might be nice to have a single command that checks and fixes both beginning and ending quotations.

3. English Check

With English crawling its way into all languages, especially in the realms of media and entertainment, it has become increasingly important for translators to know when to leave English as-is or translate it into the target language. Because these decisions often require a lot of thought, I figured it would be useful for Japanese translators to have a check that highlights any English found in the Trados translation and marks it with a warning. This way, the translator has to double-check any English to make sure they wanted to include it.

[a-zA-Z]

Here’s how it looks:

In the future, I might try and set up regexe expressions for specific kinds of English, though I must admit I am lacking in ideas right now. Let me know if you think of anything!

4. Politeness Checks

For those of you unfamiliar with Japanese, there are several different politeness levels of speech depending on the circumstances. The level of politeness changes many things about each sentence, but these changes are most prominently seen at the ends of sentences. Knowing this, I wrote a regex expression that would check the ends of sentences for certain politeness indicators and show a warning sign if it found any.

です。|でした。|ます。|ました。

Here is the check in Trados:

Because the endings this regex expression searches for are found primarily in polite speech, this kind of check would be useful when translating into literature, which uses different endings. Other expressions that search for other endings might be more useful for different tones of writing.

Reflection

This small project demonstrated the usefulness of regex to me. I only experimented with them on a personal scale; in other words, I thought about the issues I could run into while translating English to Japanese and wrote the regex expressions based on that. Nevertheless, I could see these expressions being used for QA on a company-wide basis. It could be incredibly useful if all linguists of a certain language came together to write the expressions or modify them on a somewhat regular basis in order to be consistent and have reliable QA within their systems.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top