Monday, June 23, 2014

from: John R.
date: Sat, Jun 21, 2014 at 5:03 PM
subject: Tattoo Submission

Been trying to get my brother to submit this for awhile and he finally gave me the go ahead. It is supposed to say "pure man" but one time someone told him it meant more properly something like 10/9ths Rice Man...and that maybe that was a colloquial expression for purity. Sounds fishy! Can you help us out please?
Thank you!

粋男 = "chic man"?

There is a Japanese variety show called 粋男流儀, hosted by Peter Frankl and 矢吹春奈 (Yabuki Haruna).

Sunday, June 15, 2014

Shameless Self Promotion:

This humble little blog was mentioned in The Economist:

Computer-aided translation
Johnson: Rise of the machine translatorsJun 4th 2014, 16:45 by R.L.G. | DUBLIN

THOSE passingly familiar with machine translation (MT) may well have reacted in the following ways at some point. “Great!” would be one such, on plugging something into the best-known public and free version, Google Translate, and watching the translation appear milliseconds later. “Wait a second…” might be the next, from those who know both languages. Google Translate, like all MT systems, can make mistakes, from the subtle to the the hilarious.

The internet is filled (here for example) with signs badly machine translated from Chinese into English. What monolingual English-speakers don't realise is just how many funny mistakes get made in translating the other way. Take, for example, the Occupy Wall Street protester in 2011 who seems to have plugged “No more corruption” into a computer translator and made a sign with the resulting Chinese output. It read: “There is no corruption”.

MT is hard. It has occupied the minds of a lot of smart people for decades, which is why it is still known by a 1950s-style moniker rather than “computer translation”. Older models tended to try to break down the grammar or meaning of the source text, and reconstruct it in the target language. This was so difficult, though, that in retrospect it is unsurprising that this approach started running into intractable problems. But now, in an early application of “big data” (before the phrase became vogue), MT systems typically work statistically. If you feed a lot of high-quality human-translated texts into a translation model in both target and source languages, the model can learn the likelihood that "X" in language A will be translated as "Y" in language B. (And how often, and in what contexts, "X" is more likely to be translated as "Z" instead.) The more data you feed in, the better the model's statistical guesses get. This is why Google (which has nothing if not lots of data) has got rather decent at MT.

If you "round-trip" the preceding paragraph in Google translate, rendering it into German and then translating that output once again into English, the errors and infelicities multiply:

Machine translation is very good in the translation of single words, where all she has to do, is to act as an online dictionary. It is also good at common rates, as these chunks, which translates many times and so easily represented in the target language. It's not bad, simple sentences with a clear structure enough, though, once you start sentences plugging in, you'll start to see some sluggishness in the output. And all the lyrics begin, in fact, look very disjointed.

MT struggles in particular with surprising input that the training model has not taught it to expect. Hanzi Smatter, a blog, received a picture of a biker who got a computer-translated “Ride Hard Die Free” tattooed in huge Chinese characters down his torso. The only problem was that he got "die" in the sense of a “tool used for stamping or shaping metal” permanently inked on his body, probably because nothing like “die free” was in the translator’s training texts. (It also translated “free” as “free of charge”.) Perhaps lots of industrial or commercial materials were part of the training, explaining why the rather less common “tool” meaning of “die” was chosen over the more common “ring-down-the-curtain-and-join-the-choir-invisible” meaning.

To rely on raw MT output is almost as bad an idea as getting a full-body tattoo in a language you don’t speak. But it would also be a mistake to dismiss MT, a steadily improving tool that is best used with human post-editing. This week in Dublin, TAUS, an idea shop and resource-sharing platform for MT users, gathered originators and users of MT to talk about how to get users to share more of their data. The more everyone shares, the more everyone wins, but many companies consider their translation models proprietary assets.

The reason companies have proprietary systems is because MT’s quality is quickly improved by specific training for a restricted domain. For example, an industrial company would train its model to translate "die" with the “metal tool” meaning, a toy-maker would prefer the “cube with dots on each side” meaning, and a pet shop would prefer the “pushing-up-the-daisies” meaning. Such domain restriction increases the accuracy of translation quite a lot. It has the down-side of making a single engine less useful for broader applications. But this problem is diminishing, since new such engines can increasingly be crafted quickly, as needed, for a given language pairing and domain (as long as enough training text is available, which is why TAUS is trying to get companies to share).

This makes MT a lot more than a quick “good-enough” translator or an aid to tourists. Wayne Bourland of Dell, a computer-maker, says that using MT, plus post-editing, has cut translation time by 40% for his company, which localises its website in 28 languages. More importantly, MT saves money: it has saved Dell 40% of its translating cost since 2011. He calculates the return on Dell’s investment for MT at 900%—numbers, in other words, to die for.

So will MT replace human translators entirely at some point? Or perhaps even replace the need for learning foreign languages in the long run? That will be the subject of the next column.
from: Meredith P.
date: Wed, Jun 11, 2014 at 8:37 PM
subject: Possibly worst Japanese tattoo ever

Seriously, I cannot think of anything worse.

Reddit discussion:

According to Reddit discussion thread, the tattoo was intended to be:

"I stand for many yet walk alone."

However, it is a pretty horrendous translation fail.

My fellow linguaphile, Alan Siegrist adds, 

The peculiar thing is that the characters are all correctly and properly written in a nice font, but it is completely gibberish in meaning. Could it possibly have been machine-translated into Japanese from English or some other language and the MT output tattooed as-is onto the unsuspecting tattooee’s foot? You would think that maybe the untranslated lowercase letter “i” in the supposedly Japanese MT output would have clued someone in to the fact that the MT had failed, leaving just gibberish.

It reads:

のiスタンドの多く No i-sutando no ooku

まだ私は独りで歩ける。 Mada watashi ha hitori de arukeru.

The first line makes no sense but might mean something along the lines of:

“Many of i-stand of”

The second line is not so bad, translating to :

“I can still walk alone.”

This reminds me of Green Day's Boulevard of Broken Dreams.

A reader of Hanzi Smatter recently sent in a poster of American television show, Arrow, where the lead character, Oliver Queen, has four Chinese characters tattooed on his abdomen.

The characters are 鼠姜姚猪, which literally translate as "rat/rodent", "ginger", "handsome/good-looking", and "pig/swine".

There are speculations on what the significance of this tattooed phrase, since I have never watched the series, I can only rely on interviews posted on YouTube.

from: Natalie O.
to: Tian
date: Fri, Apr 11, 2014 at 11:47 AM
subject: Re: Help read my tattoo

I got this about 10 years ago and am curious to know what it actually says.

Thanks for your help.

滾蛋 = get out of here! / beat it!