What language is that, anyways?
From LibraryWiki
Imagine you've come across some text in an unfamiliar script, say, a caption to this picture:
What language is it? For a novice, this is a non-trivial question. Does the text look like this?
- 八十七年以前,我们的祖先在这大陆上建立了一个新的国家,它孕育于自由,并且献身给一种理念,即所有人都是生来平等的。
Or this?
- 4世代と7年前に私たちの祖先たちはこの大陸に、自由の理念から生まれ、全ての人が平等に創られているという命題に捧げられた一つの新しい国を生み出しました。
Or this?
- 87년 전 우리의 조상들은 자유의 깃발 아래 만인은 태어나면서부터 평등하다는 전제에 헌신하기 위해서 이 대륙에 새로운 나라를 건설했습니다.
For an experienced Asian Studies scholar it should be clear that, the samples' content aside, the writing system used in the first sample is Chinese, that of the second is Japanese, and the third sample is written in Korean. But even the average American novice might be able to draw some conclusions about the content of the samples above with an examination of the first few symbols in the last two samples and a few additional contextual clues:
A reader well-versed in American History will have guessed what the source text of the samples is by now. But the problem of "What language is that?" is still impenetrable to the uninitiated: Even knowing the content of the text in question doesn't help. Is there a website with a "search window" into which you can copy and paste text to have it analyzed? Or do you need a book like this:
Unfortunately, I don't know of any online tool that will help. Barring access to an encyclopedia like the one cited above, this may be a case where you have to ask someone who knows.
Alternatively, you can spend some time perusing the three samples above and learn how to distinguish between the writing systems that each language employs by picking out salient aspects of each.
- Chinese is written with an inventory of thousands of characters (hànzì ). It has the highest proportion of complex characters, the more complex of which are often built up from combinations of simpler elements:
- Hànzì:一丁丂七丄丅万丈三上下丌与丐丑丒且丕世丗丘丙丞丟両丣两並丨个丫中丯丰丱串丵丶丸丹主丼丿乀乁乂乃乄久乇之乍乎乏乑乕乖乗乘乙乚乜九乞也乢乣乩乱乳乴乵乹乾乿亀亂亅了争亊事二亍于互五井亖亗亘亙些亜亝亞亟亠亡亢交亥亦亨享京亭亯亰亳亶亹人什仁仂仃仄仆仇今介仍从仐仔仕他仗仙 and thousands more.
- Japanese is written with a mixture of Chinese characters and two syllabaries:
- Hiragana: いろは二ほへとちりぬるをわかよたれそつねならむういのおくやまけふこえてあさきゆめみじゑひもせず
- Katakana: イロハニホヘトチリヌルヲワカヨタレソツネナラムウイノオクヤマケフコエテアサキユメミジヱヒモセズ
- Korean is mainly written with a syllabary called Hangul, each character of which consists of a combination of at most three phonetic elements, arranged either side by side or one over the other. So for each character, you can draw either a vertical line or a horizontal line through the whole character without bisecting any element:
- Hangul: 나랏 말싸미 듕국에 달아 문짜와로 서로 사맛디 아니할쌔 이런 전차로 어린 백셩이 이르고져 할 배 이셔도 마참내 제 뜻을 시러펴지 못할 노미 하니라 내 이랄 위하야 어엿비 너겨 새로 스물여덜 자를 맹가노니 사람마다 해여 수비 니겨 날로 쑤메 편한케 하고져 할 따라미니라
Next time, you will know.
Incidentally, for those of you who aren't well-versed in American history, the content of the caption to the first image (rendered in three different languages) is the first line of Lincoln's Gettysburg Address:
- Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.
The first image itself is the only known photograph of Lincoln at Gettysburg. He's the one centered in the lower third of the photograph, hatless, looking down. (For more information about the photograph, see the Library of Congress's web site: http://www.loc.gov/exhibits/gadd/gaphot.html)
