primeideal: Multicolored sideways eight (infinity sign) (the eight)
primeideal ([personal profile] primeideal) wrote2023-09-01 04:30 pm
Entry tags:

Kingdom of Characters, by Jing Tsu

Nonfiction book about the changes in writing, standardizing, and digitizing Mandarin Chinese over the 20th century. Occasionally dry but a lot of stuff I didn't know.

Excerpt from an 1875 report on telegraph wires (which were mostly foreign- owned and operated at that point): "The foreigners bury their telegraphic wires in the ground, burrowing through and forcing their way across in all four directions until the earth's veins are all but severed, making the burial sites vulnerable to wind and flood. How can this sit well on our conscience?"

Morse Code was designed for alphabetic writing systems; a letter in the Latin alphabet (without diacritics) can be represented as one to four dots and dashes, and an Arabic numeral gets a five-symbol code. To represent a character-based language like Chinese, telegraphers had to look up a character in a table that would represent it as four to six numerals, then reverse the process on the other end, which was much more laborious and expensive to send and receive.

Western-language telegraphy also involved a lot of codes and ciphers, both for classified military purposes and trying to get around telegraph pay-by-the-symbol pricing. An example of a historically known but very weak system would be the Caesar shift (ROT-n). If both parties have a matching codebook beforehand, then you can just send a short numerical code to express a precomposed sentence from the book. 214: "Composed and entirely resigned to the will of God." 7571: "You will rue the day if you do."

In 1875, Zhang Deyi became the first Chinese person to design a telegraphic table for Chinese, with a codebook consisting of 80 ten-by-ten grids, encoding 8000 characters. This allowed for quick Caesar-type encryption; you can shift every character by a predetermined amount.

In 1912, the international telegraphic community introduced delayed telegrams, as an alternative to the sneaky shortcuts people were taking to get around the pay-by-the-word rates. Delayed telegrams were sent up to 48 hours late, but at half price. However, users were required to use "plain language" rather than any codes or secret text. This created a problem for Chinese-language users, who were required to use numeric symbols, already costly and slow, to communicate. At an International Telegraphic Union conference in the 1920s, Wang Jingchun successfully advocated to make an exception in this case.

Where there's no alphabetical system, there's also no such thing as alphabetical order, so library science was also subject to subjectivity. "Bibliographic classification in China properly began in the first century B.C.E., and it was based on a perceived moral order. A Confucian scholar devised an elaborate scheme of seven main subject divisions--with thirty-eight subdivisions--prioritizing the Confucian classics first, with science and medicine--astronomy, geomancy, pharmacology, sexology, etc.--occupying the last two categories." (Compare Borges' Celestial Emporium of Benevolent Knowledge.)

The Wade-Giles Romanization system, which has mostly been replaced by modern pinyin (at least in mainland China), was based in part on a system used by a British officer named...Thomas Wade. (If you've read "Death's End," part 3 of the Three-Body Problem trilogy, you know why this is funny.) This brings us to the adventures of the famous Lion-Eating Poet in the Stone Den.

In jail during the Cultural Revolution, Zhi Bingyi started experimenting with different methods of character input in a computerized context; he had to write on the lid of his teacup because he didn't even have toilet paper. Hardcore.

Because there are so many characters, in the early days of computers, they couldn't be saved as raster/bitmap images because that would take too much memory. So they needed to be saved as vector images, and compressed/decompressed; Wang Xuan worked on this problem. Then there's the issue of standardizing character sets; ASCII only took seven bits (later eight), but was useful for American English and not much else (it's the telegraphy problem all over again!) So then Unicode came along, and rapidly became a big deal...

If you enjoy technology and/or linguistics, I think you'll get a lot out of this!
ursula: second-century Roman glass die (icosahedron)

[personal profile] ursula 2023-09-01 10:48 pm (UTC)(link)
This sounds really cool!
isis: (awesome)

[personal profile] isis 2023-09-01 10:57 pm (UTC)(link)
This sounds right up my alley!