What is mojibake?

Mojibake (文字化け), literally "changed characters", is when the computer encodings of characters, particularly the encodings used for Japanese characters, get fouled up somewhere, such as in email or usenet postings.

Mojibake may happen for several reasons:

Incorrect description of encoding, for example a web page which claims to be in the Shift-JIS format (see Encodings of Japanese) but is actually in UTF-8 (see Encodings of Japanese).
Removal of the eighth bit of the encoding. This was formerly a common problem in Japanese email.
The user's computer does not recognize the encoding.
Server-side software processes multibyte text as if it was single byte encoded.

Types of mojibake and diagnostics

Mojibake may appear in several forms:

A line of question marks such as ???????. This happens when the computer displays characters from eight bit encodings with question marks. This was common in Usenet posts from Outlook Express. In most cases the characters have been actually replaced with question marks making them impossible to recover.
A line of accented letters like æ¥æ¬èª[. This happens when the computer misunderstands encoded eight-bit Japanese to be text encoded in a European encoding such as those for French or Swedish which uses the eighth bit for accented letters.
Streams of unreadable kanji. This happens when
- Bytes are lost from the original text
- Eight bit encodings such as UTF-8 are misinterpreted as other encodings such as EUC-JP (see Encodings of Japanese)
Text such as ?$B%1!<%?%$(B. This happens when escape characters are removed from ISO-2022-JP encoded Japanese (see Encodings of Japanese) (here they have been turned into question marks) and only the escape sequences remain visible.


Book reviews	Convert Japanese numbers	Handwritten kanji recognition	Stroke order diagrams	Convert Japanese units

What is mojibake?

Types of mojibake and diagnostics

Links