What is mojibake?
Mojibake (文字化け), literally "changed characters", is when
the computer encodings of characters, particularly the encodings used
for Japanese characters, get fouled up somewhere, such as in email or
usenet postings.
Mojibake may happen for several reasons:
- Incorrect description of encoding, for example a web page which
claims to be in the Shift-JIS format (see Encodings of Japanese)
but is actually in UTF-8 (see Encodings of Japanese).
- Removal of the eighth bit of the encoding. This was formerly a
common problem in Japanese email.
- The user's computer does not recognize the encoding.
- Server-side software processes multibyte text as if it was single byte encoded.
Types of mojibake and diagnostics
Mojibake may appear in several forms:
- A line of question marks such as ???????. This happens when the
computer displays characters from eight bit encodings with question
marks. This was common in Usenet posts from Outlook Express. In most
cases the characters have been actually replaced with question marks
making them impossible to recover.
- A line of accented letters like æ¥æ¬èª[. This happens
when the computer misunderstands encoded eight-bit Japanese to be text
encoded in a European encoding such as those for French or Swedish
which uses the eighth bit for accented letters.
- Streams of unreadable kanji. This happens when
- Bytes are lost from the original text
- Eight bit encodings such as UTF-8 are misinterpreted as other
encodings such as EUC-JP (see Encodings of Japanese)
- Text such as ?$B%1!<%?%$(B. This happens when escape characters
are removed from ISO-2022-JP encoded Japanese (see
Encodings of Japanese) (here they have been turned into
question marks) and only the escape sequences remain visible.
Links
Copyright © 1994-2025 Ben Bullock
If you have questions, corrections, or comments, please contact
Ben Bullock
or use the discussion forum / Privacy policy