Rules for Breaking Lines in Asian Languages
Line-breaking and word-wrapping algorithms are important to text parsing as well as to text display. The rules for Asian languages, however, are quite different from the rules for Western languages. For example, unlike most Western written languages, Chinese, Japanese, Korean, and Thai do not necessarily indicate the distinction between words by using spaces. The Thai language doesn't even use punctuation. For these languages, software applications cannot conveniently base line breaks and word-wrapping algorithms on a space character or on standard hyphenation rules. They must follow different guidelines.
Because the Win32 full-text search engine for Microsoft WinHelp recognizes that word wrapping is more complex for some languages than for others, it supports the IWordBreak OLE interface. That way, if a third-party developer creates a superior word-wrapping algorithm for any language, the WinHelp engine can take advantage of it through OLE.
On This Page
Dividing Lines of Text in Japanese
Japanese line breaking is based on the kinsoku rule—you can break lines between any two characters, with several exceptions. The first exception is that a line of text cannot end with any leading characters, which are listed below. (Characters are shown with their hexadecimal code points for Shift-JIS.)
The second exception is that a line of text cannot begin with any following characters. Following characters are listed below.
The third exception is that certain overflow characters are allowed to extend past the right or bottom margin. Those characters that can are listed below.
Dividing Lines of Text in Chinese
The Chinese language uses the same line-breaking rule as the Japanese language does. A line cannot end with any leading characters, which are listed below.
Traditional Chinese (characters are shown with their hexadecimal code point for the Chinese-BIG5 CharSet)
A line cannot begin with any following characters, which are listed below.
Traditional Chinese(characters are shown with their hexadecimal code point for the Chinese-BIG5 CharSet)
(characters are shown with their hexadecimal code point for the GB-2312 CharSet)
The following overflow characters are allowed to extend past the right margin:
Traditional Chinese and Simplified Chinese(characters are shown with their hexadecimal code points, which are the same for the Chinese-BIG5 CharSet and the GB-2312 CharSet)
Dividing Lines of Text in Korean
Korean words expressed in hangul are separated by spaces, as they are in Western languages. Some Korean-language applications allow the user to choose whether or not to break lines between hangul characters.
This example breaks lines only between words.
The example below breaks lines between individual hangul characters.
The standard rule for breaking lines between hangul characters, called geumchik, is very similar to the Japanese kinsoku rule—you can break lines between any two characters, with the following exceptions. A line of text cannot end with any leading characters. (Characters are shown with their hexadecimal code point for Korean standard code, KSC 5601.)
A line of text cannot begin with any following characters, listed below:
The geumchik rule defines three methods for dealing with following characters. The first method, the JalLaNaeGi method, breaks the line before the first character to the left of the following character, as shown below:
The MilEoNuGi method breaks the line after the following character and compresses the text that falls before it, as shown below:
The GeuNyangDuGi method extends the right margin slightly to accommodate the following character, as shown below:
This method can also extend the bottom margin.
There is no special category for overflow characters in Korean.
Dividing Lines of Text in Thai
Thai editions of Windows come with a fairly sophisticated line-breaking algorithm. If you are writing a Thai-language application, take advantage of what the system provides rather than trying to come up with your own line-breaking code. To give you an idea of what would be involved, try to decipher the following line:
Translation: Imagine that this is a string to be word-wrapped. The only way to do so in English would be to identify the individual words and then determine the best place to break the line.
The line-breaking algorithm provided by the system solves these problems for you.