mk-cjk.tex /size: 11 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\usemodule[fnt-24]
4
5\startcomponent mk-cjk
6
7\environment mk-environment
8
9\definefontfallback [FullTyping] [adobemyungjostd-medium] [0x3000-0xFFFF] [check=yes,force=no]
10\definefontfallback [FullTyping] [adobesongstd-light]     [0x3000-0xFFFF] [check=yes,force=no]
11
12\definefontsynonym  [MyTyping]  [lmmono10-regular] [fallbacks=FullTyping]
13\definefont[MyTypingFont][MyTyping sa 1]
14
15\nonknuthmode
16
17\chapter{Chinese, Japanese and Korean, aka CJK}
18
19\start \setuptyping[style=\MyTypingFont] % begin of typing hackery
20
21{\em This aspect of \MKIV\ is under construction. We use non-realistic examples.
22We need to reimplement chinese numbering in \LUA, etc.\ etc.}
23
24{\em todo: There is no need for checkinf the width if the halfwidth feature is turned on.}
25
26\subject{introduction}
27
28In \CONTEXT\ \MKII\ we support \CJK\ languages. Intercharacter spacing as
29well as linebreaks are taken care of. Chinese numbering is dealt with and
30labels and other language specific aspects are supported too. The implementation
31uses active characters and some special encoding subsystem. Although it works
32quite okay, in \MKIV\ we follow a different route.
33
34The current implementation is an intermediate one and is used to explore the
35possibilities and identify needs. One handicap in implementing \CJK\ support is
36that the wishlist of features and behaviour is somewhat dependent on who you talk
37to. This means that the implementation will have some default behaviour but can be
38tuned to specific needs. The current implementation uses the script related
39analyser and is triggered by fonts but at some point I may decide to provide
40analysing independent of fonts.
41
42As will all things \TEX, we need to find a proper font to get our document typeset
43and because \CJK\ fonts are normally quite large they are not always available on
44your system by default.
45
46\subject{scripts and languages}
47
48I'm no expert on \CJK\ and will never be one so don't expect much insight in the
49scripts and languages here. Here we only look at the way a sequence of characters
50in the input turns into a typeset paragraph. For that it is important to keep in
51mind that in a Korean or Japanese text we might find Chinese characters and that
52the spacing rules become somewhat fuzzed by that. For instance Korean has spaces
53between words and words can be broken at any point, while Chinese has no spaces.
54
55Officially Chinese runs from top to bottom but here we focus on the horizontal
56variant. When turned into glyphs the characters normally are of equal width
57and in principle we could expect them all to be vertically aligned. However, a
58font can have characters that take half that space: so called halfwidth
59characters. And, of course, in practice a font might have shapes that fall into
60this categrory but happen to have their own width which deviates from this.
61
62This means that a mechanism that deals with \CJK\ has to take care of a few
63things:
64
65\startitemize[packed]
66\item Spaces at the end of the line (or actually anywhere in the input stream)
67      need to be removed but only for Chinese.
68\item Opening and closing symbols as well as punctuation needs special treatment
69      especially when they are halfwidth.
70\item Korean uses proportially spaces punctuation and mixes with other latin fonts,
71      while Chinese often uses built in latin shapes.
72\item We may break anywhere but not after an opening symbol like~( or and not
73      before a closing symbol like~).
74\item We need to deal with mixed Chinese and Korean spacing rules.
75\stopitemize
76
77Let's start with showing some Korean. We use one of the fonts shipped
78by Adobe as part of Acrobat but first we define a Korean featureset and
79a font.
80
81\startbuffer
82\definefontfeature
83  [korean]
84  [script=hang,language=kor,mode=node,analyze=yes]
85
86\definefont[KoreanSample][adobemyungjostd-medium*korean]
87\stopbuffer
88
89\typebuffer \getbuffer
90
91Korean looks like this:
92
93\startbuffer
94\KoreanSample \setscript[hangul]
95
96모든 인간은 태어날 때부터 자유로우며  존엄과 권리에 있어 동등하다.
97인간은 천부적으로 이성과 양심을 부여받았으며 서로 형제애의 정신으로
98행동하여야 한다.
99\stopbuffer
100
101\typebuffer \start \getbuffer \stop
102
103The Korean script reflect syllabes and is very structured.
104Although modern fonts contain prebuilt syllabes one can also use
105the jamo alphabet to build them from components. The following
106example is provided by Dohyun Kim:
107
108\startbuffer
109\definefontfeature [medievalkorean] [mode=node,script=hang,lang=kor,ccmp=yes,ljmo=yes,vjmo=yes,tjmo=yes]
110\definefontfeature [modernkorean]   [mode=node,script=hang,lang=kor]
111
112\enabletrackers[scripts.analyzing]
113\setscript[hangul]
114\definedfont [UnBatang*medievalkorean at 20pt] ᄒᆞᆫ글 \ruledhbox{ᄒᆞᆫ글} \ruledhbox{ᄒᆞᆫ} \ruledhbox{글}\blank
115\definedfont [UnBatang*modernkorean   at 20pt] ᄒᆞᆫ글 \ruledhbox{ᄒᆞᆫ글} \ruledhbox{ᄒᆞᆫ} \ruledhbox{글}\blank
116\disabletrackers[scripts.analyzing]
117\stopbuffer
118
119\typebuffer \start \getbuffer \stop
120
121There are subtle differences between the medieval and modern
122shapes. It was this example that lead to more advanced \type
123{tounicode} support in \MKIV\ so that copy and paste works out
124well now for such input.
125
126For Chinese we define a couple of features
127
128\startbuffer
129\definefontfeature
130  [chinese-traditional]
131  [mode=node,script=hang,lang=zht]
132\definefontfeature
133  [chinese-simple]
134  [mode=node,script=hang,lang=zhs]
135\definefontfeature
136  [chinese-traditional-hw]
137  [mode=node,script=hang,lang=zht,hwid=yes]
138\definefontfeature
139  [chinese-simple-hw]
140  [mode=node,script=hang,lang=zhs,hwid=yes]
141\stopbuffer
142
143\typebuffer \getbuffer
144
145\startbuffer
146\definefont[ChineseSampleFW][adobesongstd-light*chinese-traditional]
147\definefont[ChineseSampleHW][adobesongstd-light*chinese-traditional-hw]
148\setscript[hanzi]
149
150\ChineseSampleFW
151兡也包因沘氓侷柵苗孫孫財崧淫設弼琶跑愍窟榜蒸奭稽
152霄瓢館縲擻鼕〈孃魔釁〉佉沎岠狋垚柛胅娭涘罞偟惈牻荺
153傒焱菏酡廅滘絺赩塴榗箂踃嬁澕蓴醊獧螗餟燱螬駸礑鎞
154瀧鄿瀯騬醹躕鱕。
155
156\ChineseSampleHW
157兡也包因沘氓侷柵苗孫孫財崧淫設弼琶跑愍窟榜蒸奭稽
158霄瓢館縲擻鼕〈孃魔釁〉佉沎岠狋垚柛胅娭涘罞偟惈牻荺
159傒焱菏酡廅滘絺赩塴榗箂踃嬁澕蓴醊獧螗餟燱螬駸礑鎞
160瀧鄿瀯騬醹躕鱕。
161\stopbuffer
162
163\typebuffer \start \getbuffer \stop
164
165A few more samples:
166
167\startbuffer
168\definefont[ChFntAT][name:adobesongstd-light*chinese-traditional-hw at 16pt]
169\definefont[ChFntBT][name:songti*chinese-traditional                at 16pt]
170\definefont[ChFntCT][name:fangsong*chinese-traditional              at 16pt]
171
172\definefont[ChFntAS][name:adobesongstd-light*chinese-simple-hw      at 16pt]
173\definefont[ChFntBS][name:songti*chinese-simple                     at 16pt]
174\definefont[ChFntCS][name:fangsong*chinese-simple                   at 16pt]
175\stopbuffer
176
177\typebuffer \getbuffer
178
179In these fonts traditional comes out as follows:
180
181\start \setscript[hanzi]
182\startlines
183\ChFntAT 我〈能吞下玻璃而不傷身〉體。
184\ChFntBT 我〈能吞下玻璃而不傷身〉體。
185\ChFntCT 我〈能吞下玻璃而不傷身〉體。
186\stoplines
187\stop
188
189And simple as:
190
191\start \setscript[hanzi]
192\startlines
193\ChFntAS 我〈能吞下玻璃而不伤身〉体。
194\ChFntBS 我〈能吞下玻璃而不伤身〉体。
195\ChFntCS 我〈能吞下玻璃而不伤身〉体。
196\stoplines
197\stop
198
199\subject {tracing}
200
201As usual in \CONTEXT, we have some tracing built in. When you say
202
203\startbuffer
204\enabletrackers[scripts.analyzing]
205\stopbuffer
206
207You will get the output colored according to the category that the
208analyser put them in. When you say
209
210\startbuffer
211\enabletrackers[scripts.injections]
212\stopbuffer
213
214some rudimentary information will be written to the log about whet gets
215inserted in the nodelist.
216
217Analyzed input looks like:
218
219\startbuffer
220아아, 나는 이제야 () 알았도다. 마음이 어두운 자는 이목이
221() 되지 않는다. 이목만을 믿는 자는 보고 듣는 것이
222더욱 밝혀져서 병이 되는 것이다. 이제  마부가 발을 말굽에
223밟혀서 뒷차에 실리었으므로, 나는 드디어 혼자 고삐를 늦추어
224강에 띄우고, 무릎을 구부려 발을 모으고 안장 위에 앉았다.
225한번 떨어지면 강이나 물로 땅을 삼고, 물로 옷을 삼으며,
226물로 몸을 삼고, 물로 성정을 삼을 것이다. 이제야  마음은
227한번 떨어질 것을 판단한 터이므로,  귓속에 강물 소리가 없어졌다.
228무릇 아홉  건너는데도 걱정이 없어 의자 위에서 좌와(坐臥)하고
229기거(起居)하는  같았다.
230\stopbuffer
231
232\typebuffer \start \enabletrackers[scripts.analyzing] \KoreanSample \setscript[hangul] \getbuffer \disabletrackers[scripts.analyzing] \stop
233
234For developers (and those who provide them with input) we have another tracing
235
236\startbuffer
237\definedfont[arialuni*korean at 10pt] \setscript[hangul] \ShowCombinationsKorean
238\stopbuffer
239
240\typebuffer
241
242We need to use a font that supports Chinese as well as Korean. This gives quite some output.
243
244\start \getbuffer \stop
245
246% 안녕하세요? (Hello)
247% 감사합니다. (Thank you)
248
249\page \stop % end of typing hackery
250
251\stopcomponent
252
253% \font\JapaneseFontA=name:kozminprovi-regular
254%
255% \startlines
256% Hankaku          : {\JapaneseFontA アイウエオカキクケコサシスセソタチツテ}
257% Romanj digits    : {\JapaneseFontA 0123456789}
258% Romanj lowercase : {\JapaneseFontA abcdefghi}
259% Romanj uppercase : {\JapaneseFontA ABCDEFGHI}
260% \stoplines
261%
262% \enabletrackers[scripts.analyzing]
263%
264% \start \raggedright \dontleavehmode
265%     \ruledhbox\bgroup \ChFntBS ,\egroup  \quad
266%     \ruledhbox\bgroup \ChFntBS 〉\egroup \quad
267%     \ruledhbox\bgroup \ChFntBS 〈\egroup \par
268% \stop
269%
270% \def\DoChineseSample#1#2#3%
271%   {\ruledvtop{#1\hsize#2\relax#3}}
272%
273% \def\ChineseSampleA#1#2{%
274%     \blank
275%     \subsubject{hsize #2, fullwidth}
276%     \dontleavehmode
277%         \DoChineseSample{#1}{#2}{吞吞吞,吞吞吞吞。}\quad
278%         \DoChineseSample{#1}{#2}{吞吞吞,,吞吞吞吞。}\quad
279%         \DoChineseSample{#1}{#2}{吞吞吞〉吞吞吞吞。}\quad
280%         \DoChineseSample{#1}{#2}{吞吞吞〉,吞吞吞吞。}
281%     \blank[small]
282%     \dontleavehmode
283%         \DoChineseSample{#1}{#2}{吞吞吞〉〉吞吞吞吞。}\quad
284%         \DoChineseSample{#1}{#2}{吞吞吞〉〉吞吞吞吞。}\quad
285%         \DoChineseSample{#1}{#2}{〈吞吞吞吞吞吞吞。}\quad
286%         \DoChineseSample{#1}{#2}{〈〈吞吞吞吞吞吞吞。}
287%     \blank[small]
288%     \dontleavehmode
289%         \DoChineseSample{#1}{#2}{吞吞吞…吞吞吞吞。}\quad
290%         \DoChineseSample{#1}{#2}{吞吞吞……吞吞吞吞。}
291%     \dontleavehmode
292%     \blank
293% }
294%
295% \ChineseSampleA\ChFntBS{4.25em}
296% \ChineseSampleA\ChFntBS{4.00em}
297% \ChineseSampleA\ChFntBS{3.75em}
298% \ChineseSampleA\ChFntBS{3.50em}
299% \ChineseSampleA\ChFntBS{3.25em}
300% \ChineseSampleA\ChFntBS{3.00em}
301%
302% \def\ChineseSampleB#1#2{%
303%     \blank
304%     \subsubject{hsize #2, halfwidth}
305%     \dontleavehmode
306%         \DoChineseSample{#1}{#2}{吞吞吞,吞吞吞吞。}\quad
307%         \DoChineseSample{#1}{#2}{吞吞吞‘吞吞吞吞。}\quad
308%         \DoChineseSample{#1}{#2}{吞吞吞’吞吞吞吞。}\quad
309%     \blank
310% }
311%
312% \ChineseSampleB\ChFntBS{4.25em}
313% \ChineseSampleB\ChFntBS{4.00em}
314% \ChineseSampleB\ChFntBS{3.75em}
315% \ChineseSampleB\ChFntBS{3.50em}
316% \ChineseSampleB\ChFntBS{3.25em}
317% \ChineseSampleB\ChFntBS{3.00em}
318%
319% \disabletrackers[scripts.analyzing]
320
321