%D \module %D [ file=lang-mis, %D version=1997.03.20, % used to be supp-lan.tex %D title=\CONTEXT\ Language Macros, %D subtitle=Compounds, %D author=Hans Hagen, %D date=\currentdate, %D copyright={PRAGMA ADE \& \CONTEXT\ Development Team}] %C %C This module is part of the \CONTEXT\ macro||package and is %C therefore copyrighted by \PRAGMA. See mreadme.pdf for %C details. \writestatus{loading}{ConTeXt Language Macros / Compounds} %D More or less replaced. %D \gdef\starttest#1\stoptest{\starttabulate[|l|l|p|]#1\stoptabulate} %D \gdef\test #1{\NC\detokenize{#1}\NC\hyphenatedword{#1}\NC#1\NC\NR} \unprotect %D One of \TEX's strong points in building paragraphs is the way hyphenations are %D handled. Although for real good hyphenation of non||english languages some %D extensions to the program are needed, fairly good results can be reached with the %D standard mechanisms and an additional macro, at least in Dutch. %D %D \CONTEXT\ originates in the wish to typeset educational materials, especially in %D a technical environment. In production oriented environments, a lot of compound %D words are used. Because the Dutch language poses no limits on combining words, we %D often favor putting dashes between those words, because it facilitates reading, %D at least for those who are not that accustomed to it. %D %D In \TEX\ compound words, separated by a hyphen, are not hyphenated at all. In %D spite of the multiple pass paragraph typesetting this can lead to parts of words %D sticking into the margin. The solution lays in saying \type %D {spoelwater||terugwinunit} instead of \type {spoelwater-terugwinunit}. By using a %D one character command like \type {|}, delimited by the same character \type {|}, %D we get ourselves both a decent visualization (in \TEXEDIT\ and colored verbatim %D we color these commands yellow) and an efficient way of combining words. %D %D The sequence \type{||} simply leads to two words connected by a hyphen. Because %D we want to distinguish such a hyphen from the one inserted when \TEX\ hyphenates %D a word, we use a bit longer one. %D %D \hyphenation {spoel-wa-ter te-rug-win-unit} %D %D \starttest %D \test {spoelwater||terugwinunit} %D \stoptest %D %D As we already said, the \type{|} is a command. This commands accepts an optional %D argument before it's delimiter, which is also a \type{|}. %D %D \hyphenation {po-ly-meer che-mie} %D %D \starttest %D \test {polymeer|*|chemie} %D \stoptest %D %D Arguments like \type{*} are not interpreted and inserted directly, in contrary to %D arguments like: %D %D \starttest %D \test {polymeer|~|chemie} %D \test {|(|polymeer|)|chemie} %D \test {polymeer|(|chemie|)| } %D \stoptest %D %D Although such situations seldom occur |<|we typeset thousands of pages before we %D encountered one that forced us to enhance this mechanism|>| we also have to take %D care of comma's. %D %D \hyphenation {uit-stel-len} %D %D \starttest %D \test {op||, in|| en uitstellen} %D \stoptest %D %D The next special case (concerning quotes) was brought to my attention by Piet %D Tutelaers, one of the driving forces behind rebuilding hyphenation patterns for %D the dutch language.\footnote{In 1996 the spelling of the dutch language has been %D slightly reformed which made this topic actual again.} We'll also take care of %D this case. %D %D \starttest %D \test {AOW|'|er} %D \test {cd|'|tje} %D \test {ex|-|PTT|'|er} %D \test {rock|-|'n|-|roller} %D \stoptest %D %D Tobias Burnus pointed out that I should also support something like %D %D \starttest %D \test {well|_|known} %D \stoptest %D %D to stress the compoundness of hyphenated words. %D %D Of course we also have to take care of the special case: %D %D \starttest %D \test {text||color and ||font} %D \stoptest %D \macros %D {installdiscretionaries} %D %D The mechanism described here is one of the older inner parts of \CONTEXT. The %D most recent extensions concerns some special cases as well as the possibility to %D install other characters as delimiters. The prefered way of specifying compound %D words is using \type{||}, which is installed by: %D %D \starttyping %D \installdiscretionary | - %D \stoptyping %D %D Some alternative definitions are: %D %D \startbuffer %D \installdiscretionary * - %D \installdiscretionary + - %D \installdiscretionary / - %D \installdiscretionary ~ - %D \stopbuffer %D %D \typebuffer %D %D after which we can say: %D %D \start \getbuffer %D \starttest %D \test {test**test**test} %D \test {test++test++test} %D \test {test//test//test} %D \test {test~~test~~test} %D \stoptest %D \stop %D \macros %D {compoundhyphen} %D %D Now let's go to the macros. First we define some variables. In the main \CONTEXT\ %D modules these can be tuned by a setup command. Watch the (maybe) better looking %D compound hyphen. % hm why ex \ifx\compoundhyphen \undefined \unexpanded\def\compoundhyphen {\hbox{-\kern-.10775\emwidth-}} % .25\exheight \fi %D The last two variables are needed for subsentences |<|like this one|>| which we %D did not yet mention. We want to enable breaking but at the same time don't want %D compound characters like |-| or || to be separated from the words. \TEX\ hackers %D will recognise the next two macro's: \ifx\prewordbreak \undefined \unexpanded\def\prewordbreak {\penalty\plustenthousand\hskip\zeropoint\relax} \fi \ifx\postwordbreak\undefined \unexpanded\def\postwordbreak {\penalty\zerocount \hskip\zeropoint\relax} \fi \ifx\hspaceamount \undefined \def\hspaceamount#1#2{.16667\emwidth} \fi % language specific %unexpanded\def\permithyphenation{\ifhmode\prewordbreak\fi} % doesn't remove spaces \unexpanded\def\permithyphenation{\ifhmode\wordboundary\fi} % doesn't remove spaces %D \macros %D {beginofsubsentence,endofsubsentence, %D beginofsubsentencespacing,endofsubsentencespacing} %D %D In the previous macros we provided two hooks which can be used to support nested %D sub||sentences. In \CONTEXT\ these hooks are used to insert a small space when %D needed. % \ifx\beginofsubsentence \undefined \unexpanded\def\beginofsubsentence{\hbox{\emdash}} \fi % \ifx\endofsubsentence \undefined \unexpanded\def\endofsubsentence {\hbox{\emdash}} \fi % \ifx\beginofsubsentencespacing\undefined \let\beginofsubsentencespacing\relax \fi % \ifx\endofsubsentencespacing \undefined \let\endofsubsentencespacing \relax \fi %D The following piece of code is a torture test compound handling. The \type %D {\relax} before the \type {\ifmmode} is needed because of the alignment scanner %D (in \ETEX\ this problem is not present because there a protected macro is not %D expanded. Thanks to Tobias Burnus for providing this example. %D %D \startformula %D \left|f(x_n)-{1\over2}\right| = %D {\cases{|{1\over2}-x_n| &for $0\le x_n < {1\over2}$\cr %D |x_n-{1\over2}| &for ${1\over2}2xxx \def\lang_discretionaries_hyphen_like#1#2% {\ifconditional\spaceafterdiscretionary %prewordbreak\hbox{#1}\relax \wordboundary\hbox{#1}\relax \else\ifconditional\punctafterdiscretionary %prewordbreak\hbox{#1}\relax \wordboundary\hbox{#1}\relax \else %\prewordbreak#2\postwordbreak % was prewordbreak \wordboundary#2\wordboundary \fi\fi} \definetextmodediscretionary {} {\lang_discretionaries_hyphen_like\textmodehyphen\textmodehyphendiscretionary} \definetextmodediscretionary - {\lang_discretionaries_hyphen_like\normalhyphen\normalhyphendiscretionary} \definetextmodediscretionary _ {\lang_discretionaries_hyphen_like\composedhyphen\composedhyphendiscretionary} \definetextmodediscretionary ) {\lang_discretionaries_hyphen_like{)}{\discretionary{-)}{}{)}}} \definetextmodediscretionary ( {\ifdim\lastskip>\zeropoint %(\prewordbreak (\wordboundary \else %\prewordbreak\discretionary{}{(-}{(}\prewordbreak \wordboundary\discretionary{}{(-}{(}\wordboundary \fi} \definetextmodediscretionary ~ %{\prewordbreak\discretionary{-}{}{\thinspace}\postwordbreak} {\wordboundary\discretionary{-}{}{\thinspace}\wordboundary} \definetextmodediscretionary ' %{\prewordbreak\discretionary{-}{}{'}\postwordbreak} {\wordboundary\discretionary{-}{}{'}\wordboundary} \definetextmodediscretionary ^ %{\prewordbreak\discretionary{\hbox{\normalstartimath|\normalstopimath}}{}{\hbox{\normalstartimath|\normalstopimath}}% % \postwordbreak} % bugged {\wordboundary\discretionary{\hbox{\normalstartimath|\normalstopimath}}{}{\hbox{\normalstartimath|\normalstopimath}}% \wordboundary} % bugged \definetextmodediscretionary < %{\beginofsubsentence\prewordbreak\beginofsubsentencespacing {\beginofsubsentence\wordboundary\beginofsubsentencespacing \aftergroup\ignorespaces} % tricky, we need to go over the \nextnextnext \definetextmodediscretionary > {\removeunwantedspaces %\endofsubsentencespacing\prewordbreak\endofsubsentence} \endofsubsentencespacing\wordboundary\endofsubsentence} \definetextmodediscretionary = {\removeunwantedspaces %\prewordbreak\midsentence\prewordbreak \wordboundary\midsentence\wordboundary \aftergroup\ignorespaces} % french %definetextmodediscretionary : {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{:}:} %definetextmodediscretionary ; {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{;};} %definetextmodediscretionary ? {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{?}?} %definetextmodediscretionary ! {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{!}!} \definetextmodediscretionary : {\removeunwantedspaces\wordboundary\kern\hspaceamount\empty{:}:} \definetextmodediscretionary ; {\removeunwantedspaces\wordboundary\kern\hspaceamount\empty{;};} \definetextmodediscretionary ? {\removeunwantedspaces\wordboundary\kern\hspaceamount\empty{?}?} \definetextmodediscretionary ! {\removeunwantedspaces\wordboundary\kern\hspaceamount\empty{!}!} %definetextmodediscretionary * {\prewordbreak\discretionary{-}{}{\kern.05\emwidth}\prewordbreak} \definetextmodediscretionary * {\wordboundary\discretionary{-}{}{\kern.05\emwidth}\wordboundary} % spanish %definetextmodediscretionary ?? {\prewordbreak\questiondown} %definetextmodediscretionary !! {\prewordbreak\exclamdown} \definetextmodediscretionary ?? {\wordboundary\questiondown} \definetextmodediscretionary !! {\wordboundary\exclamdown} %D \installdiscretionary | + %D \installdiscretionary + = \def\defaultdiscretionaryhyphen{\compoundhyphen} \installdiscretionary | \defaultdiscretionaryhyphen % installs in ctx and prt will fall back on it %D \macros %D {fakecompoundhyphen} %D %D In headers and footers as well as in active pieces of text we need a dirty hack. %D Try to imagine what is needed to savely break the next text across a line and at %D the same time make the words interactive. %D %D \starttyping %D \goto{Some||Long||Word} %D \stoptyping \unexpanded\def\fakecompoundhyphen {\def\|{\mathortext\vert\lang_compounds_fake_hyphen}} \def\lang_compounds_fake_hyphen {\def##1|% {\doifelsenothing{##1}\compoundhyphen{##1}% \kern\compoundbreakpoint\allowbreak}} %D \macros %D {midworddiscretionary} %D %D If needed, one can add a discretionary hyphen using \type %D {\midworddiscretionary}. This macro does the same as \PLAIN\ \TEX's \type {\-}, %D but, like the ones implemented earlier, this one also looks ahead for spaces and %D grouping tokens. \unexpanded\def\midworddiscretionary {\futurelet\nexttoken\lang_discretionaries_mid_word} \def\lang_discretionaries_mid_word {\ifx\nexttoken\blankspace\else \ifx\nexttoken\bgroup \else \ifx\nexttoken\egroup \else \discretionary{-}{}{}% \fi\fi\fi} % As this is rather mkii-ish we keep the mkiv definition around for educational % purposes but consider this feature obsolete in post mkii versions. % %D \macros % %D {installcompoundcharacter} % %D % %D When Tobias Burnus started translating the dutch manual of \PPCHTEX\ into german, % %D he suggested to let \CONTEXT\ support the \type{german.sty} method of handling % %D compound characters, especially the umlaut. This package is meant for use with % %D \PLAIN\ \TEX\ as well as \LATEX. % %D % %D I decided to implement compound character support as versatile as possible. As a % %D result one can define his own compound character support, like: % %D % %D \starttyping % %D \installcompoundcharacter "a {\"a} % %D \installcompoundcharacter "e {\"e} % %D \installcompoundcharacter "i {\"i} % %D \installcompoundcharacter "u {\"u} % %D \installcompoundcharacter "o {\"o} % %D \installcompoundcharacter "s {\SS} % %D \stoptyping % %D % %D or even % %D % %D \starttyping % %D \installcompoundcharacter "ck {\discretionary {k-}{k}{ck}} % %D \installcompoundcharacter "ff {\discretionary{ff-}{f}{ff}} % %D \stoptyping % %D % %D The support is not limited to alphabetic characters, so the next definition is % %D also valid. % %D % %D \starttyping % %D \installcompoundcharacter ". {.\doifnextcharelse{\spacetoken}{}{\kern.125em}} % %D \stoptyping % %D % %D The implementation looks familiar and uses the same tricks as mentioned earlier % %D in this module. We take care of two arguments, which complicates things a bit. % % \installcorenamespace{compoundnormal} % \installcorenamespace{compoundsingle} % \installcorenamespace{compoundmultiple} % \installcorenamespace{compounddefinition} % % %D When I started working on \MKIV\ code, we needed a different approach for % %D defining the active character itself. In \MKII\ as well as in \MKIV\ we now use % %D the catcode vectors. % % \setnewconstant\compoundcharactermode\plusone % % \newcount\c_lang_compounds_character % % \def\installcompoundcharacter #1#2#3 #4% {#4} no grouping % {\ifcase\compoundcharactermode % % ignore mode % \else % \chardef\c_lang_compounds_character`#1% % \expandafter\chardef\csname\??compoundnormal\string#1\endcsname\c_lang_compounds_character % \def\!!stringa{#3}% % \expandafter\def\csname\ifx\!!stringa\empty\??compoundsingle\else\??compoundmultiple\fi\detokenize{#1#2#3}\endcsname{#4}% % \setevalue{\??compounddefinition\detokenize{#1}}{\noexpand\lang_compounds_handle_character{\detokenize{#1}}}% beter nr's % \expandafter\letcatcodecommand\expandafter\ctxcatcodes\expandafter\c_lang_compounds_character\csname\??compounddefinition\detokenize{#1}\endcsname % \fi} % % %D We can also ignore definitions (needed in for instance \XML). Beware, this macro % %D is supposed to be used grouped! % % \def\ignorecompoundcharacter % {\compoundcharactermode\zerocount} \let\ignorecompoundcharacter\relax % %D In handling the compound characters we have to take care of \type {\bgroup} and % %D \type {\egroup} tokens, so we end up with a multi||step interpretation macro. We % %D look ahead for a \type {\bgroup}, \type {\egroup} or \type {\blankspace}. Being % %D no user of this mechanism, the credits for testing them goes to Tobias Burnus, % %D the first german user of \CONTEXT. % %D % %D We need to look into the future with \type{\futurelet} to prevent spaces from % %D disappearing. % % \def\lang_compounds_handle_character#1% % {\def\lang_compounds_handle_character_finish{\lang_compounds_handle_character_finish_indeed{#1}}% % \futurelet\nexttoken\xhandlecompoundcharacter} % % \def\lang_compounds_handle_character_finish_indeed % {\ifx\nexttoken\bgroup % %\expandafter\lang_compounds_handle_character_pickup % handle "{ee} -> \"ee % %\expandafter\gobbleoneargument % forget "{ee} -> ee % \expandafter\lang_compounds_handle_character_one % ignore "{ee} -> "ee % \else\ifx\nexttoken\egroup % \doubleexpandafter\lang_compounds_handle_character_normal % \else\ifx\nexttoken\blankspace % \tripleexpandafter\lang_compounds_handle_character_normal % \else % \tripleexpandafter\lang_compounds_handle_character_pickup % \fi\fi\fi} % % \def\lang_compounds_handle_character_normal#1% % {\csname\??compoundnormal\string#1\endcsname} % % \def\lang_compounds_handle_character_pickup#1#2% preserve space % {\def\lang_compounds_handle_character_finish{\lang_compounds_handle_character_finish_indeed#1#2}% % \futurelet\nexttoken\lang_compounds_handle_character_finish} % % \def\lang_compounds_handle_character_finish_indeed % {\ifx\nexttoken\bgroup % \expandafter\lang_compounds_handle_character_one % \else\ifx\nexttoken\egroup % \doubleexpandafter\lang_compounds_handle_character_one % \else\ifx\nexttoken\blankspace % \tripleexpandafter\lang_compounds_handle_character_one % \else % \tripleexpandafter\lang_compounds_handle_character_two % \fi\fi\fi} % % %D Besides taken care of the grouping and space tokens, we have to deal with three % %D situations. First we look if the next character equals the first one, if so, then % %D we just insert the original. Next we look if indeed a compound character is % %D defined. We either execute the compound character or just insert the first. So we % %D have % %D % %D \starttyping % %D % %D \stoptyping % %D % %D In later modules we will see how these commands are used. % % \def\lang_compounds_handle_character_one#1#2% % {\if\string#1\string#2% was: \ifx#1#2% % %\expandafter\let\expandafter\next\csname\??compoundnormal\string#1\endcsname % \csname\??compoundnormal\string#1\expandafter\endcsname % \else\ifcsname\??compoundsingle\string#1\string#2\endcsname % %\expandafter\let\expandafter\next\lastnamedcs % \expandafter\lastnamedcs % \else % %\expandafter\let\expandafter\next\lastnamedcs % \expandafter\lastnamedcs % \fi\fi} % % \def\lang_compounds_handle_character_two#1#2#3% % {\if\string#1\string#2% % \def\next{\csname\??compoundnormal\string#1\endcsname#3}% % \else\ifcsname\??compoundmultiple\string#1\string#2\string#3\endcsname % \expandafter\let\expandafter\next\lastnamedcs % \else\ifcsname\??compoundsingle\string#1\string#2\endcsname % \expandafter\let\expandafter\next\lastnamedcs % \else % \expandafter\let\expandafter\next\lastnamedcs % \fi\fi\fi % \next} % % %D For very obscure applications (see for an application \type {lang-sla.tex}) we % %D provide: % % % \def\simplifiedcompoundcharacter#1#2% % % {\ifcsname\??compoundsingle\string#1\string#2\endcsname % % \doubleexpandafter\firstofoneargument\csname\??compoundsingle\string#1\string#2\endcsname % % \else % % #2% % % \fi} % % \def\simplifiedcompoundcharacter#1#2% % {\ifcsname\??compoundsingle\string#1\string#2\endcsname % \doubleexpandafter\firstofoneargument\lastnamedcs % \else % #2% % \fi} %D \macros %D {disablediscretionaries,disablecompoundcharacter} %D %D Occasionally we need to disable this mechanism. For the moment we assume that %D \type {|} is used. \let\disablediscretionaries \ignorediscretionaries \let\disablecompoundcharacters\ignorecompoundcharacter %D \macros %D {normalcompound} %D %D Handy in for instance XML. (Kind of obsolete) \ifdefined\normalcompound \else \let\normalcompound=| \fi %D \macros %D {compound} %D %D We will overload the already active \type {|} so we have to save its meaning in %D order to be able to use this handy macro. %D %D \starttyping %D so test\compound{}test can be used instead of test||test %D \stoptyping \bgroup \catcode\barasciicode\activecatcode \unexpanded\gdef\compound#1{|#1|} \doglobal \appendtoks \def|#1|{\ifx#1\empty\empty-\else#1\fi}% \to \everysimplifycommands \egroup %D Here we hook some code into the clean up mechanism needed for verbatim data. \appendtoks %\disablecompoundcharacters \disablediscretionaries \to \everycleanupfeatures \protect \endinput