--------------------------------------------------------------------------------
welcome
--------------------------------------------------------------------------------

There is not much information here. I normally keep track of developments in
articles or chapters in the history documents. These can (sometimes with a delay
when it's an article) be found in the ConTeXt distribution. The history and
development of LuaTeX is also documented there, often with examples or usage.

The ideas behind this project are discussed in documents in the regular ConTeXt
distribution. A short summary is: in order to make sure ConTeXt will work as
intended, we distribute an engine with it. That way we can control stability,
performance and features. It also permits experiments without the danger of
interference with the engines used in other macro packages. Also, we don't want
dependencies on large subsystems so we have a well defined set of libraries: we
want to stay lean and mean. Eventually the same applies as to original TeX: we
fix bugs and don't add all kind of stuff we don't (want or) need. Just that.

--------------------------------------------------------------------------------
codebase
--------------------------------------------------------------------------------

This codebase is a follow up on LuaTeX. It all started with a merge of files
that came from the Pascal to C converter (CWEB) plus some C libraries. That code
base evolved over time and there were the usual side effects of the translation
and merge of (also other engine) code, plus successive extensions as well as Lua
interfaces. In LuaMetaTeX I tried to smooth things a bit. The idea was to stay
close to the original (which in the end is TeX itself) so that is why many
variables, functions etc are named the way they are. Of course I reshuffled, and
renamed but I also tried to stay close to the original naming. More work needs
to be done to get it all right but it happens stepwise as I don't want to
introduce bugs. In the meantime the LuaTeX and LuaMetaTeX code bases differ
substantially but apart from some new features and stripping away backend and
font code, the core should work the same.

tex etex pdftex aleph:

Of course the main body of code comes from its ancestors. We started with pdfTeX
which has its frontend taken from standard TeX, later extended with the eTeX
additions. Some additional features from pdfTeX were rewritten to become core
functionality. We also took some from Aleph (Omega) but only some (in the
meantime adapted) r2l code is left (so we're not compatible).

mp:

The maintainance of MetaPost was delegated to the same people who do luaTeX and
as a step indevelopment a library was written. This library is used in
LuaMetaTeX but has been adapted a bit for it. In principle some of the additions
can be backported, but that is yet undecided.

lua:

This is the third major component of LuaMetaTeX. In LuaTeX a slightly patched
version has been used but here we use an unchanged version, although the version
number of the bytecode blob is adapted so that we can use intermediate versions
of lua 5.4 that expect different bytecode without crashing on existing bytecode;
this trick has been dropped but I hope at some point Lua will add a define for
this.

For the record: when we started with LuaTeX I'd gone through a pascal, modula 2,
perl, ruby with respect to the management helpers for ConTeXt, like dealing with
indexes, managing metapost subruns, and all kind of goodies that evolved over time.
I ran into Lua in the SciTE editor and the language and the concept of a small and
efficient embedded language. The language orginates in academia and is not under
the influence of (company and commercial driven) marketing. A lot of effort goes
into stepwise evolution. The authors are clear about the way they work on the
language:

	http://lua-users.org/lists/lua-l/2008-06/msg00407.html

which fits nicely in our philosophy. Just in case one wonders if other scripting
languages were considered the answer is: no, they were not. The alternatives all
are large and growing and come with large ecosystems (read: dependencies) and some
had (seemingly) drastic changes in the design over time. Of course Lua also evolves
but that is easy to deal with. And in the meantime also the performance of Lua made
it clear that it was the right choice.

avl:

This library has been in use in the backend code of LuaTeX but is currently only
used in the MP library. I'm not sure to what extend this (originally meant for
Python) module has been adapted for pdfTeX/LuaTeX but afaiks it has been stable
for a long time. It won't be updated but I might adapt it for instance wrt error
messages so that it fits in.

decnumber:

This is used in one of the additional number models that the mp library supports.
In LuaMetaTeX there is no support for the binary model. No one uses it and it
would add quite a bit to the codebase.

hnj:

This GPL licensed module is used in the hyphenation machinery. It has been
slightly adapted so that error messages and such fit in. I don't expect it to
change much in the future.

pplib:

This library is made for Lua(Meta)TeX and provides an efficient PDF parser in
pure C. In LuaTeX it was introduced a replacement for a larger library that
was overkill for our purpose, depended on C++ and kept changing. This library
itself uses libraries but that code is shipped with it. We use some of that
for additional Lua modules (like md5, sha2 and decoding).

lz4 | lzo | zstd:

For years this library was in the code base and even interfaced but not enabled
by default. When I played with zstd support as optional libary I decided that
these two should move out of the code base and also be done the optional way. The
amount of code was not that large, but the binary grew by some 10%. I also played
with the foreign module and zstd and there is no real difference in peformance. The
optionals are actually always enabled, but foreign is controlled by the command
line option that enables loading libraries, and it al;so depends on libffi.

zlib | miniz:

I started with the code taken from LuaTeX, which itself was a copy that saw some
adaptions over time (irr there were border case issues, like dealing with zero
length streams and so). It doesn't change so in due time I might strip away some
unused code. For a while libdeflate was used but because pplib also depends on
zlib and because libdeflate doesn't do streams that was abandoned (it might come
back as it is very nice and clean code.). One issue with (de)compression libraries
is that they use tricks that can be architecture dependent and we stay away from
that. I try to stay away from those and prefer to let the compiler sort things out.

Early 2021 we switched to miniz. One reason is that the codebase is smaller because
it doesn't deal with very old or rare platforms and architectures. Its performance
is comparable, definitely for our purpose, and sometimes even a bit better. I looked
at other alternatives but as soon as processor specific tricks are used, we end up
with architecture specific header files and code so it's a no-go for a presumed
long term stable and easy to compile program like luametatex. There is no gain in it
anyway.

complex:

There is a complex number interface inspired by the complex number lua module by
lhf. It also wraps libcerf usage.

lfs:

In LuaTeX we use a patched version of this library. In LuaMetaTeX I rewrote the
code because too many patches were needed to deal with mswindows properly.

socket:

The core library is used. The library is seldom adapted but I keep an eye on it.
We used to have a patched version in LuaTeX, but here we stay closer. I might
eventually do some rewrite (less code) or decide to make it an external library.
The related Lua code is not in the binary and context uses its own (derived)
variant so that it uses our helpers as well as fits in the reporting system. I
need to keep an eye on improvements upstream. We also need to keep an eye on
copas as we use that code in context.

luasec:

This is not used but here as a reference for a possible future use (maybe as
library).

curl, ghostscript, graphicmagick, zint, mujs, mysql, postgress, sqlite, ...:

The optional module mechamism supports some external libraries but we don't keep
their code in the luametatex codebase. We might come up with a separate source
tree for that, but only for some smaller ones. The large ones, those depending
on other libraries, or c++, or whatever resources, will just be taken from the
system.

libcerf:

This library might become external but is now in use as a plug into the complex
number support that itself is meant for MetaPost use. The code here has been
adapted to support the Microsoft compiler. I will keep an eye on what happens
upstream and I reconsider matters later. (There is no real need to bloat the
LuaMetaTeX binary with something that is rarely used.)

kpse:

There is optional library support for the KPSE library used in WEB2C. Although
it does provide the methods that make sense, it is not meant for usage in
ConTeXt, but more as a toolkit to identify issues and conflicts with parallel
installations like TeXLive.

posit: 

This is an experiment with compact high resolution 32 bit floating point 
numbers. We have a Lua interface but also a number moder in MetaPost. 

potrace: 

This library permmits us to play with bitmaps turned outlines, especially in 
MetaPost. It's little code for much fun. 

hb:

I have a module that works the same as the ffi variant from a couple of years
ago and I might add it when it's needed (for oriental tex font development
checking purposes, but then I also need to cleanup and add some test styles for
that purpose). Looking at the many LuaTeX subversion checkins it looks a bit
like a moving target. It's also written in C++ which we don't (want to) use in
LuaMetaTeX. But the library comes with other programs so it's likely you can
find it on you system someplace.

general:

It's really nice to see all these libraries popping up on the web but in the
perspective of something like TeX one should be careful. Quite often what is hip
today is old fashioned tomorrow. And quite often the selling point of the new
thing comes with bashing the old, which can be a sign of something being a
temporary thing or itself something ot be superseded soon. Now, get me right:
TeX in itself is great, and so are successors. In that sense LuaMetaTeX is just
a follow up with no claims made for it being better. It just makes things easier
for ConTeXt. You can kick in libraries but be aware of the fact that they can
change, so if you have long running projects, make sure you save them. Or run a
virtual machine that can last forever. TeX systems can run for ages that way. We
might eventually add support for generating libs to the compile farm. The older
a library gets, the bigger the change that its api is stable. Compression
libraries are great examples, while libraries that deal with images, conversion
and rendering are more moving (and have way more dependencies too). Actually,
for the later category, in ConTeXt we prefer to call the command line variants
instead of using libraries, also because it seldom influences performance.

licenses:

Most files contain some notice about a the license and most are quite liberal.
I had to add some (notes) that were missing from LuaTeX. There is an occasional
readme file that tells a bit more.

explanations:

The TeX derived source code contains many comments that came with the code when
it was moved from "Pascal Web" to "C Web" (with web2c) to "C plus comments" (by
Taco). These comments are mostly from Don Knuth as they were part of TeX The
Program. However, some comments were added (or changed) in the perspective of
eTeX, pdfTeX, Aleph, etc. We also added some in LuaTeX and LuaMetaTeX. So, in
the meantime it's a mix. It us who made the mess, not Don! In due time I hope
to go over all the comments and let them fit the (extended) code.

dependencies:

Often the files here include more h files than needed but given the speed of
compilation that is no problem. It also helps to identify potential name clashes
and such.

legacy:

Occasionally there is a file texlegacy.c that has some older (maybe reworked)
code but I move it to another place when It gets too large and its code no
longer can be retrofit. For me is shows a bit what got done in the (many)
intermediate steps.

--------------------------------------------------------------------------------
documentation
--------------------------------------------------------------------------------

The code will be stepwise cleaned up a it (removing the web2c side effects),
making the many branches stand out etc so that some aspects can be documented
a bit better (in due time). All this will take time (and already quite some time
went into it.) The official interface of LuaMetaTeX is described in the manual
and examples of usage can be seen in ConTeXt. Of course TeX behaves as such.

The organization of files, names of functions can change as we progress but when
possible the knuthian naming is followed so that the documentation of "TeX The
Program" still (mostly) applies. Some of the improvements in LuaMetaTeX can
eventually trickle back into LuaTeX although we need to guard stability. The
files here can *not* be dropped into the LuaTeX source tree!

--------------------------------------------------------------------------------
reboot
--------------------------------------------------------------------------------

I'll experiment with a reboot engine option but for sure that also interferes
with a macro package initialization so it's a long term experiment. Quite
certainly it will not pay off anyway so it might never happen. But there are
some pending ideas so ...

--------------------------------------------------------------------------------
libraries | ffi | luajit
--------------------------------------------------------------------------------

We use optional libraries instead of ffi which is not supported because it is
cpu and platform bound and the project that the code was taken from seems to
be orphaned. Also luajit is not supported as that projects is stalled and uses
an old lua.

--------------------------------------------------------------------------------
cmake
--------------------------------------------------------------------------------

We (Mojca and Hans) try to make the build as simple as possible with a minimum
of depencies. There are some differences with respect to unix and windows (we
support msvc, crosscompiled mingw and clang). The code of libraries that we use
is included, apart from optional libraries. It can only get better.

We really try to make all compilers happy and minimize the number of messages,
even if that makes the code a bit less nice. It's a bit unfortunate that over
time the demands and default change a bit (what was needed before triggers a
warning later).

--------------------------------------------------------------------------------
experiments
--------------------------------------------------------------------------------

I've done quite some experiments but those that in the end didn't make sense, or
complicated the code, or where nice but not that useful after all were simply
deleted so that no traces are left that can clutter the codebase. I'll probably
for get (and for sure already have forgotten) about most of them so maybe some
day they will show up as (different) experiments. We'll see how that goes.

-- miniz    : smaller pdf files, less code, similar performance
-- mimalloc : especially faster for the lua subsystem

--------------------------------------------------------------------------------
performance
--------------------------------------------------------------------------------

By now the codebase is different from the LuaTeX one and as a consequence the
performance can also differ. But it's hard to measure in ConTeXt because much
more has to be done in Lua and that comes at a price. The native LuaTeX backend
is for instance much faster (last time meausred the penalty can be up to 20%).
On the Internet one can run into complaints about performance of LuaTeX with
other macro packages, so one might wonder why we made this move but speed is
not everything. On the average ConTeXt has not become less efficient, or
at least I don't see its users complain much about it, so we just moved on.

The memory footprint at the engine end is somewhat smaller but of course that
gets compensated by memory consumption at the Lua end. We also sacrifice the
significate gain of the faster LuaJIT virtual machine (although at some point
supporting that variant makes not much sense any more as it lacks some Lua
features). Because, contrary to other TeX's the Lua(Meta)TeX frontend code
is split up in separate units, compilers can probably do less optimization,
although we use large compilations units that are mostly independent of each
other.

Eventually, in a next stage, I might be able to compentate it but don't expect
miracles: I already explored all kind of variations. Buying a faster machine is
always an option. Multiple cores don't help, faster memory and caching of files
does. Already early in the LuaTeX development we found that a CPU cache matters
but (definitely on server with many a virtual machines) there LuaMetaTeX has to
compete.

So, at this point my objective is not so much to make LuaMetaTeX run faster but
more to make sure that it keeps the same performance, even if more functionality
gets added to the TeX, MetaPost and/or Lua parts. Also keep in mind that in the
end inefficient macros and styles play a bigger role that the already pretty
fast engine.

--------------------------------------------------------------------------------
rapid development cycle
--------------------------------------------------------------------------------

Because I don't want to divert too much (and fast) from the way traditional TeX
is coded, the transition is a stepwise process. This also means that much code
that first has been abstracted and cleaned up, later goes. The extra work that
is involved, combined with a fast test cycle with the help of ConTeXt users
ensures that we keep a working ConTeXt although there occasionally are periods
with issues, especially when fundamentals change or are extended. However, the
number of temporary bugs is small compared to the number of changes and
extensions and worth the risk. The alternative is to have long periods where we
don't update the engine, but that makes testing the related changes in ConTeXt
rather cumbersome. After all, the engine targets at ConTeXt. But of course it is
kind of a pity that no one sees what steps were used to get there.

--------------------------------------------------------------------------------
api
--------------------------------------------------------------------------------

Although some symbols can be visible due to the fact that we maek them extern as
past of a code splitup, there is no api at all. Don't expect the names of the
functions and variables that this applies to to remain the same. Blame yourself
for abusing this partial exposure. The abstraction is in the \LUA\ interface and
when possible that one stays the same. Adding more and more access (callbacks)
won't happen because it has an impact on performance.

Because we want to stay close to original TeX in many aspects, the names of
functions try to match those in ttp. However, because we're now in pure C, we
have more functions (and less macros). The compiler will inline many of them,
but plenty will show up in the symbols table, when exposed. For that reason we
prefix all functions in categories so that they at least show up in groups. It
is also the reason why in for instance the optional modules code we collect all
visible locals in structs. It's all a stepwise process.

The split in tex* modules is mostly for convenience. The original program is
monolithic (you can get an idea when you look at mp.c) so in a sense they should
all be seen as a whole. As a consequence we have tex_run_* as externals as well
as locals. It's just an on-purpose side effect, not a matter of inconsistency:
there is no tex api.

--------------------------------------------------------------------------------
todo (ongoing)
--------------------------------------------------------------------------------

-  All errors and warnings (lua|tex|fatal) have to be checked; what is critital
   and what not.
-  I need to figure out why filetime differs between msvc and mingw (daylight
   correction probably).
-  Nested runtime measurement is currently not working on unix (but it works ok
   on microsoft windows).
-  I will check the manual for obsolete, removed and added functionality. This
   is an ongoing effort.
-  Eventually I might do some more cleanup of the mp*.w code. For now we keep
   w files, but who knows ...
-  A bit more reshuffling of functions to functional units is possible but that
   happens stepwise as it's easy to introduce bug(let)s. I will occasionally go
   over all code.
-  I might turn some more macros into functions (needs some reshuffling too)
   because it's nicer wrt tracing issues. When we started with LuaTeX macros
   made more sense but compilers got better. In the meantime whole program
   optimization works okay, but we cannot do that when one also wants to load
   modules.
-  A side track of the lack of stripping (see previous note) is that we need to
   namespace locals more agressive ... most is done.
-  We can clean up the dependency chain i.e. header files and such but this is
   a long term activity. It's also not that important.
-  Maybe nodememoryword vs tokenmemoryword so that the compiler can warn for a
   mixup.
-  Remove some more (also cosmetic) side effects of mp library conversion.
-  Replace some more of the print* chains by the more compact print_format call
   (no hurry with that one).
-  The naming between modules (token, tex, node) of functions is (historically)
   a bit inconsistent (getfoo, get_foo etc) so I might make that better. It does
   have some impact on compatibility but one can alias (we can provide a file).
-  Some more interface related code might get abstracted (much already done).
-  I don't mention other (either or not already rejected) ideas and experiments
   here (like pushing/popping pagebuilder states which is messy and also demands
   too much from the macro package end.)
-  Stepwise I'll make the complete split of command codes (chr) and subtypes.
   This is mostly done but there are some leftovers. It also means that we no
   longer are completely in sync with the internal original \TEX\ naming but I'll
   try to remain close.
-  The glyph and math scale features do not yet check for overflow of maxdimen
   but I'll add some more checks and/or impose some limitations on the scale
   values. We have to keep in mind that TeX itself also hapilly accepts some
   wrap around because it doesn't really crash the engine; it just can have side
   effects.

--------------------------------------------------------------------------------
todo (second phase)
--------------------------------------------------------------------------------

Ideally we'd like to see more local variables (like some cur_val and such) but
it's kind of tricky because these globals are all over the place and sometimes
get saved and restored (so that needs careful checking), and sometimes such a
variable is expected to be set in a nested call. It also spoils the (still
mostly original) documentation. So, some will happen, some won't. I actually
tested some rather drastic localization and even with tripple checking there
were side effects, so I reverted that. (We probably end up with a mix that
shows the intention.)

Anyway, there are (and will be) some changes (return values instead of accessing
global) that give a bit less code on the one hand (and therefore look somewhat
cleaner) but are not always more efficient. It's all a matter of taste.

I'm on and off looking at the files and their internal documentation and in the
process rename some variables, do some extra checking, and remove unused code.
This is a bit random activity that I started doing pending the first official
release.

Now that the math engine has been partly redone the question is: should we keep
the font related control options? They might go away at some point and even
support for traditional eight bit fonts might be dropped. We'll see about that.

That is: we saw about it. End 2021 and beginning of 2022 Mikael Sundqvist and I
spent quite a few months on playing around with new features: more classes, inter
atom spacing, inter atom penalties, atom rules, a few more FontParameters, a bit
more control on top of what we already had, etc. In the end some of the control
already present became standardized in a way that now prefers OpenType fonts.
Persistent issues with fonts are now dealt with on a per font basis in ConteXt
using existing as well as new tweaking features. We started talking micro math
typography. Old fonts are still supported but one has to configure the engine
with respecty to the used technology. Another side effect is that we now store
math character specifications in nodes instead of a number.

It makes sense to simplify delimiters (just make them a mathchar) and get rid of 
the large family and char. These next in size and extensibles are to be related
anyway so one can always make a (runtime) virtual font. The main problem is that 
we then need to refactor some tex (format) code too becuase we no longer have 
delimiters there too.

--------------------------------------------------------------------------------
dependencies
--------------------------------------------------------------------------------

There are no depencies on code outside this tree and we keep it that way. If you
follow the TeXLive (LuaTeX) source update you'll notice that there are quite
often updates of libraries and sometimes they give (initial) issues when being
compiled, also because there can be further dependencies on compilers as well as
libraries specific to a (version of) an operating system. This is not something
that users should be bothered with.

Optional libraries are really optional and although an API can change we will
not include related code in the formal LuaMetaTeX code base. We might offer some
in the build farm (for building libraries) but that is not a formal dependency.
We will of course adapt code to changes in API's but also never provide more
than a minimal interface: use Lua when more is needed.

We keep in sync with Lua development, also because we consider LuaMetaTeX to be
a nice test case. We never really have issues with Lua anyway. Maybe at some
point I will replace the socket related code. The mimalloc libraries used gives
a performance boost but we could do without. The build cerf library might be
replaced by an optional but it also depends on the complex datatype being more
mature: there is now a fundamental difference between compilers so we have a
patched version; the code doesn't change anyway, so maybe it can be stripped.

In practice there have been hardly any updates to the libraries that we do use:
most changes are in auxiliary programs and make files anyway. When there is an
update (most are on github) this is what happens:

-- check out code
-- compare used subset (like /src) with working copy
-- merge with working copy if it makes sense (otherwise delay)
-- test for a while (local compilation etc.)
-- compare used subset again, this time with local repository
-- merge with local repository
-- push update to the build farm

So, each change is checked twice which in practice doesn't take much time but
gives a good idea of the kind of changes. So far we never had to roll back.

We still use CWEB formatting for MetaPost which then involves a conversion to C
code but the C code is included. This removes a depedency on the WEB toolchain.
The Lua based converter that is part of this source tree works quite well for
our purpose (and also gives nicer code).

We don't do any architecture (CPU) or operating system specific optimizations,
simply because there is no real gain for LuaMetaTeX. It would only introduce
issues, a more complex build, dependencies on assembly generators, etc. which
is a no-go.

--------------------------------------------------------------------------------
on the agenda
--------------------------------------------------------------------------------

I will stepwise adapt some code to C23, like the native boolean types as well as 
decimal64 instead of the decimal library. This can only happen in detail when 
all compilers that we support provide these features.  This is mostly a "when I
see it cq. when I am in the mood I will do it" kind of activity. 

--------------------------------------------------------------------------------
team / responsibilities
--------------------------------------------------------------------------------

The LuaTeX code base is part of the ConTeXt code base. That way we can guarantee
its working with the ConTeXt macro package and also experiment as much as we
like without harming this package. The ConTeXt code is maintained by Hans Hagen
and Wolfgang Schuster with of course help and input from others (those who are
on the mailing list will have no problem identifying who). Because we see the
LuaMetaTeX code as part of that effort, starting with its more or less official
release (version 2.05, early 2020), Hans and Wolfgang will be responsible for
the code (knowing that we can always fall back on Taco) and explore further
possibilities. Mojca Miklavec handles the compile farm, coordinates the
distributions, deals with integration in TeXLive, etc. Alan Braslau is the first
line tester so that in an early stage we can identify issues with for TeX,
MetaPost, Lua and compilation on the different platforms that users have.

Math, one of the core building blocks of the TeX engine has been significantly 
upgraded (starting 2021+). Mikael Sundqvist and I spent a lot of time on this 
project which also involves (tweaking) fonts. We also have a MetaPost agenda 
(224+) and diverge in other fundamental improvements (2022+), like the extended 
par builder. 

If you run into problems with LuaMetaTeX, the ConTeXt mailing list is the place
to go to: ntg-context@ntg.nl. Of course you can also communicate LuaTeX problems
there, especially when you suspect that both engines share it, but for specific
LuaTeX issues there is dev-luatex@ntg.nl where the LuaTeX team can help you
further.

This (mid 2018 - begin 2020) is the first stage of the development. Before we
move on, we (read: users) will first test the current implementation more
extensively over a longer period of time, something that is really needed because
there are lots of accumulated changes, and I would not be surprised if subtle
issues have been introduced. In the meantime we will discuss how to follow up.

The version in the distribution is always tested with the ConteXt test suite,
which hopefully uncovers issues before users notice.

Stay tuned!
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
ConTeXt websites : http://contextgarden.net http://www.pragma-ade.nl
Development list : dev-context@ntg.nl
Support list     : context@ntg.nl
User groups      : http://ntg.nl http://tug.org etc
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Hans Hagen       : j.hagen@xs4all.nl
--------------------------------------------------------------------------------