1
2
3\startcomponent ontargetgreen
4
5\environment ontargetstyle
6
7\startchapter[title={Running green}]
8
9There are a few contradicting developments going on: energy prices skyrocket
10and Intel and AMD are competing for the fastest \CPUs where saving energy seems
11mostly related to making sure that the many cores running at the same time dont
12burn the machine. However, \TEX\ is a single core consumer so throwing lots of
13cores into the game is not helping much. Youre better served with one very fast
14core than many slower ones that accumulate to much horsepower. The later makes
15sense when you process video or play games, but thats not what \TEX\ is about,
16although it is fun to play with. Of course often multiple cores come in handy,
17for instance in the build farm that is used to compile \LUAMETATEX\ and
18intermediate \TEXLIVE\ releases: when that gets compiled and we also trigger a
19\LUAMETATEX\ build, two times 10 \LINUX\ virtual machines are compiling and one
20windows machine that runs four compile jobs at the same time.
21
22The server that runs the farm is Dell 710 server with dual 5630 Xeon processors,
236 SAS drives each 2GB in (hardware) raid 10, 72 GB memory, redundant power
24supplies and 6 network ports. It sits idle for most of the time and consumes
25between 250 and 400W. It is part of a redundant setup: dual switches, dual
26routers, multiple UPSs, air conditioning, two backup QNAP NASs, a few low power
27machines for distributed continuous incremental backups, etc. The server itself
28is a refurbished one, so not the most expensive, but with the Dutch energy prices
29of 2022 bound to gas prices, we quickly realized that there was no way we could
30keep it up and running. Because we have three such servers (one is turned off and
31used as fallback) we started wondering if we could go for a different solution.
32
33As we recently upgraded the 2013 laptops to refurbished 2018 ones (the latest
34models that could use the docking stations that we have), we decided to buy a few
35more and test these as replacements for the servers. Of course one has to pimp
36these machines a bit: a professional 2TB nvme SSD plus a proper 2.5in SSD as
37backup one, 64 GB of memory, a few extra USB3 network cards. The \CPUs are fast
38mobile Xeons. We use proxmox as virtual host and that runs fine in such a
39configuration.
40
41Surprisingly, after moving the farm to that setup, which basically boils down to
42moving virtual machines, we found that running those parallel compilations
43performance wise was quite okay. And the nice thing was that these machines idle
44much lower, some 2030W. The saving is therefore quite noticeable and we decided
45to check some more; after all it would be nice if we could bring down the average
46power consumption of 1750W down to at least half so that it would match the
47output of a few solar panels. Of course it means that one has to ditch perfectly
48well working machines which itself is not that environmental friendly but there
49is not much to choose here.
50
51The second machine to be replaced was the one that runs quite some virtual
52machines too: the main file server, the mail server, an ftp server, the website, an
53rsync host, the squeezebox server that also serves as update test, and various
54project related rendering services. All run in their own (OpenSuse) virtual
55machine. After installing a similar laptop those were also moved.
56
57As a side effect, the two backup NASs were replaced by a single laptop (my 2013
58Dell precision workhorse) running one backup file server, and for an extra
59incremental backup (rsnaphot running hourly, daily, weekly and monthly backups is
60our friend) a 2013 macbook was turned into a \LINUX\ machine (15W idle with an
61internal reused SSD\footnote {For a change that apple machine was easy to update,
62and we could even get a new clone battery replacement.} and an external 4GB
63disk), two managed switches became one (after all we had less network cables due
64to lost redundancy), only one backup power supply (that will be replaced by an
65nicer alternative when it breaks down; after all, by using laptops we get power
66backup for free). The total consumption went down with at least 1000W. Of course
67there is an investment involved and we need to reconfigure the server rack, but
68the expectation is that by investing now we get less troubles later (less
69gambling on energy). \footnote {We hope to save some 9000 kWh which means that
70save at least some 2500\euro\ per year and more when the government will
71reinstate its energy tax policy and or prices go further up, which seems to be
72the case. Even before the crisis in the Netherlands 5ctKwh became fives times
73that amount effectively when connection, transportation, energy tax and value
74added tax gets added.}
75
76But, there is still the pending question of what the impact is on the services
77that we run. The most demanding ones are the Math4all and Math4mbo: these produce
78large files, need many resources (\XML\ and images), and we didnt want to burn
79ourselves too much. Now, here is an interesting observation: this service runs
80twice as fast on the new infrastructure. But it is hard to explain why. The file
81server is on a different machine (so no fast internal network), the \CPU\ is a
82bit faster but not that much, the virtual machine is on \SSD, but files are saved
83on the file server, which is a two disk \USB3 enclosure connected directly to a
84virtual machine that does software raid. The most important difference is that
85main memory is much faster and \TEX\ is a memory intense process. From when we
86started with \LUATEX\ we do know that memory bandwidth and \CPU\ caches makes a
87difference. Maybe the faster floating point handling fo the more modern Xeon also
88helps here.
89
90And that brings me to the following: how do we actually benchmark \TEX ? When you
91go on the internet and compare \CPUs most tests are not that comparable to a
92\TEX\ run on a single core. One can think of a set of test files, but the problem
93there is that when the engine evolves and details in the macro package coding
94changes, one looses the comparison with older tests. This is why, when we do such
95tests, we always run the same test on the different platforms. Although this
96often shows that the gain on newer hardware is seldom what one expects from the
97more general benchmarks, one can still be surprised. When we moved to five year
98newer laptops the gain was some 30\% for me and 50\% for my colleague. The
99difference between his laptop and the slightly more beefed up virtual machine can
100be neglected.
101
102We monitor the power consumption with a youless device connected to the power
103meter. When I process the \LUAMETATEX\ manual I see the phase that the machine
104sits on go up 20W for a run that takes some 9 seconds. Lets say that we use
105180Ws or 0.0006kWh (20.000 runs per kWh). So, compared to the idle power usage of
106a server, a single \TEX\ run can be neglected, simply because it is so fast. So,
107what is actually the most efficient hardware for a \TEX\ service? I get the
108feeling that a decent Intel Atom C3955 16Core driven machine is quite okay for
109that, but I dont have that at hand and last time I checked one could not order
110anything anyway. And with prices of hardware going up its also not something you
111try for fun. As comparison to what we have now, testing \TEX\ on an Intel
112NUC11ATKC2 could also be interesting (it has an N4505 \CPU). There was a time
113when I considered a bunch of raspberry pis but they no longer are that cheap,
114given that you can get them, and adding a case and proper disc enclosure also
115adds up. When wrapped in a nice package the pi will probably a couple of times
116slower but it then probably also uses less power. These fitlets are also
117interesting but again, one cant get them.
118
119It is kind of fun to play with optimizations that dont really impact the clarity
120of the code. One can argue that spending a day on something that saves 0.005
121seconds on a specific run is a waste of time, but of course one has to multiply
122that number by a number of runs. Personally I will never gain from it but
123nevertheless it can save some energy: imagine a batch of 15000 documents every
124day. We then save $15000 * 0.005 * 365 = 27375$ seconds or about 8 hours runtime.
125This can still be neglected but what if this is not the only optimization?
126
127An example of such an optimization is this:
128
129\starttyping
130\advance\somecounter \plusone
131\advance\somecounter by \plusone
132\stoptyping
133
134The second one runs faster because there is no push back involved as side effect of
135the lack of a keyword, so how about adding this to the engine?
136
137\starttyping
138\advanceby \somecounter \plusone
139\advancebyone\somecounter
140\stoptyping
141
142Given the way \LUAMETATEX\ is coded, it only needs a few lines! In this case it
143extends the repertoire of primitives so it is visible but we have many other
144(similarly small) optimizations that contribute. Again, the average user will not
145notice a drop in runtime from 1.5 seconds to 1.45 but when 8 hours become 80
146hours or 800 hours it does become interesting. In energy sensitive 2022 these 800
147hours not only save some \texteuro 400 but also contribute to a lower carbon
148footprint! And now imagine how much could be saved on these extensive runs when
149we make sure that the style used is optimal? Of course, when we need two runs per
150document it starts adding up more.
151
152Some experiments with a demanding file showed one percent gain (on a 2.7 seconds
153run) using the alternative integers, dimensions and advance primitives. However,
154using \CONTEXTs compact font mode brought down runtime to 2.0 seconds! So, in
155the end its all very relative. It is worth noticing that the .7 seconds saved on
156fonts is sort of constant, which means that accumulated gains elsewhere makes
157that .7 seconds more significant as we progress.
158
159\stopchapter
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203\stopcomponent
204
205 |