ontarget-green.tex /size: 10 Kb    last modification: 2024-01-16 10:21
1% language=us runpath=texruns:manuals/ontarget
2
3\startcomponent ontarget-green
4
5\environment ontarget-style
6
7\startchapter[title={Running green}]
8
9There are a few contradicting developments going on: energy prices sky|-|rocket
10and Intel and AMD are competing for the fastest \CPU's where saving energy seems
11mostly related to making sure that the many cores running at the same time don't
12burn the machine. However, \TEX\ is a single core consumer so throwing lots of
13cores into the game is not helping much. You're better served with one very fast
14core than many slower ones that accumulate to much horsepower. The later makes
15sense when you process video or play games, but that's not what \TEX\ is about,
16although it is fun to play with. Of course often multiple cores come in handy,
17for instance in the build farm that is used to compile \LUAMETATEX\ and
18intermediate \TEXLIVE\ releases: when that gets compiled and we also trigger a
19\LUAMETATEX\ build, two times 10 \LINUX\ virtual machines are compiling and one
20windows machine that runs four compile jobs at the same time.
21
22The server that runs the farm is Dell 710 server with dual 5630 Xeon processors,
236 SAS drives each 2GB in (hardware) raid 10, 72 GB memory, redundant power
24supplies and 6 network ports. It sits idle for most of the time and consumes
25between 250 and 400W. It is part of a redundant setup: dual switches, dual
26routers, multiple UPS's, air conditioning, two backup QNAP NAS's, a few low power
27machines for distributed continuous incremental backups, etc. The server itself
28is a refurbished one, so not the most expensive, but with the Dutch energy prices
29of 2022 bound to gas prices, we quickly realized that there was no way we could
30keep it up and running. Because we have three such servers (one is turned off and
31used as fallback) we started wondering if we could go for a different solution.
32
33As we recently upgraded the 2013 laptops to refurbished 2018 ones (the latest
34models that could use the docking stations that we have), we decided to buy a few
35more and test these as replacements for the servers. Of course one has to pimp
36these machines a bit: a professional 2TB nvme SSD plus a proper 2.5in SSD as
37backup one, 64 GB of memory, a few extra USB3 network cards. The \CPU's are fast
38mobile Xeons. We use proxmox as virtual host and that runs fine in such a
39configuration.
40
41Surprisingly, after moving the farm to that setup, which basically boils down to
42moving virtual machines, we found that running those parallel compilations
43performance wise was quite okay. And the nice thing was that these machines idle
44much lower, some 20--30W. The saving is therefore quite noticeable and we decided
45to check some more; after all it would be nice if we could bring down the average
46power consumption of 1750W down to at least half so that it would match the
47output of a few solar panels. Of course it means that one has to ditch perfectly
48well working machines which itself is not that environmental friendly but there
49is not much to choose here.
50
51The second machine to be replaced was the one that runs quite some virtual
52machines too: the main file server, the mail server, an ftp server, the website, an
53rsync host, the squeezebox server that also serves as update test, and various
54project related rendering services. All run in their own (OpenSuse) virtual
55machine. After installing a similar laptop those were also moved.
56
57As a side effect, the two backup NAS's were replaced by a single laptop (my 2013
58Dell precision workhorse) running one backup file server, and for an extra
59incremental backup (rsnaphot running hourly, daily, weekly and monthly backups is
60our friend) a 2013 macbook was turned into a \LINUX\ machine (15W idle with an
61internal reused SSD\footnote {For a change that apple machine was easy to update,
62and we could even get a new clone battery replacement.} and an external 4GB
63disk), two managed switches became one (after all we had less network cables due
64to lost redundancy), only one backup power supply (that will be replaced by an
65nicer alternative when it breaks down; after all, by using laptops we get power
66backup for free). The total consumption went down with at least 1000W. Of course
67there is an investment involved and we need to reconfigure the server rack, but
68the expectation is that by investing now we get less troubles later (less
69gambling on energy). \footnote {We hope to save some 9000 kWh which means that
70save at least some 2500\euro\ per year and more when the government will
71reinstate its energy tax policy and or prices go further up, which seems to be
72the case. Even before the crisis in the Netherlands 5ct/Kwh became fives times
73that amount effectively when connection, transportation, energy tax and value
74added tax gets added.}
75
76But, there is still the pending question of what the impact is on the services
77that we run. The most demanding ones are the Math4all and Math4mbo: these produce
78large files, need many resources (\XML\ and images), and we didn't want to burn
79ourselves too much. Now, here is an interesting observation: this service runs
80twice as fast on the new infrastructure. But it is hard to explain why. The file
81server is on a different machine (so no fast internal network), the \CPU\ is a
82bit faster but not that much, the virtual machine is on \SSD, but files are saved
83on the file server, which is a two disk \USB3 enclosure connected directly to a
84virtual machine that does software raid. The most important difference is that
85main memory is much faster and \TEX\ is a memory intense process. From when we
86started with \LUATEX\ we do know that memory bandwidth and \CPU\ caches makes a
87difference. Maybe the faster floating point handling fo the more modern Xeon also
88helps here.
89
90And that brings me to the following: how do we actually benchmark \TEX ? When you
91go on the internet and compare \CPU's most tests are not that comparable to a
92\TEX\ run on a single core. One can think of a set of test files, but the problem
93there is that when the engine evolves and details in the macro package coding
94changes, one looses the comparison with older tests. This is why, when we do such
95tests, we always run the same test on the different platforms. Although this
96often shows that the gain on newer hardware is seldom what one expects from the
97more general benchmarks, one can still be surprised. When we moved to five year
98newer laptops the gain was some 30\% for me and 50\% for my colleague. The
99difference between his laptop and the slightly more beefed up virtual machine can
100be neglected.
101
102We monitor the power consumption with a youless device connected to the power
103meter. When I process the \LUAMETATEX\ manual I see the phase that the machine
104sits on go up 20W for a run that takes some 9 seconds. Let's say that we use
105180Ws or 0.0006kWh (20.000 runs per kWh). So, compared to the idle power usage of
106a server, a single \TEX\ run can be neglected, simply because it is so fast. So,
107what is actually the most efficient hardware for a \TEX\ service? I get the
108feeling that a decent Intel Atom C3955 16-Core driven machine is quite okay for
109that, but I don't have that at hand and last time I checked one could not order
110anything anyway. And with prices of hardware going up it's also not something you
111try for fun. As comparison to what we have now, testing \TEX\ on an Intel
112NUC11ATKC2 could also be interesting (it has an N4505 \CPU). There was a time
113when I considered a bunch of raspberry pi's but they no longer are that cheap,
114given that you can get them, and adding a case and proper disc enclosure also
115adds up. When wrapped in a nice package the pi will probably a couple of times
116slower but it then probably also uses less power. These fitlets are also
117interesting but again, one can't get them.
118
119It is kind of fun to play with optimizations that don't really impact the clarity
120of the code. One can argue that spending a day on something that saves 0.005
121seconds on a specific run is a waste of time, but of course one has to multiply
122that number by a number of runs. Personally I will never gain from it but
123nevertheless it can save some energy: imagine a batch of 15000 documents every
124day. We then save $15000 * 0.005 * 365 = 27375$ seconds or about 8 hours runtime.
125This can still be neglected but what if this is not the only optimization?
126
127An example of such an optimization is this:
128
129\starttyping
130\advance\somecounter    \plusone
131\advance\somecounter by \plusone
132\stoptyping
133
134The second one runs faster because there is no push back involved as side effect of
135the lack of a keyword, so how about adding this to the engine?
136
137\starttyping
138\advanceby   \somecounter \plusone
139\advancebyone\somecounter
140\stoptyping
141
142Given the way \LUAMETATEX\ is coded, it only needs a few lines! In this case it
143extends the repertoire of primitives so it is visible but we have many other
144(similarly small) optimizations that contribute. Again, the average user will not
145notice a drop in runtime from 1.5 seconds to 1.45 but when 8 hours become 80
146hours or 800 hours it does become interesting. In energy sensitive 2022 these 800
147hours not only save some \texteuro 400 but also contribute to a lower carbon
148footprint! And now imagine how much could be saved on these extensive runs when
149we make sure that the style used is optimal? Of course, when we need two runs per
150document it starts adding up more.
151
152Some experiments with a demanding file showed one percent gain (on a 2.7 seconds
153run) using the alternative integers, dimensions and advance primitives. However,
154using \CONTEXT's compact font mode brought down runtime to 2.0 seconds! So, in
155the end it's all very relative. It is worth noticing that the .7 seconds saved on
156fonts is sort of constant, which means that accumulated gains elsewhere makes
157that .7 seconds more significant as we progress.
158
159\stopchapter
160
161% 4 * 7520 precision with Xeons
162% 1 * 7600 precision with extreme i7
163% 1 * 2200W dell ups
164% 2 * 4K monitor (+ 3 monitors turned off)
165% 1 * imac server room xubuntu (turned off)
166% 2 * pfsense router (6 port 8 core atom appliances)
167% 1 * dell 48 port switch (1 turned off, 1 24 port switch reserve)
168% 3 * 720 dell server (turned off)
169% 2 * dell 16 port switch
170% 2 * dell 8 port switch (1 turned off)
171% 1 * raspberry pi farm
172% 2 * office printer
173% 4 * standby automatic lights
174% 2 * tv + cable box (+ 1 turned off) (on ups)
175% 2 * hue hub
176% 1 * evohome + pump heating system
177% 1 * airconditioner (idle blow, 28 degrees threshold)
178% 1 * fitlet (serves hue and heating)
179% 1 * fritzbox (7590) + 3 repeaters
180% 1 * cable modem
181% 3 * small UPS
182% 3 * distributed backup (old macbook, hp laptop, hp micro server)
183% - - some standby things (squeeze boxes etc)
184% 32  hue light bulbs
185% 6 * cordless phones
186% 1 * alarm panel
187% 2 * warm water boiler (standby + upping)
188% 1 * coffee machine (standby + upping)
189% 1 * freezer (standby + upping)
190% 1 * refrigerator (standby + upping)
191%
192% 800-1000W (from 1750-2000)
193%
194% (upcoming new monitors will save 100W)
195%
196% aim: 750 during the day, 500 after midnight
197%
198% 4 solar panels on shed
199
200% not mentioned: washing machine, dryer, dish washer, several audio sets,
201% battery loaders (ebike etc),
202
203\stopcomponent
204
205