Posts on bramp.net

3D Printing a Lightsaber

Sat, 02 Apr 2022 20:03:52 -0700

I came across these cool 3D printable Light Sabers by 3dprintingworld, but I couldn’t get the blade to print well. So here is a write up of my experience, and the modifications I made.

Lightsaber

Lightsaber with blade

Lightsaber in parts

Hilt

The hilt printed well, I used Prusa Silver PLA, and nothing special needed to be done. There are many to pick from online:

Darth Vadar

thingiverse | thangs.com

Return of the Jedi

thingiverse | thangs.com

Leia's

thangs.com

There are slightly different versions of these files on thingiverse and thangs.com. The thingiverse versions seem to be the easiest to work with.

Blade

Your browser does not support the video tag.

The Collapsing Blade is where I had problems. These are sets of concentric telescoping tubes that taper inwards allowing them to fit within each other but not slide out all the way. 3dprintingworld offered two techniques, print-in-place, and vase printing. The former would print the multiple tubes at the same time layer by layer. Whereas vase prints each tube individually as one continuous motion from top to bottom. The vase technique produced nicer looking blades, that were thinner yet strong. However, I couldn’t get the provided vase mode models to work, so I made my own in Fusion 360 (STL and Fusion files available here TODO).

The end result is ~110cm long, with five separate tubes. These are printed with 0.65mm extrusion width, and no top or bottom layers. I printed them all with the wider end of the tube as the base, but for the thinest one I found printing that upside down made for a cleaner end. I used Overture Purple PETG, which gave a very nice Samuel L. Jackson lightsaber.

Vase mode settings for the blade

Extrusion width settings for the blade

Blade Cover

The blade fitted well, but I wanted to stop it falling out so I printed a snug fitting cover. Again this was printed in vase mode, but with a 0.55mm extrusion width, and 17 solid bottom layers (to fill up to the thin tube part). Again the STL and Fusion files available.

Cover Model

Printed Cover

Extrusion width settings for the cap

Finished

Hilt - Silver PLA - Normal Settings
- LIGHTSABER-CAP.stl and LIGHTSABER-HILT.stl.
Blade Cover - Silver PLA - Vase Mode
- LightSaber_Cap_v12.stl
Blade - Purple PETG - Vase Mode
- LightSaber_Blade_v4_1.stl - LightSaber_Blade_v4_5.stl }

Your browser does not support the video tag.

Lightsaber in action

Compress and Backup

Sun, 12 Sep 2021 13:45:51 -0700

In my last article I discussed recovering a old RAID-5 disk array. Here I’m going to quickly list what I did to back up what I recovered.

# Create a zstd compressed tar file
$ tar -c -v -I"zstd -19 -T0" -f raid5-my-projects.tar.zstd My\ Projects

# Create a text based index for the tar
$ tar -t -f raid5-my-projects.tar.zstd > raid5-my-projects.index

# Backup to Google Cloud
$ gsutil cp raid5* gs://backup.bramp.net/

Maybe I should be using a proper backup solution, but this was quick and easy. I used Zstandard to compress the tar file since it gives impressive compression results, speed, and is modern.

I uploaded the results to a Archive bucket on Google’s Cloud Storage.

Recovering a RAID-5 Intel Storage Matrix on Linux (without the hardware)

Sun, 12 Sep 2021 13:09:07 -0700

I recently found hard drives from an old RAID array I stopped using over a decade ago. I wanted to recover the data from these disks, and that turned out to be more challenging than expected. This post outlines the steps, and hopefully helps someone else in future.

This was a four 750GB disk RAID-5 array using Intel Storage Matrix “fake-raid” (now called Intel Rapid Storage Technology). This is a RAID solution that uses a mix of software and hardware. I no longer have this Intel hardware, and in fact I no longer have a machine that would accept four drives. Luckily mdadm seems to have a pure software implementation of Intel Storage Matrix, so I hatched a plan. I would:

Create disk images for each of the four drives,
Mount the images locally as block devices,
Use mdadm to construct an array,
Copy the data into my backups.

1. Create disk images

I have a USB SATA adapter, and connected one drive at a time to my PC. This computer has a single local 12 TB drive, which I would store the disk images to. I start to create the disk images using:

$ sudo dd if=/dev/sdc of=1.raw

This worked great for the first disk, but the 2nd disk fail around the 600GB point. It seems this drive has developed bad blocks, but I kept my fingers crossed that this was still recoverable since this was RAID-5 after all. I switched up to using ddrescue.

$ sudo ddrescue /dev/sdc 2.raw 2.log --try-again --force --verbose

This worked great, and was able to create a full 750GB image, slowly retiring the failed blocks, recovering as much as possible. After about a week of copying I had four disk images, 1.raw, 2.raw, 3.raw, 4.raw, with only the 2nd disk having problems.

I now, chmod -w *.raw to remove write permissions to the images, helping to prevent a future step accidently altered the images.

2. Mounting the images

To mount the images I use losetup (roughly following instructions here), specifically:

$ sudo losetup -r /dev/loop31 1.raw
$ sudo losetup -r /dev/loop32 2.raw
$ sudo losetup -r /dev/loop33 3.raw
$ sudo losetup -r /dev/loop34 4.raw

Later I would use sudo losetup -d /dev/loop3[1234] to unmount these images. I then decided to inspect these drives, to see what partitions were on them:

$ sudo fdisk -l /dev/loop31

Disk /dev/loop31: 698.65 GiB, 750156374016 bytes, 1465149168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xd204616a

Device        Boot Start        End    Sectors Size Id Type
/dev/loop31p1          1 4294967295 4294967295   2T ee GPT

$ sudo fdisk -l /dev/loop32

Disk /dev/loop32: 698.65 GiB, 750156374016 bytes, 1465149168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

$ sudo fdisk -l /dev/loop33

Disk /dev/loop33: 698.65 GiB, 750156374016 bytes, 1465149168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x899c1289

Device        Boot      Start        End    Sectors   Size Id Type
/dev/loop33p1        33488921 4294836216 4261347296     2T ee GPT
/dev/loop33p2        35651584   35651584          0     0B  0 Empty
/dev/loop33p3               0    1377535    1377536 672.6M 12 Compaq diagnostics
/dev/loop33p4      3071040408 3104693987   33653580    16G 64 Novell Netware 286

Disk 1 had a single partition, disk 2 and 4 had no partitions, and the 3rd disk had four! Those partitions looked a little weird, and I wondered for a minute if I mixed up my drives, or reformatted them at some point. I tried to mount them to no success, so I just assumed the RAID added something that looked like a real partition table. So I moved onto the next step.

3. Use `mdadm` to construct an array.

This is where it got difficult, due to limitations of mounting local disks, and the Intel Storage Matrix support.

I started by asking mdadm to examine the images (telling it to use imsm):

$ sudo mdadm --examine -e imsm /dev/loop31
mdadm: /dev/loop31 is not attached to Intel(R) RAID controller.
mdadm: Failed to retrieve serial for /dev/loop31
mdadm: Failed to load all information sections on /dev/loop31

Well that’s not a great start. If I understand the error /dev/loop31 is not attached to Intel(R) RAID controller it implies I need to connect my drive (or in this case loopback disk image) via a real RAID controller. Well that defeats my whole plan. After some googling, I found this stackoverflow post pointing out there is a IMSM_NO_PLATFORM=1 environment various I could set. The messaging is not attached to Intel(R) RAID controller was really a warning, and had no actual bearing on the problem.

$ sudo IMSM_NO_PLATFORM=1 mdadm --examine -e imsm \
  /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
mdadm: no recogniseable superblock on /dev/loop34
mdadm: Cannot assemble mbr metadata on /dev/loop33
mdadm: no recogniseable superblock on /dev/loop32
mdadm: Cannot assemble mbr metadata on /dev/loop31

A new set of errors, but they did not look promising. More head scratching, and I hit a bit of a dead end. I now wondered if the drives were corrupt, making the superblocks unreadable. I decided to start to read the source code for mdadm to try and understand the superblock format, and see what was wrong.

It indicated the superblock (the data structure containing information about the array) was two sectors from the end of the disk, starting with the string Intel Raid ISM Cfg Sig. .

Guessing that a sector is 512 bytes long, I did the following:

$ tail -c 1024 3.raw  | hd

00000000  49 6e 74 65 6c 20 52 61  69 64 20 49 53 4d 20 43  |Intel Raid ISM C|
00000010  66 67 20 53 69 67 2e 20  31 2e 33 2e 30 30 00 00  |fg Sig. 1.3.00..|
00000020  cc c0 3d de 48 02 00 00  40 d5 11 d4 09 ae 19 00  |..=.H...@.......|
00000030  f8 11 00 00 10 00 00 a0  04 01 02 00 00 00 00 00  |................|
00000040  40 d5 11 d4 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000d0  00 00 00 00 00 00 00 00  53 31 33 55 4a 31 4b 51  |........S13UJ1KQ|
000000e0  34 30 33 33 33 37 00 00  f0 66 54 57 00 00 01 00  |403337...fTW....|
000000f0  3a 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |:...............|
00000100  00 00 00 00 00 00 00 00  53 31 33 55 4a 44 57 51  |........S13UJDWQ|
00000110  33 34 36 34 35 37 00 00  f0 66 54 57 00 00 02 00  |346457...fTW....|
00000120  3a 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |:...............|
00000130  00 00 00 00 00 00 00 00  53 31 33 55 4a 44 57 51  |........S13UJDWQ|
00000140  33 34 36 36 36 38 00 00  f0 66 54 57 00 00 03 00  |346668...fTW....|
00000150  3a 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |:...............|
00000160  00 00 00 00 00 00 00 00  53 31 33 55 4a 31 4b 51  |........S13UJ1KQ|
00000170  34 30 33 33 32 34 3a 30  00 66 54 57 ff ff ff ff  |403324:0.fTW....|
00000180  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000190  00 00 00 00 00 00 00 00  52 41 49 44 00 00 00 00  |........RAID....|
000001a0  00 00 00 00 00 00 00 00  00 f8 fc 05 01 00 00 00  |................|
000001b0  8c 10 00 00 00 00 00 00  00 00 01 00 00 00 00 00  |................|
000001c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001e0  00 00 00 00 00 00 00 00  a6 a8 ae 00 00 00 00 00  |................|
000001f0  00 02 00 ff 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400

Boom, the super block was there, started with a valid header, and even had other fields that looked correct (e.g S13UJ1KQ being a serial number of the drive).

Ok, so now I’m confused about what is wrong, and I wondered if this was a bug in mdadm. Going back I remember the first error I got contained Failed to retrieve serial, and I noticed the serial numbers were in the super block (e.g S13UJ1KQ). It then occurred to me, that once I imaged the hard drives, the images don’t contain the serial numbers!

Inspecting the code some more, it would fail with that error if it was unable to read the drive’s serial number. The loopback device doesn’t support serial numbers, so this started to make sense. I did however found a undocumented environment variable IMSM_DEVNAME_AS_SERIAL, which would instead of reading the serial number from the hardware, just use the name of the device as the serial (e.g /dev/loop31). This feature seems explicitly designed to help testing the mdadm codebase.

$ sudo IMSM_DEVNAME_AS_SERIAL=1 IMSM_NO_PLATFORM=1 mdadm --examine -e imsm /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
…
/dev/loop31:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.3.00
    Orig Family : d411d540
         Family : d411d540
     Generation : 0019ae09
     Attributes : All supported
           UUID : ff44bc31:56060902:afb34379:b0faf183
       Checksum : de3dc0cc correct
    MPB Sectors : 2
          Disks : 4
   RAID Devices : 1

[RAID]:
           UUID : 676c222f:760eaf46:97bd30b8:989d2470
     RAID Level : 5
        Members : 4
          Slots : [UUU_]
    Failed disk : 3
      This Slot : ?
    Sector Size : 512
     Array Size : 4395431936 (2095.91 GiB 2250.46 GB)
   Per Dev Size : 1465144328 (698.64 GiB 750.15 GB)
  Sector Offset : 0
    Num Stripes : 11446438
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : degraded
    Dirty State : clean
     RWH Policy : off

  Disk00 Serial : S13UJ1KQ403337
          State : active
             Id : 00010000
    Usable Size : 1465138766 (698.63 GiB 750.15 GB)

  Disk01 Serial : S13UJDWQ346457
          State : active
             Id : 00020000
    Usable Size : 1465138766 (698.63 GiB 750.15 GB)

  Disk02 Serial : S13UJDWQ346668
          State : active
             Id : 00030000
    Usable Size : 1465138766 (698.63 GiB 750.15 GB)

  Disk03 Serial : S13UJ1KQ403324:0
          State : active
             Id : ffffffff
    Usable Size : 1465138526 (698.63 GiB 750.15 GB)

Ok, slowly making progress! Now it lists all the superblock information, and I was happy to see Checksum : de3dc0cc correct, etc. However, it listed Failed disk : 3, and This Slot : ?. It made me think without the valid serial numbers, it didn’t know which drive was which, and thus couldn’t assemble the array.

This made me ponder that if I was ever going to create a RAID array implementation, I would not make it depend on information from the hardware. How do folks re-image disks? What is wrong with some GUID in the superblock to identify the disk? Ok digression aside.

To move forward, I needed to trick mdadm to think that serial /dev/loop31 was actually the real hardware. I went back to my drives, and visibility inspected them to check the serial numbers.

  Disk00 Serial : S13UJ1KQ403337   1.raw
  Disk01 Serial : S13UJDWQ346457   2.raw
  Disk02 Serial : S13UJDWQ346668   4.raw
  Disk03 Serial : S13UJ1KQ403324   3.raw

At this point, I realised I had accidentally swapped drives 3 and 4. Quickly renaming them got them into the correct order.

Since I had already looked over the mdadm source code, it seemed a simple clean codebase, so I decided to change it to accept serial numbers. After a little while I did the hackiest thing possible:

diff --git a/super-intel.c b/super-intel.c
index da376251..d466d911 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3994,6 +3994,20 @@ static int nvme_get_serial(int fd, void *buf, size_t buf_len)
        if (!name)
                return 1;

+       if (strcmp(name, "loop31") == 0) {
+               strcpy((char *)buf, "S13UJ1KQ403337");
+               return 0;
+       } else if (strcmp(name, "loop32") == 0) {
+               strcpy((char *)buf, "S13UJDWQ346457");
+               return 0;
+       } else if (strcmp(name, "loop33") == 0) {
+               strcpy((char *)buf, "S13UJDWQ346668");
+               return 0;
+       } else if (strcmp(name, "loop34") == 0) {
+               strcpy((char *)buf, "S13UJ1KQ403324");
+               return 0;
+       }
+
        if (strncmp(name, "nvme", 4) != 0)
                return 1;

The nvme_get_serial function now had hard coded serial numbers when reading loop3[1234]. This obviously isn’t a generalised solution, but worked for me. Go open source!.

$ make mdadm
…

$ sudo IMSM_NO_PLATFORM=1 ./mdadm --examine -e imsm /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34

Examine looked good, so the moment of truth, let’s assemble.

$ sudo IMSM_NO_PLATFORM=1 ./mdadm --assemble --readonly -e imsm /dev/md0 /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
mdadm: Container /dev/md0 has been assembled with 3 drives

Ok mixed success, it says 3 drives, but I would expect 4… But let’s keep going

$ sudo ./mdadm --assemble --scan
mdadm: Started /dev/md/RAID_0 with 3 devices

W00t! It Started without errors!

I now have a /dev/md0, /dev/md127 and /dev/md127p1 devices.

$ sudo mount -o ro /dev/md127p1 /mnt/raid5

$ ls /mnt/raid5
… lots of old files...

YAY. Finished!

Ok, I’m not sure why it says three drives not four.

$ sudo ./mdadm --detail /dev/md127
/dev/md127:
         Container : /dev/md0, member 0
        Raid Level : raid5
        Array Size : 2197715968 (2.05 TiB 2.25 TB)
     Used Dev Size : 732572032 (698.64 GiB 750.15 GB)
      Raid Devices : 4
     Total Devices : 3

             State : clean, degraded
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0

            Layout : left-asymmetric
        Chunk Size : 64K

Consistency Policy : resync


              UUID : 676c222f:760eaf46:97bd30b8:989d2470
    Number   Major   Minor   RaidDevice State
       2       7       31        0      active sync   /dev/loop31
       1       7       32        1      active sync   /dev/loop32
       0       7       33        2      active sync   /dev/loop33
       -       0        0        3      removed

This does seem to imply a drive is missing. Maybe it doesn’t matter, as it mounted successfully, and I can copy all my data off the array.

Conclusion

This did not seem the easiest task, and there were a few road bumps along the way. Hopefully the hacks in here will help someone else out in a similar situation.

To finally clean up, you can run this:

$ sudo umount /mnt/raid5
$ sudo mdadm --stop /dev/md127
$ sudo mdadm --stop /dev/md0
$ sudo losetup -d /dev/loop3[1234]

Alternative Milks

Sat, 03 Apr 2021 12:44:51 -0700

I’ve not been getting out as much during Covid, as such I’ve tried to reduce my caloric intake. One way I tried to do this, was to switch the kind of milk I drink. This also had a secondary impact on reducing my environmental impact. However, while researching the various milks, I couldn’t find one source that put both nutritional information, and the environmental impact of the various milks in one place. This article does just that.

During my research, I found there are minor differences in the US and UK in how they represent this data. For example, a UK serving size is 200ml, whereas in the US it’s 1 cup (or ~240ml). I’ve normalised all values to the UK serving size.

Type	Emissions (Kg)	Land Usage (m²)	Water Usage (L)	Variant	Calories (kcals)	Calcium (mg)	Fat (g)	Sat Fat (g)	Sugar (g)	Protein (g)
Dairy (Cow's) Milk	0.630	1.790	125.64	Whole	120	246	6.40	3.72	9.62	6.56	(source)
Dairy (Cow's) Milk	0.630	1.790	125.64	Whole (Lactose free)	120	246	6.40	3.72	9.62	6.56	(source)
Dairy (Cow's) Milk	0.630	1.790	125.64	Reduced fat (2%)	100	252	3.80	2.22	9.78	6.70	(source)
Dairy (Cow's) Milk	0.630	1.790	125.64	Fat free (skim)	68	264	0.16	0.10	10.10	6.86	(source)
Rice Milk	0.236	0.067	53.96	Rice Milk	94	236	1.94	0.00	10.56	0.56	(source)
Soy Milk	0.196	0.132	5.56	Soy Milk	86	246	2.94	0.41	7.30	5.20	(source)
Oat Milk	0.181	0.152	9.65	OATLY! (Brand)	100	292	4.16	0.42	5.84	2.50	(source)
Almond Milk	0.140	0.099	74.29	Sweetened	60	354	1.86	0.15	9.54	0.76	(source)
Almond Milk	0.140	0.099	74.29	Unsweetened	30	368	1.92	0.16	1.62	0.80	(source)
Goat’s Milk	?	?	?	Whole	138	268	8.28	5.33	8.90	7.12	(source)
Coconut Milk	?	?	?	Coconut Milk	62	376	4.16	4.17	5.00	0.42	(source)
Flax Hemp Milk	?	?	?	Flax Hemp Milk	38	24	2.50	0.00	0.84	1.66	(source)
Human Milk	?	?	?	Human Milk	140	64	8.76	0.64	13.78	2.06	(source)
Chocolate Milk	?	?	?	Chocolate Milk	98	200	0.80	0.02	17.36	1.28	(source)
Environment Impact per 200ml (source)				Nutritional Information per 200g (which is approx 200ml)

I’m not going to provide any kind of editorial, I just wanted all the data in one place. The data is sourced from the following locations.

General Comparison:

Environment:

Nutritional:

Local HTTPS Server for development

Sun, 27 Dec 2020 09:14:22 -0800

I regularly do web development with the host localhost. Running a simple HTTP server to service my site. Recently I came across a problem where some of the newer web APIs (such as DeviceMotionEvent) do not work unless the site is served via SSL. So I went about setting up a local SSL server, and certificate.

Many of the instructions out there create a self-signed certificate, that you install to be trusted locally. I wanted my development server to be accessible from other devices on my network, and I didn’t want the hassle of installing this self-signed cert. Instead I wanted a SSL certificate that uses a real/trusted CA.

Enter Let’s Encrypt, a free service to provide SSL certificates, providing you can prove you own the domain. To go about this, I did the following on my macbook:

Install Certbot (to generate the cert)

brew install certbot

There are a few ways to prove you own a domain, the HTTP based ones require a public web server. Since my development server is only on my local network, I’m going to use a DNS based proof. Since I use Cloudflare for my DNS, I’ll be using their plugin.

pip3 install certbot-dns-cloudflare

Setup the domain (local.bramp.net)

I use cloudflare to host the DNS for my domain, so I setup a new domain, local.bramp.net, that points to an internal IP address (192.168.0.123). This domain won’t actually be used via the Internet, but will happily work for any devices on my local network.

Setup DNS record for local.bramp.net

You’ll also need a API key from Cloudflare. They allow you to scope the key to only access this test domain. For example:

Create a API token

That will give you a token, that is a long string of letters and numbers.

Configure Certbot

# Create a place to store your secrets, that only you can access

mkdir ~/.secrets
cat < ~/.secrets/cloudflare.ini
dns_cloudflare_api_token = **your_key**
EOF

chmod 0700 ~/.secrets/
chmod 0400 ~/.secrets/cloudflare.ini

Generate the Certificate

certbot certonly \
  --config-dir ~/.secrets/ \
  --work-dir ~/.secrets/ \
  --logs-dir ~/.secrets/ \
  --dns-cloudflare \
  --dns-cloudflare-credentials ~/.secrets/cloudflare.ini \
  -d local.bramp.net

and voila:

 - Congratulations! Your certificate and chain have been saved at:
   /Users/bramp/.secrets/live/local.bramp.net/fullchain.pem
   Your key file has been saved at:
   /Users/bramp/.secrets/live/local.bramp.net/privkey.pem

The privkey.pem is important to keep secret. Normally certbot runs as root, but here we run it as your user for convenience.

If you want this to automatically renew, just run to add a renewal that occurs twice daily at a random minute after 12pm and 12am.

# List your current crontab, and append certbot renewal

(crontab -l ; echo "$(( RANDOM % 60 )) 0,12 * * * $(which certbot) renew -q --config-dir ~/.secrets/ --work-dir ~/.secrets/ --logs-dir ~/.secrets/") | crontab -

Or you can renew (all certificates) on demand with a simple:

certbot renew \
  --config-dir ~/.secrets/ \
  --work-dir ~/.secrets/ \
  --logs-dir ~/.secrets/

Install a simple HTTPS web server

I use http-server, “a simple, zero-configuration command-line http server.”. It supports many useful features, including SSL.

brew install http-server

Running the HTTPS web server

http-server -S \
  -C ~/.secrets/live/local.bramp.net/fullchain.pem \
  -K ~/.secrets/live/local.bramp.net/privkey.pem

You may wish to alias this to something shorter, for example:

alias https="http-server -S \
  -C ~/.secrets/live/local.bramp.net/fullchain.pem \
  -K ~/.secrets/live/local.bramp.net/privkey.pem"

Now you can run https from any directory and it’ll be served over SSL.

Additional Reading

Apache Beam and Google Dataflow in Go

Sat, 05 Jan 2019 07:59:08 -0800

Originally published as part of the Go Advent 2018 series

Overview

Apache Beam (batch and stream) is a powerful tool for handling embarrassingly parallel workloads. It is a evolution of Google’s Flume, which provides batch and streaming data processing based on the MapReduce concepts. One of the novel features of Beam is that it’s agnostic to the platform that runs the code. For example, a pipeline can be written once, and run locally, across Flink or Spark clusters, or on Google Cloud Dataflow.

An experimental Go SDK was created for Beam, and while it is still immature compared to Beam for Python and Java, it is able to do some impressive things. The remainder of this article will briefly recap a simple example from the Apache Beam site, and then work through a more complex example running on Dataflow. Consider this a more advanced version of the official getted started guide on the Apache Beam site.

Before we begin, it’s worth pointing out, that if you can do your analysis on a single machine, it is more likely faster, and more cost effective. Beam is more suitable when your data processing needs are large enough they must run in a distributed fashion.

Concepts
Shakespeare (simple example)
- Running the pipeline
Art history (more complex example)
Gotchas
Conclusion

Concepts

Beam already has good documentation, that explains all the main concepts. We will cover some of the basics.

Pipeline stages

A pipeline is made up of multiple steps, that takes some input, operates on that data, and finally produces output. The steps that operates on the data are called PTransforms (parallel transforms), and the data is always stored in PCollections (parallel collections). The PTransform takes one item at a time from the PCollection and operates on it. The PTransform are assumed to be hermetic, using no global state, thus ensuring it will always produce the same output for the given input. These properties allow the data to be sharded into multiple smaller dataset and processed in any order across multiple machines. The code you write ends up being very simple, but is able to seamlessly split across 100s of machines.

Shakespeare (simple example)

A classic example is counting the words in Shakespeare. In brief, the pipeline counts the number of times each word appears across Shakespeare’s works, and outputs a simple key-value list of word to word-count. There is an example provided with the Beam SDK, and along with a great walk through. I suggest you read that before continuing. I will however dive into some of the Go specifics, and add additional context.

The example begins with textio.Read, which reads all the files under the shakespeare directory stored on Google Cloud Storage (GCS). The files are stored on GCS, so when this pipeline runs across a cluster of machines, they will all have access. textio.Read always returns a PCollection which contains one element for every line in the given files.

lines := textio.Read(s, "gs://apache-beam-samples/shakespeare/*")

The lines PCollection is then processed by a ParDo (Parallel Do), a type of PTransform. Most transforms are built with a beam.ParDo. It will execute a supplied function in parallel on the source PCollection. In this example, the function is defined inline and very simply splits the input lines into words with a regexp. Each word is then emitted to another PCollection named words. Note how for every line, zero or more words may be emitted, making this new collection a different size to the original.

splitFunc := func(line string, emit func(string)) {
    for _, word := range wordRE.FindAllString(line, -1) {
        emit(word)
    }
}
words := beam.ParDo(s, splitFunc, lines)

An interesting trick used by the Apache Beam Go API is passing functions as an interface{}, and using reflection to infer the types. Specifically, since lines is a PCollection it is expected that the first argument of splitFunc is a string type. The second argument to splitFunc will allow Beam to infer the type of the words output PCollection. In this example it is a function with a single string argument. Thus the output type will be PCollection. If emit was defined as func(int) then the return type would be a PCollection, and the next PTransform would be expected to handle ints.

The next step uses one of the library’s higher level constructs.

counted := stats.Count(s, words)

stats.Count takes a PCollection, counts each unique element, and outputs a key-value pair of (X, int) as a PCollection>. In this specific example, the input is a PCollection, thus the output is PCollection>

Internally stats.Count it’s made up of multiple ParDos, and a beam.GroupByKey, but it hides that to make it easier to use.

At this point, the counts of each word has been calculated, and the results are stored to a simple text file. To do this the PCollection> is converted to a PCollection, containing one element for each line to be written out.

formatFunc := func(w string, c int) string {
    return fmt.Sprintf("%s: %v", w, c)
}
formatted := beam.ParDo(s, formatFunc, counted)

Again a beam.ParDo is used, but you’ll notice the formatFunc is slightly different to the splitFunc above. The formatFunc takes two arguments, a string (the key), and a int (the value). These are the pairs in the PCollection>. However, the formatFunc does not take a emit func(...) instead it simply returns a type string.

Since the PTransform outputs a single line for each input element, a simpler form of the function can be specified. One where the output element is just returned from the function. The emit func(...) is useful when the number of output elements differ to the number of input elements. If its a 1:1 mapping a return makes the function easier to read. As above this is all inferred at runtime with reflection when the pipeline is being constructed..

Multiple return arguments can also be used. For example, if the output was expected to be PCollection>, the return type could be func(...) (float64, bool).

textio.Write(s, "wordcounts.txt", formatted)

Finally textio.Write takes the formatted PCollection and writes it to a file named “wordcounts.txt" with one line per element.

Running the pipeline

To test the pipeline it can easily be run locally like so:

go get github.com/apache/beam/sdks/go/examples/wordcount
cd $GOPATH/src/github.com/apache/beam/sdks/go/examples/wordcount
go run wordcount.go --runner=direct

To run in a more realistic way, it can be run on GCP Dataflow. Before you do so, you need to create a GCP project, create a GCS bucket, enable the Cloud Dataflow APIs, and create a service account. This is documented on the Python quickstart guide, under “Before you begin”.

export GOOGLE_APPLICATION_CREDENTIALS=$PWD/your-gcp-project.json
export BUCKET=your-gcs-bucket
export PROJECT=your-gcp-project

cd $GOPATH/src/github.com/apache/beam/sdks/go/examples/wordcount
go run wordcount.go \
    --runner dataflow \
    --input gs://dataflow-samples/shakespeare/kinglear.txt \
    --output gs://${BUCKET?}/counts \
    --project ${PROJECT?} \
    --temp_location gs://${BUCKET?}/tmp/ \
    --staging_location gs://${BUCKET?}/binaries/ \
    --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515

If this works correctly you’ll see something similar to the following printed:

Cross-compiling .../wordcount.go as .../worker-1-1544590905654809000
Staging worker binary:  .../worker-1-1544590905654809000
Submitted job: 2018-12-11_21_02_29
Console: https://console.cloud.google.com/dataflow/job/2018-12-11...
Logs: https://console.cloud.google.com/logs/viewer?job_id%2F2018-12-11...
Job state: JOB_STATE_PENDING …
Job still running …
Job still running …
...
Job succeeded!

Let’s take a moment to explain what’s going on, starting with the various flags. The --runner dataflow flag tells the Apache Beam SDK to run this on GCP Dataflow, including executing all the steps required to make that happen. This includes, compiling the code and uploading it to the --staging_location. Later the staged binary will be run by Dataflow under the --project project. As this will be running “in the cloud”, the pipeline will not be able to access local files. Thus for both the --input and --output flags are set to paths on GCS, as this is a convenient place to store files. Finally the --worker_harness_container_image flag specifies the docker image that Dataflow will use to host the workcount.go binary that was uploaded to the --staging_location.

Once wordcount.go is running, it prints out helpful information, such as links to the the Dataflow console. The console displays current progress as well as a visualization of the pipeline as a directed graph. The local wordcount.go continues to run only to display status updates. It can be interrupted at any time, but the pipeline will continue to run on Dataflow until it either succeeds or fails. Once that occurs, the logs link can provide useful information.

Art history (more complex example)

Now we’ll construct a more complex pipeline, that demonstrates some other features of Beam and Dataflow. In this pipeline we will be taking 100,000 paintings from the last 600 years and processing them to extract information about their color palettes. Specifically the question we aim to answer is, “Has the color palettes of paintings change over the decades?”. This may not be a pipeline we run repeatedly, but it was a fun example, and demonstrates many advance topics.

We will skip over the details of the color extraction algorithm, and provide that in a later article. Here we’ll focus on how to create a pipeline to accomplish this task.

We start by reading a csv file that contains metadata for each painting, such as the artist, year it was painted, and a GCS path to a jpg of the painting. The paintings will then be grouped by the decade they were painted, and then the color palette for each group will be determined. Each palette will saved to a png file (DrawColorPalette), as well as all the palette saved to a single large json file (WriteIndex). To finish it off, the pipeline will be productionised, so it easier to debug, and re-run. The full source code is available here.

To start with, the main function for the pipeline looks like this:

import (
...
	"github.com/apache/beam/sdks/go/pkg/beam"
...
)

func main() {
	// If beamx or Go flags are used, flags must be parsed first.
	flag.Parse()

	// beam.Init() is an initialization hook that must called on startup. On
	// distributed runners, it is used to intercept control.
	beam.Init()

	p := beam.NewPipeline()
	s := p.Root()

	buildPipeline(s)

	ctx := context.Background()
	if err := beamx.Run(ctx, p); err != nil {
		log.Fatalf(ctx, "Failed to execute job: %v", err)
	}
}

That is the standard boilerplate for a Beam pipeline, it parses the flags, initialises Beam, delegates the pipeline construction to buildPipeline function, and finally runs the pipeline.

The interesting code begins in the buildPipeline function, which constructs the pipeline, by passing PCollections from one function to the next. To build up the tree we see in the above diagram.

func buildPipeline(s beam.Scope) {
	// nothing -> PCollection
	paintings := csvio.Read(s, *index, reflect.TypeOf(Painting{}))

	// PCollection -> PCollection>
	paintingsByGroup := GroupByDecade(s, paintings)

	// PCollection> ->
	//   (PCollection>, PCollection>)
	histograms, errors1 := ExtractHistogram(s, paintingsByGroup)

	// Calculate the color palette for the combined histograms.
	// PCollection> ->
	//   (PCollection>, PCollection>)
	palettes, errors2 := CalculateColorPalette(s, histograms)

	// PCollection> -> PCollection>
	errors3 := DrawColorPalette(s, *outputPrefix, palettes)

	// PCollection> -> nothing
	WriteIndex(s, morebeam.Join(*outputPrefix, "index.json"), palettes)

	// PCollection> -> nothing
	WriteErrorLog(s, "errors.log", errors1, errors2, errors3)
}

To make it easy to follow, each function describes the step, and is annotated with a comment that explains what kind of PCollection is accepted and returned. Let’s highlight some interesting steps.

var (
	index = flag.String("index", "art.csv", "Index of the art.")
)

// Painting represents a single painting in the dataset.
type Painting struct {
	Artist string `csv:"artist"`
	Title  string `csv:"title"`
	Date   string `csv:"date"`
	Genre  string `csv:"genre"`
	Style  string `csv:"style"`

	Filename string `csv:"new_filename"`
...
}

...
func buildPipeline(s beam.Scope) {
	// nothing -> PCollection
	paintings := csvio.Read(s, *index, reflect.TypeOf(Painting{}))
...

The very first step uses csvio.Read to read the CSV file specified by the --index flag, and returns a PCollection of Painting structs. In all the examples we’ve seen before the PCollections only contains basic types, e.g. strings, ints, etc. More complex types, such as a slices and structs are allowed (but not maps and interfaces). This makes it easier to pass rich information between the PTransforms. The only caveat is the type must be JSON-serialisable. This is because in a distributed pipeline, the PTransforms could be processed on different machines, and the PCollection needs to be marshalled to be passed between them.

For Beam to successfully unmarshal your data, the types must also be registered. This is typically done within the init() function, by called beam.RegisterType.

func init() {
	beam.RegisterType(reflect.TypeOf(Painting{}))
}

If you forget to register the type, a error will occur at Runtime, for example:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -224: execute failed: panic: reflect: Call using main.Painting as type struct { Artist string; Title string; ... } goroutine 70 [running]:

This can be a little frustrating, as when running the pipeline locally with the direct runner, it does not marshal your data, so errors like this aren’t exposed until running on Dataflow.

Now we have a collection of Paintings, we group them by decade:

// GroupByDecade takes a PCollection and returns a 
// PCollection> of the paintings group by decade.
func GroupByDecade(s beam.Scope, paintings beam.PCollection) beam.PCollection {
	s = s.Scope("GroupBy Decade")

	// PCollection -> PCollection>
	paintingsWithKey := morebeam.AddKey(s, func(art Painting) string {
		return art.Decade()
	}, paintings)

	// PCollection -> PCollection>
	return beam.GroupByKey(s, paintingsWithKey)
}

The first line in this function, s.Scope("GroupBy Decade") allows us to name this step, and group multiple sub-steps. For example, in the above diagram “GroupBy Decade” is a single step, which can be expanded to show a AddKey and GroupByKey step.

GroupByDecade returns a PCollection>. The CoGBK, is short for Common Group By Key. It is a special collection, where (as you’ll see later) each element is a tuple of a key, and an iterable collection of elements. The key in this case is the decade the painting was painted. The PCollection is transformed into a PCollection> by the morebeam.AddKey step, adding a key to each value. Then the GroupByKey will use that key to produce the final PCollection.

Next up is the ExtractHistogram, which takes the PCollection>, and returns two PCollections. The first PCollection is a PCollection>, which contains a color histogram for every decade of paintings. The second PCollection is related to error handling, and will be explained later.

The ExtractHistogram function demonstrates three new concepts, “Stateful functions”, “Data enrichment”, and “Error handling”.

Stateful functions

var (
	artPrefix = flag.String("art", "gs://mybucket/art", "Path to where the art is kept.")
)

func init() {
	beam.RegisterType(reflect.TypeOf((*extractHistogramFn)(nil)).Elem())
}

type extractHistogramFn struct {
	ArtPrefix string `json:"art_prefix"`

	fs filesystem.Interface
}

// ExtractHistogram calculates the color histograms for all the Paintings in
// the CoGBK.
func ExtractHistogram(s beam.Scope, files beam.PCollection)
		(beam.PCollection, beam.PCollection) {
	s = s.Scope("ExtractHistogram")
	return beam.ParDo2(s, &extractHistogramFn{
		ArtPrefix: *artPrefix,
	}, files)
}

Instead of passing a simple function to beam.ParDo, a struct containing two fields is passed. The exported field, ArtPrefix is the path to where the painting jpgs are stored, and the unexported field, fs, is a filesystem client for reading these jpgs.

When the pipeline runs, no global variables are allowed, including the command line flag variables. For example, when running this pipeline we may start it like so:

go run main.go \
  --art gs://${BUCKET?}/art/ \
  --runner dataflow \
  ...

When the code actually runs on the Dataflow workers, the --art flag is not specified. Thus the *artPrefix value will use the default value. To pass this to the Dataflow workers, it must be part of the DoFn struct that is passed to beam.ParDo. So in this example, we create a extractHistogramFn struct, with the exported ArtPrefix field set to the value of the --art flag. This extractHistogramFn is then marshalled and passed to the workers. As with the unmarshalled PCollection values, the extractHistogramFn must also be registered with beam during init.

When the pipeline executes this step it calls the extractHistogramFn’s ProcessElement method. This method works in a similar way to a simple DoFn functions. The arguments and return value are reflected at runtime and mapped to the PCollections being processed and returned.

Iterating over a CoGBK

func (fn *extractHistogramFn) ProcessElement(
		ctx context.Context,
		key string, values func(*Painting) bool,
		errors func(string, string)) HistogramResult {

	log.Infof(ctx, "%q: ExtractHistogram started", key)
	var art Painting
	for values(&art) {
		filename := morebeam.Join(fn.ArtPrefix, art.Filename)
		h, err := fn.extractHistogram(ctx, key, filename)
		if err != nil {
			…
		}
		
		result.Histogram = result.Histogram.Combine(h)
	}

	return result
}

ProcessElement is called once for every unique group in the PCollection. The key string argument will be the key for that group, and a values func(*Painting) bool is used to iterate all values within the group. The contact is that values is passed a pointer to a Painting struct, which is populated on each iteration. As long as there are more paintings to process in the group the values function returns true. Once it returns false, the group has been fully processed. This iterator pattern is unique to the CoGBK and makes it convient to apply an operation to every element in the group.

In this case, extractHistogram is called for each Painting, fetches a jpg of the artwork, and extract a [histogram of colors]((https://en.wikipedia.org/wiki/Color_histogram). The histograms from all painting in that group are combined, and finally one result is per group is returned.

Data enrichment

Reading the paintings from an external service (such as GCS) demonstrates a data enrichment step. This is where an external service is used to “enrich” the dataset the pipeline is processing. You could imagine a user service being called when processing log entries, or a product service when processing purchases. It should be noted, that any external action should be idempotent. If a worker fails, it is possible the same element is retried, and thus processed multiple times. Dataflow keeps track of failures and ensures the final result only has each element processed once.

When calling a remote service, typically some kind of client is needed to make the request. In this pipeline we read the images from GCS, thus setting up GCS client at startup is useful. Since we are using a struct based DoFn, there are some additional methods that can be defined.

func (fn *extractHistogramFn) Setup(ctx context.Context) error {
	var err error
	fn.fs, err = filesystem.New(ctx, fn.ArtPrefix)
	if err != nil {
		return fmt.Errorf("filesystem.New(%q) failed: %s", fn.ArtPrefix, err)
	}
	return nil
}

func (fn *extractHistogramFn) Teardown() error {
	return fn.fs.Close()
}

When the DoFn is initialized on the worker, the Setup method is called. Here a new Filesystem client is created and store it in the struct’s fs field. Later, when the DoFn is no longer needed, the Teardown method is called, giving us opportunity to cleanup the client. With all things distributed, don’t expect the Teardown to ever be called.

There are also some simple best practices around error handling that should be following when calling an external services.

func (fn *extractHistogramFn) extractHistogram(ctx context.Context,
key, filename string) (palette.Histogram, error) {
	ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
	defer cancel()

	fd, err := fn.fs.OpenRead(ctx, filename)
	if err != nil {
		return nil, fmt.Errorf("fs.OpenRead(%q) failed: %s", filename, err)
	}
	defer fd.Close()

	img, _, err := image.Decode(fd)
	if err != nil {
		return nil, fmt.Errorf("image.Decode(%q) failed: %s", filename, err)
	}

	return palette.NewColorHistogram(img), nil
}

The function begins by using a context.WithTimeout. This ensures that if the external service does not respond in a timely manner the context will be cancelled and a error returned. If this timeout wasn’t set, the external call may never end, and the pipeline never terminates.

Since the pipeline could be running across 100s of machines, it could generate significant load on a remote service. It is wise to implement appropriate backoff and retry logic. In some cases even rate limiting your pipeline’s execution, or tagging your pipeline’s traffic at a lower QoS so it can be easily shed.

The external service, may also return permanent errors. Thus a more robust error handling pattern is needed.

Error handling and dead letters

When Beam processes a PCollection, it bundles up multiple elements and processes one bundle at a time. If the PTransform return an error, panics, or otherwise fails (such as running out of memory), the full bundle is retried. With Dataflow, bundles are retried up to four times, after which the entire pipeline is aborted. This can be inconvenient, so where appropriate instead of returning an error we we use a dead letter queue. This is a new PCollection that collects processing errors. These errors can then be persisted at the end of the pipeline, manually inspected, and processed again later.

return beam.ParDo2(s, &extractHistogramFn{
	ArtPrefix: *artPrefix,
}, files)

A keen observer would have noticed that beam.ParDo2 was used by ExtractHistogram, instead of beam.ParDo. This function works the same, but returns two PCollections. In our case, the first is the normal output, and the second is a PCollection>. This second collection is keyed on the unique identifer of the painting having an issue, and the value is the error message.

Since returning a error is optional, the errors PCollection was passed to extractHistogramFn’s ProcessElement as a errors func(string, string).

Throughout we use this kind of error PCollections from every stage, and at the end of the pipeline they are collected together and output to a single errors log file:

// WriteErrorLog takes multiple PCollection>s combines them
// and writes them to the given filename.
func WriteErrorLog(s beam.Scope, filename string, errors ...beam.PCollection) {
	s = s.Scope(fmt.Sprintf("Write %q", filename))

	c := beam.Flatten(s, errors...)
	c = beam.ParDo(s, func(key, value string) string {
		return fmt.Sprintf("%s,%s", key, value)
	}, c)

	textio.Write(s, morebeam.Join(*outputPrefix, filename), c)
}

Since the output is key, comma, value, the file can easily be re-read to try just the failed keys.

The rest of the pipeline is much of the same, and thus won’t be explained in detail. CalculateColorPalette takes the color histograms and runs a K-Means clustering algorithm to extract the color palettes for those paintings. Those palettes are written out to png files with the DrawColorPalette, and finally all the palettes are written out to a JSON file in WriteIndex.

Gotchas

Marshing

Always remember to register the types that will be transmitted between workers. This is anything that’s inside a PCollection, as well as any DoFn. Not all types are allowed, but slices, structs, and primitives are. For other types, custom JSON marshalling can be used.

It should also be reminded that global state is not allowed. Flags and other global variables will not always be populated when running on a remote worker. Also, examples like this may catch you out:

prefix := “X”
s = s.Scope(“Prefix ” + prefix)
c = beam.ParDo(s, func(value string) string {
	return prefix + value
}, c)

This simple example appears to add “X” to the beginning of each element, however, it will prefix nothing. This is because, the simple anonymous function is marshalled, and unmarshalled on the worker. When it is then invoked on the worker, it does not have the closure, and thus has not captured the value of prefix. Instead prefix is the zero value. For this example to work, prefix must be defined inside the anonymous function, or a DoFn struct used which contains the prefix as a marshalled field.

Errors

Since the pipeline could be running across 100s of workers, errors are to be expected. Extensively using log.Infof, log.Debugf, etc will make your live better. They can make it very easy to debug why the pipeline got stuck, or mysteriously failed.

While debugging this pipeline, it would occasionally fail due to exceeding the memory limits of the Dataflow worker’s. Standard Go infrastructure can be used to help debug this, such as pprof.

import (
	"net/http"
	_ "net/http/pprof"
)

func main() {
	...
	go func() {
		// HTTP Server for pprof (and other debugging)
		log.Info(ctx, http.ListenAndServe("localhost:8080", nil))
	}()
	...
}

This configures a webserver which can export useful stats, and used for grabbing pprof profiling data.

Difference between direct and dataflow runners

Running the pipeline locally is a quick way to validate the pipeline is setup, and that is runs as expected. However, running locally won’t run the pipeline in parallel, and it is obviously constrained to a single machine. There are some other difference, mostly around marshalling data. It’s always a good idea to test on Dataflow, perhaps with a smaller or sampled dataset as input, that can be used as a smoke test.

Conclusion

This article has covered the basics of creating an Apache Beam pipeline with the Go SDK, while also covering some more advanced topics. The results of the specific pipeline will be revealed in a later article, until then the code is available here.

While the Beam Go SDK is still experimental, there are many great tutorials and example using the more mature Java and Python Beam SDKs [1, 2]. Google themselves even published a series of generic articles [part 1, part 2] explaining common use cases.

Certbot: Unexpected Error

Sat, 26 May 2018 11:28:44 -0700

I got a nice warning email from Let’s Encrypt that my cert was going to expire soon, and hadn’t been renewed. I found in /var/log/letsencrypt/letsencrypt.log the following error:

Renewal configuration file /etc/letsencrypt/renewal/mydomain.bramp.net.conf (cert: mydomain.bramp.net) produced an unexpected error: 'Namespace' object has no attribute 'dns_cloudflare_credentials'. Skipping.

I manually ran certbot in dry-run mode and it worked fine:

$ sudo certbot renew --dry-run

So this error only occurs when certbot is running as a cron job. Looking at /etc/cron.d/certbot I see the user runs as root, so I tried the certbot renew --dry-run again, but this time as the root user:

$ sudo su
root@:~$ sudo certbot renew --dry-run

and bam, the same error. This error somehow related to the certbot-dns-cloudflare plugin, which proves the ownership of the domain with a DNS01 challenge via Cloudflare’s DNS. I use this form of challenge, because the domain in question is internal and not available on the Internet.

I had forgotten how I installed the plugin, but searching Google, it seems to be via pip3. Clearly something was different between my root and normal user w/ sudo environments. So I did the following

$ sudo pip3 list | grep certbot
certbot (0.23.0)
certbot-apache (0.23.0)
certbot-dns-cloudflare (0.24.0)

$ sudo su
root@:~$ pip3 list | grep certbot
certbot (0.23.0)
certbot-apache (0.23.0)

Aha, no certbot-dns-cloudflare when running as root. Clearly I hadn’t installed this correctly. Running pip3 install certbot-dns-cloudflare as root fixed the problem, and voila, certbot correctly fetches new certs via a regular cron.

Marvel Cinematic Universe Timeline

Sun, 08 Apr 2018 21:46:21 -0700

In the run up to the third Avengers movie, I was wondering which characters have appeared together, and when. With dozens of characters across eighteen Marvel Cinematic Universe (MCU) movies, this will make Avengers Infinity War one huge mashup. In the style of the XKCD narrative diagrams, I plotted out the journey each character has taken across the numerous movies. I also got carried away and created a similar (yet smaller) diagram for the Netflix Marvel shows.

Marvel Cinematic Universe - Iron Man (2008) though Avengers: Infinity War (2018)

click to open an interactive version

Marvel Cinematic Universe - Netflix Shows

click to open an interactive version

I would like to thank the contributors to marvel-movies.wikia.com where I got all the information. As well as Simon Elvery who created the d3-layout-narrative module for d3.js that made these diagrams easier to create. Check back in future when I write up an article on how I created these diagrams. As always, I welcome feedback, you may contact me at @TheBramp.

Google Font Features

Sun, 21 Jan 2018 16:03:36 -0800

tl;dr Google Fonts doesn’t supply fonts with OpenType features (such as old-style figures, or small-caps), but you can build and host the fonts yourself to support everything you need.

I recently posted a article which contained lots of numbers. While I was proofreading the article, I didn’t quite liked how the numbers looked, sometime the digits were below the baseline, for example:

Oldstyle figures

Where I would have expected the top and bottom of each digit to be aligned:

Lining figures

This made me flashback to all the typography I learnt when working with LaTeX. These two styles of figures are called old-style, and lining (or sometimes lowercase and uppercase numbers). The theory is that old-style numbers flow better when mixed with text. Recall, letters like q, j and p, all drop below the baseline, which makes the text nicer to read:

Example with characters below the baseline

However, my article had many numbers on the page, sometimes within tables, where old-style just made the numbers look odd. I looked for a way to force the lining style throughout. I quickly found the CSS styling:

body {
           font-variant-numeric: lining-nums; 
  -webkit-font-feature-settings: "lnum" on;
     -moz-font-feature-settings: "lnum" on;
      -ms-font-feature-settings: "lnum" on;
          font-feature-settings: "lnum" on;
}

Sadly when I applied this to my site, it did nothing. I wondered if perhaps the font did not support lining figures. A quick search led me to Stack Overflow that implied both the font I was using, Raleway, and Google Fonts (which hosted the font) did in fact support lining.

So I went deeper down the rabbit hole to figure out what was going wrong. I wanted to confirm for myself that the font supported lining figures. I searched for a while for a simple CLI that would inspect the WOFF/TTF files and tell me what they contained. Sadly, the best I could find was FontForge, a GUI. That worked, and confirmed the fonts being served by Google did not contain the lining feature, or in fact any feature other than basic ligatures.

Later I found this GitHub issue which confirmed all features were stripped from the font. So I sought out a way to rebuild the Google font to keep the lining figures.

Before that, I started to shave another yak, and decided to create a CLI tool that would easily display the font features. I came across a Go library, SFNT that can parse OpenType fonts. Sadly it didn’t implement the parsing of the features. A few hours later, I read the OpenType spec and sent them a pull request to add this functionality. Now I can easily confirm from the command line what features are supported.

$ font features raleway-v12-latin-ext_latin-regular.woff
Glyph Substitution Table (GSUB):
	Script "latn" (Latin):
		Default Language:
			Feature "liga" (Standard Ligatures)

I decided to play around with Google Font API, and then eventually the unoffical (but awesome) google-webfonts-helper (a hassle-free way to self-host Google Fonts). However, no combination of options would make the font contain the lining figures.

Since the Google Fonts are open source, I downloaded the source TTF of the font, and double-checked it did indeed contain the feature:

$ font features Raleway-Regular.ttf 
Glyph Substitution Table (GSUB):
  Script "latn" (Latin):
    Default Language:
      Feature "aalt" (Access All Alternates)
      Feature "dlig" (Discretionary Ligatures)
      Feature "liga" (Standard Ligatures)
      Feature "lnum" (Lining Figures)
      Feature "onum" (Oldstyle Figures)
      Feature "salt" (Stylistic Alternates)
      Feature "smcp" (Small Capitals)
      Feature "ss01" (Stylistic Set 1)
      Feature "ss02" (Stylistic Set 2)

So my next idea was to take the original Raleway-Regular.ttf and convert it to WOFF and WOFF2, and strip out the bits I don’t need. Just how Google Fonts does, to ensure the resulting files are lean and performant.

I couldn’t find the pipeline Google Fonts uses to process the files, so I instead took it upon myself to figure this out. I started by using pyftsubset (part of FontTools) to remove unneeded character sets, features, and other parts from the original TTF file.

$ pip install fonttools
$ pyftsubset Raleway-Regular.ttf --layout-features='*' --unicodes="U+0000-00FF, U+0100-024F, U+0131, U+0152-0153, U+02DA, U+02DC, U+02BB-02BC, U+02C6, U+0259, U+0370-03FF, U+1E00-1EFF, U+2000-206F, U+2070-209F, U+2074, U+20A0-20CF, U+2122, U+2150-218F, U+2200-22FF, U+2C60-2C7F, U+A720-A7FF" --output-file=Raleway-Regular.subset.ttf

Now I had a TTF file with all the features, but only the subset of characters I use on my site. Next I needed to convert this this file to all the recommended font formats, so my site would look nice in IE, Chrome, Android and iOS. The resulting CSS would look like this:

@font-face {
  font-family: 'Raleway';
  src: url('raleway-regular.subset.eot');                           /* IE9 Compat Modes */
  src: local('Raleway'), local('Raleway-Regular'),
       url('raleway-regular.subset.eot?#iefix') format('embedded-opentype'), /* IE6-IE8 */
       url('raleway-regular.subset.woff2') format('woff2'),    /* Super Modern Browsers */
       url('raleway-regular.subset.woff') format('woff'),     /* Pretty Modern Browsers */
       url('raleway-regular.subset.ttf') format('truetype'),    /* Safari, Android, iOS */
       url('raleway-regular.subset.svg#ralewayregular') format('svg');    /* Legacy iOS */
  font-style: normal;
  font-weight: 400;
}

I again tried to use pyftsubset to save the files in the required formats. This worked well for TTF, WOFF, and WOFF2. But didn’t support EOT or SVG fonts:

$ pip install zopfli
$ pip install brotli
$ pyftsubset ... --flavor=woff --with-zopfli --output-file=Raleway-Regular.subset.woff
$ pyftsubset ... --flavor=woff2 --output-file=Raleway-Regular.subset.woff2

So instead I searched for a all-in-one solution to converting fonts. I found numerous websites that offered to do it, the one I settled on was fontsquirrel.com. Here I used the expert feature, to control exactly what was in the font, and to produce compressed versions in all file formats. I originally tried to use the subsetting feature on fontsquirrel, but I couldn’t get it to maintain all the features I needed, so I used pyftsubset locally instead.

After fontsquirrel.com produced the fonts, I checked it contained the features, and compared the resulting file sizes:

$ ls -ltr

# Google Fonts
 96K  raleway-v12-latin-ext_latin-regular.ttf
 40K  raleway-v12-latin-ext_latin-regular.woff
 31K  raleway-v12-latin-ext_latin-regular.woff2

# My versions
140K raleway-regular.subset-webfont.ttf
 61K raleway-regular.subset-webfont.woff
 46K raleway-regular.subset-webfont.woff2

$ font features raleway-regular.subset-webfont.woff
Glyph Substitution Table (GSUB):
  Script "latn" (Latin):
    Default Language:
      Feature "aalt" (Access All Alternates)
      Feature "dlig" (Discretionary Ligatures)
      Feature "liga" (Standard Ligatures)
      Feature "lnum" (Lining Figures)
      Feature "onum" (Oldstyle Figures)
      Feature "salt" (Stylistic Alternates)
      Feature "smcp" (Small Capitals)
      Feature "ss01" (Stylistic Set 1)
      Feature "ss02" (Stylistic Set 2)

The file size didn’t vary too much, and thus it was a simple matter of uploading the fonts to my blog, and updating the CSS.

1234567890 vs 1234567890

Measuring Percentile Latency

Tue, 16 Jan 2018 08:07:00 -0800

In many applications it is common to measure the time it takes to handle some event. Web applications pay close attention to this, to ensure each user’s request is replied to in a timely manner. To view in aggregate, many would just measure the mean response time. Which is easily calculated by summing up the total time to handle all requests, divided by the number of request. This average latency metric, however can be very misleading as it does not show the worst case behaviour. For example, the majority of users may see requests handled quickly, but a few users may experience long delays. Thus to capture the worst behaviour it is better to look at percentile latency.

This article will discuss how to calculate percentiles, collect and aggregate in an distributed way, and even how to efficiently store them as time series data.

Percentiles

Let’s start with some basics, the 99 percentile, is defined as the value that 99 out of 100 samples fall below. Thus 99 users out of 100, observe a latency less than this value, and 1 in every 100 observe a latency equal to or greater. We choose the 99%tile, because it represents the tail of the latency distribution (that is the worst cases).

The simplest way to calculate the 99 percentile, is to sort all the values, and take the 99/100^th value. For example, if you had 1,000 latency values, place them into an array, sort them, then take the value at the 990th index. That’ll be the 99%tile, which represents the latency value that 99% of the values are less than. Easy.

Throughout this article I’ll use a dataset of 10,000 randomly generated values from a log-normal distribution with parameters (μ = 0, σ = 1). Most of the values will be small (<2s), but there will be a long tail, which will simulate worst case latencies returned by a server.

Above we have a empirical cumulative distribution function (eCDF), which visually demonstrates this technique. On the redline there are 10,000 latency values, in sorted order. If we take the 9,900th point, we see a value of 10.97 seconds. This is the 99%tile latency for our dataset.

We could use this simple approach to calculate the distribution on our servers. However, let’s assume our servers receive 100 queries per second (qps), and we want to calculate the 99%tile every 60 seconds. That’ll require us to store 6,000 latency values for every minute, which is a doable, but unbounded. If we extend this to a dozen servers, all storing 6,000 numbers every minute, and we wanted to aggregate this metric across all of them, this could very quickly get out of control. Especially if we are capturing multiple different dimensions of this metrics (e.g percentiles for successful vs failed requests). Perhaps there is a way to approximate this, bounding the amount of RAM, while keeping a level of accuracy.

Histogram Approximation

Instead of storing each number, we could bin them into groups, in the same way a histogram would. For example, we know that latency values will be in the range 0 to 60,000ms. That’s because, it is impossible to handle the request in zero seconds, and hopefully the application will timeout after 60 seconds (otherwise the chances are the user isn’t waiting anymore).

So we can use histogram bins that double in size from 1ms, to ~64,000ms, for example (0-1ms], (1-2ms], (2-4ms], (4-8ms], (8-16ms], (16-32ms], (32-64ms], (64-128ms], (128-256ms], (256-512ms], (512-1024ms], etc. Extending to 65,536ms (2^16), would give us 18 bins. Each bin will record the count of values that land within its range. Thus we only need to store 18 counts, instead of the unbounded 6,000 latency values.

But how well does this approximate? Lets look at our random dataset from earlier.

Bin Range (ms)		Count	Running total	eCDF(x)
... 12 rows cut ...
1,024	2,048	2,842	7,490	74.90%
2,048	4,096	1,657	9,147	91.47%
4,096	8,192	648	9,795	97.95%
8,192	16,384	172	9,967	99.67%
16,384	32,768	29	9,996	99.96%

In this table, the first two columns represent the range of the bin, and the third column is the count of values within that bin. The running total column, is the sum of the current bin and all previous bins. Finally, the eCDF(x) is the empirical cumulative distribution function, or simply put, the running total divided by the sum of all counts (which in this case is 10,000 as there are exactly 10,000 samples).

The bins can accurately determine the percentiles at the edges, so for example, the 97.95%tile is 8,192ms, and the 99.67%tile is 16,384ms. However, we wanted the 99%tile, which lies somewhere between these two values. We can use linear approximation to find the position in the bin (which is somewhere above 8192ms, but less than 16,384ms).

Thus we can determines the 99%tile is 13.192 seconds. If we compare this to non-approximate value from earlier, 10.970s, we seem to be off by ~20%. To make this approximation more precise, we can increase the number of bins. Instead of doubling the bin boundaries, we can increase each boundary by a factor of √2 (square root of 2). This would double the number of bins (from 18 to 36), but increase the precision greatly. If we use these new bins, the linear approximation gets us a value of 11.042s (at the 99%tile) which is only off by 0.66%. This seems a good trade-off of space and accuracy.

Just to double-check, calculating the 99.9%tile (one additional 9) exactly is 23.105s, and the √2 bins estimates is 23.170s. This is only off by 0.28%, so again seems reasonable. Obviously, the shape of the distribution, and the actual values will affect the error. Empirically √2 bins works well enough, but your experience may vary.

Aggregation

Now we can calculate the percentiles, how would we extend this so we can aggregate the percentiles from multiple servers. A naive approach may be to ask each server to calculate its own 99%tile, and for us to calculate a mean of these. A average of percentiles doesn’t seem ideal, especially if one server is particularly bad, a average may just hide the outliers again. A better approach, is to collect the histogram (set of bins) from each server, and simply add them together. This works easily if every server is using the same bin ranges.

Bin Range (ms)		Server A Count	Server B Count	Total Count
...
1,024	2,048	2,842	2,811	5,653
2,048	4,096	1,657	1,660	3,317
4,096	8,192	648	634	1,282
8,192	16,384	172	155	327

So in this example, Server A and Server B have 2,842 and 2,811 samples respectively between 1.024s and 2.048s. Meaning across both these servers, there were 5,653 requests that took between 1 and 2 seconds. Using the same linear approximation techniques on this combined histogram allows us to calculate the aggregated percentiles.

This kind of aggregation works well, and is lightweight enough to collect across even a large fleet of servers. Then in a centralised location (perhaps the machine doing the monitoring) the aggregate percentiles can be calculated. If needed per server percentiles can be drilled down, as that data is retained. A lot simpler than maintaining the full set (10,000) values from each server.

Time

Typically, these percentiles want to be measured over time. For example, we want to know the 99%tile aggregated across all the servers for every minute, or hour of the day. To achieve this we need to store the histogram at fixed intervals, say every minute. There is again a naive approach, where every minute we reset the histogram counts to zero. Allowing each server to only be counting the values in the last minute. Conceptuation this is easy to reason about, but introduces subtle synchronisation issues. What happens if each server has a slightly different definition of when a minute starts? or that collection is delayed and histograms are not aggregated (before being reset)?

A more robust way is to never reset the histogram, but to always keep increasing counts. Then to calculate the value for a particular interval (say the last minute), you subtract the most recent histogram from the previous minute’s histogram. This is a little bit more work, but a lot more flexible.

To explore this concept, lets begin with a simpler (non-histogram) example, say calculating requests per second. If we store a running counter of requests, then if you recall your calculus, the rate per second, is the differential. That is, the delta between two values.

Time (s)	Running Count	Delta (per minute)
0	0
60	95	95
120	205	110
180	310	105
240	395	85
300	450	55
360	480	30
420	500	20
480	500	0
540	590	90
600	700	110

Taking the example above, we can say the average requests per seconds between time 120s, and 180s is 1.75. Because at time 180s there were 310 total requests, and at time 120s there were only 205. Thus a delta of 105 requests per minutes, or 1.75 requests per second.

This has the nice property, that we can easily calculate the rate over any arbitrary interval. For example, subtracting the value at time 600s, with the value at time 0s, calculates the average rate over the last 10 minutes. This is a lot simpler than keeping track of the per second rate every minute, and calculating the average of them. This property is especially useful when plotting on a graph where each pixel may represent a wide interval (such as a full hour). Having a quick way to calculate the rate in that hour is a real performance win. Even though this example was a simple rate per second, this works exactly the same for the histograms. Thus, storing the running total, across all servers, at periodic intervals, we can easily calculate an approximate percentile over any arbitrary interval.

Conclusion

To truly understand latency, the distribution of it must be examined. This can be achieved by looking at various percentiles. These percentiles can be scalably and efficiently calculated by using histograms with fixed bins, which keep track of a running count of latency values.

A quick word of warning, all monitoring lies to you in subtle ways, and it is your responsibility to understand it. If you have fewer than 100 values, does a 99%tile metric make sense? Perhaps extend the collection interval over a longer time period, or instead use the 90%tile. A single percentile also doesn’t show the full picture, it may always be worth exporting the 50%, 90%, 99%tile, etc. Or perhaps, a percentile doesn’t capture your monitoring requirements, and instead simply taking the max value would be better.

Finally, you may not wish to calculate all this yourself, and instead use a off the shelf library, such as HdrHistogram, or a monitoring solution such as Prometheus.

Running Java in Production: A SRE’s Perspective

Sat, 13 Jan 2018 12:50:31 -0800

Originally published as part of the Java Advent 2017 series

As a Site Reliability Engineer (SRE) I make sure our production services are efficient, scalable, and reliable. A typical SRE is a master of production, and has to have a good understanding of the wider architecture, and be well versed in many of the finer details.

It is common that SREs are polyglot programmer, expected to understand multiple different languages. For example, C++ may be hard to write, test and get right, but has high performance, perfect for backend systems such as databases. Whereas Python is easy to write, and great for quick scripting, useful for automation. Java is somewhere in the middle, and even though it is a compiled language, it provides type safety, performance, and many other advantages that make it a good choice for writing web infrastructure.

Even though many of the best practices that SREs adopt can be generalised to any language, there are some unique challenges with Java Web applications. This article highlight some of these challenges and talks about what we can do to address them.

Deployment

A typical java application consists of 100s of class files, either written by your team, or from common libraries that the application depends on. To keep the number of class files under control, and to provide better versioning, and compartmentalisation, they are typically bundled up into JAR or WAR files.

There are many ways to host a java application, one popular method is using a Java Servlet Container such as Tomcat, or JBoss. These provide some common web infrastructure, and libraries to make it, in theory, easier to deploy and manage the java application. Take Tomcat, a java program that provides the actual webserver and loads the application (bundled as a WAR file) on your behalf. This may work well in some situations, but actually adds additional complexity. For example, you now need to keep track of the version of the JRE, the version of Tomcat, and the version of your application. Testing for incompatibility, and ensuring everyone is using the same versions of the full stack can be problematic, and lead to subtle problems. Tomcat also brings along its own bespoke configuration, which is yet another thing to learn.

A good tenant to follow is to “keep it simple”, but in the Servlet Container approach, you have to keep track of a few dozen Tomcat files, plus one or more WAR files that make up the application, plus all the Tomcat configuration that goes along with it.

Thus there are some frameworks that attempt to reduce this overhead by instead of being hosted within a full application server, they embed their own web server. There is still a JVM but it invokes a single JAR file that contains everything needed to run the application. Popular frameworks that enable these standalone apps are Dropwizard and Spring Boot. To deploy a new version of the application, only a single file needs to be changed, and the JVM restarted. This is also useful when developing and testing the application, because everyone is using the same version of the stack. It is also especially useful for rollbacks (one of SRE’s core tools), as only a single file has to be changed (which can be as quick as a symlink change).

One thing to note with a Tomcat style WAR file, the file would contain the application class files, as well as all the libraries the application depends on as JAR files. In the standalone approach, all the dependencies are merged into a single, Fat JAR. A single JAR file that contains the class files for the entire application. These Fat or Uber JARs, not only are easier to version and copy around (because it is a single immutable file), but can actually be smaller than an equivalent WAR file due to pruning of unused classes in the dependencies.

This can even be taken further, by not requiring separate JVM and JAR files. Tools like capsule.io, can actually bundle up the JAR file, JVM, and all configuration into a single executable file. Now we can really ensure the full stack is using the same versions, and the deployment is agnostic to what may already be installed on the server.

Keep it simple, and make the application as quick and easy to version, using a single Fat JAR, or executable where possible.

Startup

Even though Java is a compiled language, it is not compiled to machine code, it is instead compiled to bytecode. At runtime the Java Virtual Machine (JVM) interprets the bytecode, and executes it in the most efficient way. For example, just-in-time (JIT) compilation allows the JVM to watch how the application is used, and on the fly compile the bytecode into optimal machine code. Over the long run this may be advantageous for the application, but during startup can make the application perform suboptimally for tens of minutes, or longer. This is something to be aware of, as it has implications on load balancing, monitoring, capacity planning, etc.

In a multi-server deployment, it is best practice to slowly ramp up traffic to a newly started task, giving it time to warm up, and to not harm the overall performance of the service. You may be tempted to warm up new tasks by sending it artificial traffic, before it is placed into the user-serving path. Artificial traffic can be problematic if it does not approximate normal user traffic. In fact, this fake traffic may trigger the JIT to optimise for cases that don’t normally occur, thus leaving the application in a sub-optimal or worse state than not being JIT’d.

Slow starts should also be considered when capacity planning. Don’t expect cold tasks to handle the same load as warm tasks. This is important when rolling out a new version of the application, as the capacity of the system will drop until the tasks warms up. If this is not taken into account, too many tasks may be reloaded concurrently, causing a capacity based cascading outage.

Expect cold starts, and try to warm the application up with real traffic.

Monitoring

This advice is generic monitoring advice, but it is worth repeating for Java. Make sure the most important and useful metrics are exported from the Java application, are collected and easily graphed. There are many tools and frameworks for exporting metrics, and even more for collecting, aggregating, and displaying.

When something breaks, troubleshooting the issue should be possible from only the metrics being collected. You should not be to depending on log files, or looking at code, to deal with an outage.

Most outages are caused by change. That is, a new version of the application, a config change, new source of traffic, a hardware failure, or a backend dependencies behaving differently. The metrics exported by the application, should include ways to identify the version of Java, application, and configuration in use. It should break down sources of traffic, mix, error counts, etc. It should also track the health, latency, error rates, etc of backend dependencies. Most of the time, this is enough to diagnose a outage quickly.

Specific to Java, there are metrics that can be helpful to understand the health, and performance of the application. Guiding future decisions on how to scale and optimise the application. Garbage collection time, heap size, thread count, JIT time are all important and Java specific.

Finally, a note about measuring response times, or latency. That is, the time it takes the application to handle a request. Many make the mistake of looking at average latency, in part because it can be easily calculated. Averages can be misleading, because it doesn’t show the shape of the distribution. The majority of requests may be handled quickly, but there may be a long tail of requests that are rare but take a while. This is especially troubling for JVM application, because during garbage collection there is a stop the world (STW) phase, where the application must pause, to allow the garbage collection to finish. In this pause, no requests will be responded to, and users may wait multiple seconds.

It is better to collect either the max, or 99 (or higher) percentile latency. For percentile, that say for every every 100 requests, 99 are served quicker than this number. Looking at the worst case latency is more meaningful, and more reflective of the user perceived performance.

Measure metrics that matter, and you can later depend on.

Memory Management

A good investment of your time is to learn about the various JVM garbage collection algorithms. The current state of the art are the concurrent collectors, either G1, or CMS. You can decide on what may be best for your application, but for now G1 is the likely winner. There are many great articles that explain how they work, but I’ll cover some key topics.

When starting up, the Java Virtual Machine (JVM) reserves a large chunk of OS memory and splits it into heap and non-heap. The non-heap contains areas such as Metaspace (formally called Permgen), and stack space. Metaspace is for class definitions, and stack space is for each thread’s stacks. The heap is used for the objects that are created, which normally takes up the majority of the memory usage. Unlike a typical executable, the JVM has the -Xms and -Xmx flags that control the minimum and maximum size of the heap. These limits constrain the maximum amount of RAM the JVM will use, which can make the memory demands on your servers predictable. It is common to set both these flags to the same value, provisioning them to fill up the available RAM on your server. There are also best practices around this when sizing Docker containers.

Garbage collection (GC) is the process of managing this heap, by finding java objects that are no longer in use (i.e no longer referred to), and can be reclaimed. In most cases the JVM scans the full graph of objects, marking which it finds. At the end, any that weren’t visited, are deleted. To ensure there aren’t race conditions, the GC typically has to stop the world (STW), which pauses the application for a short while, while it finishes up.

The GC is a source of (perhaps unwarranted) resentment because it is blamed for many performance problems. Typically this boils down to not understanding how the GC works. For example, if the heap is sized too small, the JVM can aggressive garbage collect, trying to futilely free up space. The application can then get stuck in this “GC thrashing” cycle, that makes very little progress freeing up space, and spending a larger and larger proportion of time in GC, instead of running the application code.

Two common cases where this can happen, are memory leaks, or resource exhaustion. Garbage collected languages shouldn’t allow what is conventionally called memory leaks, however, they can occur. Take for example, maintaining a cache of objects that never expire. This cache will grow forever, and even though the objects in the cache may never be used again, they are still referenced, thus ineligible to be garbage collected.

Another common cases is unbounded queues. If your application places incoming requests on a unbounded queue, this queue could grow forever. If there is a spike of request, objects retained on the queue could increase the heap usage, causing the application to spend more and more time in GC. Thus the application will have less time to process requests from the queue, causing the backlog to grow. This spirals out of control as the GC struggles to find any objects to free, until the application can make no forward progress.

The garbage collector algorithms has many optimisations to try and reduce total GC time. One important observation, the weak generational hypothesis, is that objects either exist for a short time (for example, related to the handling a request), or last a long time (such as global objects that manage long lived resources).

Because of this, the heap is further divided into young and old space. The GC algorithm that runs across the young space assume the object will be freed, and if not, the GC promotes the object into old space. The algorithm for old space, makes the opposite assumption, the the object won’t be freed. The size of the young/old may thus also be tuned, and depending on G1 or CMS the approach will be different. But, if the young space is too small, objects that should only exist for short time end up getting promoted to old space. Breaking some of the assumptions the old GC algorithms make, causing GC to run less efficiently, and causing secondary issues such as memory fragmentation.

As mentioned earlier, GC is a source of long tail latency, so should be monitored closely. The time taken for each phase of the GC should be recorded, as well as the fullness of heap space (broken down by young/old/etc) before and after GC runs. This provides all the hints needed to either tune, or improve the application to get GC under control.

Make GC your friend. Careful attention should be paid to the heap, and garbage collector, and it should be tuned (even coarsely) to ensure there is enough heap space even in the fully loaded/worst case.

Other tips

Debugging

Java has many rich tools for debugging during development and in production. For example, it is possible to capture live stack traces, and heap dumps from the running application. This can be useful to understand memory leaks, or deadlocks. However, you must ensure the application is started to allow these features, and that the typical tools, jmap, jcmd, etc are actually available on the server. Running the application inside a Docker container, or non-standard environment, may make this more difficult, so test and write a playbook on how to do this now.

Many frameworks, also expose much of this information via webservices, for easier debugging, for example the Dropwizard /threads resource, or the Spring Boot production endpoints.

Don’t wait until you have a production issue, test now how to grab heap dumps and stack traces.

Fewer but larger tasks

There are many features of the JVM that have a fixed cost per running JVM, such as JIT and garbage collection. Your application may also have fixed overheads, such as resource polling (backend database connections), etc. If you run fewer, but larger (in terms of CPU and RAM) instances, you can reduce this fixed cost, getting an economy of scale. I’ve seen doubling the amount of CPU and RAM a Java application had, allowed it to handle 4x the requests per second (with no impact to latency). This however makes some assumption about the application’s ability to scale in a multi-threaded way, but generally scaling vertically is easier than horizontally.

Make your JVM as large as possible.

32-bit vs. 64-bit Java

It used to be common practice to run a 32-bit JVM if your application didn’t use more than 4GiB of RAM. This was because 32-bit pointers are half the size of 64-bit, which reduced the overhead of each java object. However, as modern CPUs are 64-bit, typically with 64-bit specific performance improvements, and that the cost of RAM being cheap this make 64-bit JVMs the clear winner.

Use 64-bit JVMs.

Load Shedding

Again general advice, but important for java. To avoid overload caused by GC thrashing, or cold tasks, the application should aggressively load shed. That is, beyond some threshold, the application should reject new requests. It may seem bad to reject some requests early, but it is better than allowing the application to become unrecoverably unhealthy and fail all requests. There are many ways to avoid overload, but common approaches are to ensure queues are bounded, and that thread pools are sized correctly. Additionally, outbound request should have appropriate deadlines, to ensure a slow backend doesn’t cause problems for your application.

Handle as many requests as you can, and no more.

Conclusion

Hopefully this article has made you think about your java production environment. While not be prescriptive, we highlight some areas to focus. The links throughout should guide you in the right direction.

Parsing with Antlr4 and Go

Sat, 16 Dec 2017 12:50:31 -0800

Originally published as part of the Go Advent 2017 series

What is ANTLR?

ANTLR (ANother Tool for Language Recognition), is an ALL(*) parser generator. In layman’s terms, Antlr, creates parsers in a number of languages (Go, Java, C, C#, Javascript), that can process text or binary input. The generated parser provides a callback interface to parse the input in an event-driven manner, which can be used as-is, or used to build parse trees (a data structure representing the input).

ANTLR is used by a number of popular projects, e.g Hive and Pig use it to parse Hadoop queries, Oracle and NetBeans uses it for their IDEs, and Twitter even uses it to understand search queries. Support was recently added so that ANTLR 4 can be used to generate parsers in pure Go. This article will explain some of the benefits of ANTLR, and walk us through a simple example.

Why use it?

It is possible to hand write a parser, but this process can be complex, error prone, and hard to change. Instead there are many [parser generators](https://en.wikipedia.org/wiki/Compari son_of_parser_generators) that take a grammar expressed in an domain- specific way, and generates code to parse that language. Popular parser generates include bison and yacc. In fact, there is a version of yacc, goyacc, which is written in Go and was part of the main go repo until it was moved to golang.org/x/tools last year.

So why use ANTLR over these?

ANTLR has a suite of tools, and GUIs, that makes writing and debugging grammars easy.
It uses a simple EBNF syntax to define the grammar, instead of a bespoke configuration language.
ANTLR is an Adaptive LL(*) parser, ALL(*) for short, whereas most other parser generators (e.g Bison and Yacc) are LALR. The difference between LL(*) and LALR is out of scope for this article, but simply LALR works bottom-up, and LL(*) works top-down. This has a bearing on how the grammar is written, making some languages easier or harder to express.
The generated code for a LL(*) parser is more understandable than a LALR parser. This is because LALR parsers are commonly table driven, whereas LL(*) parsers encode the logic in its control flow, making it more comprehensible.
Finally ANTLR is agnostic to the target language. A single grammar can be used to generate parsers in Java, Go, C, etc. Unlike Bison/Yacc which typically embeds target language code into the grammar, making it harder to port.

Installing ANTLR v4

ANTLR is a Java 1.7 application, that generates the Go code needed to parse your language. During development Java is needed, but once the parser is built only Go and the ANTLR runtime library is required. The ANTLR site has [documentation](https://github.com/antlr/antlr4/blob/master/doc/getting- started.md) on how to install this on multiple platforms, but in brief, you can do the following:

$ wget http://www.antlr.org/download/antlr-4.7-complete.jar
$ alias antlr='java -jar $PWD/antlr-4.7-complete.jar'

The antlr command is now available in your shell. If you prefer, the .jar file can be placed into a ~/bin directory, and the alias can be stored in your ~/.bash_profile.

Classic calculator example

Let’s start with the “hello world” for parsers, the calculator example. We want to build a parser that handles simple mathematical expressions such as 1 + 2 * 3. The focus of this article is on how to use Go with ANTLR, so the syntax of the ANTLR language won’t be explained in detail, but the ANTLR site has [compressive documentation](https://githu b.com/antlr/antlr4/blob/master/doc/grammars.md).

As we go along, the source is available to all examples.

// Calc.g4
grammar Calc;

// Tokens
MUL: '*';
DIV: '/';
ADD: '+';
SUB: '-';
NUMBER: [0-9]+;
WHITESPACE: [ \r\n\t]+ -> skip;

// Rules
start : expression EOF;

expression
   : expression op=('*'|'/') expression # MulDiv
   | expression op=('+'|'-') expression # AddSub
   | NUMBER                             # Number
   ;

The above is a simple grammar split into two sections, tokens, and rules. The tokens are terminal symbols in the grammar, that is, they are made up of nothing but literal characters. Whereas rules are non- terminal states made up of tokens and/or other rules.

By convention this grammar must be saved with a filename that matches the name of the grammar, in this case “Calc.g4” . To process this file, and generate the Go parser, we run the antlr command like so:

$ antlr -Dlanguage=Go -o parser Calc.g4

This will generate a set of Go files in the “parser” package and subdirectory. It is possible to place the generated code in a different package by using the -package argument. This is useful if your project has multiple parsers, or you just want a more descriptive package name for the parser. The generated files will look like the following:

$ tree
├── Calc.g4
└── parser
    ├── calc_lexer.go
    ├── calc_parser.go
    ├── calc_base_listener.go
    └── calc_listener.go

The generated files consist of three main components, the Lexer, Parser, and Listener.

The Lexer takes arbitrary input and returns a stream of tokens. For input such as 1 + 2 * 3, the Lexer would return the following tokens: NUMBER (1), ADD (+), NUMBER (2), MUL (*), NUMBER (3), EOF.

The Parser uses the Lexer’s output and applies the Grammar’s rules. Building higher level constructs, such as expressions that can be used to calculate the result.

The Listener then allows us to make use of the the parsed input. As mentioned earlier, yacc requires language specific code to be embedded with the grammar. However, ANTLR separates this concern, allowing the grammar to be agnostic to the target programming language. It does this through use of listeners, which effectively allows hooks to be placed before and after every rule is encountered in the parsed input.

Using the Lexer

Let’s move onto an example of using this generated code, starting with the Lexer.

// example1.go
package main

import (
	"fmt"
	"github.com/antlr/antlr4/runtime/Go/antlr"

	"./parser"
)

func main() {
	// Setup the input
	is := antlr.NewInputStream("1 + 2 * 3")

	// Create the Lexer
	lexer := parser.NewCalcLexer(is)

	// Read all tokens
	for {
		t := lexer.NextToken()
		if t.GetTokenType() == antlr.TokenEOF {
			break
		}
		fmt.Printf("%s (%q)\n",
			lexer.SymbolicNames[t.GetTokenType()], t.GetText())
	}
}

To begin with, the generated parser is imported from the local subdirectory import "./parser". Next the Lexer is created with some input:

	// Setup the input
	is := antlr.NewInputStream("1 + 2 * 3")

	// Create the Lexer
	lexer := parser.NewCalcLexer(is)

In this example the input is a simple string, "1 + 2 * 3" but there are other [antlr.InputStream](https://godoc.org/github.com/antlr/antlr 4/runtime/Go/antlr#InputStream)s, for example, the antlr.FileStream type can read directly from a file. The InputStream is then passed to a newly created Lexer. Note the name of the Lexer is CalcLexer which matches the grammar’s name defined in the Calc.g4.

The lexer is then used to consume all the tokens from the input, printing them one by one. This wouldn’t normally be necessary but we do this for demonstrative purposes.

 	for {
		t := lexer.NextToken()
		if t.GetTokenType() == antlr.TokenEOF {
			break
		}
		fmt.Printf("%s (%q)\n",
			lexer.SymbolicNames[t.GetTokenType()], t.GetText())
	}

Each token has two main components, the TokenType, and the Text. The TokenType is a simple integer representing the type of token, while the Text is literally the text that made up this token. All the TokenTypes are defined at the end of calc_lexer.go, with their string names stored in the SymbolicNames slice:

// calc_lexer.go
const (
	CalcLexerMUL        = 1
	CalcLexerDIV        = 2
	CalcLexerADD        = 3
	CalcLexerSUB        = 4
	CalcLexerNUMBER     = 5
	CalcLexerWHITESPACE = 6
)

You may also note, that the Whitespace token is not printed, even though the input clearly had whitespace. This is because the grammar was designed to skip (i.e. discard) the whitespace WHITESPACE: [ \r\n\t]+ -> skip;.

Using the Parser

The Lexer on its own is not very useful, so the example can be modified to also use the Parser and Listener:

// example2.go
package main

import (
	"./parser"
	"github.com/antlr/antlr4/runtime/Go/antlr"
)

type calcListener struct {
	*parser.BaseCalcListener
}

func main() {
	// Setup the input
	is := antlr.NewInputStream("1 + 2 * 3")

	// Create the Lexer
	lexer := parser.NewCalcLexer(is)
	stream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

	// Create the Parser
	p := parser.NewCalcParser(stream)

	// Finally parse the expression
	antlr.ParseTreeWalkerDefault.Walk(&calcListener{}, p.Start())
}

This is very similar to before, but instead of manually iterating over the tokens, the lexer is used to create a [CommonTokenStream](https:// godoc.org/github.com/antlr/antlr4/runtime/Go/antlr#CommonTokenStream), which in turn is used to create a new CalcParser. This CalcParser is then “walked”, which is ANTLR’s event-driven API for receiving the results of parsing the rules.

Note, the [Walk](https://godoc.org/github.com/antlr/antlr4/runtime/Go/ antlr#ParseTreeWalker.Walk) function does not return anything. Some may have expected a parsed form of the expression to be returned, such as some kind of AST (abstract syntax tree), but instead the Listener receives event as the parsing occurs. This is similar in concept to SAX style parsers for XML. Event-based parsing can sometimes be harder to use, but it has many advantages. For example, the parser can be very memory efficient as previously parsed rules can be discarded once they are no longer needed. The parser can also be aborted early if the programmer wishes to.

But so far, this example doesn’t do anything beyond ensuring the input can be parsed without error. To add logic, we must extend the calcListener type. The calcListener has an embedded BaseCalcListener, which is a helper type, that provides empty methods for all those defined in in the CalcListener interface. That interface looks like:

// parser/calc_listener.go
// CalcListener is a complete listener for a parse tree produced by CalcParser.
type CalcListener interface {
	antlr.ParseTreeListener

	// EnterStart is called when entering the start production.
	EnterStart(c *StartContext)

	// EnterNumber is called when entering the Number production.
	EnterNumber(c *NumberContext)

	// EnterMulDiv is called when entering the MulDiv production.
	EnterMulDiv(c *MulDivContext)

	// EnterAddSub is called when entering the AddSub production.
	EnterAddSub(c *AddSubContext)

	// ExitStart is called when exiting the start production.
	ExitStart(c *StartContext)

	// ExitNumber is called when exiting the Number production.
	ExitNumber(c *NumberContext)

	// ExitMulDiv is called when exiting the MulDiv production.
	ExitMulDiv(c *MulDivContext)

	// ExitAddSub is called when exiting the AddSub production.
	ExitAddSub(c *AddSubContext)
}

There is an Enter and Exit function for each rule found in the grammar. As the input is walked, the Parser calls the appropriate function on the listener, to indicate when the rule starts and finishes being evaluated.

Adding the logic

A simple calculator can be constructed from this event driven parser by using a stack of values. Every time a number is found, it is added to a stack. Everytime an expression (add/multiple/etc) is found, the last two numbers on the stack are popped, and the appropriate operation is carried out. The result is then placed back on the stack.

Take the expression 1 + 2 * 3, the result could be either (1 + 2) * 3 = 9, or 1 + (2 * 3) = 7. Those that recall the order of operations, will know that multiplication should always be carried out before addition, thus the correct result is 7. However, without the parentheses there could be some ambiguity on how this should be parsed. Luckily the ambiguity is resolved by the grammar. The precedence of multiplication over addition was subtly implied within Calc.g4, by placing the MulDiv expressed before the AddSub expression.

The code for a listener that implements this stack of value implementation is relatively simple:

type calcListener struct {
	*parser.BaseCalcListener

	stack []int
}

func (l *calcListener) push(i int) {
	l.stack = append(l.stack, i)
}

func (l *calcListener) pop() int {
	if len(l.stack) < 1 {
		panic("stack is empty unable to pop")
	}

	// Get the last value from the stack.
	result := l.stack[len(l.stack)-1]

	// Remove the last element from the stack.
	l.stack = l.stack[:len(l.stack)-1]

	return result
}

func (l *calcListener) ExitMulDiv(c *parser.MulDivContext) {
	right, left := l.pop(), l.pop()

	switch c.GetOp().GetTokenType() {
	case parser.CalcParserMUL:
		l.push(left * right)
	case parser.CalcParserDIV:
		l.push(left / right)
	default:
		panic(fmt.Sprintf("unexpected op: %s", c.GetOp().GetText()))
	}
}

func (l *calcListener) ExitAddSub(c *parser.AddSubContext) {
	right, left := l.pop(), l.pop()

	switch c.GetOp().GetTokenType() {
	case parser.CalcParserADD:
		l.push(left + right)
	case parser.CalcParserSUB:
		l.push(left - right)
	default:
		panic(fmt.Sprintf("unexpected op: %s", c.GetOp().GetText()))
	}
}

func (l *calcListener) ExitNumber(c *parser.NumberContext) {
	i, err := strconv.Atoi(c.GetText())
	if err != nil {
		panic(err.Error())
	}

	l.push(i)
}

Finally this listener would be used like so:

// calc takes a string expression and returns the evaluated result.
func calc(input string) int {
	// Setup the input
	is := antlr.NewInputStream(input)

	// Create the Lexer
	lexer := parser.NewCalcLexer(is)
	stream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

	// Create the Parser
	p := parser.NewCalcParser(stream)

	// Finally parse the expression (by walking the tree)
	var listener calcListener
	antlr.ParseTreeWalkerDefault.Walk(&listener, p.Start())

	return listener.pop()
}

Following the algorithm, the parsing of 1 + 2 * 3 would work like so.

The numbers 2 and 3 would be visited first (and placed on the stack),
Then the MulDiv expression would be visited, taking the values 2 and 3, multiplying them, and placing the result, 6, back on the stack.
Then the number 1 would visited and pushed onto the stack.
Finally AddSub would be visited, popping the 1 and the 6 from the stack, placing the result 7 back.

The order the rules are visited is completely driven by the Parser, and thus the grammar.

More grammars

Learning how to write a grammar may be daunting, but there are many resources for help. The author of ANTLR, Terence Parr, has published a book, with some of the content freely available on antlr.org.

If you don’t want to write your own grammar, there are many pre-written grammars available. Including grammars for CSS, HTML, SQL, etc, as well many popular programming languages. To make it easier, I have generated parsers for all those available grammars, making them as easy to use just by importing.

A quick example of using one of the pre-generated grammars:

import (
	"bramp.net/antlr4/json" // The parser

	"github.com/antlr/antlr4/runtime/Go/antlr"
)

type exampleListener struct {
	// https://godoc.org/bramp.net/antlr4/json#BaseJSONListener
	*json.BaseJSONListener
}

func main() {
	// Setup the input
	is := antlr.NewInputStream(`
		{
			"example": "json",
			"with": ["an", "array"]
		}`)


	// Create the JSON Lexer
	lexer := json.NewJSONLexer(is)
	stream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

	// Create the JSON Parser
	p := json.NewJSONParser(stream)

	// Finally walk the tree
	antlr.ParseTreeWalkerDefault.Walk(&exampleListener{}, p.Json())
}

Conclusion

Hopefully this article has given you a taste of how to use ANTLR with Go. The examples for this article are found here, and the godoc for the ANTLR library is here which explains the various InputStream, Lexer, Parser, etc interfaces.

Vanity Go Import Paths

Mon, 02 Oct 2017 07:48:23 -0700

When using third-party packages in Go, they are imported by a path that represents how to download that package from the Internet. For example, to use the popular structured logging library, Logrus, it would imported at the top of the Go program like so:

import (
  "github.com/sirupsen/logrus"
)

When go get is then executed, it fetches the Logrus source code from GitHub and places the code in the $GOPATH/src directory. Take a look for yourself:

$ tree $GOPATH/src
...
├── github.com
│   ├── Sirupsen
│   │   └── logrus
...

An astute reader may wonder, how exactly does go get know that github.com/sirupsen/logrus is a Git repository, and that it can be fetched via the git protocol from that URL. The go get binary could have some smarts in it, that knows about GitHub, and does the right thing. But that seems inflexible, and problematic if new sites want to be supported. Instead the Go developers built a layer of indirection that allows the go get tool to discover the correct source repo.

As outlined in the Remote Import Paths docs, the go get binary will make a normal HTTP request to https://github.com/sirupsen/logrus (falling back to http if needed) and look at the returned HTML for a tag. This meta tag, can then redirect the go get binary to the correct source code repository for the package.


This meta tag can been seen with curl:
$ curl https://github.com/sirupsen/logrus | grep meta | grep go-import
<meta name="go-import"
  content="github.com/sirupsen/logrus git https://github.com/sirupsen/logrus.git">
That tag says, the package rooted at github.com/sirupsen/logrus can be fetched with git, at the
URL https://github.com/sirupsen/logrus.git. The meta tag can express other source control systems, e.g Mercurial, Bazaar, Subversion.
GitHub is a very convenient place to host source code, but the GitHub URL is generic. Instead it is possible to use the  tag to create vanity domains to host projects. For example, the package hosted at github.com/bramp/goredirects could instead be imported as bramp.net/goredirects. All that is needed is a static HTML page at bramp.net/goredirects, containing the following  tag pointing at GitHub.
<meta name=go-import
  content="bramp.net/goredirects git https://github.com/bramp/goredirects.git">
Incase a user attempted to visit that page directly with their web browser, it is worthwhile
placing more information about the project on the page, or simply making the page redirect.
To help make these redirect pages, I wrote a simple go tool, goredirects, that inspects all local repositories under a vanity domain directory in the local $GOPATH/src/ and outputs static HTML pages that can be hosted on that domain.
For example, create your new project on GitHub, but check out the project under $GOPATH/src/example.com/project. Then run the tool:
$ go install bramp.net/goredirects
$ goredirects example.com outputdir
The directory outputdir will now contain multiple directories and html files, one for each project under $GOPATH/src/example.com. These HTML files contain the appropriate goimports meta tag to redirect the download of source code from the vanity name, to GitHub. Just upload these files to your website, voilà you are done. Examples of these vanity redirect files can be found on bramp.net, e.g bramp.net/goredirects/index.html. This tool even works for packages with sub-packages under the main root.
Finally, it is possible to ensure that if someone finds your project via GitHub, that go get will always place it under your vanity domain. This be can be achieved with an import comment. Within the source code, ensure that at least one of the files in your page has a comment like so:
package project // import "example.com/project"
Then go get will enforce the correct/vanity URL to use, instead of the true location.
More helpful links on the topic:

golang.org/cmd/go/#hdr-Import_path_checking
golang.org/cmd/go/#hdr-Remote_import_paths
golang.org/doc/go1.4#canonicalimports
godoc.org/golang.org/x/tools/cmd/fiximports
texlution.com/post/golang-canonical-import-paths/



Teaching Binary to 8th Graders
Sat, 15 Jul 2017 12:23:18 -0700
This summer, as part of GoogleServe, I volunteered in a local school to teach kids about the importance of mathematics. This was part part of a larger program organised by the Silicon Valley Education Foundation (SVEF).
I had 90 minutes, to introduce myself, talk a little about Google, and then spend the majority of the time teaching a topic of my choosing. Not knowing anything about teaching 8th graders I went to the Internet to find some material.
I quickly found cse4k12.org, and the excellent YouTube series by csunplugged.org. I decided I would teach about counting in binary. The csunplugged videos showed how to introduced this material in a way that seemed fun and got got the kids to work out the concepts on their own. I decided to mix the teaching with worksheets from cse4k12.org (to reinforce what the kids just learnt). Since I only had ~90 minutes to cover a lot, I took what I found on cse4k12.org and simplified their activities. I went ahead and created new worksheets, and am providing them here today for others to use. The rough schedule I used was:


10min Intro to counting in binary, with kids holding up bits (similar to this video).


15min Work on this “Counting In Binary” worksheet.


10min Representing text (again similar to this video)


15min Using “Encoding Table” and “Encoding Message” worksheets to write some secret messages to each other.


10min Representing images with binary.


15min Using the “Bitmaps” worksheets to encode their own images, and if time allows swapping encoded images with each other to decode.


All in all, this worked quite well. I learnt a lot, and was happy to see the class engaged! I will certainly be taking part in activities like this again.
P.S I found printing all sheets double sided worked really well. Oh and no computers needed! Put those laptops away.


Maven Plugins on Java 8
Sat, 01 Apr 2017 15:21:27 -0700
As part of my standard Maven configuration, I like to use two plugins backed by Google technologies, the first to help keep my code formatted correctly, and the second to check for compile time errors. However, Google recently moved to require JDK 1.8, which broke anyone trying to compile my projects with an older JDK. In this article I’ll quickly explain how to configure Maven to work around this problem.
Specifically I use the following two plugins:


coveo/fmt-maven-plugin (which uses google-java-format). This follows the Google’s Java Style guide, and reformats the code to ensure it stays consistent. This is great when accepting external contributions, as it keeps the code base uniform, and avoids style discussion on pull requests.


plexus-compiler-javac-errorprone (which uses Google’s errorprone). This is a static code analysis tool, that checks for simple errors at compile time, and fails the build if they are found. Again, this helps improve the quality of the code.


Even though my projects typically target 1.7, these plugins require to run under 1.8. Really I’d prefer I could bump all my projects to target 1.8+, but since a few of my projects are libraries (which other people include into their projects), that is easier said than done. To deal with this, I changed my Maven configuration to only run these two plugins when run under the sufficient JDK. This means those using a older JDK don’t get the benefits, but since locally I use JDK 8, and all my open source projects use Travis CI, eventually these issues will be identified.
So if you get an error like
java.lang.UnsupportedClassVersionError: com/google/googlejavaformat/java/FormatterException : Unsupported major.minor version 52.0
or
An API incompatibility was encountered while executing org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile: java.lang.UnsupportedClassVersionError: javax/tools/DiagnosticListener : Unsupported major.minor version 52.0
Please update to JDK 1.8, or update your Maven configuration to restrict these plugins to when run on a modern JDK:

...
    
        
            java18
            
                1.8
            
            
                
                    
                        com.coveo
                        fmt-maven-plugin
                        
                            
                                
                                    format
                                
                            
                        
                    
                    
                        org.apache.maven.plugins
                        maven-compiler-plugin
                        
                            javac-with-errorprone
                            true
                            true
                            
                                -Xlint:all
                            
                        
                        
                            
                                org.codehaus.plexus
                                plexus-compiler-javac-errorprone
                                2.8.1
                            
                            
                            
                                com.google.errorprone
                                error_prone_core
                                2.0.19
                            
                        
                    
                
            
        
    
...

This defines a new profile, that is only “activated” under Java 1.8. When activated the  section has the two additional plugins added.
Ensure that these plugins are no longer mentioned in the regular  section, and only in the  section.
An example of this change can be found in recent commit.

Time (s)	Running Count	Delta (per minute)
0	0
60	95	95
120	205	110
180	310	105
240	395	85
300	450	55
360	480	30
420	500	20
480	500	0
540	590	90
600	700	110

Time (s)	Running Count	Delta (per minute)
0	0
60	95	95
120	205	110
180	310	105
240	395	85
300	450	55
360	480	30
420	500	20
480	500	0
540	590	90
600	700	110

Posts on bramp.net

3D Printing a Lightsaber

Lightsaber

Lightsaber with blade

Lightsaber in parts

Hilt

Darth Vadar

Return of the Jedi

Leia's

Blade

Vase mode settings for the blade

Extrusion width settings for the blade

Blade Cover

Cover Model

Printed Cover

Extrusion width settings for the cap

Finished

Lightsaber in action

Compress and Backup

Recovering a RAID-5 Intel Storage Matrix on Linux (without the hardware)

1. Create disk images

2. Mounting the images

3. Use mdadm to construct an array.

Conclusion

Alternative Milks

Local HTTPS Server for development

Install Certbot (to generate the cert)

Setup the domain (local.bramp.net)

Setup DNS record for local.bramp.net

Create a API token

Configure Certbot

Generate the Certificate

Install a simple HTTPS web server

Running the HTTPS web server

Additional Reading

Apache Beam and Google Dataflow in Go

Overview

Table of Contents

Concepts

Pipeline stages

Shakespeare (simple example)

Running the pipeline

Art history (more complex example)

Stateful functions

Iterating over a CoGBK

Data enrichment

Error handling and dead letters

Gotchas

Marshing

Errors

Difference between direct and dataflow runners

Conclusion

Certbot: Unexpected Error

Marvel Cinematic Universe Timeline

Google Font Features

Oldstyle figures

Lining figures

Example with characters below the baseline

Measuring Percentile Latency

Percentiles

Histogram Approximation

Aggregation

Time

Conclusion

Running Java in Production: A SRE’s Perspective

Deployment

Startup

Monitoring

Memory Management

Other tips

Debugging

Fewer but larger tasks

32-bit vs. 64-bit Java

Load Shedding

Conclusion

Parsing with Antlr4 and Go

What is ANTLR?

Why use it?

So why use ANTLR over these?

Installing ANTLR v4

3. Use `mdadm` to construct an array.

Time (s)	Running Count	Delta (per minute)
0	0
60	95	95
120	205	110
180	310	105
240	395	85
300	450	55
360	480	30
420	500	20
480	500	0
540	590	90
600	700	110