<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Posts on bramp.net</title>
    <link>https://blog.bramp.net/</link>
    <description>Recent content in Posts on bramp.net</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-GB</language>
    <lastBuildDate>Sat, 02 Apr 2022 20:03:52 -0700</lastBuildDate>
    <atom:link href="https://blog.bramp.net/post/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>3D Printing a Lightsaber</title>
      <link>https://blog.bramp.net/post/2022/04/02/3d-printing-a-lightsaber/</link>
      <pubDate>Sat, 02 Apr 2022 20:03:52 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2022/04/02/3d-printing-a-lightsaber/</guid>
      <description><p>I came across these cool 3D printable Light Sabers by <a href="https://thangs.com/designer/3dprintingworld">3dprintingworld</a>, but I couldn&rsquo;t get the blade to print well. So here is a write up of my experience, and the modifications I made.</p>
<!--
  convert saber-orig.jpg -resize 720x saber.jpg
  convert saber-on-orig.jpg -resize 720x saber-on.jpg
  convert saber-parts-orig.jpg -resize 720x saber-parts.jpg
-->
<p><figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/saber.jpg" width="720"><figcaption>
      <h4>Lightsaber</h4>
    </figcaption>
</figure>

<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/saber-on.jpg" width="720"><figcaption>
      <h4>Lightsaber with blade</h4>
    </figcaption>
</figure>

<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/saber-parts.jpg" width="720"><figcaption>
      <h4>Lightsaber in parts</h4>
    </figcaption>
</figure>
</p>
<h1 id="hilt">Hilt</h1>
<p>The hilt printed well, I used <a href="https://www.prusa3d.com/product/silver-pla-filament-1kg/">Prusa Silver PLA</a>, and nothing special needed to be done. There are many to pick from online:</p>
<!-- Move this kind of thing into the main css -->
<style>
  .image-grid {
    display: flex;
    align-items: center;
    justify-content: center;
  }
  .background-1 {
    background: linear-gradient(180deg, rgba(228, 196, 215, .5) 0%, rgba(196, 196, 196, 0) 100%);
    margin:  10px;
  }
  .background-2 {
    background: linear-gradient(180deg, rgba(197, 196, 228, .5) 0%, rgba(196, 196, 196, 0) 100%);
    margin:  10px;
  }
  .background-3 {
    background: linear-gradient(180deg, rgba(196, 228, 201, .5) 0%, rgba(196, 196, 196, 0) 100%);
    margin:  10px;
  }
</style>
<!--
  convert saber-1-orig.webp -resize 200x saber-1.png
  convert saber-2-orig.webp -resize 200x saber-2.png
  convert saber-3-orig.webp -resize 200x saber-3.png
-->
<div class="image-grid">
  <figure class="background-2"><img src="saber-2.png" width="200">
    <figcaption><h4>Darth Vadar</h4>
        <a href="https://www.thingiverse.com/thing:3668138">thingiverse</a> | 
        <a href="https://thangs.com/3dprintingworld/Collapsing-Sith-Lightsaber-23598">thangs.com</a>
      </figcaption>
  </figure>
  <figure class="background-1"><img src="saber-1.png" width="200">
    <figcaption><h4>Return of the Jedi</h4>
        <a href="https://www.thingiverse.com/thing:3606120">thingiverse</a> | 
        <a href="https://thangs.com/designer/3dprintingworld/3d-model/Collapsing-LightsaberPNP-23596">thangs.com</a>
    </figcaption>
  </figure>
  <figure class="background-3"><img src="saber-3.png" width="200">
    <figcaption><h4>Leia's</h4>
        <a href="https://thangs.com/designer/3dprintingworld/3d-model/Leia's%20Dual%20Extrusion%20Collapsing%20Lightsaber%20-25509">thangs.com</a>
    </figcaption>
  </figure>
</div>
<p>There are slightly different versions of these files on thingiverse and thangs.com. The thingiverse versions seem to be the easiest to work with.</p>
<h1 id="blade">Blade</h1>
<!--
ffmpeg -i "blade-extending-orig.mp4" -filter:v "crop=1920:360:0:360,scale=720:136" "blade-extending.mp4"
ffmpeg -i "blade-extending-orig.mp4" -filter:v "crop=1920:360:0:360,scale=720:136" "blade-extending.webm"
-->
<video width="720" height="136" loop muted autoplay>
  <source src="blade-extending.mp4" type="video/mp4" />
  <source src="blade-extending.webm" type="video/webm" />
  Your browser does not support the video tag.
</video>
<p>The Collapsing Blade is where I had problems. These are sets of concentric telescoping tubes that taper inwards allowing them to fit within each other but not slide out all the way. 3dprintingworld offered two techniques, <a href="https://thangs.com/designer/3dprintingworld/3d-model/SwordSaber-Test-Print-23601">print-in-place</a>, and vase printing. The former would print the multiple tubes at the same time layer by layer. Whereas vase prints each tube individually as one continuous motion from top to bottom. The vase technique produced nicer looking blades, that were thinner yet strong. However, I couldn&rsquo;t get the <a href="https://www.thingiverse.com/thing:3606120">provided vase mode models</a> to work, so I made my own in Fusion 360 (STL and Fusion files available here TODO).</p>
<p>The end result is ~110cm long, with five separate tubes. These are printed with 0.65mm extrusion width, and no top or bottom layers. I printed them all with the wider end of the tube as the base, but for the thinest one I found printing that upside down made for a cleaner end. I used <a href="https://overture3d.com/products/overture-petg-3d-printer-filament-1?variant=42224802922750">Overture Purple PETG</a>, which gave a very nice <a href="https://www.giantfreakinrobot.com/scifi/samuel-jackson-claims-bad-motherfcker-star-wars-lightsaber.html">Samuel L. Jackson lightsaber</a>.</p>
<!--
convert blade-settings-1-orig.png -resize 720x blade-settings-1.png
convert blade-settings-2-orig.png -resize 528x blade-settings-2.png
-->
<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/blade-settings-1.png" width="720"><figcaption>
      <h4>Vase mode settings for the blade</h4>
    </figcaption>
</figure>

<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/blade-settings-2.png" width="528"><figcaption>
      <h4>Extrusion width settings for the blade</h4>
    </figcaption>
</figure>

<h1 id="blade-cover">Blade Cover</h1>
<p>The blade fitted well, but I wanted to stop it falling out so I printed a snug fitting cover. Again this was printed in vase mode, but with a 0.55mm extrusion width, and 17 solid bottom layers (to fill up to the thin tube part). Again the STL and Fusion files available.</p>
<!--
convert cap-orig.png -resize 300x cap.png
convert cap-orig.jpg -resize 300x cap.jpg
-->
<div class="image-grid">
<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/cap.png" width="300"><figcaption>
      <h4>Cover Model</h4>
    </figcaption>
</figure>

<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/cap.jpg" width="300"><figcaption>
      <h4>Printed Cover</h4>
    </figcaption>
</figure>

</div>
<!--
convert cap-settings-orig.png -resize 710x cap-settings.png
-->
<figure><img src="/post/2022/04/02/3d-printing-a-lightsaber/cap-settings.png" width="710"><figcaption>
      <h4>Extrusion width settings for the cap</h4>
    </figcaption>
</figure>

<h1 id="finished">Finished</h1>
<ul>
<li><b>Hilt</b> - Silver PLA - Normal Settings
<ul>
<li><a href="https://www.thingiverse.com/thing:3606120/files">LIGHTSABER-CAP.stl</a> and <a href="https://www.thingiverse.com/thing:3606120/files">LIGHTSABER-HILT.stl</a>.</li>
</ul>
</li>
<li><b>Blade Cover</b> - Silver PLA - Vase Mode
<ul>
<li><a href="https://www.printables.com/model/161150/files">LightSaber_Cap_v12.stl</a></li>
</ul>
</li>
<li><b>Blade</b> - Purple PETG - Vase Mode
<ul>
<li><a href="https://www.printables.com/model/161151/files">LightSaber_Blade_v4_1.stl - LightSaber_Blade_v4_5.stl</a>
}</li>
</ul>
</li>
</ul>
<!--
ffmpeg -i "saber-fun-orig.mp4" -filter:v "scale=720:406" "saber-fun.mp4"
ffmpeg -i "saber-fun-orig.mp4" -filter:v "scale=720:406" "saber-fun.webm"
-->
<figure>
<video width="720" height="406" loop muted autoplay>
  <source src="saber-fun.mp4" type="video/mp4" />
  <source src="saber-fun.webm" type="video/webm" />
  Your browser does not support the video tag.
</video>
 <figcaption><h4>Lightsaber in action</h4></figcaption>
</figure>
</description>
    </item>
    
    <item>
      <title>Compress and Backup</title>
      <link>https://blog.bramp.net/post/2021/09/12/compress-and-backup/</link>
      <pubDate>Sun, 12 Sep 2021 13:45:51 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2021/09/12/compress-and-backup/</guid>
      <description><p>In my <a href="https://blog.bramp.net/post/2021/09/12/recovering-a-raid-5-intel-storage-matrix-on-linux-without-the-hardware/">last article</a> I discussed recovering a old RAID-5 disk array. Here I&rsquo;m going to quickly list what I did to back up what I recovered.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl"><span class="c1"># Create a zstd compressed tar file</span>
</span></span><span class="line"><span class="cl">$ tar -c -v -I<span class="s2">&#34;zstd -19 -T0&#34;</span> -f raid5-my-projects.tar.zstd My<span class="se">\ </span>Projects
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Create a text based index for the tar</span>
</span></span><span class="line"><span class="cl">$ tar -t -f raid5-my-projects.tar.zstd &gt; raid5-my-projects.index
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Backup to Google Cloud</span>
</span></span><span class="line"><span class="cl">$ gsutil cp raid5* gs://backup.bramp.net/
</span></span></code></pre></div><p>Maybe I should be using a proper backup solution, but this was quick and easy. I used <a href="http://facebook.github.io/zstd/">Zstandard</a> to compress the tar file since it gives <a href="https://linuxreviews.org/Comparison_of_Compression_Algorithms">impressive compression results</a>, speed, and is modern.</p>
<p>I uploaded the results to a <a href="https://cloud.google.com/storage/docs/storage-classes">Archive bucket</a> on Google&rsquo;s Cloud Storage.</p>
</description>
    </item>
    
    <item>
      <title>Recovering a RAID-5 Intel Storage Matrix on Linux (without the hardware)</title>
      <link>https://blog.bramp.net/post/2021/09/12/recovering-a-raid-5-intel-storage-matrix-on-linux-without-the-hardware/</link>
      <pubDate>Sun, 12 Sep 2021 13:09:07 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2021/09/12/recovering-a-raid-5-intel-storage-matrix-on-linux-without-the-hardware/</guid>
      <description><p>I recently found hard drives from an old RAID array I stopped using over a decade ago. I wanted to <a href="https://raid.wiki.kernel.org/index.php/RAID_Recovery">recover the data</a> from these disks, and that turned out to be more challenging than expected. This post outlines the steps, and hopefully helps someone else in future.</p>
<p>This was a four 750GB disk RAID-5 array using Intel Storage Matrix &ldquo;fake-raid&rdquo; (now called <a href="https://en.wikipedia.org/wiki/Intel_Rapid_Storage_Technology">Intel Rapid Storage Technology</a>). This is a RAID solution that uses a mix of software and hardware. I no longer have this Intel hardware, and in fact I no longer have a machine that would accept four drives. Luckily <code>mdadm</code> seems to have a pure software implementation of Intel Storage Matrix, so I hatched a plan. I would:</p>
<ol>
<li>Create disk images for each of the four drives,</li>
<li>Mount the images locally as block devices,</li>
<li>Use <code>mdadm</code> to construct an array,</li>
<li>Copy the data into my backups.</li>
</ol>
<h1 id="1-create-disk-images">1. Create disk images</h1>
<p>I have a <a href="https://www.google.com/search?q=usb+sata+adapter">USB SATA adapter</a>, and connected one drive at a time to my PC. This computer has a single local 12 TB drive, which I would store the disk images to. I start to create the disk images using:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo dd <span class="k">if</span><span class="o">=</span>/dev/sdc <span class="nv">of</span><span class="o">=</span>1.raw
</span></span></code></pre></div><p>This worked great for the first disk, but the 2nd disk fail around the 600GB point. It seems this drive has developed bad blocks, but I kept my fingers crossed that this was still recoverable since this was RAID-5 after all. I switched up to using <a href="https://www.gnu.org/software/ddrescue/"><code>ddrescue</code></a>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo ddrescue /dev/sdc 2.raw 2.log --try-again --force --verbose
</span></span></code></pre></div><p>This worked great, and was able to create a full 750GB image, slowly retiring the failed blocks, recovering as much as possible. After about a week of copying I had four disk images, <code>1.raw</code>, <code>2.raw</code>, <code>3.raw</code>, <code>4.raw</code>, with only the 2nd disk having problems.</p>
<p>I now, <code>chmod -w *.raw</code> to remove write permissions to the images, helping to prevent a future step accidently altered the images.</p>
<h1 id="2-mounting-the-images">2. Mounting the images</h1>
<p>To mount the images I use <code>losetup</code> (roughly following instructions <a href="https://askubuntu.com/questions/663027/create-raid-array-of-image-files">here</a>), specifically:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo losetup -r /dev/loop31 1.raw
</span></span><span class="line"><span class="cl">$ sudo losetup -r /dev/loop32 2.raw
</span></span><span class="line"><span class="cl">$ sudo losetup -r /dev/loop33 3.raw
</span></span><span class="line"><span class="cl">$ sudo losetup -r /dev/loop34 4.raw
</span></span></code></pre></div><p>Later I would use <code>sudo losetup -d /dev/loop3[1234]</code> to unmount these images. I then decided to inspect these drives, to see what partitions were on them:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo fdisk -l /dev/loop31
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Disk /dev/loop31: 698.65 GiB, <span class="m">750156374016</span> bytes, <span class="m">1465149168</span> sectors
</span></span><span class="line"><span class="cl">Units: sectors of <span class="m">1</span> * <span class="nv">512</span> <span class="o">=</span> <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">Sector size <span class="o">(</span>logical/physical<span class="o">)</span>: <span class="m">512</span> bytes / <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">I/O size <span class="o">(</span>minimum/optimal<span class="o">)</span>: <span class="m">512</span> bytes / <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">Disklabel type: dos
</span></span><span class="line"><span class="cl">Disk identifier: 0xd204616a
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Device        Boot Start        End    Sectors Size Id Type
</span></span><span class="line"><span class="cl">/dev/loop31p1          <span class="m">1</span> <span class="m">4294967295</span> <span class="m">4294967295</span>   2T ee GPT
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo fdisk -l /dev/loop32
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Disk /dev/loop32: 698.65 GiB, <span class="m">750156374016</span> bytes, <span class="m">1465149168</span> sectors
</span></span><span class="line"><span class="cl">Units: sectors of <span class="m">1</span> * <span class="nv">512</span> <span class="o">=</span> <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">Sector size <span class="o">(</span>logical/physical<span class="o">)</span>: <span class="m">512</span> bytes / <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">I/O size <span class="o">(</span>minimum/optimal<span class="o">)</span>: <span class="m">512</span> bytes / <span class="m">512</span> bytes
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo fdisk -l /dev/loop33
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Disk /dev/loop33: 698.65 GiB, <span class="m">750156374016</span> bytes, <span class="m">1465149168</span> sectors
</span></span><span class="line"><span class="cl">Units: sectors of <span class="m">1</span> * <span class="nv">512</span> <span class="o">=</span> <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">Sector size <span class="o">(</span>logical/physical<span class="o">)</span>: <span class="m">512</span> bytes / <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">I/O size <span class="o">(</span>minimum/optimal<span class="o">)</span>: <span class="m">512</span> bytes / <span class="m">512</span> bytes
</span></span><span class="line"><span class="cl">Disklabel type: dos
</span></span><span class="line"><span class="cl">Disk identifier: 0x899c1289
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Device        Boot      Start        End    Sectors   Size Id Type
</span></span><span class="line"><span class="cl">/dev/loop33p1        <span class="m">33488921</span> <span class="m">4294836216</span> <span class="m">4261347296</span>     2T ee GPT
</span></span><span class="line"><span class="cl">/dev/loop33p2        <span class="m">35651584</span>   <span class="m">35651584</span>          <span class="m">0</span>     0B  <span class="m">0</span> Empty
</span></span><span class="line"><span class="cl">/dev/loop33p3               <span class="m">0</span>    <span class="m">1377535</span>    <span class="m">1377536</span> 672.6M <span class="m">12</span> Compaq diagnostics
</span></span><span class="line"><span class="cl">/dev/loop33p4      <span class="m">3071040408</span> <span class="m">3104693987</span>   <span class="m">33653580</span>    16G <span class="m">64</span> Novell Netware <span class="m">286</span>
</span></span></code></pre></div><p>Disk 1 had a single partition, disk 2 and 4 had no partitions, and the 3rd disk had four! Those partitions looked a little weird, and I wondered for a minute if I mixed up my drives, or reformatted them at some point. I tried to mount them to no success, so I just assumed the RAID added something that looked like a real partition table. So I moved onto the next step.</p>
<h1 id="3-use-mdadm-to-construct-an-array">3. Use <code>mdadm</code> to construct an array.</h1>
<p>This is where it got difficult, due to limitations of mounting local disks, and the Intel Storage Matrix support.</p>
<p>I started by asking <code>mdadm</code> to examine the images (telling it to use <code>imsm</code>):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo mdadm --examine -e imsm /dev/loop31
</span></span><span class="line"><span class="cl">mdadm: /dev/loop31 is not attached to Intel<span class="o">(</span>R<span class="o">)</span> RAID controller.
</span></span><span class="line"><span class="cl">mdadm: Failed to retrieve serial <span class="k">for</span> /dev/loop31
</span></span><span class="line"><span class="cl">mdadm: Failed to load all information sections on /dev/loop31
</span></span></code></pre></div><p>Well that’s not a great start. If I understand the error <code>/dev/loop31 is not attached to Intel(R) RAID controller</code> it implies I need to connect my drive (or in this case loopback disk image) via a real RAID controller. Well that defeats my whole plan. After some googling, I found this <a href="https://askubuntu.com/questions/1239082/reassemble-intel-rst-raid-on-another-mainboard">stackoverflow post</a> pointing out there is a <code>IMSM_NO_PLATFORM=1</code> environment various I could set. The messaging <code>is not attached to Intel(R) RAID controller</code> was really a warning, and had no actual bearing on the problem.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo <span class="nv">IMSM_NO_PLATFORM</span><span class="o">=</span><span class="m">1</span> mdadm --examine -e imsm <span class="se">\
</span></span></span><span class="line"><span class="cl">  /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
</span></span><span class="line"><span class="cl">mdadm: no recogniseable superblock on /dev/loop34
</span></span><span class="line"><span class="cl">mdadm: Cannot assemble mbr metadata on /dev/loop33
</span></span><span class="line"><span class="cl">mdadm: no recogniseable superblock on /dev/loop32
</span></span><span class="line"><span class="cl">mdadm: Cannot assemble mbr metadata on /dev/loop31
</span></span></code></pre></div><p>A new set of errors, but they did not look promising. More head scratching, and I hit a bit of a dead end. I now wondered if the drives were corrupt, making the superblocks unreadable. I decided to start to read the source code for <code>mdadm</code> to try and understand the superblock format, and see what was wrong.</p>
<p>It indicated the <a href="https://github.com/neilbrown/mdadm/blob/5f4184557a98bb641a7889e280265109c73e2f43/super-intel.c#L242">superblock</a> (the data structure containing information about the array) was two sectors from the end of the disk, starting with the string <code>Intel Raid ISM Cfg Sig. </code>.</p>
<p>Guessing that a sector is 512 bytes long, I did the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ tail -c <span class="m">1024</span> 3.raw  <span class="p">|</span> hd
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="m">00000000</span>  <span class="m">49</span> 6e <span class="m">74</span> <span class="m">65</span> 6c <span class="m">20</span> <span class="m">52</span> <span class="m">61</span>  <span class="m">69</span> <span class="m">64</span> <span class="m">20</span> <span class="m">49</span> <span class="m">53</span> 4d <span class="m">20</span> <span class="m">43</span>  <span class="p">|</span>Intel Raid ISM C<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000010</span>  <span class="m">66</span> <span class="m">67</span> <span class="m">20</span> <span class="m">53</span> <span class="m">69</span> <span class="m">67</span> 2e <span class="m">20</span>  <span class="m">31</span> 2e <span class="m">33</span> 2e <span class="m">30</span> <span class="m">30</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span><span class="nb">fg</span> Sig. 1.3.00..<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000020</span>  cc c0 3d de <span class="m">48</span> <span class="m">02</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">40</span> d5 <span class="m">11</span> d4 <span class="m">09</span> ae <span class="m">19</span> <span class="m">00</span>  <span class="p">|</span>..<span class="o">=</span>.H...@.......<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000030</span>  f8 <span class="m">11</span> <span class="m">00</span> <span class="m">00</span> <span class="m">10</span> <span class="m">00</span> <span class="m">00</span> a0  <span class="m">04</span> <span class="m">01</span> <span class="m">02</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000040</span>  <span class="m">40</span> d5 <span class="m">11</span> d4 <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>@...............<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000050</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl">*
</span></span><span class="line"><span class="cl">000000d0  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">53</span> <span class="m">31</span> <span class="m">33</span> <span class="m">55</span> 4a <span class="m">31</span> 4b <span class="m">51</span>  <span class="p">|</span>........S13UJ1KQ<span class="p">|</span>
</span></span><span class="line"><span class="cl">000000e0  <span class="m">34</span> <span class="m">30</span> <span class="m">33</span> <span class="m">33</span> <span class="m">33</span> <span class="m">37</span> <span class="m">00</span> <span class="m">00</span>  f0 <span class="m">66</span> <span class="m">54</span> <span class="m">57</span> <span class="m">00</span> <span class="m">00</span> <span class="m">01</span> <span class="m">00</span>  <span class="p">|</span>403337...fTW....<span class="p">|</span>
</span></span><span class="line"><span class="cl">000000f0  3a <span class="m">01</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>:...............<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000100</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">53</span> <span class="m">31</span> <span class="m">33</span> <span class="m">55</span> 4a <span class="m">44</span> <span class="m">57</span> <span class="m">51</span>  <span class="p">|</span>........S13UJDWQ<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000110</span>  <span class="m">33</span> <span class="m">34</span> <span class="m">36</span> <span class="m">34</span> <span class="m">35</span> <span class="m">37</span> <span class="m">00</span> <span class="m">00</span>  f0 <span class="m">66</span> <span class="m">54</span> <span class="m">57</span> <span class="m">00</span> <span class="m">00</span> <span class="m">02</span> <span class="m">00</span>  <span class="p">|</span>346457...fTW....<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000120</span>  3a <span class="m">01</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>:...............<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000130</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">53</span> <span class="m">31</span> <span class="m">33</span> <span class="m">55</span> 4a <span class="m">44</span> <span class="m">57</span> <span class="m">51</span>  <span class="p">|</span>........S13UJDWQ<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000140</span>  <span class="m">33</span> <span class="m">34</span> <span class="m">36</span> <span class="m">36</span> <span class="m">36</span> <span class="m">38</span> <span class="m">00</span> <span class="m">00</span>  f0 <span class="m">66</span> <span class="m">54</span> <span class="m">57</span> <span class="m">00</span> <span class="m">00</span> <span class="m">03</span> <span class="m">00</span>  <span class="p">|</span>346668...fTW....<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000150</span>  3a <span class="m">01</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>:...............<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000160</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">53</span> <span class="m">31</span> <span class="m">33</span> <span class="m">55</span> 4a <span class="m">31</span> 4b <span class="m">51</span>  <span class="p">|</span>........S13UJ1KQ<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000170</span>  <span class="m">34</span> <span class="m">30</span> <span class="m">33</span> <span class="m">33</span> <span class="m">32</span> <span class="m">34</span> 3a <span class="m">30</span>  <span class="m">00</span> <span class="m">66</span> <span class="m">54</span> <span class="m">57</span> ff ff ff ff  <span class="p">|</span>403324:0.fTW....<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000180</span>  <span class="m">02</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000190</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">52</span> <span class="m">41</span> <span class="m">49</span> <span class="m">44</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>........RAID....<span class="p">|</span>
</span></span><span class="line"><span class="cl">000001a0  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> f8 <span class="nb">fc</span> <span class="m">05</span> <span class="m">01</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl">000001b0  8c <span class="m">10</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">01</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl">000001c0  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl">*
</span></span><span class="line"><span class="cl">000001e0  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  a6 a8 ae <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl">000001f0  <span class="m">00</span> <span class="m">02</span> <span class="m">00</span> ff <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl"><span class="m">00000200</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span> <span class="m">00</span>  <span class="p">|</span>................<span class="p">|</span>
</span></span><span class="line"><span class="cl">*
</span></span><span class="line"><span class="cl"><span class="m">00000400</span>
</span></span></code></pre></div><p>Boom, the super block was there, started with a valid header, and even had other fields that looked correct (e.g S13UJ1KQ being a serial number of the drive).</p>
<p>Ok, so now I’m confused about what is wrong, and I wondered if this was a bug in <code>mdadm</code>. Going back I remember the first error I got contained <code>Failed to retrieve serial</code>, and I noticed the serial numbers were in the super block (e.g S13UJ1KQ). It then occurred to me, that once I imaged the hard drives, the images don’t contain the serial numbers!</p>
<p>Inspecting the code some more, it would fail with that error if it was unable to read the drive’s serial number. The loopback device doesn’t support serial numbers, so this started to make sense. I did however found a undocumented environment variable <code>IMSM_DEVNAME_AS_SERIAL</code>, which would instead of reading the serial number from the hardware, just use the name of the device as the serial (e.g <code>/dev/loop31</code>). This feature seems explicitly designed to help testing the <code>mdadm</code> codebase.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo <span class="nv">IMSM_DEVNAME_AS_SERIAL</span><span class="o">=</span><span class="m">1</span> <span class="nv">IMSM_NO_PLATFORM</span><span class="o">=</span><span class="m">1</span> mdadm --examine -e imsm /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
</span></span><span class="line"><span class="cl">…
</span></span><span class="line"><span class="cl">/dev/loop31:
</span></span><span class="line"><span class="cl">          Magic : Intel Raid ISM Cfg Sig.
</span></span><span class="line"><span class="cl">        Version : 1.3.00
</span></span><span class="line"><span class="cl">    Orig Family : d411d540
</span></span><span class="line"><span class="cl">         Family : d411d540
</span></span><span class="line"><span class="cl">     Generation : 0019ae09
</span></span><span class="line"><span class="cl">     Attributes : All supported
</span></span><span class="line"><span class="cl">           UUID : ff44bc31:56060902:afb34379:b0faf183
</span></span><span class="line"><span class="cl">       Checksum : de3dc0cc correct
</span></span><span class="line"><span class="cl">    MPB Sectors : <span class="m">2</span>
</span></span><span class="line"><span class="cl">          Disks : <span class="m">4</span>
</span></span><span class="line"><span class="cl">   RAID Devices : <span class="m">1</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">[</span>RAID<span class="o">]</span>:
</span></span><span class="line"><span class="cl">           UUID : 676c222f:760eaf46:97bd30b8:989d2470
</span></span><span class="line"><span class="cl">     RAID Level : <span class="m">5</span>
</span></span><span class="line"><span class="cl">        Members : <span class="m">4</span>
</span></span><span class="line"><span class="cl">          Slots : <span class="o">[</span>UUU_<span class="o">]</span>
</span></span><span class="line"><span class="cl">    Failed disk : <span class="m">3</span>
</span></span><span class="line"><span class="cl">      This Slot : ?
</span></span><span class="line"><span class="cl">    Sector Size : <span class="m">512</span>
</span></span><span class="line"><span class="cl">     Array Size : <span class="m">4395431936</span> <span class="o">(</span>2095.91 GiB 2250.46 GB<span class="o">)</span>
</span></span><span class="line"><span class="cl">   Per Dev Size : <span class="m">1465144328</span> <span class="o">(</span>698.64 GiB 750.15 GB<span class="o">)</span>
</span></span><span class="line"><span class="cl">  Sector Offset : <span class="m">0</span>
</span></span><span class="line"><span class="cl">    Num Stripes : <span class="m">11446438</span>
</span></span><span class="line"><span class="cl">     Chunk Size : <span class="m">64</span> KiB
</span></span><span class="line"><span class="cl">       Reserved : <span class="m">0</span>
</span></span><span class="line"><span class="cl">  Migrate State : idle
</span></span><span class="line"><span class="cl">      Map State : degraded
</span></span><span class="line"><span class="cl">    Dirty State : clean
</span></span><span class="line"><span class="cl">     RWH Policy : off
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  Disk00 Serial : S13UJ1KQ403337
</span></span><span class="line"><span class="cl">          State : active
</span></span><span class="line"><span class="cl">             Id : <span class="m">00010000</span>
</span></span><span class="line"><span class="cl">    Usable Size : <span class="m">1465138766</span> <span class="o">(</span>698.63 GiB 750.15 GB<span class="o">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  Disk01 Serial : S13UJDWQ346457
</span></span><span class="line"><span class="cl">          State : active
</span></span><span class="line"><span class="cl">             Id : <span class="m">00020000</span>
</span></span><span class="line"><span class="cl">    Usable Size : <span class="m">1465138766</span> <span class="o">(</span>698.63 GiB 750.15 GB<span class="o">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  Disk02 Serial : S13UJDWQ346668
</span></span><span class="line"><span class="cl">          State : active
</span></span><span class="line"><span class="cl">             Id : <span class="m">00030000</span>
</span></span><span class="line"><span class="cl">    Usable Size : <span class="m">1465138766</span> <span class="o">(</span>698.63 GiB 750.15 GB<span class="o">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  Disk03 Serial : S13UJ1KQ403324:0
</span></span><span class="line"><span class="cl">          State : active
</span></span><span class="line"><span class="cl">             Id : ffffffff
</span></span><span class="line"><span class="cl">    Usable Size : <span class="m">1465138526</span> <span class="o">(</span>698.63 GiB 750.15 GB<span class="o">)</span>
</span></span></code></pre></div><p>Ok, slowly making progress! Now it lists all the superblock information, and I was happy to see <code>Checksum : de3dc0cc correct</code>, etc. However, it listed <code>Failed disk : 3</code>, and <code>This Slot : ?</code>. It made me think without the valid serial numbers, it didn’t know which drive was which, and thus couldn’t assemble the array.</p>
<blockquote>
<p>This made me ponder that if I was ever going to create a RAID array implementation, I would not make it depend on information from the hardware. How do folks re-image disks? What is wrong with some GUID in the superblock to identify the disk? Ok digression aside.</p>
</blockquote>
<p>To move forward, I needed to trick <code>mdadm</code> to think that serial <code>/dev/loop31</code> was actually the real hardware. I went back to my drives, and visibility inspected them to check the serial numbers.</p>
<pre tabindex="0"><code>  Disk00 Serial : S13UJ1KQ403337   1.raw
  Disk01 Serial : S13UJDWQ346457   2.raw
  Disk02 Serial : S13UJDWQ346668   4.raw
  Disk03 Serial : S13UJ1KQ403324   3.raw
</code></pre><p>At this point, I realised I had accidentally swapped drives 3 and 4. Quickly renaming them got them into the correct order.</p>
<p>Since I had already looked over the mdadm source code, it seemed a simple clean codebase, so I decided to change it to accept serial numbers. After a little while I did the hackiest thing possible:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-diff" data-lang="diff"><span class="line"><span class="cl"><span class="gh">diff --git a/super-intel.c b/super-intel.c
</span></span></span><span class="line"><span class="cl"><span class="gh">index da376251..d466d911 100644
</span></span></span><span class="line"><span class="cl"><span class="gd">--- a/super-intel.c
</span></span></span><span class="line"><span class="cl"><span class="gi">+++ b/super-intel.c
</span></span></span><span class="line"><span class="cl"><span class="gu">@@ -3994,6 +3994,20 @@ static int nvme_get_serial(int fd, void *buf, size_t buf_len)
</span></span></span><span class="line"><span class="cl">        if (!name)
</span></span><span class="line"><span class="cl">                return 1;
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gi">+       if (strcmp(name, &#34;loop31&#34;) == 0) {
</span></span></span><span class="line"><span class="cl"><span class="gi">+               strcpy((char *)buf, &#34;S13UJ1KQ403337&#34;);
</span></span></span><span class="line"><span class="cl"><span class="gi">+               return 0;
</span></span></span><span class="line"><span class="cl"><span class="gi">+       } else if (strcmp(name, &#34;loop32&#34;) == 0) {
</span></span></span><span class="line"><span class="cl"><span class="gi">+               strcpy((char *)buf, &#34;S13UJDWQ346457&#34;);
</span></span></span><span class="line"><span class="cl"><span class="gi">+               return 0;
</span></span></span><span class="line"><span class="cl"><span class="gi">+       } else if (strcmp(name, &#34;loop33&#34;) == 0) {
</span></span></span><span class="line"><span class="cl"><span class="gi">+               strcpy((char *)buf, &#34;S13UJDWQ346668&#34;);
</span></span></span><span class="line"><span class="cl"><span class="gi">+               return 0;
</span></span></span><span class="line"><span class="cl"><span class="gi">+       } else if (strcmp(name, &#34;loop34&#34;) == 0) {
</span></span></span><span class="line"><span class="cl"><span class="gi">+               strcpy((char *)buf, &#34;S13UJ1KQ403324&#34;);
</span></span></span><span class="line"><span class="cl"><span class="gi">+               return 0;
</span></span></span><span class="line"><span class="cl"><span class="gi">+       }
</span></span></span><span class="line"><span class="cl"><span class="gi">+
</span></span></span><span class="line"><span class="cl">        if (strncmp(name, &#34;nvme&#34;, 4) != 0)
</span></span><span class="line"><span class="cl">                return 1;
</span></span></code></pre></div><p>The <code>nvme_get_serial</code> function now had hard coded serial numbers when reading loop3[1234]. This obviously isn’t a generalised solution, but worked for me. Go open source!.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ make mdadm
</span></span><span class="line"><span class="cl">…
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ sudo <span class="nv">IMSM_NO_PLATFORM</span><span class="o">=</span><span class="m">1</span> ./mdadm --examine -e imsm /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
</span></span></code></pre></div><p>Examine looked good, so the moment of truth, let’s assemble.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo <span class="nv">IMSM_NO_PLATFORM</span><span class="o">=</span><span class="m">1</span> ./mdadm --assemble --readonly -e imsm /dev/md0 /dev/loop31 /dev/loop32 /dev/loop33 /dev/loop34
</span></span><span class="line"><span class="cl">mdadm: Container /dev/md0 has been assembled with <span class="m">3</span> drives
</span></span></code></pre></div><p>Ok mixed success, it says 3 drives, but I would expect 4… But let’s keep going</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo ./mdadm --assemble --scan
</span></span><span class="line"><span class="cl">mdadm: Started /dev/md/RAID_0 with <span class="m">3</span> devices
</span></span></code></pre></div><p>W00t! It Started without errors!</p>
<p>I now have a <code>/dev/md0</code>, <code>/dev/md127</code> and <code>/dev/md127p1</code> devices.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo mount -o ro /dev/md127p1 /mnt/raid5
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ ls /mnt/raid5
</span></span><span class="line"><span class="cl">… lots of old files...
</span></span></code></pre></div><p>YAY. Finished!</p>
<p>Ok, I’m not sure why it says three drives not four.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo ./mdadm --detail /dev/md127
</span></span><span class="line"><span class="cl">/dev/md127:
</span></span><span class="line"><span class="cl">         Container : /dev/md0, member <span class="m">0</span>
</span></span><span class="line"><span class="cl">        Raid Level : raid5
</span></span><span class="line"><span class="cl">        Array Size : <span class="m">2197715968</span> <span class="o">(</span>2.05 TiB 2.25 TB<span class="o">)</span>
</span></span><span class="line"><span class="cl">     Used Dev Size : <span class="m">732572032</span> <span class="o">(</span>698.64 GiB 750.15 GB<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Raid Devices : <span class="m">4</span>
</span></span><span class="line"><span class="cl">     Total Devices : <span class="m">3</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">             State : clean, degraded
</span></span><span class="line"><span class="cl">    Active Devices : <span class="m">3</span>
</span></span><span class="line"><span class="cl">   Working Devices : <span class="m">3</span>
</span></span><span class="line"><span class="cl">    Failed Devices : <span class="m">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            Layout : left-asymmetric
</span></span><span class="line"><span class="cl">        Chunk Size : 64K
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Consistency Policy : resync
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">              UUID : 676c222f:760eaf46:97bd30b8:989d2470
</span></span><span class="line"><span class="cl">    Number   Major   Minor   RaidDevice State
</span></span><span class="line"><span class="cl">       <span class="m">2</span>       <span class="m">7</span>       <span class="m">31</span>        <span class="m">0</span>      active sync   /dev/loop31
</span></span><span class="line"><span class="cl">       <span class="m">1</span>       <span class="m">7</span>       <span class="m">32</span>        <span class="m">1</span>      active sync   /dev/loop32
</span></span><span class="line"><span class="cl">       <span class="m">0</span>       <span class="m">7</span>       <span class="m">33</span>        <span class="m">2</span>      active sync   /dev/loop33
</span></span><span class="line"><span class="cl">       -       <span class="m">0</span>        <span class="m">0</span>        <span class="m">3</span>      removed
</span></span></code></pre></div><p>This does seem to imply a drive is missing. Maybe it doesn’t matter, as it mounted successfully, and I can copy all my data off the array.</p>
<h1 id="conclusion">Conclusion</h1>
<p>This did not seem the easiest task, and there were a few road bumps along the way. Hopefully the hacks in here will help someone else out in a similar situation.</p>
<p>To finally clean up, you can run this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo umount /mnt/raid5
</span></span><span class="line"><span class="cl">$ sudo mdadm --stop /dev/md127
</span></span><span class="line"><span class="cl">$ sudo mdadm --stop /dev/md0
</span></span><span class="line"><span class="cl">$ sudo losetup -d /dev/loop3<span class="o">[</span>1234<span class="o">]</span>
</span></span></code></pre></div></description>
    </item>
    
    <item>
      <title>Alternative Milks</title>
      <link>https://blog.bramp.net/post/2021/04/03/alternative-milks/</link>
      <pubDate>Sat, 03 Apr 2021 12:44:51 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2021/04/03/alternative-milks/</guid>
      <description><p>I&rsquo;ve not been getting out as much during Covid, as such I&rsquo;ve tried to reduce my caloric intake. One way I tried to do this, was to switch the kind of milk I drink. This also had a secondary impact on reducing my environmental impact. However, while researching the various milks, I couldn&rsquo;t find one source that put both nutritional information, and the environmental impact of the various milks in one place. This article does just that.</p>
<p>During my research, I found there are minor differences in the US and UK in how they represent this data. For example, a UK serving size is 200ml, whereas in the US it&rsquo;s 1 cup (or ~240ml). I&rsquo;ve normalised all values to the UK serving size.</p>
<!-- Generated at https://www.tablesgenerator.com/html_tables# -->
<style type="text/css">
.tg  {
	text-align: center;
}
.tg .tg-jkyp {
	white-space:nowrap;
	text-align:right;
}
.tg .tg-pb0m {
	white-space:nowrap;
	text-align: center;
	font-weight: bold;
}

</style>
<div class="overflow-center">
<table class="tg table table-striped table-hover table-condensed" data-sortable>
<thead>
  <tr>
    <th class="tg-jkyp">Type</th>
    <th class="tg-pb0m" data-sorted="true" data-sorted-direction="descending">Emissions<br/>(Kg)</th>
    <th class="tg-pb0m">Land Usage<br/>(m<sup>2</sup>)</th>
    <th class="tg-pb0m">Water Usage<br/>(L)</th>
    <th class="tg-jkyp">Variant</th>
    <th class="tg-pb0m">Calories<br/>(kcals)</th>
    <th class="tg-pb0m">Calcium<br/>(mg)</th>
    <th class="tg-pb0m">Fat<br/>(g)</th>
    <th class="tg-pb0m">Sat Fat<br/>(g)</th>
    <th class="tg-pb0m">Sugar<br/>(g)</th>
    <th class="tg-pb0m">Protein<br/>(g)</th>
    <th></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-jkyp">Dairy (Cow's) Milk</td>
    <td>0.630</td>
    <td>1.790</span></td>
    <td>125.64</span></td>
    <td class="tg-jkyp">Whole</td>
    <td>120</td>
    <td>246</td>
    <td>6.40</td>
    <td>3.72</td>
    <td>9.62</td>
    <td>6.56</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097512/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Dairy (Cow's) Milk</td>
    <td>0.630</span></td>
    <td>1.790</span></td>
    <td>125.64</span></td>
    <td class="tg-jkyp">Whole (Lactose free)</td>
    <td>120</td>
    <td>246</td>
    <td>6.40</td>
    <td>3.72</td>
    <td>9.62</td>
    <td>6.56</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097525/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Dairy (Cow's) Milk</td>
    <td>0.630</span></td>
    <td>1.790</span></td>
    <td>125.64</span></td>
    <td class="tg-jkyp">Reduced fat (2%)</td>
    <td>100</td>
    <td>252</td>
    <td>3.80</td>
    <td>2.22</td>
    <td>9.78</td>
    <td>6.70</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097517/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Dairy (Cow's) Milk</td>
    <td>0.630</span></td>
    <td>1.790</span></td>
    <td>125.64</span></td>
    <td class="tg-jkyp">Fat free (skim)</td>
    <td>68</td>
    <td>264</td>
    <td>0.16</td>
    <td>0.10</td>
    <td>10.10</td>
    <td>6.86</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097521/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Rice Milk</td>
    <td class="tg-em69">0.236</span></td>
    <td class="tg-j1aw">0.067</span></td>
    <td class="tg-i9py">53.96</span></td>
    <td class="tg-jkyp">Rice Milk</td>
    <td>94</td>
    <td>236</td>
    <td>1.94</td>
    <td>0.00</td>
    <td>10.56</td>
    <td>0.56</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097552/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Soy Milk</td>
    <td class="tg-wl0g">0.196</span></td>
    <td class="tg-aj4d">0.132</span></td>
    <td class="tg-j1aw">5.56</span></td>
    <td class="tg-jkyp">Soy Milk</td>
    <td>86</td>
    <td>246</td>
    <td>2.94</td>
    <td>0.41</td>
    <td>7.30</td>
    <td>5.20</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097542/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Oat Milk</td>
    <td class="tg-a46i">0.181</span></td>
    <td class="tg-em69">0.152</span></td>
    <td class="tg-vnq0">9.65</span></td>
    <td class="tg-jkyp">OATLY! (Brand)</td>
    <td>100</td>
    <td>292</td>
    <td>4.16</td>
    <td>0.42</td>
    <td>5.84</td>
    <td>2.50</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/719016/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Almond Milk</td>
    <td class="tg-j1aw">0.140</span></td>
    <td class="tg-zomh">0.099</span></td>
    <td class="tg-em69">74.29</span></td>
    <td class="tg-jkyp">Sweetened</td>
    <td>60</td>
    <td>354</td>
    <td>1.86</td>
    <td>0.15</td>
    <td>9.54</td>
    <td>0.76</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097548/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Almond Milk</td>
    <td class="tg-j1aw">0.140</span></td>
    <td class="tg-zomh">0.099</span></td>
    <td class="tg-em69">74.29</span></td>
    <td class="tg-jkyp">Unsweetened</td>
    <td>30</td>
    <td>368</td>
    <td>1.92</td>
    <td>0.16</td>
    <td>1.62</td>
    <td>0.80</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097550/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Goat’s Milk</td>
    <td>?</td>
    <td>?</td>
    <td>?</td>
    <td class="tg-jkyp">Whole</td>
    <td>138</td>
    <td>268</td>
    <td>8.28</td>
    <td>5.33</td>
    <td>8.90</td>
    <td>7.12</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097531/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Coconut Milk</td>
    <td>?</td>
    <td>?</td>
    <td>?</td>
    <td class="tg-jkyp">Coconut Milk</td>
    <td>62</td>
    <td>376</td>
    <td>4.16</td>
    <td>4.17</td>
    <td>5.00</td>
    <td>0.42</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097553/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Flax Hemp Milk</td>
    <td>?</td>
    <td>?</td>
    <td>?</td>
    <td class="tg-jkyp">Flax Hemp Milk</td>
    <td>38</td>
    <td>24</td>
    <td>2.50</td>
    <td>0.00</td>
    <td>0.84</td>
    <td>1.66</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/468978/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Human Milk</td>
    <td>?</td>
    <td>?</td>
    <td>?</td>
    <td class="tg-jkyp">Human Milk</td>
    <td>140</td>
    <td>64</td>
    <td>8.76</td>
    <td>0.64</td>
    <td>13.78</td>
    <td>2.06</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097510/nutrients">(source)</a></td>
  </tr>
  <tr>
    <td class="tg-jkyp">Chocolate Milk</td>
    <td>?</td>
    <td>?</td>
    <td>?</td>
    <td class="tg-jkyp">Chocolate Milk</td>
    <td>98</td>
    <td>200</td>
    <td>0.80</td>
    <td>0.02</td>
    <td>17.36</td>
    <td>1.28</td>
    <td><a href="https://fdc.nal.usda.gov/fdc-app.html#/food-details/1097699/nutrients">(source)</a></td>
  </tr>
</tbody>
<tfoot>
  <tr>
    <td class="tg-pb0m" colspan="4">Environment Impact per 200ml <a href="https://www.bbc.com/news/science-environment-46654042">(source)</a></td>
    <td class="tg-pb0m" colspan="8">Nutritional Information per 200g (which is approx 200ml)</td>
  </tr>
</tfoot>
</table>
</div>
<p>I’m not going to provide any kind of editorial, I just wanted all the data in one place. The data is sourced from the following locations.</p>
<p>General Comparison:</p>
<ul>
<li><a href="https://www.healthline.com/health/milk-almond-cow-soy-rice">Heathline: Comparing Milks: Almond, Dairy, Soy, Rice, and Coconut</a></li>
<li><a href="https://www.bbcgoodfood.com/howto/guide/which-milk-right-you">BBC Good Food: Which milk is right for you?</a></li>
</ul>
<p>Environment:</p>
<ul>
<li><a href="https://www.theguardian.com/environment/2020/jan/28/what-plant-milk-should-i-drink-almond-killing-bees-aoe">The Guardian: Almonds are out. Dairy is a disaster. So what milk should we drink?</a></li>
<li><a href="https://www.ediblebrooklyn.com/2020/plant-milks-sustainability/">Edible Brooklyn: Which Plant-Based Milk Is Best for the Environment?</a></li>
<li><a href="https://www.bbc.com/news/science-environment-46654042">BBC: Climate change: Which vegan milk is best?</a></li>
<li><a href="https://ora.ox.ac.uk/objects/uuid:b0b53649-5e93-4415-bf07-6b0b1227172f">Poore, J., &amp; Nemecek, T. (2018). Reducing food’s environmental impacts through producers and consumers. Science, 360(6392), 987–992.</a></li>
</ul>
<p>Nutritional:</p>
<ul>
<li><a href="https://www.gov.uk/government/publications/composition-of-foods-integrated-dataset-cofid">2021 McCance and Widdowson&rsquo;s Composition of Foods</a> (<a href="https://docs.google.com/spreadsheets/d/1aA2y5vMbS8_J8e9pUep9cPVVjrPz4NAO92nncYdoN6Y/edit#gid=1197879062">Integrated Dataset</a> | <a href="https://docs.google.com/spreadsheets/d/1j46zZ39cPrn0wT8IGujw-USaYeTQArhLyejQcxFXsKs/edit#gid=279090816">Old Foods</a> )</li>
<li><a href="https://fdc.nal.usda.gov/">U.S. Department of Agriculture, Agricultural Research Service. FoodData Central, 2021. fdc.nal.usda.gov.</a></li>
</ul>
</description>
    </item>
    
    <item>
      <title>Local HTTPS Server for development</title>
      <link>https://blog.bramp.net/post/2020/12/27/local-https-server-for-development/</link>
      <pubDate>Sun, 27 Dec 2020 09:14:22 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2020/12/27/local-https-server-for-development/</guid>
      <description><p>I regularly do web development with the host localhost. Running a simple HTTP server to service my site. Recently I came across a problem where some of the newer web APIs (such as DeviceMotionEvent) do not work unless the site is served via SSL. So I went about setting up a local SSL server, and certificate.</p>
<p>Many of the <a href="https://matthewhoelter.com/2019/10/21/how-to-setup-https-on-your-local-development-environment-localhost-in-minutes.html">instructions out there</a> create a self-signed certificate, that you install to be trusted locally. I wanted my development server to be accessible from other devices on my network, and I didn&rsquo;t want the hassle of installing this self-signed cert. Instead I wanted a SSL certificate that uses a real/trusted CA.</p>
<p>Enter <a href="https://letsencrypt.org/">Let&rsquo;s Encrypt</a>, a free service to provide SSL certificates, providing you can prove you own the domain. To go about this, I did the following on my macbook:</p>
<h1 id="install-certbot-to-generate-the-cert">Install Certbot (to generate the cert)</h1>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">brew install certbot
</span></span></code></pre></div><p>There are a few ways to prove you own a domain, the HTTP based ones require a public web server. Since my development server is only on my local network, I&rsquo;m going to use a DNS based proof. Since I use Cloudflare for my DNS, I&rsquo;ll be using their plugin.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">pip3 install certbot-dns-cloudflare
</span></span></code></pre></div><h1 id="setup-the-domain-localbrampnet">Setup the domain (local.bramp.net)</h1>
<p>I use cloudflare to host the DNS for my domain, so I setup a new domain, local.bramp.net, that points to an internal IP address (192.168.0.123). This domain won&rsquo;t actually be used via the Internet, but will happily work for any devices on my local network.</p>
<figure><img src="/post/2020/12/27/local-https-server-for-development/setup-dns.png" width="720" height="208"><figcaption>
      <h4>Setup DNS record for local.bramp.net</h4>
    </figcaption>
</figure>

<p>You&rsquo;ll also need a <a href="https://support.cloudflare.com/hc/en-us/articles/200167836-Where-do-I-find-my-Cloudflare-API-key-">API key from Cloudflare</a>. They allow you to scope the key to only access this test domain. For example:</p>
<figure><img src="/post/2020/12/27/local-https-server-for-development/create-token.png" width="720" height="304"><figcaption>
      <h4>Create a API token</h4>
    </figcaption>
</figure>

<p>That will give you a token, that is a long string of letters and numbers.</p>
<h1 id="configure-certbot">Configure Certbot</h1>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl"><span class="c1"># Create a place to store your secrets, that only you can access</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">mkdir ~/.secrets
</span></span><span class="line"><span class="cl">cat <span class="s">&lt;&lt;EOF &gt; ~/.secrets/cloudflare.ini
</span></span></span><span class="line"><span class="cl"><span class="s">dns_cloudflare_api_token = **your_key**
</span></span></span><span class="line"><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">chmod <span class="m">0700</span> ~/.secrets/
</span></span><span class="line"><span class="cl">chmod <span class="m">0400</span> ~/.secrets/cloudflare.ini
</span></span></code></pre></div><h1 id="generate-the-certificate">Generate the Certificate</h1>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">certbot certonly <span class="se">\
</span></span></span><span class="line"><span class="cl">  --config-dir ~/.secrets/ <span class="se">\
</span></span></span><span class="line"><span class="cl">  --work-dir ~/.secrets/ <span class="se">\
</span></span></span><span class="line"><span class="cl">  --logs-dir ~/.secrets/ <span class="se">\
</span></span></span><span class="line"><span class="cl">  --dns-cloudflare <span class="se">\
</span></span></span><span class="line"><span class="cl">  --dns-cloudflare-credentials ~/.secrets/cloudflare.ini <span class="se">\
</span></span></span><span class="line"><span class="cl">  -d local.bramp.net
</span></span></code></pre></div><p>and voila:</p>
<pre tabindex="0"><code> - Congratulations! Your certificate and chain have been saved at:
   /Users/bramp/.secrets/live/local.bramp.net/fullchain.pem
   Your key file has been saved at:
   /Users/bramp/.secrets/live/local.bramp.net/privkey.pem
</code></pre><p>The <code>privkey.pem</code> is important to keep secret. Normally certbot runs as root, but here we run it as your user for convenience.</p>
<p>If you want this to automatically renew, just run to add a renewal that occurs twice daily at a random minute after 12pm and 12am.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl"><span class="c1"># List your current crontab, and append certbot renewal</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">(</span>crontab -l <span class="p">;</span> <span class="nb">echo</span> <span class="s2">&#34;</span><span class="k">$((</span> RANDOM <span class="o">%</span> <span class="m">60</span> <span class="k">))</span><span class="s2"> 0,12 * * * </span><span class="k">$(</span>which certbot<span class="k">)</span><span class="s2"> renew -q --config-dir ~/.secrets/ --work-dir ~/.secrets/ --logs-dir ~/.secrets/&#34;</span><span class="o">)</span> <span class="p">|</span> crontab -
</span></span></code></pre></div><p>Or you can renew (all certificates) on demand with a simple:</p>
<pre tabindex="0"><code>certbot renew \
  --config-dir ~/.secrets/ \
  --work-dir ~/.secrets/ \
  --logs-dir ~/.secrets/
</code></pre><h1 id="install-a-simple-https-web-server">Install a simple HTTPS web server</h1>
<p>I use <a href="https://github.com/http-party/http-server">http-server</a>, &ldquo;a simple, zero-configuration command-line http server.&rdquo;. It supports many useful features, including SSL.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">brew install http-server
</span></span></code></pre></div><h1 id="running-the-https-web-server">Running the HTTPS web server</h1>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">http-server -S <span class="se">\
</span></span></span><span class="line"><span class="cl">  -C ~/.secrets/live/local.bramp.net/fullchain.pem <span class="se">\
</span></span></span><span class="line"><span class="cl">  -K ~/.secrets/live/local.bramp.net/privkey.pem
</span></span></code></pre></div><p>You may wish to alias this to something shorter, for example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl"><span class="nb">alias</span> <span class="nv">https</span><span class="o">=</span><span class="s2">&#34;http-server -S \
</span></span></span><span class="line"><span class="cl"><span class="s2">  -C ~/.secrets/live/local.bramp.net/fullchain.pem \
</span></span></span><span class="line"><span class="cl"><span class="s2">  -K ~/.secrets/live/local.bramp.net/privkey.pem&#34;</span>
</span></span></code></pre></div><p>Now you can run <code>https</code> from any directory and it&rsquo;ll be served over SSL.</p>
<h1 id="additional-reading">Additional Reading</h1>
<ul>
<li><a href="https://support.cloudflare.com/hc/en-us/articles/200167836-Where-do-I-find-my-Cloudflare-API-key-">Cloudflare - Managing API Tokens and Keys</a></li>
<li><a href="https://certbot.eff.org/lets-encrypt/osx-other">certbot instructions</a></li>
<li><a href="https://mangolassi.it/topic/18355/setup-letsencrypt-certbot-with-cloudflare-dns-authentication-ubuntu/2">Setup LetsEncrypt Certbot with CLoudFlare DNS authentication (Ubuntu)</a></li>
<li><a href="https://matthewhoelter.com/2019/10/21/how-to-setup-https-on-your-local-development-environment-localhost-in-minutes.html">How to setup HTTPS (SSL) on your local development environment (localhost) in minutes</a></li>
<li><a href="https://certbot-dns-cloudflare.readthedocs.io/en/stable/">certbot-dns-cloudflare’s documentation!</a></li>
</ul>
</description>
    </item>
    
    <item>
      <title>Apache Beam and Google Dataflow in Go</title>
      <link>https://blog.bramp.net/post/2019/01/05/apache-beam-and-google-dataflow-in-go/</link>
      <pubDate>Sat, 05 Jan 2019 07:59:08 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2019/01/05/apache-beam-and-google-dataflow-in-go/</guid>
      <description><p><em>Originally <a href="https://blog.gopheracademy.com/advent-2018/apache-beam/">published</a> as part of the Go Advent 2018 series</em></p>
<h1 id="overview">Overview</h1>
<p><a href="https://beam.apache.org/">Apache Beam</a> (<strong>b</strong>atch and str<strong>eam</strong>) is a powerful tool for handling <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel">embarrassingly parallel</a> workloads. It is a evolution of <a href="https://ai.google/research/pubs/pub35650">Google’s Flume</a>, which provides batch and streaming data processing based on the <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce</a> concepts. One of the novel features of Beam is that it’s agnostic to the platform that runs the code. For example, a pipeline can be written once, and run locally, across <a href="https://flink.apache.org/">Flink</a> or <a href="https://spark.apache.org/">Spark</a> clusters, or on <a href="https://cloud.google.com/dataflow/">Google Cloud Dataflow</a>.</p>
<p>An experimental <a href="https://beam.apache.org/documentation/sdks/go/">Go SDK</a> was created for Beam, and while it is still immature compared to Beam for <a href="https://beam.apache.org/documentation/sdks/python/">Python</a> and <a href="https://beam.apache.org/documentation/sdks/java/">Java</a>, it is able to do some impressive things. The remainder of this article will briefly recap a simple example from the Apache Beam site, and then work through a more complex example running on Dataflow. Consider this a more advanced version of the <a href="https://beam.apache.org/get-started/">official getted started guide</a> on the Apache Beam site.</p>
<p>Before we begin, it’s worth pointing out, that if you can do your analysis on a single machine, it is more likely faster, and more cost effective. Beam is more suitable when your data processing needs are large enough they must run in a distributed fashion.</p>
<h2 id="table-of-contents">Table of Contents</h2>
<ul>
<li><a href="#concepts">Concepts</a></li>
<li><a href="#shakespeare-simple-example">Shakespeare (simple example)</a>
<ul>
<li><a href="#running-the-pipeline">Running the pipeline</a></li>
</ul>
</li>
<li><a href="#art-history-more-complex-example">Art history (more complex example)</a>
<ul>
<li><a href="#stateful-functions">Stateful functions</a></li>
<li><a href="#iterating-over-a-cogbk">Iterating over a CoGBK</a></li>
<li><a href="#data-enrichment">Data enrichment</a></li>
<li><a href="#error-handling-and-dead-letters">Error handling and dead letters</a></li>
</ul>
</li>
<li><a href="#gotchas">Gotchas</a>
<ul>
<li><a href="#marshing">Marshing</a></li>
<li><a href="#errors">Errors</a></li>
<li><a href="#difference-between-direct-and-dataflow-runners">Difference between direct and dataflow runners</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
<h1 id="concepts">Concepts</h1>
<p>Beam already has good documentation, that explains all the <a href="https://beam.apache.org/documentation/programming-guide/">main concepts</a>. We will cover some of the basics.</p>
<figure><img src="/post/2019/01/05/apache-beam-and-google-dataflow-in-go/design-your-pipeline-linear.png" width="720" height="175"><figcaption>
      <h4>Pipeline stages</h4>
    </figcaption>
</figure>

<p>A pipeline is made up of multiple steps, that takes some input, operates on that data, and finally produces output. The steps that operates on the data are called PTransforms (parallel transforms), and the data is always stored in PCollections (parallel collections). The PTransform takes one item at a time from the PCollection and operates on it. The PTransform are assumed to be hermetic, using no global state, thus ensuring it will always produce the same output for the given input. These properties allow the data to be sharded into multiple smaller dataset and processed in any order across multiple machines. The code you write ends up being very simple, but is able to seamlessly split across 100s of machines.</p>
<h1 id="shakespeare-simple-example">Shakespeare (simple example)</h1>
<div style="float: right; width: 200px">
	<img src="word-count.png" width=200 height=436>
</div>
<p>A classic example is counting the words in Shakespeare. In brief, the pipeline counts the number of times each word appears across Shakespeare’s works, and outputs a simple key-value list of word to word-count. There is an <a href="https://github.com/apache/beam/blob/master/sdks/go/examples/minimal_wordcount/minimal_wordcount.go">example</a> provided with the Beam SDK, and along with a great <a href="https://beam.apache.org/get-started/wordcount-example/">walk through</a>. I suggest you read that before continuing. I will however dive into some of the Go specifics, and add additional context.</p>
<p>The example begins with <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio#Read"><code>textio.Read</code></a>, which reads all the files under the shakespeare directory stored on <a href="https://cloud.google.com/storage/">Google Cloud Storage</a> (GCS). The files are stored on GCS, so when this pipeline runs across a cluster of machines, they will all have access. <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio#Read"><code>textio.Read</code></a> always returns a <code>PCollection&lt;string&gt;</code> which contains one element for every line in the given files.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">lines</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">textio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;gs://apache-beam-samples/shakespeare/*&#34;</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>The <code>lines</code> PCollection is then processed by a ParDo (<strong>Par</strong>allel <strong>Do</strong>), a type of PTransform. Most transforms are built with a <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a>. It will execute a supplied function in parallel on the source PCollection. In this example, the function is defined inline and very simply splits the input lines into words with a regexp. Each word is then emitted to another <code>PCollection&lt;string&gt;</code> named <code>words</code>. Note how for every line, zero or more words may be emitted, making this new collection a different size to the original.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">splitFunc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">line</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">emit</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="kt">string</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">word</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">wordRE</span><span class="p">.</span><span class="nf">FindAllString</span><span class="p">(</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nf">emit</span><span class="p">(</span><span class="nx">word</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">words</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">splitFunc</span><span class="p">,</span><span class="w"> </span><span class="nx">lines</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>An interesting trick used by the Apache Beam Go API is passing functions as an <code>interface{}</code>, and using reflection to infer the types. Specifically, since <code>lines</code> is a <code>PCollection&lt;string&gt;</code> it is expected that the first argument of <code>splitFunc</code> is a string type. The second argument to <code>splitFunc</code> will allow Beam to infer the type of the <code>words</code> output PCollection. In this example it is a function with a single string argument. Thus the output type will be <code>PCollection&lt;string&gt;</code>. If <code>emit</code> was defined as <code>func(int)</code> then the return type would be a <code>PCollection&lt;int&gt;</code>, and the next PTransform would be expected to handle ints.</p>
<p>The next step uses one of the library’s higher level constructs.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">counted</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stats</span><span class="p">.</span><span class="nf">Count</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">words</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p><a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/transforms/stats#Count"><code>stats.Count</code></a> takes a <code>PCollection&lt;X&gt;</code>, counts each unique element, and outputs a key-value pair of (X, int) as a <code>PCollection&lt;KV&lt;X, int&gt;&gt;</code>. In this specific example, the input is a <code>PCollection&lt;string&gt;</code>, thus the output is <code>PCollection&lt;KV&lt;string, int&gt;&gt;</code></p>
<p>Internally <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/transforms/stats#Count"><code>stats.Count</code></a> it’s made up of multiple ParDos, and a <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey"><code>beam.GroupByKey</code></a>, but it hides that to make it easier to use.</p>
<p>At this point, the counts of each word has been calculated, and the results are stored to a simple text file. To do this the <code>PCollection&lt;KV&lt;string, int&gt;&gt;</code> is converted to a <code>PCollection&lt;string&gt;</code>, containing one element for each line to be written out.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">formatFunc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;%s: %v&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">formatted</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">formatFunc</span><span class="p">,</span><span class="w"> </span><span class="nx">counted</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>Again a <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a> is used, but you’ll notice the <code>formatFunc</code> is slightly different to the <code>splitFunc</code> above. The <code>formatFunc</code> takes two arguments, a string (the key), and a int (the value). These are the pairs in the <code>PCollection&lt;KV&lt;string, int&gt;&gt;</code>. However, the <code>formatFunc</code> does not take a <code>emit func(...)</code> instead it simply returns a type string.</p>
<p>Since the PTransform outputs a single line for each input element, a simpler form of the function can be specified. One where the output element is just returned from the function. The <code>emit func(...)</code> is useful when the number of output elements differ to the number of input elements. If its a 1:1 mapping a return makes the function easier to read. As above this is all inferred at runtime with reflection when the pipeline is being constructed..</p>
<p>Multiple return arguments can also be used. For example, if the output was expected to be <code>PCollection&lt;KV&lt;float64, bool&gt;&gt;</code>, the return type could be <code>func(...) (float64, bool)</code>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">textio</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;wordcounts.txt&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">formatted</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>Finally <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio#Write"><code>textio.Write</code></a> takes the formatted <code>PCollection&lt;string&gt;</code> and writes it to a file named “wordcounts.txt&quot; with one line per element.</p>
<h2 id="running-the-pipeline">Running the pipeline</h2>
<p>To test the pipeline it can easily be run locally like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">go get github.com/apache/beam/sdks/go/examples/wordcount
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> <span class="nv">$GOPATH</span>/src/github.com/apache/beam/sdks/go/examples/wordcount
</span></span><span class="line"><span class="cl">go run wordcount.go --runner<span class="o">=</span>direct
</span></span></code></pre></div><p>To run in a more realistic way, it can be run on <a href="https://cloud.google.com/dataflow/">GCP Dataflow</a>. Before you do so, you need to create a GCP project, create a GCS bucket, enable the Cloud Dataflow APIs, and create a service account. This is documented on the <a href="https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python">Python quickstart guide</a>, under “Before you begin”.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">GOOGLE_APPLICATION_CREDENTIALS</span><span class="o">=</span><span class="nv">$PWD</span>/your-gcp-project.json
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">BUCKET</span><span class="o">=</span>your-gcs-bucket
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">PROJECT</span><span class="o">=</span>your-gcp-project
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> <span class="nv">$GOPATH</span>/src/github.com/apache/beam/sdks/go/examples/wordcount
</span></span><span class="line"><span class="cl">go run wordcount.go <span class="se">\
</span></span></span><span class="line"><span class="cl">    --runner dataflow <span class="se">\
</span></span></span><span class="line"><span class="cl">    --input gs://dataflow-samples/shakespeare/kinglear.txt <span class="se">\
</span></span></span><span class="line"><span class="cl">    --output gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/counts <span class="se">\
</span></span></span><span class="line"><span class="cl">    --project <span class="si">${</span><span class="nv">PROJECT</span><span class="p">?</span><span class="si">}</span> <span class="se">\
</span></span></span><span class="line"><span class="cl">    --temp_location gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/tmp/ <span class="se">\
</span></span></span><span class="line"><span class="cl">    --staging_location gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/binaries/ <span class="se">\
</span></span></span><span class="line"><span class="cl">    --worker_harness_container_image<span class="o">=</span>apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
</span></span></code></pre></div><p>If this works correctly you’ll see something similar to the following printed:</p>
<pre tabindex="0"><code>Cross-compiling .../wordcount.go as .../worker-1-1544590905654809000
Staging worker binary:  .../worker-1-1544590905654809000
Submitted job: 2018-12-11_21_02_29
Console: https://console.cloud.google.com/dataflow/job/2018-12-11...
Logs: https://console.cloud.google.com/logs/viewer?job_id%2F2018-12-11...
Job state: JOB_STATE_PENDING …
Job still running …
Job still running …
...
Job succeeded!
</code></pre><p>Let&rsquo;s take a moment to explain what’s going on, starting with the various flags. The <code>--runner dataflow</code> flag tells the Apache Beam SDK to run this on GCP Dataflow, including executing all the steps required to make that happen. This includes, compiling the code and uploading it to the <code>--staging_location</code>. Later the staged binary will be run by Dataflow under the <code>--project</code> project. As this will be running “in the cloud”, the pipeline will not be able to access local files. Thus for both the <code>--input</code> and <code> --output</code> flags are set to paths on GCS, as this is a convenient place to store files. Finally the <code>--worker_harness_container_image</code> flag specifies the docker image that Dataflow will use to host the workcount.go binary that was uploaded to the <code>--staging_location</code>.</p>
<p>Once wordcount.go is running, it prints out helpful information, such as links to the the Dataflow console. The console displays current progress as well as a visualization of the pipeline as a directed graph. The local wordcount.go continues to run only to display status updates. It can be interrupted at any time, but the pipeline will continue to run on Dataflow until it either succeeds or fails. Once that occurs, the logs link can provide useful information.</p>
<h1 id="art-history-more-complex-example">Art history (more complex example)</h1>
<div style="float: right; width: 300px">
	<img src="palette.png" width=300 height=411>
</div>
<p>Now we’ll construct a more complex pipeline, that demonstrates some other features of Beam and Dataflow. In this pipeline we will be taking 100,000 paintings from the last 600 years and processing them to extract information about their color palettes. Specifically the question we aim to answer is, “Has the color palettes of paintings change over the decades?”. This may not be a pipeline we run repeatedly, but it was a fun example, and demonstrates many advance topics.</p>
<p>We will skip over the details of the color extraction algorithm, and provide that in a later article. Here we’ll focus on how to create a pipeline to accomplish this task.</p>
<p>We start by reading a csv file that contains metadata for each painting, such as the artist, year it was painted, and a GCS path to a jpg of the painting. The paintings will then be grouped by the decade they were painted, and then the color palette for each group will be determined. Each palette will saved to a png file (DrawColorPalette), as well as all the palette saved to a single large json file (WriteIndex). To finish it off, the pipeline will be productionised, so it easier to debug, and re-run. The full source code is <a href="https://github.com/bramp/dataflow-art">available here</a>.</p>
<p>To start with, the main function for the pipeline looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/apache/beam/sdks/go/pkg/beam&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// If beamx or Go flags are used, flags must be parsed first.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">flag</span><span class="p">.</span><span class="nf">Parse</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// beam.Init() is an initialization hook that must called on startup. On</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// distributed runners, it is used to intercept control.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">beam</span><span class="p">.</span><span class="nf">Init</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">NewPipeline</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Root</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">buildPipeline</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beamx</span><span class="p">.</span><span class="nf">Run</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">log</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;Failed to execute job: %v&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>That is the standard boilerplate for a Beam pipeline, it parses the flags, initialises Beam, delegates the pipeline construction to <code>buildPipeline</code> function, and finally runs the pipeline.</p>
<p>The interesting code begins in the <code>buildPipeline</code> function, which constructs the pipeline, by passing PCollections from one function to the next. To build up the tree we see in the above diagram.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">buildPipeline</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// nothing -&gt; PCollection&lt;Painting&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintings</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">csvio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">(</span><span class="nx">Painting</span><span class="p">{}))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;Painting&gt; -&gt; PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintingsByGroup</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">GroupByDecade</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">paintings</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;CoGBK&lt;string, Painting&gt;&gt; -&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">//   (PCollection&lt;KV&lt;string, Histogram&gt;&gt;, PCollection&lt;KV&lt;string, string&gt;&gt;)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">histograms</span><span class="p">,</span><span class="w"> </span><span class="nx">errors1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">ExtractHistogram</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">paintingsByGroup</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Calculate the color palette for the combined histograms.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, Histogram&gt;&gt; -&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">//   (PCollection&lt;KV&lt;string, []color.RGBA&gt;&gt;, PCollection&lt;KV&lt;string, string&gt;&gt;)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">palettes</span><span class="p">,</span><span class="w"> </span><span class="nx">errors2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">CalculateColorPalette</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">histograms</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, []color.RGBA&gt;&gt; -&gt; PCollection&lt;KV&lt;string, string&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">errors3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">DrawColorPalette</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">outputPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">palettes</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, []color.RGBA&gt;&gt; -&gt; nothing</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">WriteIndex</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">Join</span><span class="p">(</span><span class="o">*</span><span class="nx">outputPrefix</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;index.json&#34;</span><span class="p">),</span><span class="w"> </span><span class="nx">palettes</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, string&gt;&gt; -&gt; nothing</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">WriteErrorLog</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;errors.log&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">errors1</span><span class="p">,</span><span class="w"> </span><span class="nx">errors2</span><span class="p">,</span><span class="w"> </span><span class="nx">errors3</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>To make it easy to follow, each function describes the step, and is annotated with a comment that explains what kind of PCollection is accepted and returned. Let&rsquo;s highlight some interesting steps.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">flag</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">&#34;index&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;art.csv&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;Index of the art.&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// Painting represents a single painting in the dataset.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">Painting</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Artist</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;artist&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Title</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;title&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Date</span><span class="w">   </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;date&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Genre</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;genre&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Style</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;style&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Filename</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;new_filename&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">buildPipeline</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// nothing -&gt; PCollection&lt;Painting&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintings</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">csvio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">(</span><span class="nx">Painting</span><span class="p">{}))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span></code></pre></div><p>The very first step uses <a href="https://godoc.org/github.com/bramp/morebeam/csvio#Read"><code>csvio.Read</code></a> to read the CSV file specified by the <code>--index</code> flag, and returns a PCollection of Painting structs. In all the examples we’ve seen before the PCollections only contains basic types, e.g. strings, ints, etc. More complex types, such as a slices and structs are allowed (but not maps and interfaces). This makes it easier to pass rich information between the PTransforms. The only caveat is the type must be JSON-serialisable. This is because in a distributed pipeline, the PTransforms could be processed on different machines, and the PCollection needs to be marshalled to be passed between them.</p>
<p>For Beam to successfully unmarshal your data, the types must also be registered. This is typically done within the init() function, by called <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#RegisterType"><code>beam.RegisterType</code></a>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">beam</span><span class="p">.</span><span class="nf">RegisterType</span><span class="p">(</span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">(</span><span class="nx">Painting</span><span class="p">{}))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>If you forget to register the type, a error will occur at Runtime, for example:</p>
<pre tabindex="0"><code>java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -224: execute failed: panic: reflect: Call using main.Painting as type struct { Artist string; Title string; ... } goroutine 70 [running]:
</code></pre><p>This can be a little frustrating, as when running the pipeline locally with the <code>direct</code> runner, it does not marshal your data, so errors like this aren’t exposed until running on Dataflow.</p>
<p>Now we have a collection of Paintings, we group them by decade:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// GroupByDecade takes a PCollection&lt;Painting&gt; and returns a </span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// PCollection&lt;CoGBK&lt;string, Painting&gt;&gt; of the paintings group by decade.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">GroupByDecade</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">,</span><span class="w"> </span><span class="nx">paintings</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="s">&#34;GroupBy Decade&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;Painting&gt; -&gt; PCollection&lt;KV&lt;string, Painting&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintingsWithKey</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">AddKey</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">art</span><span class="w"> </span><span class="nx">Painting</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="nx">art</span><span class="p">.</span><span class="nf">Decade</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">},</span><span class="w"> </span><span class="nx">paintings</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;string, Painting&gt; -&gt; PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">GroupByKey</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">paintingsWithKey</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>The first line in this function, <code>s.Scope(&quot;GroupBy Decade&quot;)</code> allows us to name this step, and group multiple sub-steps. For example, in the above diagram “GroupBy Decade” is a single step, which can be expanded to show a <a href="https://godoc.org/github.com/bramp/morebeam#AddKey"><code>AddKey</code></a> and <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey"><code>GroupByKey</code></a> step.</p>
<p><code>GroupByDecade</code> returns a <code>PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</code>. The CoGBK, is short for <strong>Co</strong>mmon <strong>G</strong>roup <strong>B</strong>y <strong>K</strong>ey. It is a special collection, where (as you’ll see later) each element is a tuple of a key, and an iterable collection of elements. The key in this case is the decade the painting was painted. The <code>PCollection&lt;Painting&gt;</code> is transformed into a <code>PCollection&lt;KV&lt;String,Painting&gt;&gt;</code> by the <a href="https://godoc.org/github.com/bramp/morebeam#AddKey"><code>morebeam.AddKey</code></a> step, adding a key to each value. Then the <code>GroupByKey</code> will use that key to produce the final PCollection.</p>
<p>Next up is the <code>ExtractHistogram</code>, which takes the <code>PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</code>, and returns two PCollections. The first PCollection is a <code>PCollection&lt;KV&lt;string, Histogram&gt;&gt;</code>, which contains a <a href="https://en.wikipedia.org/wiki/Color_histogram">color histogram</a> for every decade of paintings. The second PCollection is related to error handling, and will be explained later.</p>
<p>The ExtractHistogram function demonstrates three new concepts, “Stateful functions”, “Data enrichment”, and “Error handling”.</p>
<h2 id="stateful-functions">Stateful functions</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">artPrefix</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">flag</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">&#34;art&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;gs://mybucket/art&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;Path to where the art is kept.&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">beam</span><span class="p">.</span><span class="nf">RegisterType</span><span class="p">(</span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">((</span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)(</span><span class="kc">nil</span><span class="p">)).</span><span class="nf">Elem</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">extractHistogramFn</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ArtPrefix</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`json:&#34;art_prefix&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">fs</span><span class="w"> </span><span class="nx">filesystem</span><span class="p">.</span><span class="nx">Interface</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// ExtractHistogram calculates the color histograms for all the Paintings in</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// the CoGBK.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">ExtractHistogram</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">,</span><span class="w"> </span><span class="nx">files</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">(</span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">,</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="s">&#34;ExtractHistogram&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo2</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">extractHistogramFn</span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">ArtPrefix</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">artPrefix</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">},</span><span class="w"> </span><span class="nx">files</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Instead of passing a simple function to <code>beam.ParDo</code>, a struct containing two fields is passed. The exported field, <code>ArtPrefix</code> is the path to where the painting jpgs are stored, and the unexported field, <code>fs</code>, is a filesystem client for reading these jpgs.</p>
<p>When the pipeline runs, no global variables are allowed, including the command line flag variables. For example, when running this pipeline we may start it like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">go run main.go <span class="se">\
</span></span></span><span class="line"><span class="cl">  --art gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/art/ <span class="se">\
</span></span></span><span class="line"><span class="cl">  --runner dataflow <span class="se">\
</span></span></span><span class="line"><span class="cl">  ...
</span></span></code></pre></div><p>When the code actually runs on the Dataflow workers, the <code>--art</code> flag is not specified. Thus the <code>*artPrefix</code> value will use the default value. To pass this to the Dataflow workers, it must be part of the DoFn struct that is passed to <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a>. So in this example, we create a <code>extractHistogramFn</code> struct, with the exported <code>ArtPrefix</code> field set to the value of the <code>--art</code> flag. This <code>extractHistogramFn</code> is then marshalled and passed to the workers. As with the unmarshalled PCollection values, the extractHistogramFn must also be registered with beam during <code>init</code>.</p>
<p>When the pipeline executes this step it calls the <code>extractHistogramFn</code>’s <code>ProcessElement</code> method. This method works in a similar way to a simple DoFn functions. The arguments and return value are reflected at runtime and mapped to the PCollections being processed and returned.</p>
<h2 id="iterating-over-a-cogbk">Iterating over a CoGBK</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">ProcessElement</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">key</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="o">*</span><span class="nx">Painting</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">errors</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">))</span><span class="w"> </span><span class="nx">HistogramResult</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">log</span><span class="p">.</span><span class="nf">Infof</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;%q: ExtractHistogram started&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">var</span><span class="w"> </span><span class="nx">art</span><span class="w"> </span><span class="nx">Painting</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">for</span><span class="w"> </span><span class="nf">values</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">art</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">Join</span><span class="p">(</span><span class="nx">fn</span><span class="p">.</span><span class="nx">ArtPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">art</span><span class="p">.</span><span class="nx">Filename</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">h</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nf">extractHistogram</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="err">…</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">result</span><span class="p">.</span><span class="nx">Histogram</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">result</span><span class="p">.</span><span class="nx">Histogram</span><span class="p">.</span><span class="nf">Combine</span><span class="p">(</span><span class="nx">h</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p><code>ProcessElement</code> is called once for every unique group in the <code>PCollection&lt;CoGBK&lt;string, Painting&gt;</code>. The <code>key string</code> argument will be the key for that group, and a <code>values func(*Painting) bool</code> is used to iterate all values within the group. The contact is that <code>values</code> is passed a pointer to a <code>Painting</code> struct, which is populated on each iteration. As long as there are more paintings to process in the group the values function returns true. Once it returns false, the group has been fully processed. This iterator pattern is unique to the <code>CoGBK</code> and makes it convient to apply an operation to every element in the group.</p>
<p>In this case, <code>extractHistogram</code> is called for each Painting, fetches a jpg of the artwork, and extract a [histogram of colors]((<a href="https://en.wikipedia.org/wiki/Color_histogram)">https://en.wikipedia.org/wiki/Color_histogram)</a>. The histograms from all painting in that group are combined, and finally one result is per group is returned.</p>
<h2 id="data-enrichment">Data enrichment</h2>
<p>Reading the paintings from an external service (such as <a href="https://cloud.google.com/storage/">GCS</a>) demonstrates a data enrichment step. This is where an external service is used to “enrich” the dataset the pipeline is processing. You could imagine a user service being called when processing log entries, or a product service when processing purchases. It should be noted, that any external action should be <a href="https://en.wikipedia.org/wiki/Idempotence">idempotent</a>. If a worker fails, it is possible the same element is retried, and thus processed multiple times. Dataflow keeps track of failures and ensures the final result only has each element processed once.</p>
<p>When calling a remote service, typically some kind of client is needed to make the request. In this pipeline we read the images from GCS, thus setting up GCS client at startup is useful. Since we are using a struct based DoFn, there are some additional methods that can be defined.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">Setup</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">fn</span><span class="p">.</span><span class="nx">fs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">filesystem</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">ArtPrefix</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;filesystem.New(%q) failed: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">ArtPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">Teardown</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">fs</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>When the DoFn is initialized on the worker, the <code>Setup</code> method is called. Here a new <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/filesystem">Filesystem client</a> is created and store it in the struct’s <code>fs</code> field. Later, when the DoFn is no longer needed, the <code>Teardown</code> method is called, giving us opportunity to cleanup the client. With all things distributed, don’t expect the <code>Teardown</code> to ever be called.</p>
<p>There are also some simple best practices around error handling that should be following when calling an external services.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">extractHistogram</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">palette</span><span class="p">.</span><span class="nx">Histogram</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">cancel</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nf">WithTimeout</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="mi">30</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">defer</span><span class="w"> </span><span class="nf">cancel</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">fd</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">fs</span><span class="p">.</span><span class="nf">OpenRead</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;fs.OpenRead(%q) failed: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">defer</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">img</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">image</span><span class="p">.</span><span class="nf">Decode</span><span class="p">(</span><span class="nx">fd</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;image.Decode(%q) failed: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">palette</span><span class="p">.</span><span class="nf">NewColorHistogram</span><span class="p">(</span><span class="nx">img</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>The function begins by using a <a href="https://golang.org/pkg/context/#WithTimeout"><code>context.WithTimeout</code></a>. This ensures that if the external service does not respond in a timely manner the context will be cancelled and a error returned. If this timeout wasn’t set, the external call may never end, and the pipeline never terminates.</p>
<p>Since the pipeline could be running across 100s of machines, it could generate significant load on a remote service. It is wise to implement appropriate <a href="https://cloud.google.com/storage/docs/exponential-backoff">backoff and retry logic</a>. In some cases even <a href="https://cloud.google.com/service-infrastructure/docs/rate-limiting">rate limiting</a> your pipeline’s execution, or tagging your pipeline’s traffic at a <a href="https://www.usenix.org/conference/srecon17asia/program/presentation/sheerin">lower QoS</a> so it can be easily shed.</p>
<p>The external service, may also return permanent errors. Thus a more robust error handling pattern is needed.</p>
<h2 id="error-handling-and-dead-letters">Error handling and dead letters</h2>
<p>When Beam processes a PCollection, it bundles up multiple elements and processes one bundle at a time. If the PTransform return an error, panics, or otherwise fails (such as running out of memory), the full bundle is retried. With Dataflow, bundles are <a href="https://cloud.google.com/dataflow/docs/resources/faq#how-are-java-exceptions-handled-in-cloud-dataflow">retried up to four times</a>, after which the entire pipeline is aborted. This can be inconvenient, so where appropriate instead of returning an error we we use a <a href="https://en.wikipedia.org/wiki/Dead_letter_queue">dead letter queue</a>. This is a new PCollection that collects processing errors. These errors can then be persisted at the end of the pipeline, manually inspected, and processed again later.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="k">return</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo2</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">extractHistogramFn</span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ArtPrefix</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">artPrefix</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">},</span><span class="w"> </span><span class="nx">files</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>A keen observer would have noticed that <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo2"><code>beam.ParDo2</code></a> was used by ExtractHistogram, instead of <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a>. This function works the same, but returns two PCollections. In our case, the first is the normal output, and the second is a <code>PCollection&lt;KV&lt;string, string&gt;&gt;</code>. This second collection is keyed on the unique identifer of the painting having an issue, and the value is the error message.</p>
<p>Since returning a error is optional, the errors PCollection was passed to <code>extractHistogramFn</code>’s <code>ProcessElement</code> as a <code>errors func(string, string)</code>.</p>
<p>Throughout we use this kind of error PCollections from every stage, and at the end of the pipeline they are collected together and output to a single errors log file:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// WriteErrorLog takes multiple PCollection&lt;KV&lt;string,string&gt;&gt;s combines them</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// and writes them to the given filename.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">WriteErrorLog</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="w"> </span><span class="o">...</span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;Write %q&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">Flatten</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="o">...</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;%s,%s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">},</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">textio</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">Join</span><span class="p">(</span><span class="o">*</span><span class="nx">outputPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">),</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Since the output is key, comma, value, the file can easily be re-read to try just the failed keys.</p>
<p>The rest of the pipeline is much of the same, and thus won’t be explained in detail. <code>CalculateColorPalette</code> takes the color histograms and runs a K-Means clustering algorithm to extract the color palettes for those paintings. Those palettes are written out to png files with the <code>DrawColorPalette</code>, and finally all the palettes are written out to a JSON file in <code>WriteIndex</code>.</p>
<h2 id="gotchas">Gotchas</h2>
<h3 id="marshing">Marshing</h3>
<p>Always remember to register the types that will be transmitted between workers. This is anything that’s inside a PCollection, as well as any DoFn. Not all types are allowed, but slices, structs, and primitives are. For other types, custom JSON marshalling can be used.</p>
<p>It should also be reminded that global state is not allowed. Flags and other global variables will not always be populated when running on a remote worker. Also, examples like this may catch you out:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">prefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="err">“</span><span class="nx">X</span><span class="err">”</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="err">“</span><span class="nx">Prefix</span><span class="w"> </span><span class="err">”</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">prefix</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">value</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">},</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>This simple example appears to add “X” to the beginning of each element, however, it will prefix nothing. This is because, the simple anonymous function is marshalled, and unmarshalled on the worker. When it is then invoked on the worker, it does not have the closure, and thus has not captured the value of prefix. Instead prefix is the zero value. For this example to work, prefix must be defined inside the anonymous function, or a DoFn struct used which contains the prefix as a marshalled field.</p>
<h3 id="errors">Errors</h3>
<p>Since the pipeline could be running across 100s of workers, errors are to be expected. Extensively using  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/log#Infof"><code>log.Infof</code></a>, <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/log#Debugf"><code>log.Debugf</code></a>, etc will make your live better. They can make it very easy to debug why the pipeline got stuck, or mysteriously failed.</p>
<p>While debugging this pipeline, it would occasionally fail due to exceeding the memory limits of the Dataflow worker’s. Standard Go infrastructure can be used to help debug this, such as <a href="https://golang.org/pkg/net/http/pprof/">pprof</a>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;net/http&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">_</span><span class="w"> </span><span class="s">&#34;net/http/pprof&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="c1">// HTTP Server for pprof (and other debugging)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">log</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nf">ListenAndServe</span><span class="p">(</span><span class="s">&#34;localhost:8080&#34;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This configures a webserver which can export useful stats, and used for grabbing pprof profiling data.</p>
<h3 id="difference-between-direct-and-dataflow-runners">Difference between direct and dataflow runners</h3>
<p>Running the pipeline locally is a quick way to validate the pipeline is setup, and that is runs as expected. However, running locally won’t run the pipeline in parallel, and it is obviously constrained to a single machine. There are some other difference, mostly around marshalling data. It’s always a good idea to test on Dataflow, perhaps with a smaller or sampled dataset as input, that can be used as a smoke test.</p>
<h1 id="conclusion">Conclusion</h1>
<p>This article has covered the basics of creating an Apache Beam pipeline with the Go SDK, while also covering some more advanced topics. The results of the specific pipeline will be revealed in a later article, until then the <a href="https://github.com/bramp/dataflow-art">code is available here</a>.</p>
<p>While the Beam Go SDK is still experimental, there are many great tutorials and example using the more mature Java and Python Beam SDKs [<a href="https://medium.com/google-cloud/popular-java-projects-on-github-that-could-use-some-help-analyzed-using-bigquery-and-dataflow-dbd5753827f4">1</a>, <a href="https://medium.com/@vallerylancey/error-handling-elements-in-apache-beam-pipelines-fffdea91af2a">2</a>]. Google themselves even published a series of generic articles [<a href="https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1">part 1</a>, <a href="https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-2">part 2</a>] explaining common use cases.</p>
</description>
    </item>
    
    <item>
      <title>Certbot: Unexpected Error</title>
      <link>https://blog.bramp.net/post/2018/05/26/certbot-unexpected-error/</link>
      <pubDate>Sat, 26 May 2018 11:28:44 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2018/05/26/certbot-unexpected-error/</guid>
      <description><p>I got a <a href="https://letsencrypt.org/docs/expiration-emails/">nice warning email</a> from <a href="https://letsencrypt.org/">Let&rsquo;s Encrypt</a> that my cert was going to expire soon, and hadn&rsquo;t been renewed. I found in  <code>/var/log/letsencrypt/letsencrypt.log</code> the following error:</p>
<pre tabindex="0"><code>Renewal configuration file /etc/letsencrypt/renewal/mydomain.bramp.net.conf (cert: mydomain.bramp.net) produced an unexpected error: &#39;Namespace&#39; object has no attribute &#39;dns_cloudflare_credentials&#39;. Skipping.
</code></pre><p>I manually ran certbot in dry-run mode and it worked fine:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo certbot renew --dry-run
</span></span></code></pre></div><p>So this error only occurs when certbot is running as a cron job. Looking at <code>/etc/cron.d/certbot</code> I see the user runs as root, so I tried the <code>certbot renew --dry-run</code> again, but this time as the root user:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo su
</span></span><span class="line"><span class="cl">root@:~$ sudo certbot renew --dry-run
</span></span></code></pre></div><p>and bam, the same error. This error somehow related to the <a href="https://certbot-dns-cloudflare.readthedocs.io/en/latest/">certbot-dns-cloudflare plugin</a>, which proves the ownership of the domain with a <a href="https://acme-python.readthedocs.io/en/latest/api/challenges.html#acme.challenges.DNS01">DNS01 challenge</a> via Cloudflare&rsquo;s DNS. I use this form of challenge, because the domain in question is internal and not available on the Internet.</p>
<p>I had forgotten how I installed the plugin, but searching Google, it seems to be via <code>pip3</code>. Clearly something was different between my root and normal user w/ sudo environments. So I did the following</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ sudo pip3 list <span class="p">|</span> grep certbot
</span></span><span class="line"><span class="cl">certbot <span class="o">(</span>0.23.0<span class="o">)</span>
</span></span><span class="line"><span class="cl">certbot-apache <span class="o">(</span>0.23.0<span class="o">)</span>
</span></span><span class="line"><span class="cl">certbot-dns-cloudflare <span class="o">(</span>0.24.0<span class="o">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ sudo su
</span></span><span class="line"><span class="cl">root@:~$ pip3 list <span class="p">|</span> grep certbot
</span></span><span class="line"><span class="cl">certbot <span class="o">(</span>0.23.0<span class="o">)</span>
</span></span><span class="line"><span class="cl">certbot-apache <span class="o">(</span>0.23.0<span class="o">)</span>
</span></span></code></pre></div><p>Aha, no certbot-dns-cloudflare when running as root. Clearly I hadn&rsquo;t installed this correctly. Running <code>pip3 install certbot-dns-cloudflare</code> as root fixed the problem, and voila, certbot correctly fetches new certs via a regular cron.</p>
</description>
    </item>
    
    <item>
      <title>Marvel Cinematic Universe Timeline</title>
      <link>https://blog.bramp.net/post/2018/04/08/the-mcu/</link>
      <pubDate>Sun, 08 Apr 2018 21:46:21 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2018/04/08/the-mcu/</guid>
      <description><p>In the run up to the <a href="https://www.imdb.com/title/tt4154756/">third Avengers movie</a>, I was wondering which characters have appeared together, and when. With dozens of characters across eighteen <a href="https://en.wikipedia.org/wiki/Marvel_Cinematic_Universe">Marvel Cinematic Universe</a> (MCU) movies, this will make Avengers Infinity War one huge mashup. In the style of the <a href="https://xkcd.com/657/">XKCD narrative diagrams</a>, I plotted out the journey each character has taken across the numerous movies. I also got carried away and created a similar (yet smaller) diagram for the <a href="https://en.wikipedia.org/wiki/List_of_Marvel_Cinematic_Universe_television_series#Netflix_series">Netflix Marvel shows</a>.</p>
<br />
<p><strong>Marvel Cinematic Universe - Iron Man (2008) though Avengers: Infinity War (2018)</strong></p>
<div class="text-center">
<a href="https://projects.bramp.net/mcu/film.html">
  <img src="film.png" width=770 height=344 alt="Marvel Cinematic Universe - Character Timeline"></img>
  <p>click to open an interactive version</p>
</a>
</div>
<br/>
<p><strong>Marvel Cinematic Universe - Netflix Shows</strong></p>
<div class="text-center">
<a href="https://projects.bramp.net/mcu/netflix.html">
  <img src="netflix.png" width=770 height=411 alt="Marvel Netflix Shows - Character Timeline"></img>
  <p>click to open an interactive version</p>
</a>
</div>
<p>I would like to thank the contributors to <a href="http://marvel-movies.wikia.com/">marvel-movies.wikia.com</a> where I got all the information. As well as <a href="https://twitter.com/drzax">Simon Elvery</a> who created the <a href="https://github.com/abcnews/d3-layout-narrative">d3-layout-narrative</a> module for <a href="https://d3js.org/">d3.js</a> that made these diagrams easier to create. Check back in future when I write up an article on how I created these diagrams. As always, I welcome feedback, you may contact me at <a href="https://twitter.com/TheBramp">@TheBramp</a>.</p>
</description>
    </item>
    
    <item>
      <title>Google Font Features</title>
      <link>https://blog.bramp.net/post/2018/01/21/google-font-features/</link>
      <pubDate>Sun, 21 Jan 2018 16:03:36 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2018/01/21/google-font-features/</guid>
      <description><blockquote>
<p><strong>tl;dr Google Fonts doesn&rsquo;t supply fonts with OpenType features (such as old-style figures, or small-caps), but you can build and host the fonts yourself to support everything you need.</strong></p>
</blockquote>
<p>I recently posted a <a href="https://blog.bramp.net/post/2018/01/16/measuring-percentile-latency/">article which contained lots of numbers</a>. While I was proofreading the article, I didn’t quite liked how the numbers looked, sometime the digits were below the baseline, for example:</p>
<figure><img src="/post/2018/01/21/google-font-features/oldstyle.png" width="760" height="157"><figcaption>
      <h4>Oldstyle figures</h4>
    </figcaption>
</figure>

<p>Where I would have expected the top and bottom of each digit to be aligned:</p>
<figure><img src="/post/2018/01/21/google-font-features/lining.png" width="760" height="152"><figcaption>
      <h4>Lining figures</h4>
    </figcaption>
</figure>

<p>This made me flashback to all the typography I learnt when <a href="https://github.com/bramp/publication">working with LaTeX</a>. These two styles of figures are called old-style, and lining (or sometimes lowercase and uppercase numbers). The theory is that old-style numbers flow better when mixed with text. Recall, letters like q, j and p, all drop below the baseline, which makes the text nicer to read:</p>
<figure><img src="/post/2018/01/21/google-font-features/quickbrownfox.png" width="760" height="100"><figcaption>
      <h4>Example with characters below the baseline</h4>
    </figcaption>
</figure>

<p>However, my article had many numbers on the page, sometimes within tables, where old-style just made the numbers look odd. I looked for a way to force the lining style throughout. I quickly found the CSS styling:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css"><span class="line"><span class="cl"><span class="nt">body</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">           <span class="k">font-variant-numeric</span><span class="p">:</span> <span class="n">lining-nums</span><span class="p">;</span> 
</span></span><span class="line"><span class="cl">  <span class="kp">-webkit-</span><span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">     <span class="kp">-moz-</span><span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">      <span class="kp">-ms-</span><span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">          <span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Sadly when I applied this to my site, it did nothing. I wondered if perhaps the font did not support lining figures. A quick search led me to <a href="https://stackoverflow.com/questions/28098992/google-fonts-lining-numbers">Stack Overflow</a> that implied both the font I was using, <a href="https://fonts.google.com/specimen/Raleway">Raleway</a>, and Google Fonts (which hosted the font) did in fact support lining.</p>
<p>So I went deeper down the rabbit hole to figure out what was going wrong. I wanted to confirm for myself that the font supported lining figures. I searched for a while for a simple CLI that would inspect the <a href="https://en.wikipedia.org/wiki/Web_Open_Font_Format">WOFF</a>/<a href="https://en.wikipedia.org/wiki/TrueType">TTF</a> files and tell me what they contained. Sadly, the best I could find was <a href="https://fontforge.github.io/">FontForge</a>, a GUI. That worked, and confirmed the fonts being served by Google did not contain the lining feature, or in fact any feature other than basic ligatures.</p>
<p>Later I found this <a href="https://github.com/google/fonts/issues/1335">GitHub issue</a> which confirmed all features were stripped from the font. So I sought out a way to rebuild the Google font to keep the lining figures.</p>
<p>Before that, I started to <a href="http://sethgodin.typepad.com/seths_blog/2005/03/dont_shave_that.html">shave another yak</a>, and decided to create a CLI tool that would easily display the font features. I came across a Go library, <a href="https://github.com/ConradIrwin/font">SFNT</a> that can parse OpenType fonts. Sadly it didn’t implement the parsing of the features. A few hours later, I read the <a href="http://www.adobe.com/devnet/opentype/afdko/topic_feature_file_syntax.html">OpenType spec</a> and sent them a <a href="https://github.com/ConradIrwin/font/pull/3">pull request</a> to add this functionality. Now I can easily confirm from the command line what features are supported.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ font features raleway-v12-latin-ext_latin-regular.woff
</span></span><span class="line"><span class="cl">Glyph Substitution Table <span class="o">(</span>GSUB<span class="o">)</span>:
</span></span><span class="line"><span class="cl">	Script <span class="s2">&#34;latn&#34;</span> <span class="o">(</span>Latin<span class="o">)</span>:
</span></span><span class="line"><span class="cl">		Default Language:
</span></span><span class="line"><span class="cl">			Feature <span class="s2">&#34;liga&#34;</span> <span class="o">(</span>Standard Ligatures<span class="o">)</span>
</span></span></code></pre></div><p>I decided to play around with <a href="https://developers.google.com/fonts/docs/developer_api">Google Font API</a>, and then eventually the unoffical (but awesome) <a href="https://google-webfonts-helper.herokuapp.com/fonts/raleway">google-webfonts-helper</a> (a hassle-free way to self-host Google Fonts). However, no combination of options would make the font contain the lining figures.</p>
<p>Since the Google Fonts are open source, I downloaded the <a href="https://github.com/google/fonts/tree/master/ofl/raleway">source TTF of the font</a>, and double-checked it did indeed contain the feature:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ font features Raleway-Regular.ttf 
</span></span><span class="line"><span class="cl">Glyph Substitution Table <span class="o">(</span>GSUB<span class="o">)</span>:
</span></span><span class="line"><span class="cl">  Script <span class="s2">&#34;latn&#34;</span> <span class="o">(</span>Latin<span class="o">)</span>:
</span></span><span class="line"><span class="cl">    Default Language:
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;aalt&#34;</span> <span class="o">(</span>Access All Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;dlig&#34;</span> <span class="o">(</span>Discretionary Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;liga&#34;</span> <span class="o">(</span>Standard Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;lnum&#34;</span> <span class="o">(</span>Lining Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;onum&#34;</span> <span class="o">(</span>Oldstyle Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;salt&#34;</span> <span class="o">(</span>Stylistic Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;smcp&#34;</span> <span class="o">(</span>Small Capitals<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss01&#34;</span> <span class="o">(</span>Stylistic Set 1<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss02&#34;</span> <span class="o">(</span>Stylistic Set 2<span class="o">)</span>
</span></span></code></pre></div><p>So my next idea was to take the original Raleway-Regular.ttf and convert it to <a href="https://en.wikipedia.org/wiki/Web_Open_Font_Format">WOFF</a> and <a href="https://www.w3.org/TR/WOFF2/">WOFF2</a>, and strip out the bits I don’t need. Just how Google Fonts does, to ensure the resulting files are lean and performant.</p>
<p>I couldn’t find the pipeline Google Fonts uses to process the files, so I instead took it upon myself to figure this out. I started by using <code>pyftsubset</code> (part of <a href="https://github.com/fonttools/fonttools">FontTools</a>) to remove unneeded character sets, features, and other parts from the original TTF file.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ pip install fonttools
</span></span><span class="line"><span class="cl">$ pyftsubset Raleway-Regular.ttf --layout-features<span class="o">=</span><span class="s1">&#39;*&#39;</span> --unicodes<span class="o">=</span><span class="s2">&#34;U+0000-00FF, U+0100-024F, U+0131, U+0152-0153, U+02DA, U+02DC, U+02BB-02BC, U+02C6, U+0259, U+0370-03FF, U+1E00-1EFF, U+2000-206F, U+2070-209F, U+2074, U+20A0-20CF, U+2122, U+2150-218F, U+2200-22FF, U+2C60-2C7F, U+A720-A7FF&#34;</span> --output-file<span class="o">=</span>Raleway-Regular.subset.ttf
</span></span></code></pre></div><p>Now I had a TTF file with all the features, but only the subset of characters I use on my site. Next I needed to convert this this file to all the <a href="https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/webfont-optimization">recommended font formats</a>, so my site would look nice in IE, Chrome, Android and iOS. The resulting CSS would look like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css"><span class="line"><span class="cl"><span class="p">@</span><span class="k">font-face</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">font-family</span><span class="o">:</span> <span class="s1">&#39;Raleway&#39;</span><span class="o">;</span>
</span></span><span class="line"><span class="cl">  <span class="nt">src</span><span class="o">:</span> <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.eot&#39;</span><span class="o">);</span>                           <span class="c">/* IE9 Compat Modes */</span>
</span></span><span class="line"><span class="cl">  <span class="nt">src</span><span class="o">:</span> <span class="nt">local</span><span class="o">(</span><span class="s1">&#39;Raleway&#39;</span><span class="o">),</span> <span class="nt">local</span><span class="o">(</span><span class="s1">&#39;Raleway-Regular&#39;</span><span class="o">),</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.eot?#iefix&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;embedded-opentype&#39;</span><span class="o">),</span> <span class="c">/* IE6-IE8 */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.woff2&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;woff2&#39;</span><span class="o">),</span>    <span class="c">/* Super Modern Browsers */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.woff&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;woff&#39;</span><span class="o">),</span>     <span class="c">/* Pretty Modern Browsers */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.ttf&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;truetype&#39;</span><span class="o">),</span>    <span class="c">/* Safari, Android, iOS */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.svg#ralewayregular&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;svg&#39;</span><span class="o">);</span>    <span class="c">/* Legacy iOS */</span>
</span></span><span class="line"><span class="cl">  <span class="nt">font-style</span><span class="o">:</span> <span class="nt">normal</span><span class="o">;</span>
</span></span><span class="line"><span class="cl">  <span class="nt">font-weight</span><span class="o">:</span> <span class="nt">400</span><span class="o">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>I again tried to use <code>pyftsubset</code> to save the files in the required formats. This worked well for TTF, WOFF, and WOFF2. But didn’t support <a href="https://en.wikipedia.org/wiki/Embedded_OpenType">EOT</a> or <a href="http://caniuse.com/svg-fonts">SVG</a> fonts:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ pip install zopfli
</span></span><span class="line"><span class="cl">$ pip install brotli
</span></span><span class="line"><span class="cl">$ pyftsubset ... --flavor<span class="o">=</span>woff --with-zopfli --output-file<span class="o">=</span>Raleway-Regular.subset.woff
</span></span><span class="line"><span class="cl">$ pyftsubset ... --flavor<span class="o">=</span>woff2 --output-file<span class="o">=</span>Raleway-Regular.subset.woff2
</span></span></code></pre></div><p>So instead I searched for a all-in-one solution to converting fonts. I found numerous websites that offered to do it, the one I settled on was <a href="https://www.fontsquirrel.com/tools/webfont-generator">fontsquirrel.com</a>. Here I used the expert feature, to control exactly what was in the font, and to produce compressed versions in all file formats. I originally tried to use the subsetting feature on fontsquirrel, but I couldn’t get it to maintain all the features I needed, so I used <code>pyftsubset</code> locally instead.</p>
<p>After fontsquirrel.com produced the fonts, I checked it contained the features, and compared the resulting file sizes:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ ls -ltr
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Google Fonts</span>
</span></span><span class="line"><span class="cl"> 96K  raleway-v12-latin-ext_latin-regular.ttf
</span></span><span class="line"><span class="cl"> 40K  raleway-v12-latin-ext_latin-regular.woff
</span></span><span class="line"><span class="cl"> 31K  raleway-v12-latin-ext_latin-regular.woff2
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># My versions</span>
</span></span><span class="line"><span class="cl">140K raleway-regular.subset-webfont.ttf
</span></span><span class="line"><span class="cl"> 61K raleway-regular.subset-webfont.woff
</span></span><span class="line"><span class="cl"> 46K raleway-regular.subset-webfont.woff2
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ font features raleway-regular.subset-webfont.woff
</span></span><span class="line"><span class="cl">Glyph Substitution Table <span class="o">(</span>GSUB<span class="o">)</span>:
</span></span><span class="line"><span class="cl">  Script <span class="s2">&#34;latn&#34;</span> <span class="o">(</span>Latin<span class="o">)</span>:
</span></span><span class="line"><span class="cl">    Default Language:
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;aalt&#34;</span> <span class="o">(</span>Access All Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;dlig&#34;</span> <span class="o">(</span>Discretionary Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;liga&#34;</span> <span class="o">(</span>Standard Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;lnum&#34;</span> <span class="o">(</span>Lining Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;onum&#34;</span> <span class="o">(</span>Oldstyle Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;salt&#34;</span> <span class="o">(</span>Stylistic Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;smcp&#34;</span> <span class="o">(</span>Small Capitals<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss01&#34;</span> <span class="o">(</span>Stylistic Set 1<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss02&#34;</span> <span class="o">(</span>Stylistic Set 2<span class="o">)</span>
</span></span></code></pre></div><p>The file size didn&rsquo;t vary too much, and thus it was a simple matter of <a href="https://blog.bramp.net/fonts/raleway-regular.subset-webfont.woff2">uploading the fonts</a> to my blog, and updating the CSS.</p>
<p class="text-center">
<span class="onum" style="text-decoration: red underline overline; font-size: 3.5em">1234567890</span> &nbsp;vs&nbsp; <span class="lnum" style="text-decoration: red underline overline; font-size: 3.5em">1234567890</span>
</p></description>
    </item>
    
    <item>
      <title>Measuring Percentile Latency</title>
      <link>https://blog.bramp.net/post/2018/01/16/measuring-percentile-latency/</link>
      <pubDate>Tue, 16 Jan 2018 08:07:00 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2018/01/16/measuring-percentile-latency/</guid>
      <description><p>In many applications it is common to measure the time it takes to handle some event. Web applications pay close attention to this, to ensure each user’s request is replied to in a timely manner. To view in aggregate, many would just measure the mean response time. Which is easily calculated by summing up the total time to handle all requests, divided by the number of request. This average latency metric, however can be very <a href="https://www.elastic.co/blog/averages-can-dangerous-use-percentile">misleading</a> as it does not show the worst case behaviour. For example, the majority of users may see requests handled quickly, but a few users may experience long delays. Thus to capture the worst behaviour it is better to look at percentile latency.</p>
<p>This article will discuss how to calculate percentiles, collect and aggregate in an distributed way, and even how to efficiently store them as time series data.</p>
<h1 id="percentiles">Percentiles</h1>
<p>Let&rsquo;s start with some basics, the 99 percentile, is defined as the value that 99 out of 100 samples fall below. Thus 99 users out of 100, observe a latency less than this value, and 1 in every 100 observe a latency equal to or greater. We choose the 99%tile, because it represents the <a href="https://en.wikipedia.org/wiki/Long_tail">tail of the latency distribution</a> (that is the worst cases).</p>
<p>The simplest way to calculate the 99 percentile, is to sort all the values, and take the 99/100<sup>th</sup> value. For example, if you had 1,000 latency values, place them into an array, sort them, then take the value at the 990th index. That’ll be the 99%tile, which represents the latency value that 99% of the values are less than. Easy.</p>
<p>Throughout this article I’ll use a dataset of <a href="https://docs.google.com/spreadsheets/d/1s7-DbMeHTyzEZdeBaM5rrKp5o0V9O11VFX9Y47or150/edit#gid=884072054">10,000 randomly generated values</a> from a <a href="https://en.wikipedia.org/wiki/Log-normal_distribution">log-normal distribution</a> with parameters (μ = 0, σ = 1). Most of the values will be small (&lt;2s), but there will be a long tail, which will simulate worst case latencies returned by a server.</p>
<div class="text-center">
  <object data="1st.svg" type="image/svg+xml" width=720 height=360 alt="eCDF of example dataset">
    <img src="1st.png" width=720 height=360 />
  </object>
</div>
<p>Above we have a <a href="https://en.wikipedia.org/wiki/Empirical_distribution_function">empirical cumulative distribution function</a> (eCDF), which visually demonstrates this technique. On the redline there are 10,000 latency values, in sorted order. If we take the 9,900th point, we see a value of 10.97 seconds. This is the 99%tile latency for our dataset.</p>
<p>We could use this simple approach to calculate the distribution on our servers. However, let&rsquo;s assume our servers receive 100 queries per second (qps), and we want to calculate the 99%tile every 60 seconds. That’ll require us to store 6,000 latency values for every minute, which is a doable, but unbounded. If we extend this to a dozen servers, all storing 6,000 numbers every minute, and we wanted to aggregate this metric across all of them, this could very quickly get out of control. Especially if we are capturing multiple different dimensions of this metrics (e.g percentiles for successful vs failed requests).  Perhaps there is a way to approximate this, bounding the amount of RAM, while keeping a level of accuracy.</p>
<h1 id="histogram-approximation">Histogram Approximation</h1>
<p>Instead of storing each number, we could bin them into groups, in the same way a histogram would. For example, we know that latency values will be in the range 0 to 60,000ms. That’s because, it is impossible to handle the request in zero seconds, and hopefully the application will timeout after 60 seconds (otherwise the chances are the user isn’t waiting anymore).</p>
<p>So we can use histogram bins that double in size from 1ms, to ~64,000ms, for example (0-1ms], (1-2ms], (2-4ms], (4-8ms], (8-16ms], (16-32ms], (32-64ms], (64-128ms], (128-256ms], (256-512ms], (512-1024ms], etc. Extending to 65,536ms (2^16), would give us 18 bins. Each bin will record the count of values that land within its range. Thus we only need to store 18 counts, instead of the unbounded 6,000 latency values.</p>
<p>But how well does this approximate? Lets look at our random dataset from earlier.</p>
<!--
| Bin Range (ms)  | Count | Running total | eCDF(x) |
|----------------:|------:|--------------:|--------:|
|  1,024 |  2,048 | 2,842 |         7,490 |  74.90% |
|  2,048 |  4,096 | 1,657 |         9,147 |  91.47% |
|  4,096 |  8,192 |   648 |         9,795 |  97.95% |
|  8,192 | 16,384 |   172 |         9,967 |  99.67% |
| 16,384 | 32,768 |    29 |         9,996 |  99.96% |
-->
<table class="table">
<thead>
<tr>
<th class="text-center" colspan=2>Bin Range (ms)</th>
<th class="text-right">Count</th>
<th class="text-right">Running total</th>
<th class="text-right">eCDF(x)</th>
</tr>
</thead>
<tbody>
<tr>
<td class="text-center" colspan=5> ... 12 rows cut ... </td>
</tr>
<tr>
<td class="text-right">1,024</td>
<td class="text-right">2,048</td>
<td class="text-right">2,842</td>
<td class="text-right">7,490</td>
<td class="text-right">74.90%</td>
</tr>
<tr>
<td class="text-right">2,048</td>
<td class="text-right">4,096</td>
<td class="text-right">1,657</td>
<td class="text-right">9,147</td>
<td class="text-right">91.47%</td>
</tr>
<tr>
<td class="text-right">4,096</td>
<td class="text-right">8,192</td>
<td class="text-right">648</td>
<td class="text-right">9,795</td>
<td class="text-right">97.95%</td>
</tr>
<tr>
<td class="text-right">8,192</td>
<td class="text-right">16,384</td>
<td class="text-right">172</td>
<td class="text-right">9,967</td>
<td class="text-right">99.67%</td>
</tr>
<tr>
<td class="text-right">16,384</td>
<td class="text-right">32,768</td>
<td class="text-right">29</td>
<td class="text-right">9,996</td>
<td class="text-right">99.96%</td>
</tr>
</tbody>
</table>
<p>In this table, the first two columns represent the range of the bin, and the third column is the count of values within that bin. The running total column, is the sum of the current bin and all previous bins. Finally, the eCDF(x) is the empirical cumulative distribution function, or simply put, the running total divided by the sum of all counts (which in this case is 10,000 as there are exactly 10,000 samples).</p>
<p>The bins can accurately determine the percentiles at the edges, so for example, the 97.95%tile is 8,192ms, and the 99.67%tile is 16,384ms. However, we wanted the 99%tile, which lies somewhere between these two values. We can use <a href="https://en.wikipedia.org/wiki/Linear_interpolation">linear approximation</a> to find the position in the bin (which is somewhere above 8192ms, but less than 16,384ms).</p>
<!-- https://www.mathcha.io/editor -->
<div class="text-center">
  <!-- x_{0} + (x_{1} - x_{0}) \frac{y - y_{0}}{y_{1} - y_{0}} = x -->
  <img src="linear-approx-1.png" width=660 alt="Linear Approximation Equation"></img>
  <br/>
</div>
<br/>
<div class="text-center">
  <!-- 8,192 + (16,384 - 8,192) \frac{99\% - 97.95\%}{99.67\% - 97.95\%} \approx 13,192 ms -->
  <img src="linear-approx-2.png" width=660 alt="Linear Approximation Example"></img>
</div>
<br/>
<p>Thus we can determines the 99%tile is 13.192 seconds. If we compare this to non-approximate value from earlier, 10.970s, we seem to be off by ~20%. To make this approximation more precise, we can increase the number of bins. Instead of doubling the bin boundaries, we can increase each boundary by a factor of √2 (square root of 2). This would double the number of bins (from 18 to 36), but increase the precision greatly. If we use these new bins, the linear approximation gets us a value of 11.042s (at the 99%tile) which is only off by 0.66%. This seems a good trade-off of space and accuracy.</p>
<div class="text-center">
  <object data="2nd.svg" type="image/svg+xml" width=720 height=360 alt="eCDF of example dataset with linear approximation">
    <img src="2nd.png" width=720 height=360/>
  </object>
</div>
<p>Just to double-check, calculating the 99.9%tile (one additional 9) exactly is 23.105s, and the √2 bins estimates is 23.170s. This is only off by 0.28%, so again seems reasonable. Obviously, the shape of the distribution, and the actual values will affect the error. Empirically √2 bins works well enough, but your experience may vary.</p>
<h1 id="aggregation">Aggregation</h1>
<p>Now we can calculate the percentiles, how would we extend this so we can aggregate the percentiles from multiple servers. A naive approach may be to ask each server to calculate its own 99%tile, and for us to calculate a mean of these. A average of percentiles doesn’t seem ideal, especially if one server is particularly bad, a average may just hide the outliers again. A better approach, is to collect the histogram (set of bins) from each server, and simply add them together. This works easily if every server is using the same bin ranges.</p>
<!--
| Bin Range (ms) | Server A<br/>Count | Server B<br />Count | Total<br />Count |
|----------------|---------------:|---------------:|------------:|
| <td colspan=5> ... </td> |
|  1,024 |  2,048 | 2,842 | 2,811 | 5,653 |
|  2,048 |  4,096 | 1,657 | 1,660 | 3,317 |
|  4,096 |  8,192 |   648 |   634 | 1,282 |
|  8,192 | 16,384 |   172 |   155 |   327 |
<td colspan=5> ... </td>
-->
<table class="table">
<thead>
<tr>
<th class="text-center" colspan=2>Bin Range (ms)</th>
<th class="text-right">Server A<br>Count</th>
<th class="text-right">Server B<br>Count</th>
<th class="text-right">Total<br>Count</th>
</tr>
</thead>
<tbody>
<tr>
<td class="text-center" colspan="5"> ... </td>
</tr>
<tr>
<td>1,024</td>
<td class="text-right">2,048</td>
<td class="text-right">2,842</td>
<td class="text-right">2,811</td>
<td class="text-right">5,653</td>
</tr>
<tr>
<td>2,048</td>
<td class="text-right">4,096</td>
<td class="text-right">1,657</td>
<td class="text-right">1,660</td>
<td class="text-right">3,317</td>
</tr>
<tr>
<td>4,096</td>
<td class="text-right">8,192</td>
<td class="text-right">648</td>
<td class="text-right">634</td>
<td class="text-right">1,282</td>
</tr>
<tr>
<td>8,192</td>
<td class="text-right">16,384</td>
<td class="text-right">172</td>
<td class="text-right">155</td>
<td class="text-right">327</td>
</tr>
</tbody>
</table>
<p>So in this example, Server A and Server B have 2,842 and 2,811 samples respectively between 1.024s and 2.048s. Meaning across both these servers, there were 5,653 requests that took between 1 and 2 seconds. Using the same linear approximation techniques on this combined histogram allows us to calculate the aggregated percentiles.</p>
<p>This kind of aggregation works well, and is lightweight enough to collect across even a large fleet of servers. Then in a centralised location (perhaps the machine doing the monitoring) the aggregate percentiles can be calculated. If needed per server percentiles can be drilled down, as that data is retained. A lot simpler than maintaining the full set (10,000) values from each server.</p>
<h1 id="time">Time</h1>
<p>Typically, these percentiles want to be measured over time. For example, we want to know the 99%tile aggregated across all the servers for every minute, or hour of the day. To achieve this we need to store the histogram at fixed intervals, say every minute. There is again a naive approach, where every minute we reset the histogram counts to zero. Allowing each server to only be counting the values in the last minute. Conceptuation this is easy to reason about, but introduces subtle synchronisation issues. What happens if each server has a slightly different definition of when a minute starts? or that collection is delayed and histograms are not aggregated (before being reset)?</p>
<p>A more robust way is to never reset the histogram, but to always keep increasing counts. Then to calculate the value for a particular interval (say the last minute), you subtract the most recent histogram from the previous minute’s histogram. This is a little bit more work, but a lot more flexible.</p>
<p>To explore this concept, lets begin with a simpler (non-histogram) example, say calculating requests per second. If we store a running counter of requests, then if you recall your calculus, the rate per second, is the differential. That is, the delta between two values.</p>















<table class="table">
  <thead>
      <tr>
          <th class="text- right">Time (s)</th>
          <th class="text- right">Running Count</th>
          <th class="text- right">Delta (per minute)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td class="text- right">0</td>
          <td class="text- right">0</td>
          <td class="text- right"></td>
      </tr>
      <tr>
          <td class="text- right">60</td>
          <td class="text- right">95</td>
          <td class="text- right">95</td>
      </tr>
      <tr>
          <td class="text- right">120</td>
          <td class="text- right">205</td>
          <td class="text- right">110</td>
      </tr>
      <tr>
          <td class="text- right">180</td>
          <td class="text- right">310</td>
          <td class="text- right">105</td>
      </tr>
      <tr>
          <td class="text- right">240</td>
          <td class="text- right">395</td>
          <td class="text- right">85</td>
      </tr>
      <tr>
          <td class="text- right">300</td>
          <td class="text- right">450</td>
          <td class="text- right">55</td>
      </tr>
      <tr>
          <td class="text- right">360</td>
          <td class="text- right">480</td>
          <td class="text- right">30</td>
      </tr>
      <tr>
          <td class="text- right">420</td>
          <td class="text- right">500</td>
          <td class="text- right">20</td>
      </tr>
      <tr>
          <td class="text- right">480</td>
          <td class="text- right">500</td>
          <td class="text- right">0</td>
      </tr>
      <tr>
          <td class="text- right">540</td>
          <td class="text- right">590</td>
          <td class="text- right">90</td>
      </tr>
      <tr>
          <td class="text- right">600</td>
          <td class="text- right">700</td>
          <td class="text- right">110</td>
      </tr>
  </tbody>
</table>

<p>Taking the example above, we can say the average requests per seconds between time 120s, and 180s is 1.75. Because at time 180s there were 310 total requests, and at time 120s there were only 205. Thus a delta of 105 requests per minutes, or 1.75 requests per second.</p>
<p>This has the nice property, that we can easily calculate the rate over any arbitrary interval. For example, subtracting the value at time 600s, with the value at time 0s, calculates the average rate over the last 10 minutes.  This is a lot simpler than keeping track of the per second rate every minute, and calculating the average of them. This property is especially useful when plotting on a graph where each pixel may represent a wide interval (such as a full hour). Having a quick way to calculate the rate in that hour is a real performance win.
Even though this example was a simple rate per second, this works exactly the same for the histograms. Thus, storing the running total, across all servers, at periodic intervals, we can easily calculate an approximate percentile over any arbitrary interval.</p>
<h1 id="conclusion">Conclusion</h1>
<p>To truly understand latency, the distribution of it must be examined. This can be achieved by looking at various percentiles. These percentiles can be scalably and efficiently calculated by using histograms with fixed bins, which keep track of a running count of latency values.</p>
<p>A quick word of warning, all monitoring lies to you in subtle ways, and it is your responsibility to understand it. If you have fewer than 100 values, does a 99%tile metric make sense? Perhaps extend the collection interval over a longer time period, or instead use the 90%tile.  A single percentile also doesn’t show the full picture, it may always be worth exporting the 50%, 90%, 99%tile, etc. Or perhaps, a percentile doesn’t capture your monitoring requirements, and instead simply taking the max value would be better.</p>
<p>Finally, you may not wish to calculate all this yourself, and instead use a off the shelf library, such as <a href="https://hdrhistogram.github.io/HdrHistogram/">HdrHistogram</a>, or a monitoring solution such as <a href="https://prometheus.io">Prometheus</a>.</p>
</description>
    </item>
    
    <item>
      <title>Running Java in Production: A SRE’s Perspective</title>
      <link>https://blog.bramp.net/post/2018/01/13/running-java-in-production/</link>
      <pubDate>Sat, 13 Jan 2018 12:50:31 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2018/01/13/running-java-in-production/</guid>
      <description><p><em>Originally <a href="https://www.javaadvent.com/2017/12/running-java-in-production.html">published</a> as part of the Java Advent 2017 series</em></p>
<p>As a <a href="https://landing.google.com/sre/">Site Reliability Engineer</a> (SRE) I make sure our production services are efficient, scalable, and reliable. A typical SRE is a master of production, and has to have a good understanding of the wider architecture, and be well versed in many of the finer details.</p>
<p>It is common that SREs are polyglot programmer, expected to understand multiple different languages. For example, C++ may be hard to write, test and get right, but has high performance, perfect for backend systems such as databases. Whereas Python is easy to write, and great for quick scripting, useful for automation. Java is somewhere in the middle, and even though it is a compiled language, it provides type safety, performance, and many other advantages that make it a good choice for writing web infrastructure.</p>
<p>Even though many of the <a href="https://landing.google.com/sre/book.html">best practices that SREs adopt</a> can be generalised to any language, there are some unique challenges with Java Web applications. This article highlight some of these challenges and talks about what we can do to address them.</p>
<h1 id="deployment">Deployment</h1>
<p>A typical java application consists of 100s of class files, either written by your team, or from common libraries that the application depends on. To keep the number of class files under control, and to provide better versioning, and compartmentalisation, they are typically bundled up into <a href="https://en.wikipedia.org/wiki/JAR_(file_format)">JAR</a> or <a href="https://en.wikipedia.org/wiki/WAR_(file_format)">WAR</a> files.</p>
<p>There are many ways to host a java application, one popular method is using a <a href="https://en.wikipedia.org/wiki/Web_container">Java Servlet Container</a> such as <a href="https://tomcat.apache.org/">Tomcat</a>, or <a href="https://www.jboss.org/">JBoss</a>. These provide some common web infrastructure, and libraries to make it, in theory, easier to deploy and manage the java application. Take Tomcat, a java program that provides the actual webserver and loads the application (bundled as a WAR file) on your behalf. This may work well in some situations, but actually adds additional complexity. For example, you now need to keep track of the version of the JRE, the version of Tomcat, and the version of your application. Testing for incompatibility, and ensuring everyone is using the same versions of the full stack can be problematic, and lead to subtle problems. Tomcat also brings along its own bespoke configuration, which is yet another thing to learn.</p>
<p>A good tenant to follow is to “<a href="https://landing.google.com/sre/book/chapters/simplicity.html">keep it simple</a>”, but in the Servlet Container approach, you have to keep track of a few dozen Tomcat files, plus one or more WAR files that make up the application, plus all the Tomcat configuration that goes along with it.</p>
<p>Thus there are some frameworks that attempt to reduce this overhead by instead of being hosted within a full application server, they embed their own web server. There is still a JVM but it invokes a single JAR file that contains everything needed to run the application. Popular frameworks that enable these standalone apps are <a href="http://www.dropwizard.io/">Dropwizard</a> and <a href="https://projects.spring.io/spring-boot/">Spring Boot</a>. To deploy a new version of the application, only a single file needs to be changed, and the JVM restarted. This is also useful when developing and testing the application, because everyone is using the same version of the stack. It is also especially useful for rollbacks (one of SRE’s core tools), as only a single file has to be changed (which can be as quick as a symlink change).</p>
<p>One thing to note with a Tomcat style WAR file, the file would contain the application class files, as well as all the libraries the application depends on as JAR files. In the standalone approach, all the dependencies are merged into a single, <a href="https://stackoverflow.com/questions/19150811/what-is-a-fat-jar">Fat JAR</a>. A single JAR file that contains the class files for the entire application. These Fat or Uber JARs, not only are easier to version and copy around (because it is a single immutable file), but can actually be smaller than an equivalent WAR file due to pruning of unused classes in the dependencies.</p>
<p>This can even be taken further, by not requiring separate JVM and JAR files. Tools like <a href="http://www.capsule.io/">capsule.io</a>, can actually bundle up the JAR file, JVM, and all configuration into a single executable file. Now we can really ensure the full stack is using the same versions, and the deployment is agnostic to what may already be installed on the server.</p>
<blockquote>
<p>Keep it simple, and make the application as quick and easy to version, using a single Fat JAR, or executable where possible.</p>
</blockquote>
<h1 id="startup">Startup</h1>
<p>Even though Java is a compiled language, it is not compiled to machine code, it is instead compiled to bytecode. At runtime the Java Virtual Machine (JVM) interprets the bytecode, and executes it in the most efficient way. For example, <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">just-in-time</a> (JIT) compilation allows the JVM to watch how the application is used, and on the fly compile the bytecode into optimal machine code. Over the long run this may be advantageous for the application, but during startup can make the application perform suboptimally for tens of minutes, or longer. This is something to be aware of, as it has implications on load balancing, monitoring, capacity planning, etc.</p>
<p>In a multi-server deployment, it is best practice to slowly ramp up traffic to a newly started task, giving it time to warm up, and to not harm the overall performance of the service. You may be tempted to warm up new tasks by sending it artificial traffic, before it is placed into the user-serving path. Artificial traffic can be problematic if it does not approximate normal user traffic. In fact, this fake traffic may trigger the JIT to optimise for cases that don’t normally occur, thus leaving the application in a sub-optimal or worse state than not being JIT’d.</p>
<p>Slow starts should also be considered when capacity planning. Don’t expect cold tasks to handle the same load as warm tasks. This is important when rolling out a new version of the application, as the capacity of the system will drop until the tasks warms up. If this is not taken into account, too many tasks may be reloaded concurrently, causing a capacity based cascading outage.</p>
<blockquote>
<p>Expect cold starts, and try to warm the application up with real traffic.</p>
</blockquote>
<h1 id="monitoring">Monitoring</h1>
<p>This advice is generic <a href="https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html">monitoring advice</a>, but it is worth repeating for Java. Make sure the most important and useful metrics are exported from the Java application, are collected and easily graphed. There are many tools and frameworks for exporting metrics, and even more for collecting, aggregating, and displaying.</p>
<p>When something breaks, <a href="https://landing.google.com/sre/book/chapters/effective-troubleshooting.html">troubleshooting</a> the issue should be possible from only the metrics being collected. You should not be to depending on log files, or looking at code, to deal with an outage.</p>
<p>Most outages are caused by change. That is, a new version of the application, a config change, new source of traffic, a hardware failure, or a backend dependencies behaving differently. The metrics exported by the application, should include ways to identify the version of Java, application, and configuration in use. It should break down sources of traffic, mix, error counts, etc. It should also track the health, latency, error rates, etc of backend dependencies. Most of the time, this is enough to diagnose a outage quickly.</p>
<p>Specific to Java, there are metrics that can be helpful to understand the health, and performance of the application. Guiding future decisions on how to scale and optimise the application. Garbage collection time, heap size, thread count, JIT time are all important and Java specific.</p>
<p>Finally, a note about measuring response times, or latency. That is, the time it takes the application to handle a request. Many make the mistake of looking at average latency, in part because it can be easily calculated. <a href="https://www.elastic.co/blog/averages-can-dangerous-use-percentile">Averages can be misleading</a>, because it doesn’t show the <a href="https://en.wikipedia.org/wiki/Percentile_rank">shape of the distribution</a>. The majority of requests may be handled quickly, but there may be a long tail of requests that are rare but take a while. This is especially troubling for JVM application, because during garbage collection there is a <a href="https://www.cubrid.org/blog/understanding-java-garbage-collection">stop the world</a> (STW) phase, where the application must pause, to allow the garbage collection to finish. In this pause, no requests will be responded to, and users may wait multiple seconds.</p>
<p>It is better to collect either the max, or 99 (or higher) percentile latency. For percentile, that say for every every 100 requests, 99 are served quicker than this number. Looking at the worst case latency is more meaningful, and more reflective of the user perceived performance.</p>
<blockquote>
<p>Measure metrics that matter, and you can later depend on.</p>
</blockquote>
<h1 id="memory-management">Memory Management</h1>
<p>A good investment of your time is to learn about the various <a href="https://plumbr.io/handbook/garbage-collection-algorithms-implementations">JVM garbage collection algorithms</a>. The current state of the art are the concurrent collectors, either <a href="https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector">G1</a>, or <a href="https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html">CMS</a>. You can decide on what may be best for your application, but for now G1 is the likely winner. There are many great articles that explain how they work, but I’ll cover some key topics.</p>
<p>When starting up, the Java Virtual Machine (JVM) reserves a large chunk of OS memory and splits it into heap and non-heap. The non-heap contains areas such as <a href="https://blogs.oracle.com/poonam/about-g1-garbage-collector,-permanent-generation-and-metaspace">Metaspace</a> (<a href="https://dzone.com/articles/java-8-permgen-metaspace">formally called Permgen</a>), and stack space. Metaspace is for class definitions, and stack space is for each thread&rsquo;s stacks. The heap is used for the objects that are created, which normally takes up the majority of the memory usage. Unlike a typical executable, the JVM has the <a href="https://alvinalexander.com/blog/post/java/java-xmx-xms-memory-heap-size-control"><code>-Xms</code> and <code>-Xmx</code> flags</a> that control the minimum and maximum size of the heap. These limits constrain the maximum amount of RAM the JVM will use, which can make the memory demands on your servers predictable. It is common to set both these flags to the same value, provisioning them to fill up the available RAM on your server. There are also best practices around this when <a href="https://developers.redhat.com/blog/2017/03/14/java-inside-docker/">sizing Docker containers</a>.</p>
<p>Garbage collection (GC) is the process of managing this heap, by finding java objects that are no longer in use (i.e no longer referred to), and can be reclaimed. In most cases the JVM scans the full graph of objects, marking which it finds. At the end, any that weren’t visited, are deleted. To ensure there aren’t race conditions, the GC typically has to stop the world (STW), which pauses the application for a short while, while it finishes up.</p>
<p>The GC is a source of (perhaps unwarranted) resentment because it is blamed for many performance problems. Typically this boils down to not understanding how the GC works. For example, if the heap is sized too small, the JVM can aggressive garbage collect, trying to futilely free up space. The application can then get stuck in this “<a href="http://javaagile.blogspot.com/2013/07/the-thrashing-of-garbage-collector.html">GC thrashing</a>” cycle, that makes very little progress freeing up space, and spending a larger and larger proportion of time in GC, instead of running the application code.</p>
<p>Two common cases where this can happen, are <a href="https://plumbr.io/blog/memory-leaks/what-is-a-memory-leak">memory leaks</a>, or <a href="http://www.oracle.com/technetwork/articles/java/trywithresources-401775.html">resource exhaustion</a>. Garbage collected languages shouldn’t allow what is conventionally called memory leaks, however, they can occur. Take for example, maintaining a cache of objects that never expire. This cache will grow forever, and even though the objects in the cache may never be used again, they are still referenced, thus ineligible to be garbage collected.</p>
<p>Another common cases is <a href="https://blog.bramp.net/post/2015/12/17/the-importance-of-tuning-your-thread-pools/">unbounded queues</a>. If your application places incoming requests on a unbounded queue, this queue could grow forever. If there is a spike of request, objects retained on the queue could increase the heap usage, causing the application to spend more and more time in GC. Thus the application will have less time to process requests from the queue, causing the backlog to grow. This spirals out of control as the GC struggles to find any objects to free, until the application can make no forward progress.</p>
<p>The garbage collector algorithms has many optimisations to try and reduce total GC time. One important observation, the <a href="https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/generations.html">weak generational hypothesis</a>, is that objects either exist for a short time (for example, related to the handling a request), or last a long time (such as global objects that manage long lived resources).</p>
<p>Because of this, the heap is further divided into young and old space. The GC algorithm that runs across the young space assume the object will be freed, and if not, the GC promotes the object into old space. The algorithm for old space, makes the opposite assumption, the the object won’t be freed. The size of the <a href="https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html">young/old may thus also be tuned</a>, and depending on G1 or CMS the approach will be different. But, if the young space is too small, objects that should only exist for short time end up getting promoted to old space. Breaking some of the assumptions the old GC algorithms make, causing GC to run less efficiently, and causing secondary issues such as memory fragmentation.</p>
<p>As mentioned earlier, GC is a source of <a href="https://www.weave.works/blog/the-long-tail-tools-to-investigate-high-long-tail-latency/">long tail latency</a>, so should be monitored closely. The time taken for each phase of the GC should be recorded, as well as the fullness of heap space (broken down by young/old/etc) before and after GC runs. This provides all the hints needed to either tune, or improve the application to get GC under control.</p>
<blockquote>
<p>Make GC your friend. Careful attention should be paid to the heap, and garbage collector, and it should be tuned (even coarsely) to ensure there is enough heap space even in the fully loaded/worst case.</p>
</blockquote>
<h1 id="other-tips">Other tips</h1>
<h2 id="debugging">Debugging</h2>
<p>Java has many rich tools for debugging during development and in production. For example, it is possible to capture live stack traces, and heap dumps from the running application. This can be useful to understand memory leaks, or deadlocks. However, you must ensure the application is started to allow these features, and that the typical tools, <a href="https://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html">jmap</a>, <a href="https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html">jcmd</a>, etc are actually available on the server. Running the application inside a <a href="http://trustmeiamadeveloper.com/2016/03/18/where-is-my-memory-java/">Docker container</a>, or non-standard environment, may make this more difficult, so test and write a playbook on how to do this now.</p>
<p>Many frameworks, also expose much of this information via webservices, for easier debugging, for example the Dropwizard /threads resource, or the <a href="https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-endpoints.html">Spring Boot production endpoints</a>.</p>
<blockquote>
<p>Don’t wait until you have a production issue, test now how to grab heap dumps and stack traces.</p>
</blockquote>
<h2 id="fewer-but-larger-tasks">Fewer but larger tasks</h2>
<p>There are many features of the JVM that have a fixed cost per running JVM, such as JIT and garbage collection. Your application may also have fixed overheads, such as resource polling (backend database connections), etc. If you run fewer, but larger (in terms of CPU and RAM) instances, you can reduce this fixed cost, getting an economy of scale. I’ve seen doubling the amount of CPU and RAM a Java application had, allowed it to handle 4x the requests per second (with no impact to latency). This however makes some assumption about the application’s ability to scale in a multi-threaded way, but generally scaling vertically is easier than horizontally.</p>
<blockquote>
<p>Make your JVM as large as possible.</p>
</blockquote>
<h2 id="32-bit-vs-64-bit-java">32-bit vs. 64-bit Java</h2>
<p>It used to be common practice to run a 32-bit JVM if your application didn’t use more than 4GiB of RAM. This was because 32-bit pointers are half the size of 64-bit, which reduced the overhead of each java object. However, as modern CPUs are 64-bit, typically with 64-bit specific performance improvements, and that the cost of RAM being cheap this make 64-bit JVMs the clear winner.</p>
<blockquote>
<p>Use 64-bit JVMs.</p>
</blockquote>
<h2 id="load-shedding">Load Shedding</h2>
<p>Again general advice, but important for java. To <a href="https://landing.google.com/sre/book/chapters/handling-overload.html">avoid overload</a> caused by GC thrashing, or cold tasks, the application should aggressively load shed. That is, beyond some threshold, the application should reject new requests. It may seem bad to reject some requests early, but it is better than allowing the application to become unrecoverably unhealthy and fail all requests. There are many ways to avoid overload, but common approaches are to ensure queues are bounded, and that <a href="https://blog.bramp.net/post/2015/12/17/the-importance-of-tuning-your-thread-pools/">thread pools are sized correctly</a>. Additionally, outbound request should have <a href="https://www.datawire.io/guide/traffic/deadlines-distributed-timeouts-microservices/">appropriate deadlines</a>, to ensure a slow backend doesn’t cause problems for your application.</p>
<blockquote>
<p>Handle as many requests as you can, and no more.</p>
</blockquote>
<h1 id="conclusion">Conclusion</h1>
<p>Hopefully this article has made you think about your java production environment. While not be prescriptive, we highlight some areas to focus. The links throughout should guide you in the right direction.</p>
</description>
    </item>
    
    <item>
      <title>Parsing with Antlr4 and Go</title>
      <link>https://blog.bramp.net/post/2017/12/16/parsing-with-antlr4-and-go/</link>
      <pubDate>Sat, 16 Dec 2017 12:50:31 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/12/16/parsing-with-antlr4-and-go/</guid>
      <description><p><em>Originally <a href="https://blog.gopheracademy.com/advent-2017/parsing-with-antlr4-and-go/">published</a> as part of the Go Advent 2017 series</em></p>
<h2 id="what-is-antlr">What is ANTLR?</h2>
<p><a href="http://www.antlr.org">ANTLR</a> (ANother Tool for Language Recognition),
is an <a href="http://www.antlr.org/papers/allstar-techreport.pdf">ALL(*)</a>
<a href="https://en.wikipedia.org/wiki/Parser_generator">parser generator</a>. In
layman&rsquo;s terms, Antlr, creates parsers in a number of languages (Go,
Java, C, C#, Javascript), that can process text or binary input. The
generated parser provides a callback interface to parse the input in an
event-driven manner, which can be used as-is, or used to build parse
trees (a data structure representing the input).</p>
<p>ANTLR is used by a number of popular projects, e.g Hive and Pig use it
to parse Hadoop queries, Oracle and NetBeans uses it for their IDEs, and
Twitter even uses it to understand search queries. Support was recently
added so that ANTLR 4 can be used to generate parsers in pure Go. This
article will explain some of the benefits of ANTLR, and walk us through
a simple example.</p>
<h2 id="why-use-it">Why use it?</h2>
<p>It is possible to <a href="https://blog.gopheracademy.com/advent-2014/parsers-lexers/">hand write a
parser</a>, but
this process can be complex, error prone, and hard to change. Instead
there are many [parser generators](<a href="https://en.wikipedia.org/wiki/Compari">https://en.wikipedia.org/wiki/Compari</a>
son_of_parser_generators) that take a grammar expressed in an domain-
specific way, and generates code to parse that language. Popular parser
generates include <a href="https://www.gnu.org/software/bison/">bison</a> and
<a href="http://dinosaur.compilertools.net/yacc/">yacc</a>. In fact, there is a
version of yacc, goyacc, which is written in Go and was part of the main
go repo until it was moved to
<a href="https://godoc.org/golang.org/x/tools/cmd/goyacc">golang.org/x/tools</a>
last year.</p>
<h3 id="so-why-use-antlr-over-these">So why use ANTLR over these?</h3>
<ul>
<li>
<p>ANTLR has a <a href="http://www.antlr.org/tools.html">suite of tools</a>, and
<a href="http://tunnelvisionlabs.com/products/demo/antlrworks">GUIs</a>, that
makes writing and debugging grammars easy.</p>
</li>
<li>
<p>It uses a simple <a href="https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form">EBNF</a>
syntax to define the grammar, instead of a bespoke configuration
language.</p>
</li>
<li>
<p>ANTLR is an <a href="http://www.antlr.org/papers/allstar-techreport.pdf">Adaptive</a>
<a href="https://en.wikipedia.org/wiki/LL_parser">LL(*) parser</a>, ALL(*) for short,
whereas most other parser generators (e.g Bison and Yacc) are
<a href="https://en.wikipedia.org/wiki/LALR_parser">LALR</a>. The difference
between LL(*) and LALR is out of scope for this article, but
simply LALR works bottom-up, and LL(*) works top-down. This
has a bearing on how the grammar is written, making some languages
easier or harder to express.</p>
</li>
<li>
<p>The generated code for a LL(*) parser is more understandable than a
LALR parser. This is because LALR parsers are commonly table driven,
whereas LL(*) parsers encode the logic in its control flow, making
it more comprehensible.</p>
</li>
<li>
<p>Finally ANTLR is agnostic to the target language. A single grammar
can be used to generate parsers in Java, Go, C, etc. Unlike
Bison/Yacc which typically embeds target language code into the
grammar, making it harder to port.</p>
</li>
</ul>
<h2 id="installing-antlr-v4">Installing ANTLR v4</h2>
<p>ANTLR is a Java 1.7 application, that generates the Go code needed to
parse your language. During development Java is needed, but once the
parser is built only Go and the <a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/antlr">ANTLR runtime
library</a> is
required. The ANTLR site has
[documentation](<a href="https://github.com/antlr/antlr4/blob/master/doc/getting-">https://github.com/antlr/antlr4/blob/master/doc/getting-</a>
started.md) on how to install this on multiple platforms, but in brief,
you can do the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ wget http://www.antlr.org/download/antlr-4.7-complete.jar
</span></span><span class="line"><span class="cl">$ <span class="nb">alias</span> <span class="nv">antlr</span><span class="o">=</span><span class="s1">&#39;java -jar $PWD/antlr-4.7-complete.jar&#39;</span>
</span></span></code></pre></div><p>The <code>antlr</code> command is now available in your shell. If you prefer, the
.jar file can be placed into a <code>~/bin</code> directory, and the alias can be
stored in your <code>~/.bash_profile</code>.</p>
<h2 id="classic-calculator-example">Classic calculator example</h2>
<p>Let&rsquo;s start with the “hello world” for parsers, the calculator example.
We want to build a parser that handles simple mathematical expressions
such as <code>1 + 2 * 3</code>. The focus of this article is on how to use Go with
ANTLR, so the syntax of the ANTLR language won’t be explained in
detail, but the ANTLR site has [compressive documentation](https://githu
b.com/antlr/antlr4/blob/master/doc/grammars.md).</p>
<p>As we go along, the <a href="https://github.com/bramp/goadvent-antlr">source is available to all
examples</a>.</p>
<pre tabindex="0"><code>// Calc.g4
grammar Calc;

// Tokens
MUL: &#39;*&#39;;
DIV: &#39;/&#39;;
ADD: &#39;+&#39;;
SUB: &#39;-&#39;;
NUMBER: [0-9]+;
WHITESPACE: [ \r\n\t]+ -&gt; skip;

// Rules
start : expression EOF;

expression
   : expression op=(&#39;*&#39;|&#39;/&#39;) expression # MulDiv
   | expression op=(&#39;+&#39;|&#39;-&#39;) expression # AddSub
   | NUMBER                             # Number
   ;
</code></pre><p>The above is a simple grammar split into two sections, <em>tokens</em>, and
<em>rules</em>. The tokens are terminal symbols in the grammar, that is, they
are made up of nothing but literal characters. Whereas rules are non-
terminal states made up of tokens and/or other rules.</p>
<p>By convention this grammar must be saved with a filename that matches
the name of the grammar, in this case “Calc.g4” . To process this file,
and generate the Go parser, we run the <code>antlr</code> command like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ antlr -Dlanguage<span class="o">=</span>Go -o parser Calc.g4 
</span></span></code></pre></div><p>This will generate a set of Go files in the “parser” package and
subdirectory. It is possible to place the generated code in a different
package by using the <code>-package &lt;name&gt;</code> argument. This is useful if your
project has multiple parsers, or you just want a more descriptive
package name for the parser. The generated files will look like the
following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ tree
</span></span><span class="line"><span class="cl">├── Calc.g4
</span></span><span class="line"><span class="cl">└── parser
</span></span><span class="line"><span class="cl">    ├── calc_lexer.go
</span></span><span class="line"><span class="cl">    ├── calc_parser.go
</span></span><span class="line"><span class="cl">    ├── calc_base_listener.go
</span></span><span class="line"><span class="cl">    └── calc_listener.go
</span></span></code></pre></div><p>The generated files consist of three main components, the Lexer, Parser,
and Listener.</p>
<p>The Lexer takes arbitrary input and returns a stream of tokens. For
input such as <code>1 + 2 * 3</code>, the Lexer would return the following tokens:
<code>NUMBER (1), ADD (+), NUMBER (2), MUL (*), NUMBER (3), EOF</code>.</p>
<p>The Parser uses the Lexer’s output and applies the Grammar’s rules.
Building higher level constructs, such as expressions that can be used
to calculate the result.</p>
<p>The Listener then allows us to make use of the the parsed input. As
mentioned earlier, yacc requires language specific code to be embedded
with the grammar. However, ANTLR separates this concern, allowing the
grammar to be agnostic to the target programming language. It does this
through use of listeners, which effectively allows hooks to be placed
before and after every rule is encountered in the parsed input.</p>
<h2 id="using-the-lexer">Using the Lexer</h2>
<p>Let&rsquo;s move onto an example of using this generated code, starting with
the Lexer.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// example1.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;fmt&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/antlr/antlr4/runtime/Go/antlr&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;./parser&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">&#34;1 + 2 * 3&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Read all tokens</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">for</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lexer</span><span class="p">.</span><span class="nf">NextToken</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenEOF</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="k">break</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%s (%q)\n&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="nx">lexer</span><span class="p">.</span><span class="nx">SymbolicNames</span><span class="p">[</span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()],</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetText</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>To begin with, the generated parser is imported from the local
subdirectory <code>import &quot;./parser&quot;</code>. Next the Lexer is created with some
input:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">&#34;1 + 2 * 3&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>In this example the input is a simple string, <code>&quot;1 + 2 * 3&quot;</code> but there
are other [<code>antlr.InputStream</code>](<a href="https://godoc.org/github.com/antlr/antlr">https://godoc.org/github.com/antlr/antlr</a>
4/runtime/Go/antlr#InputStream)s, for example, the <a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/antlr#FileStream"><code>antlr.FileStream</code></a>
type can read directly from a file. The <code>InputStream</code> is then passed to
a newly created Lexer. Note the name of the Lexer is <code>CalcLexer</code> which
matches the grammar’s name defined in the Calc.g4.</p>
<p>The lexer is then used to consume all the tokens from the input,
printing them one by one. This wouldn’t normally be necessary but we do
this for demonstrative purposes.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="w"> 	</span><span class="k">for</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lexer</span><span class="p">.</span><span class="nf">NextToken</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenEOF</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="k">break</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%s (%q)\n&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="nx">lexer</span><span class="p">.</span><span class="nx">SymbolicNames</span><span class="p">[</span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()],</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetText</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Each token has two main components, the TokenType, and the Text. The
TokenType is a simple integer representing the type of token, while the
Text is literally the text that made up this token. All the TokenTypes
are defined at the end of calc_lexer.go, with their string names stored
in the SymbolicNames slice:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// calc_lexer.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">const</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerMUL</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerDIV</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerADD</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerSUB</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerNUMBER</span><span class="w">     </span><span class="p">=</span><span class="w"> </span><span class="mi">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerWHITESPACE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">6</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>You may also note, that the Whitespace token is not printed, even though
the input clearly had whitespace. This is because the grammar was
designed to skip (i.e. discard) the whitespace <code>WHITESPACE: [ \r\n\t]+ -&gt; skip;</code>.</p>
<h2 id="using-the-parser">Using the Parser</h2>
<p>The Lexer on its own is not very useful, so the example can be modified
to also use the Parser and Listener:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// example2.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;./parser&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/antlr/antlr4/runtime/Go/antlr&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">calcListener</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">BaseCalcListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">&#34;1 + 2 * 3&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stream</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewCommonTokenStream</span><span class="p">(</span><span class="nx">lexer</span><span class="p">,</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenDefaultChannel</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcParser</span><span class="p">(</span><span class="nx">stream</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Finally parse the expression</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeWalkerDefault</span><span class="p">.</span><span class="nf">Walk</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">calcListener</span><span class="p">{},</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Start</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This is very similar to before, but instead of manually iterating over
the tokens, the lexer is used to create a [<code>CommonTokenStream</code>](https://
godoc.org/github.com/antlr/antlr4/runtime/Go/antlr#CommonTokenStream),
which in turn is used to create a new <code>CalcParser</code>. This <code>CalcParser</code> is
then “walked”, which is ANTLR&rsquo;s event-driven API for receiving the
results of parsing the rules.</p>
<p>Note, the [<code>Walk</code>](<a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/">https://godoc.org/github.com/antlr/antlr4/runtime/Go/</a>
antlr#ParseTreeWalker.Walk) function does not return anything. Some may
have expected a parsed form of the expression to be returned, such as
some kind of <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a>
(abstract syntax tree), but instead the Listener receives event as the
parsing occurs. This is similar in concept to
<a href="https://en.wikipedia.org/wiki/Simple_API_for_XML">SAX</a> style parsers
for XML. Event-based parsing can sometimes be harder to use, but it has
many advantages. For example, the parser can be very memory efficient as
previously parsed rules can be discarded once they are no longer needed.
The parser can also be aborted early if the programmer wishes to.</p>
<p>But so far, this example doesn’t do anything beyond ensuring the input
can be parsed without error. To add logic, we must extend the
<code>calcListener</code> type. The <code>calcListener</code> has an embedded
<code>BaseCalcListener</code>, which is a helper type, that provides empty methods
for all those defined in in the <code>CalcListener</code> interface. That interface
looks like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// parser/calc_listener.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// CalcListener is a complete listener for a parse tree produced by CalcParser.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">CalcListener</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterStart is called when entering the start production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterStart</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">StartContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterNumber is called when entering the Number production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterNumber</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">NumberContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterMulDiv is called when entering the MulDiv production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterMulDiv</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">MulDivContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterAddSub is called when entering the AddSub production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterAddSub</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">AddSubContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitStart is called when exiting the start production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitStart</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">StartContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitNumber is called when exiting the Number production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitNumber</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">NumberContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitMulDiv is called when exiting the MulDiv production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitMulDiv</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">MulDivContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitAddSub is called when exiting the AddSub production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitAddSub</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">AddSubContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>There is an Enter and Exit function for each rule found in the grammar.
As the input is walked, the Parser calls the appropriate function on the
listener, to indicate when the rule starts and finishes being evaluated.</p>
<h2 id="adding-the-logic">Adding the logic</h2>
<p>A simple calculator can be constructed from this event driven parser by
using a stack of values. Every time a number is found, it is added to a
stack. Everytime an expression (add/multiple/etc) is found, the last two
numbers on the stack are popped, and the appropriate operation is
carried out. The result is then placed back on the stack.</p>
<p>Take the expression <code>1 + 2 * 3</code>,  the result could be either <code>(1 + 2) * 3 = 9</code>, or <code>1 + (2 * 3) = 7</code>. Those that recall the <a href="https://en.wikipedia.org/wiki/Order_of_operations">order of
operations</a>, will
know that multiplication should always be carried out before addition,
thus the correct result is 7. However, without the parentheses there
could be some ambiguity on how this should be parsed. Luckily the
ambiguity is resolved by the grammar. The precedence of multiplication
over addition was subtly implied within Calc.g4, by placing the <code>MulDiv</code>
expressed before the <code>AddSub</code> expression.</p>
<div class="text-center">
	<img src="parse-tree.svg">
</div>
<p>The code for a listener that implements this stack of value
implementation is relatively simple:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">calcListener</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">BaseCalcListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stack</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">push</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">pop</span><span class="p">()</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="s">&#34;stack is empty unable to pop&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Get the last value from the stack.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Remove the last element from the stack.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">ExitMulDiv</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">MulDivContext</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">right</span><span class="p">,</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">(),</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserMUL</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserDIV</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">default</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;unexpected op: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetText</span><span class="p">()))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">ExitAddSub</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">AddSubContext</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">right</span><span class="p">,</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">(),</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserADD</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserSUB</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">default</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;unexpected op: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetText</span><span class="p">()))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">ExitNumber</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">NumberContext</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nf">Atoi</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nf">GetText</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nf">Error</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Finally this listener would be used like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// calc takes a string expression and returns the evaluated result.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">calc</span><span class="p">(</span><span class="nx">input</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stream</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewCommonTokenStream</span><span class="p">(</span><span class="nx">lexer</span><span class="p">,</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenDefaultChannel</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcParser</span><span class="p">(</span><span class="nx">stream</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Finally parse the expression (by walking the tree)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">var</span><span class="w"> </span><span class="nx">listener</span><span class="w"> </span><span class="nx">calcListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeWalkerDefault</span><span class="p">.</span><span class="nf">Walk</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">listener</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Start</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">listener</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Following the algorithm, the parsing of <code>1 + 2 * 3</code> would work like so.</p>
<ol>
<li>The numbers 2 and 3 would be visited first (and placed on the stack),</li>
<li>Then the MulDiv expression would be visited, taking the values 2 and
3, multiplying them, and placing the result, 6, back on the stack.</li>
<li>Then the number 1 would visited and pushed onto the stack.</li>
<li>Finally AddSub would be visited, popping the 1 and the 6 from the
stack, placing the result 7 back.</li>
</ol>
<p>The order the rules are visited is completely driven by the Parser, and
thus the grammar.</p>
<h2 id="more-grammars">More grammars</h2>
<p>Learning how to write a grammar may be daunting, but there are many
resources for help. The author of ANTLR, <a href="http://parrt.cs.usfca.edu/">Terence
Parr</a>, has <a href="https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference">published a
book</a>,
with some of the content freely available on <a href="http://antlr.org">antlr.org</a>.</p>
<p>If you don’t want to write your own grammar, there are many <a href="https://github.com/antlr/grammars-v4">pre-written
grammars available</a>. Including
grammars for CSS, HTML, SQL, etc, as well many popular programming
languages. To make it easier, I have <a href="https://github.com/bramp/antlr4-grammars">generated
parsers</a> for all those
available grammars, making them as easy to use just by importing.</p>
<p>A quick example of using one of the pre-generated grammars:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;bramp.net/antlr4/json&#34;</span><span class="w"> </span><span class="c1">// The parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/antlr/antlr4/runtime/Go/antlr&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">exampleListener</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// https://godoc.org/bramp.net/antlr4/json#BaseJSONListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">*</span><span class="nx">json</span><span class="p">.</span><span class="nx">BaseJSONListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">`
</span></span></span><span class="line"><span class="cl"><span class="s">		{
</span></span></span><span class="line"><span class="cl"><span class="s">			&#34;example&#34;: &#34;json&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s">			&#34;with&#34;: [&#34;an&#34;, &#34;array&#34;]
</span></span></span><span class="line"><span class="cl"><span class="s">		}`</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the JSON Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nf">NewJSONLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stream</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewCommonTokenStream</span><span class="p">(</span><span class="nx">lexer</span><span class="p">,</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenDefaultChannel</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the JSON Parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nf">NewJSONParser</span><span class="p">(</span><span class="nx">stream</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Finally walk the tree</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeWalkerDefault</span><span class="p">.</span><span class="nf">Walk</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">exampleListener</span><span class="p">{},</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Json</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>Hopefully this article has given you a taste of how to use ANTLR with Go.
The examples for this article are <a href="https://github.com/bramp/goadvent-antlr">found here</a>,
and the <a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/antlr">godoc for the ANTLR library is here</a>
which explains the various InputStream, Lexer, Parser, etc interfaces.</p>
</description>
    </item>
    
    <item>
      <title>Vanity Go Import Paths</title>
      <link>https://blog.bramp.net/post/2017/10/02/vanity-go-import-paths/</link>
      <pubDate>Mon, 02 Oct 2017 07:48:23 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/10/02/vanity-go-import-paths/</guid>
      <description><p>When using third-party packages in Go, they are imported by a path that represents
how to download that package from the Internet. For example, to use the
popular structured logging library, <a href="https://github.com/sirupsen/logrus">Logrus</a>, it would imported at the top of the Go program like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="s">&#34;github.com/sirupsen/logrus&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>When <code>go get</code> is then executed, it fetches the Logrus source code from GitHub
and places the code in the <code>$GOPATH/src</code> directory. Take a look for yourself:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ tree <span class="nv">$GOPATH</span>/src
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">├── github.com
</span></span><span class="line"><span class="cl">│   ├── Sirupsen
</span></span><span class="line"><span class="cl">│   │   └── logrus
</span></span><span class="line"><span class="cl">...
</span></span></code></pre></div><p>An astute reader may wonder, how exactly does <code>go get</code> know that <code>github.com/sirupsen/logrus</code> is a Git repository, and that it can be fetched via the git protocol from that URL. The <code>go get</code> binary could have some smarts in it, that knows about GitHub, and does the right thing. But that seems inflexible, and problematic if new sites want to be supported. Instead the Go developers built a layer of indirection that allows the <code>go get</code> tool to discover the correct source repo.</p>
<p>As outlined in the <a href="https://golang.org/cmd/go/#hdr-Remote_import_paths">Remote Import Paths</a> docs,  the <code>go get</code> binary will make a normal HTTP request to <code>https://github.com/sirupsen/logrus</code> (falling back to http if needed) and look at the returned HTML for a <code>&lt;meta name=&quot;go-import&quot;</code> tag. This meta tag, can then redirect the <code>go get</code> binary to the correct source code repository for the package.</p>
<p>This meta tag can been seen with <code>curl</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ curl https://github.com/sirupsen/logrus <span class="p">|</span> grep meta <span class="p">|</span> grep go-import
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">&#34;go-import&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="na">content</span><span class="o">=</span><span class="s">&#34;github.com/sirupsen/logrus git https://github.com/sirupsen/logrus.git&#34;</span><span class="p">&gt;</span>
</span></span></code></pre></div><p>That tag says, the package rooted at <code>github.com/sirupsen/logrus</code> can be fetched with git, at the
URL <code>https://github.com/sirupsen/logrus.git</code>. The meta tag can express other source control systems, e.g Mercurial, Bazaar, Subversion.</p>
<p>GitHub is a very convenient place to host source code, but the GitHub URL is generic. Instead it is possible to use the <code>&lt;meta&gt;</code> tag to create vanity domains to host projects. For example, the package hosted at <a href="https://github.com/bramp/goredirects">github.com/bramp/goredirects</a> could instead be imported as <code>bramp.net/goredirects</code>. All that is needed is a static HTML page at <code>bramp.net/goredirects</code>, containing the following <code>&lt;meta&gt;</code> tag pointing at GitHub.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">go-import</span>
</span></span><span class="line"><span class="cl">  <span class="na">content</span><span class="o">=</span><span class="s">&#34;bramp.net/goredirects git https://github.com/bramp/goredirects.git&#34;</span><span class="p">&gt;</span>
</span></span></code></pre></div><p>Incase a user attempted to visit that page directly with their web browser, it is worthwhile
placing more information about the project on the page, or simply making the page redirect.</p>
<p>To help make these redirect pages, I wrote a simple go tool, <a href="https://github.com/bramp/goredirects"><code>goredirects</code></a>, that inspects all local repositories under a vanity domain directory in the local <code>$GOPATH/src/</code> and outputs static HTML pages that can be hosted on that domain.</p>
<p>For example, create your new project on GitHub, but check out the project under <code>$GOPATH/src/example.com/project</code>. Then run the tool:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ go install bramp.net/goredirects
</span></span><span class="line"><span class="cl">$ goredirects example.com outputdir
</span></span></code></pre></div><p>The directory <code>outputdir</code> will now contain multiple directories and html files, one for each project under <code>$GOPATH/src/example.com</code>. These HTML files contain the appropriate goimports meta tag to redirect the download of source code from the vanity name, to GitHub. Just upload these files to your website, voilà you are done. Examples of these vanity redirect files can be found on bramp.net, e.g <a href="https://bramp.net/goredirects/index.html">bramp.net/goredirects/index.html</a>. This tool even works for packages with sub-packages under the main root.</p>
<p>Finally, it is possible to ensure that if someone finds your project via GitHub, that <code>go get</code> will always place it under your vanity domain. This be can be achieved with an <a href="https://golang.org/cmd/go/#hdr-Import_path_checking">import comment</a>. Within the source code, ensure that at least one of the files in your page has a comment like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span><span class="w"> </span><span class="nx">project</span><span class="w"> </span><span class="c1">// import &#34;example.com/project&#34;</span><span class="w">
</span></span></span></code></pre></div><p>Then <code>go get</code> will enforce the correct/vanity URL to use, instead of the true location.</p>
<p>More helpful links on the topic:</p>
<ul>
<li><a href="https://golang.org/cmd/go/#hdr-Import_path_checking">golang.org/cmd/go/#hdr-Import_path_checking</a></li>
<li><a href="https://golang.org/cmd/go/#hdr-Remote_import_paths">golang.org/cmd/go/#hdr-Remote_import_paths</a></li>
<li><a href="https://golang.org/doc/go1.4#canonicalimports">golang.org/doc/go1.4#canonicalimports</a></li>
<li><a href="https://godoc.org/golang.org/x/tools/cmd/fiximports">godoc.org/golang.org/x/tools/cmd/fiximports</a></li>
<li><a href="https://texlution.com/post/golang-canonical-import-paths/">texlution.com/post/golang-canonical-import-paths/</a></li>
</ul>
</description>
    </item>
    
    <item>
      <title>Teaching Binary to 8th Graders</title>
      <link>https://blog.bramp.net/post/2017/07/15/teaching-binary-to-8th-graders/</link>
      <pubDate>Sat, 15 Jul 2017 12:23:18 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/07/15/teaching-binary-to-8th-graders/</guid>
      <description><p>This summer, as part of <a href="https://www.google.com/intl/en/giving/people.html#google-serve">GoogleServe</a>, I volunteered in a local school to teach kids about the importance of mathematics. This was part part of a larger program organised by the <a href="svefoundation.org">Silicon Valley Education Foundation</a> (SVEF).</p>
<p>I had 90 minutes, to introduce myself, talk a little about Google, and then spend the majority of the time teaching a topic of my choosing. Not knowing anything about teaching 8th graders I went to the Internet to find some material.</p>
<p>I quickly found <a href="http://cse4k12.org">cse4k12.org</a>, and the excellent <a href="https://www.youtube.com/user/csunplugged">YouTube series</a> by <a href="http://csunplugged.org">csunplugged.org</a>. I decided I would teach about counting in binary. The <a href="http://csunplugged.org/binary-numbers/">csunplugged videos</a> showed how to introduced this material in a way that seemed fun and got got the kids to work out the concepts on their own. I decided to mix the teaching with worksheets from <a href="http://cse4k12.org">cse4k12.org</a> (to reinforce what the kids just learnt). Since I only had ~90 minutes to cover a lot, I took what I found on cse4k12.org and simplified their activities. I went ahead and created new worksheets, and am providing them here today for others to use. The rough schedule I used was:</p>
<ul>
<li>
<p>10min Intro to counting in binary, with kids holding up <a href="https://docs.google.com/document/d/1dCmZc3_v16eIqP230j0DyLXIsS2SVY1Ia8TmGo4b0vQ/edit?usp=sharing">bits</a> (similar to <a href="https://youtu.be/Pz7dLWvi2w0">this video</a>).</p>
</li>
<li>
<p>15min Work on this “<a href="https://docs.google.com/document/d/1u3ITluIYx7K4x7yqy9101DB813TWI9_TYnUu8VxHBgs/edit#bookmark=id.rsbu3f59m3ry">Counting In Binary</a>” worksheet.</p>
</li>
<li>
<p>10min Representing text (again <a href="https://youtu.be/xc0stfTVE_8">similar to this video</a>)</p>
</li>
<li>
<p>15min Using “<a href="https://docs.google.com/document/d/1u3ITluIYx7K4x7yqy9101DB813TWI9_TYnUu8VxHBgs/edit#bookmark=id.gpxpgtxg156z">Encoding Table</a>” and “<a href="https://docs.google.com/document/d/1u3ITluIYx7K4x7yqy9101DB813TWI9_TYnUu8VxHBgs/edit#bookmark=id.1r2yly29a01u">Encoding Message</a>” worksheets to write some secret messages to each other.</p>
</li>
<li>
<p>10min Representing images with binary.</p>
</li>
<li>
<p>15min Using the “<a href="https://docs.google.com/document/d/1u3ITluIYx7K4x7yqy9101DB813TWI9_TYnUu8VxHBgs/edit#bookmark=id.1gznllvnujnk">Bitmaps</a>” worksheets to encode their own images, and if time allows swapping encoded images with each other to decode.</p>
</li>
</ul>
<p>All in all, this worked quite well. I learnt a lot, and was happy to see the class engaged! I will certainly be taking part in activities like this again.</p>
<p>P.S I found printing all sheets double sided worked really well. Oh and no computers needed! Put those laptops away.</p>
</description>
    </item>
    
    <item>
      <title>Maven Plugins on Java 8</title>
      <link>https://blog.bramp.net/post/2017/04/01/maven-plugins-on-java-8/</link>
      <pubDate>Sat, 01 Apr 2017 15:21:27 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/04/01/maven-plugins-on-java-8/</guid>
      <description><p>As part of my standard Maven configuration, I like to use two plugins backed by Google technologies, the first to help keep my code formatted correctly, and the second to check for compile time errors. However, Google recently moved to require JDK 1.8, which broke anyone trying to compile my projects with an older JDK. In this article I&rsquo;ll quickly explain how to configure Maven to work around this problem.</p>
<p>Specifically I use the following two plugins:</p>
<ul>
<li>
<p><a href="https://github.com/coveo/fmt-maven-plugin">coveo/fmt-maven-plugin</a> (which uses <a href="https://github.com/google/google-java-format">google-java-format</a>). This follows the <a href="https://google.github.io/styleguide/javaguide.html">Google&rsquo;s Java Style</a> guide, and reformats the code to ensure it stays consistent. This is great when accepting external contributions, as it keeps the code base uniform, and avoids style discussion on pull requests.</p>
</li>
<li>
<p><a href="https://github.com/codehaus-plexus/plexus-compiler">plexus-compiler-javac-errorprone</a> (which uses Google&rsquo;s <a href="https://github.com/google/error-prone">errorprone</a>). This is a static code analysis tool, that checks for simple errors at compile time, and fails the build if they are found. Again, this helps improve the quality of the code.</p>
</li>
</ul>
<p>Even though my projects typically target 1.7, these plugins require to run under 1.8. Really I&rsquo;d prefer I could bump all my projects to target 1.8+, but since a few of my projects are libraries (which other people include into their projects), that is easier said than done. To deal with this, I changed my Maven configuration to only run these two plugins when run under the sufficient JDK. This means those using a older JDK don&rsquo;t get the benefits, but since locally I use JDK 8, and all my open source projects use <a href="https://travis-ci.org">Travis CI</a>, eventually these issues will be identified.</p>
<p>So if you get an error like</p>
<pre tabindex="0"><code>java.lang.UnsupportedClassVersionError: com/google/googlejavaformat/java/FormatterException : Unsupported major.minor version 52.0
</code></pre><p>or</p>
<pre tabindex="0"><code>An API incompatibility was encountered while executing org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile: java.lang.UnsupportedClassVersionError: javax/tools/DiagnosticListener : Unsupported major.minor version 52.0
</code></pre><p>Please update to JDK 1.8, or update your Maven configuration to restrict these plugins to when run on a modern JDK:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;project&gt;</span>
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;profiles&gt;</span>
</span></span><span class="line"><span class="cl">        <span class="nt">&lt;profile&gt;</span>
</span></span><span class="line"><span class="cl">            <span class="nt">&lt;id&gt;</span>java18<span class="nt">&lt;/id&gt;</span>
</span></span><span class="line"><span class="cl">            <span class="nt">&lt;activation&gt;</span>
</span></span><span class="line"><span class="cl">                <span class="nt">&lt;jdk&gt;</span>1.8<span class="nt">&lt;/jdk&gt;</span>
</span></span><span class="line"><span class="cl">            <span class="nt">&lt;/activation&gt;</span>
</span></span><span class="line"><span class="cl">            <span class="nt">&lt;build&gt;</span>
</span></span><span class="line"><span class="cl">                <span class="nt">&lt;plugins&gt;</span>
</span></span><span class="line"><span class="cl">                    <span class="nt">&lt;plugin&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;groupId&gt;</span>com.coveo<span class="nt">&lt;/groupId&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;artifactId&gt;</span>fmt-maven-plugin<span class="nt">&lt;/artifactId&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;executions&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;execution&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;goals&gt;</span>
</span></span><span class="line"><span class="cl">                                    <span class="nt">&lt;goal&gt;</span>format<span class="nt">&lt;/goal&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;/goals&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;/execution&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;/executions&gt;</span>
</span></span><span class="line"><span class="cl">                    <span class="nt">&lt;/plugin&gt;</span>
</span></span><span class="line"><span class="cl">                    <span class="nt">&lt;plugin&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;groupId&gt;</span>org.apache.maven.plugins<span class="nt">&lt;/groupId&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;artifactId&gt;</span>maven-compiler-plugin<span class="nt">&lt;/artifactId&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;configuration&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;compilerId&gt;</span>javac-with-errorprone<span class="nt">&lt;/compilerId&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;forceJavacCompilerUse&gt;</span>true<span class="nt">&lt;/forceJavacCompilerUse&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;showWarnings&gt;</span>true<span class="nt">&lt;/showWarnings&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;compilerArgs&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;arg&gt;</span>-Xlint:all<span class="nt">&lt;/arg&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;/compilerArgs&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;/configuration&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;dependencies&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;dependency&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;groupId&gt;</span>org.codehaus.plexus<span class="nt">&lt;/groupId&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;artifactId&gt;</span>plexus-compiler-javac-errorprone<span class="nt">&lt;/artifactId&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;version&gt;</span>2.8.1<span class="nt">&lt;/version&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;/dependency&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="c">&lt;!-- override plexus-compiler-javac-errorprone&#39;s dependency on
</span></span></span><span class="line"><span class="cl"><span class="c">                                 Error Prone with the latest version --&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;dependency&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;groupId&gt;</span>com.google.errorprone<span class="nt">&lt;/groupId&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;artifactId&gt;</span>error_prone_core<span class="nt">&lt;/artifactId&gt;</span>
</span></span><span class="line"><span class="cl">                                <span class="nt">&lt;version&gt;</span>2.0.19<span class="nt">&lt;/version&gt;</span>
</span></span><span class="line"><span class="cl">                            <span class="nt">&lt;/dependency&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nt">&lt;/dependencies&gt;</span>
</span></span><span class="line"><span class="cl">                    <span class="nt">&lt;/plugin&gt;</span>
</span></span><span class="line"><span class="cl">                <span class="nt">&lt;/plugins&gt;</span>
</span></span><span class="line"><span class="cl">            <span class="nt">&lt;/build&gt;</span>
</span></span><span class="line"><span class="cl">        <span class="nt">&lt;/profile&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;/profiles&gt;</span>
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/project&gt;</span>
</span></span></code></pre></div><p>This defines a new profile, that is only &ldquo;activated&rdquo; under Java 1.8. When activated the <code>&lt;build&gt;</code> section has the two additional plugins added.
Ensure that these plugins are no longer mentioned in the regular <code>&lt;build&gt;</code> section, and only in the <code>&lt;profile&gt;&lt;build&gt;</code> section.</p>
<p>An example of this change can be found in <a href="https://github.com/bramp/ffmpeg-cli-wrapper/commit/4985ba3ab3ef84839bc0f4ca8b63573b77e33c67">recent commit</a>.</p>
</description>
    </item>
    
  </channel>
</rss>
