<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Go on bramp.net</title>
    <link>https://blog.bramp.net/</link>
    <description>Recent content in Go on bramp.net</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-GB</language>
    <lastBuildDate>Sat, 05 Jan 2019 07:59:08 -0800</lastBuildDate>
    <atom:link href="https://blog.bramp.net/tags/go/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Apache Beam and Google Dataflow in Go</title>
      <link>https://blog.bramp.net/post/2019/01/05/apache-beam-and-google-dataflow-in-go/</link>
      <pubDate>Sat, 05 Jan 2019 07:59:08 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2019/01/05/apache-beam-and-google-dataflow-in-go/</guid>
      <description><p><em>Originally <a href="https://blog.gopheracademy.com/advent-2018/apache-beam/">published</a> as part of the Go Advent 2018 series</em></p>
<h1 id="overview">Overview</h1>
<p><a href="https://beam.apache.org/">Apache Beam</a> (<strong>b</strong>atch and str<strong>eam</strong>) is a powerful tool for handling <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel">embarrassingly parallel</a> workloads. It is a evolution of <a href="https://ai.google/research/pubs/pub35650">Google’s Flume</a>, which provides batch and streaming data processing based on the <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce</a> concepts. One of the novel features of Beam is that it’s agnostic to the platform that runs the code. For example, a pipeline can be written once, and run locally, across <a href="https://flink.apache.org/">Flink</a> or <a href="https://spark.apache.org/">Spark</a> clusters, or on <a href="https://cloud.google.com/dataflow/">Google Cloud Dataflow</a>.</p>
<p>An experimental <a href="https://beam.apache.org/documentation/sdks/go/">Go SDK</a> was created for Beam, and while it is still immature compared to Beam for <a href="https://beam.apache.org/documentation/sdks/python/">Python</a> and <a href="https://beam.apache.org/documentation/sdks/java/">Java</a>, it is able to do some impressive things. The remainder of this article will briefly recap a simple example from the Apache Beam site, and then work through a more complex example running on Dataflow. Consider this a more advanced version of the <a href="https://beam.apache.org/get-started/">official getted started guide</a> on the Apache Beam site.</p>
<p>Before we begin, it’s worth pointing out, that if you can do your analysis on a single machine, it is more likely faster, and more cost effective. Beam is more suitable when your data processing needs are large enough they must run in a distributed fashion.</p>
<h2 id="table-of-contents">Table of Contents</h2>
<ul>
<li><a href="#concepts">Concepts</a></li>
<li><a href="#shakespeare-simple-example">Shakespeare (simple example)</a>
<ul>
<li><a href="#running-the-pipeline">Running the pipeline</a></li>
</ul>
</li>
<li><a href="#art-history-more-complex-example">Art history (more complex example)</a>
<ul>
<li><a href="#stateful-functions">Stateful functions</a></li>
<li><a href="#iterating-over-a-cogbk">Iterating over a CoGBK</a></li>
<li><a href="#data-enrichment">Data enrichment</a></li>
<li><a href="#error-handling-and-dead-letters">Error handling and dead letters</a></li>
</ul>
</li>
<li><a href="#gotchas">Gotchas</a>
<ul>
<li><a href="#marshing">Marshing</a></li>
<li><a href="#errors">Errors</a></li>
<li><a href="#difference-between-direct-and-dataflow-runners">Difference between direct and dataflow runners</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
<h1 id="concepts">Concepts</h1>
<p>Beam already has good documentation, that explains all the <a href="https://beam.apache.org/documentation/programming-guide/">main concepts</a>. We will cover some of the basics.</p>
<figure><img src="/post/2019/01/05/apache-beam-and-google-dataflow-in-go/design-your-pipeline-linear.png" width="720" height="175"><figcaption>
      <h4>Pipeline stages</h4>
    </figcaption>
</figure>

<p>A pipeline is made up of multiple steps, that takes some input, operates on that data, and finally produces output. The steps that operates on the data are called PTransforms (parallel transforms), and the data is always stored in PCollections (parallel collections). The PTransform takes one item at a time from the PCollection and operates on it. The PTransform are assumed to be hermetic, using no global state, thus ensuring it will always produce the same output for the given input. These properties allow the data to be sharded into multiple smaller dataset and processed in any order across multiple machines. The code you write ends up being very simple, but is able to seamlessly split across 100s of machines.</p>
<h1 id="shakespeare-simple-example">Shakespeare (simple example)</h1>
<div style="float: right; width: 200px">
	<img src="word-count.png" width=200 height=436>
</div>
<p>A classic example is counting the words in Shakespeare. In brief, the pipeline counts the number of times each word appears across Shakespeare’s works, and outputs a simple key-value list of word to word-count. There is an <a href="https://github.com/apache/beam/blob/master/sdks/go/examples/minimal_wordcount/minimal_wordcount.go">example</a> provided with the Beam SDK, and along with a great <a href="https://beam.apache.org/get-started/wordcount-example/">walk through</a>. I suggest you read that before continuing. I will however dive into some of the Go specifics, and add additional context.</p>
<p>The example begins with <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio#Read"><code>textio.Read</code></a>, which reads all the files under the shakespeare directory stored on <a href="https://cloud.google.com/storage/">Google Cloud Storage</a> (GCS). The files are stored on GCS, so when this pipeline runs across a cluster of machines, they will all have access. <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio#Read"><code>textio.Read</code></a> always returns a <code>PCollection&lt;string&gt;</code> which contains one element for every line in the given files.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">lines</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">textio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;gs://apache-beam-samples/shakespeare/*&#34;</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>The <code>lines</code> PCollection is then processed by a ParDo (<strong>Par</strong>allel <strong>Do</strong>), a type of PTransform. Most transforms are built with a <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a>. It will execute a supplied function in parallel on the source PCollection. In this example, the function is defined inline and very simply splits the input lines into words with a regexp. Each word is then emitted to another <code>PCollection&lt;string&gt;</code> named <code>words</code>. Note how for every line, zero or more words may be emitted, making this new collection a different size to the original.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">splitFunc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">line</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">emit</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="kt">string</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">word</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">wordRE</span><span class="p">.</span><span class="nf">FindAllString</span><span class="p">(</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nf">emit</span><span class="p">(</span><span class="nx">word</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">words</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">splitFunc</span><span class="p">,</span><span class="w"> </span><span class="nx">lines</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>An interesting trick used by the Apache Beam Go API is passing functions as an <code>interface{}</code>, and using reflection to infer the types. Specifically, since <code>lines</code> is a <code>PCollection&lt;string&gt;</code> it is expected that the first argument of <code>splitFunc</code> is a string type. The second argument to <code>splitFunc</code> will allow Beam to infer the type of the <code>words</code> output PCollection. In this example it is a function with a single string argument. Thus the output type will be <code>PCollection&lt;string&gt;</code>. If <code>emit</code> was defined as <code>func(int)</code> then the return type would be a <code>PCollection&lt;int&gt;</code>, and the next PTransform would be expected to handle ints.</p>
<p>The next step uses one of the library’s higher level constructs.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">counted</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stats</span><span class="p">.</span><span class="nf">Count</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">words</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p><a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/transforms/stats#Count"><code>stats.Count</code></a> takes a <code>PCollection&lt;X&gt;</code>, counts each unique element, and outputs a key-value pair of (X, int) as a <code>PCollection&lt;KV&lt;X, int&gt;&gt;</code>. In this specific example, the input is a <code>PCollection&lt;string&gt;</code>, thus the output is <code>PCollection&lt;KV&lt;string, int&gt;&gt;</code></p>
<p>Internally <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/transforms/stats#Count"><code>stats.Count</code></a> it’s made up of multiple ParDos, and a <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey"><code>beam.GroupByKey</code></a>, but it hides that to make it easier to use.</p>
<p>At this point, the counts of each word has been calculated, and the results are stored to a simple text file. To do this the <code>PCollection&lt;KV&lt;string, int&gt;&gt;</code> is converted to a <code>PCollection&lt;string&gt;</code>, containing one element for each line to be written out.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">formatFunc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;%s: %v&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">formatted</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">formatFunc</span><span class="p">,</span><span class="w"> </span><span class="nx">counted</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>Again a <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a> is used, but you’ll notice the <code>formatFunc</code> is slightly different to the <code>splitFunc</code> above. The <code>formatFunc</code> takes two arguments, a string (the key), and a int (the value). These are the pairs in the <code>PCollection&lt;KV&lt;string, int&gt;&gt;</code>. However, the <code>formatFunc</code> does not take a <code>emit func(...)</code> instead it simply returns a type string.</p>
<p>Since the PTransform outputs a single line for each input element, a simpler form of the function can be specified. One where the output element is just returned from the function. The <code>emit func(...)</code> is useful when the number of output elements differ to the number of input elements. If its a 1:1 mapping a return makes the function easier to read. As above this is all inferred at runtime with reflection when the pipeline is being constructed..</p>
<p>Multiple return arguments can also be used. For example, if the output was expected to be <code>PCollection&lt;KV&lt;float64, bool&gt;&gt;</code>, the return type could be <code>func(...) (float64, bool)</code>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">textio</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;wordcounts.txt&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">formatted</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>Finally <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio#Write"><code>textio.Write</code></a> takes the formatted <code>PCollection&lt;string&gt;</code> and writes it to a file named “wordcounts.txt&quot; with one line per element.</p>
<h2 id="running-the-pipeline">Running the pipeline</h2>
<p>To test the pipeline it can easily be run locally like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">go get github.com/apache/beam/sdks/go/examples/wordcount
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> <span class="nv">$GOPATH</span>/src/github.com/apache/beam/sdks/go/examples/wordcount
</span></span><span class="line"><span class="cl">go run wordcount.go --runner<span class="o">=</span>direct
</span></span></code></pre></div><p>To run in a more realistic way, it can be run on <a href="https://cloud.google.com/dataflow/">GCP Dataflow</a>. Before you do so, you need to create a GCP project, create a GCS bucket, enable the Cloud Dataflow APIs, and create a service account. This is documented on the <a href="https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python">Python quickstart guide</a>, under “Before you begin”.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">GOOGLE_APPLICATION_CREDENTIALS</span><span class="o">=</span><span class="nv">$PWD</span>/your-gcp-project.json
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">BUCKET</span><span class="o">=</span>your-gcs-bucket
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">PROJECT</span><span class="o">=</span>your-gcp-project
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> <span class="nv">$GOPATH</span>/src/github.com/apache/beam/sdks/go/examples/wordcount
</span></span><span class="line"><span class="cl">go run wordcount.go <span class="se">\
</span></span></span><span class="line"><span class="cl">    --runner dataflow <span class="se">\
</span></span></span><span class="line"><span class="cl">    --input gs://dataflow-samples/shakespeare/kinglear.txt <span class="se">\
</span></span></span><span class="line"><span class="cl">    --output gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/counts <span class="se">\
</span></span></span><span class="line"><span class="cl">    --project <span class="si">${</span><span class="nv">PROJECT</span><span class="p">?</span><span class="si">}</span> <span class="se">\
</span></span></span><span class="line"><span class="cl">    --temp_location gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/tmp/ <span class="se">\
</span></span></span><span class="line"><span class="cl">    --staging_location gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/binaries/ <span class="se">\
</span></span></span><span class="line"><span class="cl">    --worker_harness_container_image<span class="o">=</span>apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
</span></span></code></pre></div><p>If this works correctly you’ll see something similar to the following printed:</p>
<pre tabindex="0"><code>Cross-compiling .../wordcount.go as .../worker-1-1544590905654809000
Staging worker binary:  .../worker-1-1544590905654809000
Submitted job: 2018-12-11_21_02_29
Console: https://console.cloud.google.com/dataflow/job/2018-12-11...
Logs: https://console.cloud.google.com/logs/viewer?job_id%2F2018-12-11...
Job state: JOB_STATE_PENDING …
Job still running …
Job still running …
...
Job succeeded!
</code></pre><p>Let&rsquo;s take a moment to explain what’s going on, starting with the various flags. The <code>--runner dataflow</code> flag tells the Apache Beam SDK to run this on GCP Dataflow, including executing all the steps required to make that happen. This includes, compiling the code and uploading it to the <code>--staging_location</code>. Later the staged binary will be run by Dataflow under the <code>--project</code> project. As this will be running “in the cloud”, the pipeline will not be able to access local files. Thus for both the <code>--input</code> and <code> --output</code> flags are set to paths on GCS, as this is a convenient place to store files. Finally the <code>--worker_harness_container_image</code> flag specifies the docker image that Dataflow will use to host the workcount.go binary that was uploaded to the <code>--staging_location</code>.</p>
<p>Once wordcount.go is running, it prints out helpful information, such as links to the the Dataflow console. The console displays current progress as well as a visualization of the pipeline as a directed graph. The local wordcount.go continues to run only to display status updates. It can be interrupted at any time, but the pipeline will continue to run on Dataflow until it either succeeds or fails. Once that occurs, the logs link can provide useful information.</p>
<h1 id="art-history-more-complex-example">Art history (more complex example)</h1>
<div style="float: right; width: 300px">
	<img src="palette.png" width=300 height=411>
</div>
<p>Now we’ll construct a more complex pipeline, that demonstrates some other features of Beam and Dataflow. In this pipeline we will be taking 100,000 paintings from the last 600 years and processing them to extract information about their color palettes. Specifically the question we aim to answer is, “Has the color palettes of paintings change over the decades?”. This may not be a pipeline we run repeatedly, but it was a fun example, and demonstrates many advance topics.</p>
<p>We will skip over the details of the color extraction algorithm, and provide that in a later article. Here we’ll focus on how to create a pipeline to accomplish this task.</p>
<p>We start by reading a csv file that contains metadata for each painting, such as the artist, year it was painted, and a GCS path to a jpg of the painting. The paintings will then be grouped by the decade they were painted, and then the color palette for each group will be determined. Each palette will saved to a png file (DrawColorPalette), as well as all the palette saved to a single large json file (WriteIndex). To finish it off, the pipeline will be productionised, so it easier to debug, and re-run. The full source code is <a href="https://github.com/bramp/dataflow-art">available here</a>.</p>
<p>To start with, the main function for the pipeline looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/apache/beam/sdks/go/pkg/beam&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// If beamx or Go flags are used, flags must be parsed first.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">flag</span><span class="p">.</span><span class="nf">Parse</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// beam.Init() is an initialization hook that must called on startup. On</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// distributed runners, it is used to intercept control.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">beam</span><span class="p">.</span><span class="nf">Init</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">NewPipeline</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Root</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">buildPipeline</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nf">Background</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beamx</span><span class="p">.</span><span class="nf">Run</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">log</span><span class="p">.</span><span class="nf">Fatalf</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;Failed to execute job: %v&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>That is the standard boilerplate for a Beam pipeline, it parses the flags, initialises Beam, delegates the pipeline construction to <code>buildPipeline</code> function, and finally runs the pipeline.</p>
<p>The interesting code begins in the <code>buildPipeline</code> function, which constructs the pipeline, by passing PCollections from one function to the next. To build up the tree we see in the above diagram.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">buildPipeline</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// nothing -&gt; PCollection&lt;Painting&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintings</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">csvio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">(</span><span class="nx">Painting</span><span class="p">{}))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;Painting&gt; -&gt; PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintingsByGroup</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">GroupByDecade</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">paintings</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;CoGBK&lt;string, Painting&gt;&gt; -&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">//   (PCollection&lt;KV&lt;string, Histogram&gt;&gt;, PCollection&lt;KV&lt;string, string&gt;&gt;)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">histograms</span><span class="p">,</span><span class="w"> </span><span class="nx">errors1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">ExtractHistogram</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">paintingsByGroup</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Calculate the color palette for the combined histograms.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, Histogram&gt;&gt; -&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">//   (PCollection&lt;KV&lt;string, []color.RGBA&gt;&gt;, PCollection&lt;KV&lt;string, string&gt;&gt;)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">palettes</span><span class="p">,</span><span class="w"> </span><span class="nx">errors2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">CalculateColorPalette</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">histograms</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, []color.RGBA&gt;&gt; -&gt; PCollection&lt;KV&lt;string, string&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">errors3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">DrawColorPalette</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">outputPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">palettes</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, []color.RGBA&gt;&gt; -&gt; nothing</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">WriteIndex</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">Join</span><span class="p">(</span><span class="o">*</span><span class="nx">outputPrefix</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;index.json&#34;</span><span class="p">),</span><span class="w"> </span><span class="nx">palettes</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;KV&lt;string, string&gt;&gt; -&gt; nothing</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">WriteErrorLog</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;errors.log&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">errors1</span><span class="p">,</span><span class="w"> </span><span class="nx">errors2</span><span class="p">,</span><span class="w"> </span><span class="nx">errors3</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>To make it easy to follow, each function describes the step, and is annotated with a comment that explains what kind of PCollection is accepted and returned. Let&rsquo;s highlight some interesting steps.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">flag</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">&#34;index&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;art.csv&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;Index of the art.&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// Painting represents a single painting in the dataset.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">Painting</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Artist</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;artist&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Title</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;title&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Date</span><span class="w">   </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;date&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Genre</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;genre&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Style</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;style&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">Filename</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`csv:&#34;new_filename&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">buildPipeline</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// nothing -&gt; PCollection&lt;Painting&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintings</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">csvio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">(</span><span class="nx">Painting</span><span class="p">{}))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="o">...</span><span class="w">
</span></span></span></code></pre></div><p>The very first step uses <a href="https://godoc.org/github.com/bramp/morebeam/csvio#Read"><code>csvio.Read</code></a> to read the CSV file specified by the <code>--index</code> flag, and returns a PCollection of Painting structs. In all the examples we’ve seen before the PCollections only contains basic types, e.g. strings, ints, etc. More complex types, such as a slices and structs are allowed (but not maps and interfaces). This makes it easier to pass rich information between the PTransforms. The only caveat is the type must be JSON-serialisable. This is because in a distributed pipeline, the PTransforms could be processed on different machines, and the PCollection needs to be marshalled to be passed between them.</p>
<p>For Beam to successfully unmarshal your data, the types must also be registered. This is typically done within the init() function, by called <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#RegisterType"><code>beam.RegisterType</code></a>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">beam</span><span class="p">.</span><span class="nf">RegisterType</span><span class="p">(</span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">(</span><span class="nx">Painting</span><span class="p">{}))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>If you forget to register the type, a error will occur at Runtime, for example:</p>
<pre tabindex="0"><code>java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -224: execute failed: panic: reflect: Call using main.Painting as type struct { Artist string; Title string; ... } goroutine 70 [running]:
</code></pre><p>This can be a little frustrating, as when running the pipeline locally with the <code>direct</code> runner, it does not marshal your data, so errors like this aren’t exposed until running on Dataflow.</p>
<p>Now we have a collection of Paintings, we group them by decade:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// GroupByDecade takes a PCollection&lt;Painting&gt; and returns a </span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// PCollection&lt;CoGBK&lt;string, Painting&gt;&gt; of the paintings group by decade.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">GroupByDecade</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">,</span><span class="w"> </span><span class="nx">paintings</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="s">&#34;GroupBy Decade&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;Painting&gt; -&gt; PCollection&lt;KV&lt;string, Painting&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">paintingsWithKey</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">AddKey</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">art</span><span class="w"> </span><span class="nx">Painting</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="nx">art</span><span class="p">.</span><span class="nf">Decade</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">},</span><span class="w"> </span><span class="nx">paintings</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// PCollection&lt;string, Painting&gt; -&gt; PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">GroupByKey</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">paintingsWithKey</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>The first line in this function, <code>s.Scope(&quot;GroupBy Decade&quot;)</code> allows us to name this step, and group multiple sub-steps. For example, in the above diagram “GroupBy Decade” is a single step, which can be expanded to show a <a href="https://godoc.org/github.com/bramp/morebeam#AddKey"><code>AddKey</code></a> and <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey"><code>GroupByKey</code></a> step.</p>
<p><code>GroupByDecade</code> returns a <code>PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</code>. The CoGBK, is short for <strong>Co</strong>mmon <strong>G</strong>roup <strong>B</strong>y <strong>K</strong>ey. It is a special collection, where (as you’ll see later) each element is a tuple of a key, and an iterable collection of elements. The key in this case is the decade the painting was painted. The <code>PCollection&lt;Painting&gt;</code> is transformed into a <code>PCollection&lt;KV&lt;String,Painting&gt;&gt;</code> by the <a href="https://godoc.org/github.com/bramp/morebeam#AddKey"><code>morebeam.AddKey</code></a> step, adding a key to each value. Then the <code>GroupByKey</code> will use that key to produce the final PCollection.</p>
<p>Next up is the <code>ExtractHistogram</code>, which takes the <code>PCollection&lt;CoGBK&lt;string, Painting&gt;&gt;</code>, and returns two PCollections. The first PCollection is a <code>PCollection&lt;KV&lt;string, Histogram&gt;&gt;</code>, which contains a <a href="https://en.wikipedia.org/wiki/Color_histogram">color histogram</a> for every decade of paintings. The second PCollection is related to error handling, and will be explained later.</p>
<p>The ExtractHistogram function demonstrates three new concepts, “Stateful functions”, “Data enrichment”, and “Error handling”.</p>
<h2 id="stateful-functions">Stateful functions</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">artPrefix</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">flag</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">&#34;art&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;gs://mybucket/art&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;Path to where the art is kept.&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">beam</span><span class="p">.</span><span class="nf">RegisterType</span><span class="p">(</span><span class="nx">reflect</span><span class="p">.</span><span class="nf">TypeOf</span><span class="p">((</span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)(</span><span class="kc">nil</span><span class="p">)).</span><span class="nf">Elem</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">extractHistogramFn</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ArtPrefix</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`json:&#34;art_prefix&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">fs</span><span class="w"> </span><span class="nx">filesystem</span><span class="p">.</span><span class="nx">Interface</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// ExtractHistogram calculates the color histograms for all the Paintings in</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// the CoGBK.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">ExtractHistogram</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">,</span><span class="w"> </span><span class="nx">files</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">(</span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">,</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="s">&#34;ExtractHistogram&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo2</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">extractHistogramFn</span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">ArtPrefix</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">artPrefix</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">},</span><span class="w"> </span><span class="nx">files</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Instead of passing a simple function to <code>beam.ParDo</code>, a struct containing two fields is passed. The exported field, <code>ArtPrefix</code> is the path to where the painting jpgs are stored, and the unexported field, <code>fs</code>, is a filesystem client for reading these jpgs.</p>
<p>When the pipeline runs, no global variables are allowed, including the command line flag variables. For example, when running this pipeline we may start it like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">go run main.go <span class="se">\
</span></span></span><span class="line"><span class="cl">  --art gs://<span class="si">${</span><span class="nv">BUCKET</span><span class="p">?</span><span class="si">}</span>/art/ <span class="se">\
</span></span></span><span class="line"><span class="cl">  --runner dataflow <span class="se">\
</span></span></span><span class="line"><span class="cl">  ...
</span></span></code></pre></div><p>When the code actually runs on the Dataflow workers, the <code>--art</code> flag is not specified. Thus the <code>*artPrefix</code> value will use the default value. To pass this to the Dataflow workers, it must be part of the DoFn struct that is passed to <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a>. So in this example, we create a <code>extractHistogramFn</code> struct, with the exported <code>ArtPrefix</code> field set to the value of the <code>--art</code> flag. This <code>extractHistogramFn</code> is then marshalled and passed to the workers. As with the unmarshalled PCollection values, the extractHistogramFn must also be registered with beam during <code>init</code>.</p>
<p>When the pipeline executes this step it calls the <code>extractHistogramFn</code>’s <code>ProcessElement</code> method. This method works in a similar way to a simple DoFn functions. The arguments and return value are reflected at runtime and mapped to the PCollections being processed and returned.</p>
<h2 id="iterating-over-a-cogbk">Iterating over a CoGBK</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">ProcessElement</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">key</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="o">*</span><span class="nx">Painting</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">errors</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">))</span><span class="w"> </span><span class="nx">HistogramResult</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">log</span><span class="p">.</span><span class="nf">Infof</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;%q: ExtractHistogram started&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">var</span><span class="w"> </span><span class="nx">art</span><span class="w"> </span><span class="nx">Painting</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">for</span><span class="w"> </span><span class="nf">values</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">art</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">Join</span><span class="p">(</span><span class="nx">fn</span><span class="p">.</span><span class="nx">ArtPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">art</span><span class="p">.</span><span class="nx">Filename</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">h</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nf">extractHistogram</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="err">…</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">result</span><span class="p">.</span><span class="nx">Histogram</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">result</span><span class="p">.</span><span class="nx">Histogram</span><span class="p">.</span><span class="nf">Combine</span><span class="p">(</span><span class="nx">h</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p><code>ProcessElement</code> is called once for every unique group in the <code>PCollection&lt;CoGBK&lt;string, Painting&gt;</code>. The <code>key string</code> argument will be the key for that group, and a <code>values func(*Painting) bool</code> is used to iterate all values within the group. The contact is that <code>values</code> is passed a pointer to a <code>Painting</code> struct, which is populated on each iteration. As long as there are more paintings to process in the group the values function returns true. Once it returns false, the group has been fully processed. This iterator pattern is unique to the <code>CoGBK</code> and makes it convient to apply an operation to every element in the group.</p>
<p>In this case, <code>extractHistogram</code> is called for each Painting, fetches a jpg of the artwork, and extract a [histogram of colors]((<a href="https://en.wikipedia.org/wiki/Color_histogram)">https://en.wikipedia.org/wiki/Color_histogram)</a>. The histograms from all painting in that group are combined, and finally one result is per group is returned.</p>
<h2 id="data-enrichment">Data enrichment</h2>
<p>Reading the paintings from an external service (such as <a href="https://cloud.google.com/storage/">GCS</a>) demonstrates a data enrichment step. This is where an external service is used to “enrich” the dataset the pipeline is processing. You could imagine a user service being called when processing log entries, or a product service when processing purchases. It should be noted, that any external action should be <a href="https://en.wikipedia.org/wiki/Idempotence">idempotent</a>. If a worker fails, it is possible the same element is retried, and thus processed multiple times. Dataflow keeps track of failures and ensures the final result only has each element processed once.</p>
<p>When calling a remote service, typically some kind of client is needed to make the request. In this pipeline we read the images from GCS, thus setting up GCS client at startup is useful. Since we are using a struct based DoFn, there are some additional methods that can be defined.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">Setup</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">fn</span><span class="p">.</span><span class="nx">fs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">filesystem</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">ArtPrefix</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;filesystem.New(%q) failed: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">ArtPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">Teardown</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">fs</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>When the DoFn is initialized on the worker, the <code>Setup</code> method is called. Here a new <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/filesystem">Filesystem client</a> is created and store it in the struct’s <code>fs</code> field. Later, when the DoFn is no longer needed, the <code>Teardown</code> method is called, giving us opportunity to cleanup the client. With all things distributed, don’t expect the <code>Teardown</code> to ever be called.</p>
<p>There are also some simple best practices around error handling that should be following when calling an external services.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="o">*</span><span class="nx">extractHistogramFn</span><span class="p">)</span><span class="w"> </span><span class="nf">extractHistogram</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">palette</span><span class="p">.</span><span class="nx">Histogram</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">cancel</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nf">WithTimeout</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="mi">30</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">defer</span><span class="w"> </span><span class="nf">cancel</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">fd</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">fs</span><span class="p">.</span><span class="nf">OpenRead</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;fs.OpenRead(%q) failed: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">defer</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">img</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">image</span><span class="p">.</span><span class="nf">Decode</span><span class="p">(</span><span class="nx">fd</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Errorf</span><span class="p">(</span><span class="s">&#34;image.Decode(%q) failed: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">palette</span><span class="p">.</span><span class="nf">NewColorHistogram</span><span class="p">(</span><span class="nx">img</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>The function begins by using a <a href="https://golang.org/pkg/context/#WithTimeout"><code>context.WithTimeout</code></a>. This ensures that if the external service does not respond in a timely manner the context will be cancelled and a error returned. If this timeout wasn’t set, the external call may never end, and the pipeline never terminates.</p>
<p>Since the pipeline could be running across 100s of machines, it could generate significant load on a remote service. It is wise to implement appropriate <a href="https://cloud.google.com/storage/docs/exponential-backoff">backoff and retry logic</a>. In some cases even <a href="https://cloud.google.com/service-infrastructure/docs/rate-limiting">rate limiting</a> your pipeline’s execution, or tagging your pipeline’s traffic at a <a href="https://www.usenix.org/conference/srecon17asia/program/presentation/sheerin">lower QoS</a> so it can be easily shed.</p>
<p>The external service, may also return permanent errors. Thus a more robust error handling pattern is needed.</p>
<h2 id="error-handling-and-dead-letters">Error handling and dead letters</h2>
<p>When Beam processes a PCollection, it bundles up multiple elements and processes one bundle at a time. If the PTransform return an error, panics, or otherwise fails (such as running out of memory), the full bundle is retried. With Dataflow, bundles are <a href="https://cloud.google.com/dataflow/docs/resources/faq#how-are-java-exceptions-handled-in-cloud-dataflow">retried up to four times</a>, after which the entire pipeline is aborted. This can be inconvenient, so where appropriate instead of returning an error we we use a <a href="https://en.wikipedia.org/wiki/Dead_letter_queue">dead letter queue</a>. This is a new PCollection that collects processing errors. These errors can then be persisted at the end of the pipeline, manually inspected, and processed again later.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="k">return</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo2</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">extractHistogramFn</span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">ArtPrefix</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">artPrefix</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">},</span><span class="w"> </span><span class="nx">files</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>A keen observer would have noticed that <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo2"><code>beam.ParDo2</code></a> was used by ExtractHistogram, instead of <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo"><code>beam.ParDo</code></a>. This function works the same, but returns two PCollections. In our case, the first is the normal output, and the second is a <code>PCollection&lt;KV&lt;string, string&gt;&gt;</code>. This second collection is keyed on the unique identifer of the painting having an issue, and the value is the error message.</p>
<p>Since returning a error is optional, the errors PCollection was passed to <code>extractHistogramFn</code>’s <code>ProcessElement</code> as a <code>errors func(string, string)</code>.</p>
<p>Throughout we use this kind of error PCollections from every stage, and at the end of the pipeline they are collected together and output to a single errors log file:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// WriteErrorLog takes multiple PCollection&lt;KV&lt;string,string&gt;&gt;s combines them</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// and writes them to the given filename.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">WriteErrorLog</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nx">Scope</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="w"> </span><span class="o">...</span><span class="nx">beam</span><span class="p">.</span><span class="nx">PCollection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;Write %q&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">Flatten</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="o">...</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;%s,%s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">},</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">textio</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">morebeam</span><span class="p">.</span><span class="nf">Join</span><span class="p">(</span><span class="o">*</span><span class="nx">outputPrefix</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">),</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Since the output is key, comma, value, the file can easily be re-read to try just the failed keys.</p>
<p>The rest of the pipeline is much of the same, and thus won’t be explained in detail. <code>CalculateColorPalette</code> takes the color histograms and runs a K-Means clustering algorithm to extract the color palettes for those paintings. Those palettes are written out to png files with the <code>DrawColorPalette</code>, and finally all the palettes are written out to a JSON file in <code>WriteIndex</code>.</p>
<h2 id="gotchas">Gotchas</h2>
<h3 id="marshing">Marshing</h3>
<p>Always remember to register the types that will be transmitted between workers. This is anything that’s inside a PCollection, as well as any DoFn. Not all types are allowed, but slices, structs, and primitives are. For other types, custom JSON marshalling can be used.</p>
<p>It should also be reminded that global state is not allowed. Flags and other global variables will not always be populated when running on a remote worker. Also, examples like this may catch you out:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">prefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="err">“</span><span class="nx">X</span><span class="err">”</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Scope</span><span class="p">(</span><span class="err">“</span><span class="nx">Prefix</span><span class="w"> </span><span class="err">”</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">prefix</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">beam</span><span class="p">.</span><span class="nf">ParDo</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">value</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">},</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>This simple example appears to add “X” to the beginning of each element, however, it will prefix nothing. This is because, the simple anonymous function is marshalled, and unmarshalled on the worker. When it is then invoked on the worker, it does not have the closure, and thus has not captured the value of prefix. Instead prefix is the zero value. For this example to work, prefix must be defined inside the anonymous function, or a DoFn struct used which contains the prefix as a marshalled field.</p>
<h3 id="errors">Errors</h3>
<p>Since the pipeline could be running across 100s of workers, errors are to be expected. Extensively using  <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/log#Infof"><code>log.Infof</code></a>, <a href="https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/log#Debugf"><code>log.Debugf</code></a>, etc will make your live better. They can make it very easy to debug why the pipeline got stuck, or mysteriously failed.</p>
<p>While debugging this pipeline, it would occasionally fail due to exceeding the memory limits of the Dataflow worker’s. Standard Go infrastructure can be used to help debug this, such as <a href="https://golang.org/pkg/net/http/pprof/">pprof</a>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;net/http&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">_</span><span class="w"> </span><span class="s">&#34;net/http/pprof&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="c1">// HTTP Server for pprof (and other debugging)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">log</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nf">ListenAndServe</span><span class="p">(</span><span class="s">&#34;localhost:8080&#34;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This configures a webserver which can export useful stats, and used for grabbing pprof profiling data.</p>
<h3 id="difference-between-direct-and-dataflow-runners">Difference between direct and dataflow runners</h3>
<p>Running the pipeline locally is a quick way to validate the pipeline is setup, and that is runs as expected. However, running locally won’t run the pipeline in parallel, and it is obviously constrained to a single machine. There are some other difference, mostly around marshalling data. It’s always a good idea to test on Dataflow, perhaps with a smaller or sampled dataset as input, that can be used as a smoke test.</p>
<h1 id="conclusion">Conclusion</h1>
<p>This article has covered the basics of creating an Apache Beam pipeline with the Go SDK, while also covering some more advanced topics. The results of the specific pipeline will be revealed in a later article, until then the <a href="https://github.com/bramp/dataflow-art">code is available here</a>.</p>
<p>While the Beam Go SDK is still experimental, there are many great tutorials and example using the more mature Java and Python Beam SDKs [<a href="https://medium.com/google-cloud/popular-java-projects-on-github-that-could-use-some-help-analyzed-using-bigquery-and-dataflow-dbd5753827f4">1</a>, <a href="https://medium.com/@vallerylancey/error-handling-elements-in-apache-beam-pipelines-fffdea91af2a">2</a>]. Google themselves even published a series of generic articles [<a href="https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1">part 1</a>, <a href="https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-2">part 2</a>] explaining common use cases.</p>
</description>
    </item>
    
    <item>
      <title>Google Font Features</title>
      <link>https://blog.bramp.net/post/2018/01/21/google-font-features/</link>
      <pubDate>Sun, 21 Jan 2018 16:03:36 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2018/01/21/google-font-features/</guid>
      <description><blockquote>
<p><strong>tl;dr Google Fonts doesn&rsquo;t supply fonts with OpenType features (such as old-style figures, or small-caps), but you can build and host the fonts yourself to support everything you need.</strong></p>
</blockquote>
<p>I recently posted a <a href="https://blog.bramp.net/post/2018/01/16/measuring-percentile-latency/">article which contained lots of numbers</a>. While I was proofreading the article, I didn’t quite liked how the numbers looked, sometime the digits were below the baseline, for example:</p>
<figure><img src="/post/2018/01/21/google-font-features/oldstyle.png" width="760" height="157"><figcaption>
      <h4>Oldstyle figures</h4>
    </figcaption>
</figure>

<p>Where I would have expected the top and bottom of each digit to be aligned:</p>
<figure><img src="/post/2018/01/21/google-font-features/lining.png" width="760" height="152"><figcaption>
      <h4>Lining figures</h4>
    </figcaption>
</figure>

<p>This made me flashback to all the typography I learnt when <a href="https://github.com/bramp/publication">working with LaTeX</a>. These two styles of figures are called old-style, and lining (or sometimes lowercase and uppercase numbers). The theory is that old-style numbers flow better when mixed with text. Recall, letters like q, j and p, all drop below the baseline, which makes the text nicer to read:</p>
<figure><img src="/post/2018/01/21/google-font-features/quickbrownfox.png" width="760" height="100"><figcaption>
      <h4>Example with characters below the baseline</h4>
    </figcaption>
</figure>

<p>However, my article had many numbers on the page, sometimes within tables, where old-style just made the numbers look odd. I looked for a way to force the lining style throughout. I quickly found the CSS styling:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css"><span class="line"><span class="cl"><span class="nt">body</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">           <span class="k">font-variant-numeric</span><span class="p">:</span> <span class="n">lining-nums</span><span class="p">;</span> 
</span></span><span class="line"><span class="cl">  <span class="kp">-webkit-</span><span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">     <span class="kp">-moz-</span><span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">      <span class="kp">-ms-</span><span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">          <span class="k">font-feature-settings</span><span class="p">:</span> <span class="s2">&#34;lnum&#34;</span> <span class="kc">on</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Sadly when I applied this to my site, it did nothing. I wondered if perhaps the font did not support lining figures. A quick search led me to <a href="https://stackoverflow.com/questions/28098992/google-fonts-lining-numbers">Stack Overflow</a> that implied both the font I was using, <a href="https://fonts.google.com/specimen/Raleway">Raleway</a>, and Google Fonts (which hosted the font) did in fact support lining.</p>
<p>So I went deeper down the rabbit hole to figure out what was going wrong. I wanted to confirm for myself that the font supported lining figures. I searched for a while for a simple CLI that would inspect the <a href="https://en.wikipedia.org/wiki/Web_Open_Font_Format">WOFF</a>/<a href="https://en.wikipedia.org/wiki/TrueType">TTF</a> files and tell me what they contained. Sadly, the best I could find was <a href="https://fontforge.github.io/">FontForge</a>, a GUI. That worked, and confirmed the fonts being served by Google did not contain the lining feature, or in fact any feature other than basic ligatures.</p>
<p>Later I found this <a href="https://github.com/google/fonts/issues/1335">GitHub issue</a> which confirmed all features were stripped from the font. So I sought out a way to rebuild the Google font to keep the lining figures.</p>
<p>Before that, I started to <a href="http://sethgodin.typepad.com/seths_blog/2005/03/dont_shave_that.html">shave another yak</a>, and decided to create a CLI tool that would easily display the font features. I came across a Go library, <a href="https://github.com/ConradIrwin/font">SFNT</a> that can parse OpenType fonts. Sadly it didn’t implement the parsing of the features. A few hours later, I read the <a href="http://www.adobe.com/devnet/opentype/afdko/topic_feature_file_syntax.html">OpenType spec</a> and sent them a <a href="https://github.com/ConradIrwin/font/pull/3">pull request</a> to add this functionality. Now I can easily confirm from the command line what features are supported.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ font features raleway-v12-latin-ext_latin-regular.woff
</span></span><span class="line"><span class="cl">Glyph Substitution Table <span class="o">(</span>GSUB<span class="o">)</span>:
</span></span><span class="line"><span class="cl">	Script <span class="s2">&#34;latn&#34;</span> <span class="o">(</span>Latin<span class="o">)</span>:
</span></span><span class="line"><span class="cl">		Default Language:
</span></span><span class="line"><span class="cl">			Feature <span class="s2">&#34;liga&#34;</span> <span class="o">(</span>Standard Ligatures<span class="o">)</span>
</span></span></code></pre></div><p>I decided to play around with <a href="https://developers.google.com/fonts/docs/developer_api">Google Font API</a>, and then eventually the unoffical (but awesome) <a href="https://google-webfonts-helper.herokuapp.com/fonts/raleway">google-webfonts-helper</a> (a hassle-free way to self-host Google Fonts). However, no combination of options would make the font contain the lining figures.</p>
<p>Since the Google Fonts are open source, I downloaded the <a href="https://github.com/google/fonts/tree/master/ofl/raleway">source TTF of the font</a>, and double-checked it did indeed contain the feature:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ font features Raleway-Regular.ttf 
</span></span><span class="line"><span class="cl">Glyph Substitution Table <span class="o">(</span>GSUB<span class="o">)</span>:
</span></span><span class="line"><span class="cl">  Script <span class="s2">&#34;latn&#34;</span> <span class="o">(</span>Latin<span class="o">)</span>:
</span></span><span class="line"><span class="cl">    Default Language:
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;aalt&#34;</span> <span class="o">(</span>Access All Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;dlig&#34;</span> <span class="o">(</span>Discretionary Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;liga&#34;</span> <span class="o">(</span>Standard Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;lnum&#34;</span> <span class="o">(</span>Lining Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;onum&#34;</span> <span class="o">(</span>Oldstyle Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;salt&#34;</span> <span class="o">(</span>Stylistic Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;smcp&#34;</span> <span class="o">(</span>Small Capitals<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss01&#34;</span> <span class="o">(</span>Stylistic Set 1<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss02&#34;</span> <span class="o">(</span>Stylistic Set 2<span class="o">)</span>
</span></span></code></pre></div><p>So my next idea was to take the original Raleway-Regular.ttf and convert it to <a href="https://en.wikipedia.org/wiki/Web_Open_Font_Format">WOFF</a> and <a href="https://www.w3.org/TR/WOFF2/">WOFF2</a>, and strip out the bits I don’t need. Just how Google Fonts does, to ensure the resulting files are lean and performant.</p>
<p>I couldn’t find the pipeline Google Fonts uses to process the files, so I instead took it upon myself to figure this out. I started by using <code>pyftsubset</code> (part of <a href="https://github.com/fonttools/fonttools">FontTools</a>) to remove unneeded character sets, features, and other parts from the original TTF file.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ pip install fonttools
</span></span><span class="line"><span class="cl">$ pyftsubset Raleway-Regular.ttf --layout-features<span class="o">=</span><span class="s1">&#39;*&#39;</span> --unicodes<span class="o">=</span><span class="s2">&#34;U+0000-00FF, U+0100-024F, U+0131, U+0152-0153, U+02DA, U+02DC, U+02BB-02BC, U+02C6, U+0259, U+0370-03FF, U+1E00-1EFF, U+2000-206F, U+2070-209F, U+2074, U+20A0-20CF, U+2122, U+2150-218F, U+2200-22FF, U+2C60-2C7F, U+A720-A7FF&#34;</span> --output-file<span class="o">=</span>Raleway-Regular.subset.ttf
</span></span></code></pre></div><p>Now I had a TTF file with all the features, but only the subset of characters I use on my site. Next I needed to convert this this file to all the <a href="https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/webfont-optimization">recommended font formats</a>, so my site would look nice in IE, Chrome, Android and iOS. The resulting CSS would look like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css"><span class="line"><span class="cl"><span class="p">@</span><span class="k">font-face</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">font-family</span><span class="o">:</span> <span class="s1">&#39;Raleway&#39;</span><span class="o">;</span>
</span></span><span class="line"><span class="cl">  <span class="nt">src</span><span class="o">:</span> <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.eot&#39;</span><span class="o">);</span>                           <span class="c">/* IE9 Compat Modes */</span>
</span></span><span class="line"><span class="cl">  <span class="nt">src</span><span class="o">:</span> <span class="nt">local</span><span class="o">(</span><span class="s1">&#39;Raleway&#39;</span><span class="o">),</span> <span class="nt">local</span><span class="o">(</span><span class="s1">&#39;Raleway-Regular&#39;</span><span class="o">),</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.eot?#iefix&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;embedded-opentype&#39;</span><span class="o">),</span> <span class="c">/* IE6-IE8 */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.woff2&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;woff2&#39;</span><span class="o">),</span>    <span class="c">/* Super Modern Browsers */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.woff&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;woff&#39;</span><span class="o">),</span>     <span class="c">/* Pretty Modern Browsers */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.ttf&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;truetype&#39;</span><span class="o">),</span>    <span class="c">/* Safari, Android, iOS */</span>
</span></span><span class="line"><span class="cl">       <span class="nt">url</span><span class="o">(</span><span class="s1">&#39;raleway-regular.subset.svg#ralewayregular&#39;</span><span class="o">)</span> <span class="nt">format</span><span class="o">(</span><span class="s1">&#39;svg&#39;</span><span class="o">);</span>    <span class="c">/* Legacy iOS */</span>
</span></span><span class="line"><span class="cl">  <span class="nt">font-style</span><span class="o">:</span> <span class="nt">normal</span><span class="o">;</span>
</span></span><span class="line"><span class="cl">  <span class="nt">font-weight</span><span class="o">:</span> <span class="nt">400</span><span class="o">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>I again tried to use <code>pyftsubset</code> to save the files in the required formats. This worked well for TTF, WOFF, and WOFF2. But didn’t support <a href="https://en.wikipedia.org/wiki/Embedded_OpenType">EOT</a> or <a href="http://caniuse.com/svg-fonts">SVG</a> fonts:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ pip install zopfli
</span></span><span class="line"><span class="cl">$ pip install brotli
</span></span><span class="line"><span class="cl">$ pyftsubset ... --flavor<span class="o">=</span>woff --with-zopfli --output-file<span class="o">=</span>Raleway-Regular.subset.woff
</span></span><span class="line"><span class="cl">$ pyftsubset ... --flavor<span class="o">=</span>woff2 --output-file<span class="o">=</span>Raleway-Regular.subset.woff2
</span></span></code></pre></div><p>So instead I searched for a all-in-one solution to converting fonts. I found numerous websites that offered to do it, the one I settled on was <a href="https://www.fontsquirrel.com/tools/webfont-generator">fontsquirrel.com</a>. Here I used the expert feature, to control exactly what was in the font, and to produce compressed versions in all file formats. I originally tried to use the subsetting feature on fontsquirrel, but I couldn’t get it to maintain all the features I needed, so I used <code>pyftsubset</code> locally instead.</p>
<p>After fontsquirrel.com produced the fonts, I checked it contained the features, and compared the resulting file sizes:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ ls -ltr
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Google Fonts</span>
</span></span><span class="line"><span class="cl"> 96K  raleway-v12-latin-ext_latin-regular.ttf
</span></span><span class="line"><span class="cl"> 40K  raleway-v12-latin-ext_latin-regular.woff
</span></span><span class="line"><span class="cl"> 31K  raleway-v12-latin-ext_latin-regular.woff2
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># My versions</span>
</span></span><span class="line"><span class="cl">140K raleway-regular.subset-webfont.ttf
</span></span><span class="line"><span class="cl"> 61K raleway-regular.subset-webfont.woff
</span></span><span class="line"><span class="cl"> 46K raleway-regular.subset-webfont.woff2
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ font features raleway-regular.subset-webfont.woff
</span></span><span class="line"><span class="cl">Glyph Substitution Table <span class="o">(</span>GSUB<span class="o">)</span>:
</span></span><span class="line"><span class="cl">  Script <span class="s2">&#34;latn&#34;</span> <span class="o">(</span>Latin<span class="o">)</span>:
</span></span><span class="line"><span class="cl">    Default Language:
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;aalt&#34;</span> <span class="o">(</span>Access All Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;dlig&#34;</span> <span class="o">(</span>Discretionary Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;liga&#34;</span> <span class="o">(</span>Standard Ligatures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;lnum&#34;</span> <span class="o">(</span>Lining Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;onum&#34;</span> <span class="o">(</span>Oldstyle Figures<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;salt&#34;</span> <span class="o">(</span>Stylistic Alternates<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;smcp&#34;</span> <span class="o">(</span>Small Capitals<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss01&#34;</span> <span class="o">(</span>Stylistic Set 1<span class="o">)</span>
</span></span><span class="line"><span class="cl">      Feature <span class="s2">&#34;ss02&#34;</span> <span class="o">(</span>Stylistic Set 2<span class="o">)</span>
</span></span></code></pre></div><p>The file size didn&rsquo;t vary too much, and thus it was a simple matter of <a href="https://blog.bramp.net/fonts/raleway-regular.subset-webfont.woff2">uploading the fonts</a> to my blog, and updating the CSS.</p>
<p class="text-center">
<span class="onum" style="text-decoration: red underline overline; font-size: 3.5em">1234567890</span> &nbsp;vs&nbsp; <span class="lnum" style="text-decoration: red underline overline; font-size: 3.5em">1234567890</span>
</p></description>
    </item>
    
    <item>
      <title>Parsing with Antlr4 and Go</title>
      <link>https://blog.bramp.net/post/2017/12/16/parsing-with-antlr4-and-go/</link>
      <pubDate>Sat, 16 Dec 2017 12:50:31 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/12/16/parsing-with-antlr4-and-go/</guid>
      <description><p><em>Originally <a href="https://blog.gopheracademy.com/advent-2017/parsing-with-antlr4-and-go/">published</a> as part of the Go Advent 2017 series</em></p>
<h2 id="what-is-antlr">What is ANTLR?</h2>
<p><a href="http://www.antlr.org">ANTLR</a> (ANother Tool for Language Recognition),
is an <a href="http://www.antlr.org/papers/allstar-techreport.pdf">ALL(*)</a>
<a href="https://en.wikipedia.org/wiki/Parser_generator">parser generator</a>. In
layman&rsquo;s terms, Antlr, creates parsers in a number of languages (Go,
Java, C, C#, Javascript), that can process text or binary input. The
generated parser provides a callback interface to parse the input in an
event-driven manner, which can be used as-is, or used to build parse
trees (a data structure representing the input).</p>
<p>ANTLR is used by a number of popular projects, e.g Hive and Pig use it
to parse Hadoop queries, Oracle and NetBeans uses it for their IDEs, and
Twitter even uses it to understand search queries. Support was recently
added so that ANTLR 4 can be used to generate parsers in pure Go. This
article will explain some of the benefits of ANTLR, and walk us through
a simple example.</p>
<h2 id="why-use-it">Why use it?</h2>
<p>It is possible to <a href="https://blog.gopheracademy.com/advent-2014/parsers-lexers/">hand write a
parser</a>, but
this process can be complex, error prone, and hard to change. Instead
there are many [parser generators](<a href="https://en.wikipedia.org/wiki/Compari">https://en.wikipedia.org/wiki/Compari</a>
son_of_parser_generators) that take a grammar expressed in an domain-
specific way, and generates code to parse that language. Popular parser
generates include <a href="https://www.gnu.org/software/bison/">bison</a> and
<a href="http://dinosaur.compilertools.net/yacc/">yacc</a>. In fact, there is a
version of yacc, goyacc, which is written in Go and was part of the main
go repo until it was moved to
<a href="https://godoc.org/golang.org/x/tools/cmd/goyacc">golang.org/x/tools</a>
last year.</p>
<h3 id="so-why-use-antlr-over-these">So why use ANTLR over these?</h3>
<ul>
<li>
<p>ANTLR has a <a href="http://www.antlr.org/tools.html">suite of tools</a>, and
<a href="http://tunnelvisionlabs.com/products/demo/antlrworks">GUIs</a>, that
makes writing and debugging grammars easy.</p>
</li>
<li>
<p>It uses a simple <a href="https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form">EBNF</a>
syntax to define the grammar, instead of a bespoke configuration
language.</p>
</li>
<li>
<p>ANTLR is an <a href="http://www.antlr.org/papers/allstar-techreport.pdf">Adaptive</a>
<a href="https://en.wikipedia.org/wiki/LL_parser">LL(*) parser</a>, ALL(*) for short,
whereas most other parser generators (e.g Bison and Yacc) are
<a href="https://en.wikipedia.org/wiki/LALR_parser">LALR</a>. The difference
between LL(*) and LALR is out of scope for this article, but
simply LALR works bottom-up, and LL(*) works top-down. This
has a bearing on how the grammar is written, making some languages
easier or harder to express.</p>
</li>
<li>
<p>The generated code for a LL(*) parser is more understandable than a
LALR parser. This is because LALR parsers are commonly table driven,
whereas LL(*) parsers encode the logic in its control flow, making
it more comprehensible.</p>
</li>
<li>
<p>Finally ANTLR is agnostic to the target language. A single grammar
can be used to generate parsers in Java, Go, C, etc. Unlike
Bison/Yacc which typically embeds target language code into the
grammar, making it harder to port.</p>
</li>
</ul>
<h2 id="installing-antlr-v4">Installing ANTLR v4</h2>
<p>ANTLR is a Java 1.7 application, that generates the Go code needed to
parse your language. During development Java is needed, but once the
parser is built only Go and the <a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/antlr">ANTLR runtime
library</a> is
required. The ANTLR site has
[documentation](<a href="https://github.com/antlr/antlr4/blob/master/doc/getting-">https://github.com/antlr/antlr4/blob/master/doc/getting-</a>
started.md) on how to install this on multiple platforms, but in brief,
you can do the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ wget http://www.antlr.org/download/antlr-4.7-complete.jar
</span></span><span class="line"><span class="cl">$ <span class="nb">alias</span> <span class="nv">antlr</span><span class="o">=</span><span class="s1">&#39;java -jar $PWD/antlr-4.7-complete.jar&#39;</span>
</span></span></code></pre></div><p>The <code>antlr</code> command is now available in your shell. If you prefer, the
.jar file can be placed into a <code>~/bin</code> directory, and the alias can be
stored in your <code>~/.bash_profile</code>.</p>
<h2 id="classic-calculator-example">Classic calculator example</h2>
<p>Let&rsquo;s start with the “hello world” for parsers, the calculator example.
We want to build a parser that handles simple mathematical expressions
such as <code>1 + 2 * 3</code>. The focus of this article is on how to use Go with
ANTLR, so the syntax of the ANTLR language won’t be explained in
detail, but the ANTLR site has [compressive documentation](https://githu
b.com/antlr/antlr4/blob/master/doc/grammars.md).</p>
<p>As we go along, the <a href="https://github.com/bramp/goadvent-antlr">source is available to all
examples</a>.</p>
<pre tabindex="0"><code>// Calc.g4
grammar Calc;

// Tokens
MUL: &#39;*&#39;;
DIV: &#39;/&#39;;
ADD: &#39;+&#39;;
SUB: &#39;-&#39;;
NUMBER: [0-9]+;
WHITESPACE: [ \r\n\t]+ -&gt; skip;

// Rules
start : expression EOF;

expression
   : expression op=(&#39;*&#39;|&#39;/&#39;) expression # MulDiv
   | expression op=(&#39;+&#39;|&#39;-&#39;) expression # AddSub
   | NUMBER                             # Number
   ;
</code></pre><p>The above is a simple grammar split into two sections, <em>tokens</em>, and
<em>rules</em>. The tokens are terminal symbols in the grammar, that is, they
are made up of nothing but literal characters. Whereas rules are non-
terminal states made up of tokens and/or other rules.</p>
<p>By convention this grammar must be saved with a filename that matches
the name of the grammar, in this case “Calc.g4” . To process this file,
and generate the Go parser, we run the <code>antlr</code> command like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ antlr -Dlanguage<span class="o">=</span>Go -o parser Calc.g4 
</span></span></code></pre></div><p>This will generate a set of Go files in the “parser” package and
subdirectory. It is possible to place the generated code in a different
package by using the <code>-package &lt;name&gt;</code> argument. This is useful if your
project has multiple parsers, or you just want a more descriptive
package name for the parser. The generated files will look like the
following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">$ tree
</span></span><span class="line"><span class="cl">├── Calc.g4
</span></span><span class="line"><span class="cl">└── parser
</span></span><span class="line"><span class="cl">    ├── calc_lexer.go
</span></span><span class="line"><span class="cl">    ├── calc_parser.go
</span></span><span class="line"><span class="cl">    ├── calc_base_listener.go
</span></span><span class="line"><span class="cl">    └── calc_listener.go
</span></span></code></pre></div><p>The generated files consist of three main components, the Lexer, Parser,
and Listener.</p>
<p>The Lexer takes arbitrary input and returns a stream of tokens. For
input such as <code>1 + 2 * 3</code>, the Lexer would return the following tokens:
<code>NUMBER (1), ADD (+), NUMBER (2), MUL (*), NUMBER (3), EOF</code>.</p>
<p>The Parser uses the Lexer’s output and applies the Grammar’s rules.
Building higher level constructs, such as expressions that can be used
to calculate the result.</p>
<p>The Listener then allows us to make use of the the parsed input. As
mentioned earlier, yacc requires language specific code to be embedded
with the grammar. However, ANTLR separates this concern, allowing the
grammar to be agnostic to the target programming language. It does this
through use of listeners, which effectively allows hooks to be placed
before and after every rule is encountered in the parsed input.</p>
<h2 id="using-the-lexer">Using the Lexer</h2>
<p>Let&rsquo;s move onto an example of using this generated code, starting with
the Lexer.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// example1.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;fmt&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/antlr/antlr4/runtime/Go/antlr&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;./parser&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">&#34;1 + 2 * 3&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Read all tokens</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">for</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lexer</span><span class="p">.</span><span class="nf">NextToken</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenEOF</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="k">break</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%s (%q)\n&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="nx">lexer</span><span class="p">.</span><span class="nx">SymbolicNames</span><span class="p">[</span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()],</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetText</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>To begin with, the generated parser is imported from the local
subdirectory <code>import &quot;./parser&quot;</code>. Next the Lexer is created with some
input:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">&#34;1 + 2 * 3&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>In this example the input is a simple string, <code>&quot;1 + 2 * 3&quot;</code> but there
are other [<code>antlr.InputStream</code>](<a href="https://godoc.org/github.com/antlr/antlr">https://godoc.org/github.com/antlr/antlr</a>
4/runtime/Go/antlr#InputStream)s, for example, the <a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/antlr#FileStream"><code>antlr.FileStream</code></a>
type can read directly from a file. The <code>InputStream</code> is then passed to
a newly created Lexer. Note the name of the Lexer is <code>CalcLexer</code> which
matches the grammar’s name defined in the Calc.g4.</p>
<p>The lexer is then used to consume all the tokens from the input,
printing them one by one. This wouldn’t normally be necessary but we do
this for demonstrative purposes.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="w"> 	</span><span class="k">for</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lexer</span><span class="p">.</span><span class="nf">NextToken</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenEOF</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="k">break</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%s (%q)\n&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="nx">lexer</span><span class="p">.</span><span class="nx">SymbolicNames</span><span class="p">[</span><span class="nx">t</span><span class="p">.</span><span class="nf">GetTokenType</span><span class="p">()],</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nf">GetText</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Each token has two main components, the TokenType, and the Text. The
TokenType is a simple integer representing the type of token, while the
Text is literally the text that made up this token. All the TokenTypes
are defined at the end of calc_lexer.go, with their string names stored
in the SymbolicNames slice:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// calc_lexer.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">const</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerMUL</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerDIV</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerADD</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerSUB</span><span class="w">        </span><span class="p">=</span><span class="w"> </span><span class="mi">4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerNUMBER</span><span class="w">     </span><span class="p">=</span><span class="w"> </span><span class="mi">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">CalcLexerWHITESPACE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">6</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>You may also note, that the Whitespace token is not printed, even though
the input clearly had whitespace. This is because the grammar was
designed to skip (i.e. discard) the whitespace <code>WHITESPACE: [ \r\n\t]+ -&gt; skip;</code>.</p>
<h2 id="using-the-parser">Using the Parser</h2>
<p>The Lexer on its own is not very useful, so the example can be modified
to also use the Parser and Listener:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// example2.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;./parser&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/antlr/antlr4/runtime/Go/antlr&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">calcListener</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">BaseCalcListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">&#34;1 + 2 * 3&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stream</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewCommonTokenStream</span><span class="p">(</span><span class="nx">lexer</span><span class="p">,</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenDefaultChannel</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcParser</span><span class="p">(</span><span class="nx">stream</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Finally parse the expression</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeWalkerDefault</span><span class="p">.</span><span class="nf">Walk</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">calcListener</span><span class="p">{},</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Start</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This is very similar to before, but instead of manually iterating over
the tokens, the lexer is used to create a [<code>CommonTokenStream</code>](https://
godoc.org/github.com/antlr/antlr4/runtime/Go/antlr#CommonTokenStream),
which in turn is used to create a new <code>CalcParser</code>. This <code>CalcParser</code> is
then “walked”, which is ANTLR&rsquo;s event-driven API for receiving the
results of parsing the rules.</p>
<p>Note, the [<code>Walk</code>](<a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/">https://godoc.org/github.com/antlr/antlr4/runtime/Go/</a>
antlr#ParseTreeWalker.Walk) function does not return anything. Some may
have expected a parsed form of the expression to be returned, such as
some kind of <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a>
(abstract syntax tree), but instead the Listener receives event as the
parsing occurs. This is similar in concept to
<a href="https://en.wikipedia.org/wiki/Simple_API_for_XML">SAX</a> style parsers
for XML. Event-based parsing can sometimes be harder to use, but it has
many advantages. For example, the parser can be very memory efficient as
previously parsed rules can be discarded once they are no longer needed.
The parser can also be aborted early if the programmer wishes to.</p>
<p>But so far, this example doesn’t do anything beyond ensuring the input
can be parsed without error. To add logic, we must extend the
<code>calcListener</code> type. The <code>calcListener</code> has an embedded
<code>BaseCalcListener</code>, which is a helper type, that provides empty methods
for all those defined in in the <code>CalcListener</code> interface. That interface
looks like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// parser/calc_listener.go</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// CalcListener is a complete listener for a parse tree produced by CalcParser.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">CalcListener</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterStart is called when entering the start production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterStart</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">StartContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterNumber is called when entering the Number production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterNumber</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">NumberContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterMulDiv is called when entering the MulDiv production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterMulDiv</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">MulDivContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// EnterAddSub is called when entering the AddSub production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">EnterAddSub</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">AddSubContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitStart is called when exiting the start production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitStart</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">StartContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitNumber is called when exiting the Number production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitNumber</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">NumberContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitMulDiv is called when exiting the MulDiv production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitMulDiv</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">MulDivContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// ExitAddSub is called when exiting the AddSub production.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nf">ExitAddSub</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">AddSubContext</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>There is an Enter and Exit function for each rule found in the grammar.
As the input is walked, the Parser calls the appropriate function on the
listener, to indicate when the rule starts and finishes being evaluated.</p>
<h2 id="adding-the-logic">Adding the logic</h2>
<p>A simple calculator can be constructed from this event driven parser by
using a stack of values. Every time a number is found, it is added to a
stack. Everytime an expression (add/multiple/etc) is found, the last two
numbers on the stack are popped, and the appropriate operation is
carried out. The result is then placed back on the stack.</p>
<p>Take the expression <code>1 + 2 * 3</code>,  the result could be either <code>(1 + 2) * 3 = 9</code>, or <code>1 + (2 * 3) = 7</code>. Those that recall the <a href="https://en.wikipedia.org/wiki/Order_of_operations">order of
operations</a>, will
know that multiplication should always be carried out before addition,
thus the correct result is 7. However, without the parentheses there
could be some ambiguity on how this should be parsed. Luckily the
ambiguity is resolved by the grammar. The precedence of multiplication
over addition was subtly implied within Calc.g4, by placing the <code>MulDiv</code>
expressed before the <code>AddSub</code> expression.</p>
<div class="text-center">
	<img src="parse-tree.svg">
</div>
<p>The code for a listener that implements this stack of value
implementation is relatively simple:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">calcListener</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">BaseCalcListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stack</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">push</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">pop</span><span class="p">()</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="s">&#34;stack is empty unable to pop&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Get the last value from the stack.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Remove the last element from the stack.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">ExitMulDiv</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">MulDivContext</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">right</span><span class="p">,</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">(),</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserMUL</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserDIV</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">default</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;unexpected op: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetText</span><span class="p">()))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">ExitAddSub</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">AddSubContext</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">right</span><span class="p">,</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">(),</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetTokenType</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserADD</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">case</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">CalcParserSUB</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">left</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">right</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">default</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;unexpected op: %s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">GetOp</span><span class="p">().</span><span class="nf">GetText</span><span class="p">()))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="o">*</span><span class="nx">calcListener</span><span class="p">)</span><span class="w"> </span><span class="nf">ExitNumber</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">parser</span><span class="p">.</span><span class="nx">NumberContext</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nf">Atoi</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nf">GetText</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nf">Error</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">l</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Finally this listener would be used like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// calc takes a string expression and returns the evaluated result.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">calc</span><span class="p">(</span><span class="nx">input</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stream</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewCommonTokenStream</span><span class="p">(</span><span class="nx">lexer</span><span class="p">,</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenDefaultChannel</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the Parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nf">NewCalcParser</span><span class="p">(</span><span class="nx">stream</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Finally parse the expression (by walking the tree)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">var</span><span class="w"> </span><span class="nx">listener</span><span class="w"> </span><span class="nx">calcListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeWalkerDefault</span><span class="p">.</span><span class="nf">Walk</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">listener</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Start</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="nx">listener</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Following the algorithm, the parsing of <code>1 + 2 * 3</code> would work like so.</p>
<ol>
<li>The numbers 2 and 3 would be visited first (and placed on the stack),</li>
<li>Then the MulDiv expression would be visited, taking the values 2 and
3, multiplying them, and placing the result, 6, back on the stack.</li>
<li>Then the number 1 would visited and pushed onto the stack.</li>
<li>Finally AddSub would be visited, popping the 1 and the 6 from the
stack, placing the result 7 back.</li>
</ol>
<p>The order the rules are visited is completely driven by the Parser, and
thus the grammar.</p>
<h2 id="more-grammars">More grammars</h2>
<p>Learning how to write a grammar may be daunting, but there are many
resources for help. The author of ANTLR, <a href="http://parrt.cs.usfca.edu/">Terence
Parr</a>, has <a href="https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference">published a
book</a>,
with some of the content freely available on <a href="http://antlr.org">antlr.org</a>.</p>
<p>If you don’t want to write your own grammar, there are many <a href="https://github.com/antlr/grammars-v4">pre-written
grammars available</a>. Including
grammars for CSS, HTML, SQL, etc, as well many popular programming
languages. To make it easier, I have <a href="https://github.com/bramp/antlr4-grammars">generated
parsers</a> for all those
available grammars, making them as easy to use just by importing.</p>
<p>A quick example of using one of the pre-generated grammars:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;bramp.net/antlr4/json&#34;</span><span class="w"> </span><span class="c1">// The parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="s">&#34;github.com/antlr/antlr4/runtime/Go/antlr&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">exampleListener</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// https://godoc.org/bramp.net/antlr4/json#BaseJSONListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="o">*</span><span class="nx">json</span><span class="p">.</span><span class="nx">BaseJSONListener</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Setup the input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">is</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewInputStream</span><span class="p">(</span><span class="s">`
</span></span></span><span class="line"><span class="cl"><span class="s">		{
</span></span></span><span class="line"><span class="cl"><span class="s">			&#34;example&#34;: &#34;json&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s">			&#34;with&#34;: [&#34;an&#34;, &#34;array&#34;]
</span></span></span><span class="line"><span class="cl"><span class="s">		}`</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the JSON Lexer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">lexer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nf">NewJSONLexer</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">stream</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nf">NewCommonTokenStream</span><span class="p">(</span><span class="nx">lexer</span><span class="p">,</span><span class="w"> </span><span class="nx">antlr</span><span class="p">.</span><span class="nx">TokenDefaultChannel</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Create the JSON Parser</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nf">NewJSONParser</span><span class="p">(</span><span class="nx">stream</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Finally walk the tree</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="nx">antlr</span><span class="p">.</span><span class="nx">ParseTreeWalkerDefault</span><span class="p">.</span><span class="nf">Walk</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">exampleListener</span><span class="p">{},</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nf">Json</span><span class="p">())</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>Hopefully this article has given you a taste of how to use ANTLR with Go.
The examples for this article are <a href="https://github.com/bramp/goadvent-antlr">found here</a>,
and the <a href="https://godoc.org/github.com/antlr/antlr4/runtime/Go/antlr">godoc for the ANTLR library is here</a>
which explains the various InputStream, Lexer, Parser, etc interfaces.</p>
</description>
    </item>
    
    <item>
      <title>Vanity Go Import Paths</title>
      <link>https://blog.bramp.net/post/2017/10/02/vanity-go-import-paths/</link>
      <pubDate>Mon, 02 Oct 2017 07:48:23 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/10/02/vanity-go-import-paths/</guid>
      <description><p>When using third-party packages in Go, they are imported by a path that represents
how to download that package from the Internet. For example, to use the
popular structured logging library, <a href="https://github.com/sirupsen/logrus">Logrus</a>, it would imported at the top of the Go program like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="s">&#34;github.com/sirupsen/logrus&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>When <code>go get</code> is then executed, it fetches the Logrus source code from GitHub
and places the code in the <code>$GOPATH/src</code> directory. Take a look for yourself:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ tree <span class="nv">$GOPATH</span>/src
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">├── github.com
</span></span><span class="line"><span class="cl">│   ├── Sirupsen
</span></span><span class="line"><span class="cl">│   │   └── logrus
</span></span><span class="line"><span class="cl">...
</span></span></code></pre></div><p>An astute reader may wonder, how exactly does <code>go get</code> know that <code>github.com/sirupsen/logrus</code> is a Git repository, and that it can be fetched via the git protocol from that URL. The <code>go get</code> binary could have some smarts in it, that knows about GitHub, and does the right thing. But that seems inflexible, and problematic if new sites want to be supported. Instead the Go developers built a layer of indirection that allows the <code>go get</code> tool to discover the correct source repo.</p>
<p>As outlined in the <a href="https://golang.org/cmd/go/#hdr-Remote_import_paths">Remote Import Paths</a> docs,  the <code>go get</code> binary will make a normal HTTP request to <code>https://github.com/sirupsen/logrus</code> (falling back to http if needed) and look at the returned HTML for a <code>&lt;meta name=&quot;go-import&quot;</code> tag. This meta tag, can then redirect the <code>go get</code> binary to the correct source code repository for the package.</p>
<p>This meta tag can been seen with <code>curl</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ curl https://github.com/sirupsen/logrus <span class="p">|</span> grep meta <span class="p">|</span> grep go-import
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">&#34;go-import&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="na">content</span><span class="o">=</span><span class="s">&#34;github.com/sirupsen/logrus git https://github.com/sirupsen/logrus.git&#34;</span><span class="p">&gt;</span>
</span></span></code></pre></div><p>That tag says, the package rooted at <code>github.com/sirupsen/logrus</code> can be fetched with git, at the
URL <code>https://github.com/sirupsen/logrus.git</code>. The meta tag can express other source control systems, e.g Mercurial, Bazaar, Subversion.</p>
<p>GitHub is a very convenient place to host source code, but the GitHub URL is generic. Instead it is possible to use the <code>&lt;meta&gt;</code> tag to create vanity domains to host projects. For example, the package hosted at <a href="https://github.com/bramp/goredirects">github.com/bramp/goredirects</a> could instead be imported as <code>bramp.net/goredirects</code>. All that is needed is a static HTML page at <code>bramp.net/goredirects</code>, containing the following <code>&lt;meta&gt;</code> tag pointing at GitHub.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><span class="line"><span class="cl"><span class="p">&lt;</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">go-import</span>
</span></span><span class="line"><span class="cl">  <span class="na">content</span><span class="o">=</span><span class="s">&#34;bramp.net/goredirects git https://github.com/bramp/goredirects.git&#34;</span><span class="p">&gt;</span>
</span></span></code></pre></div><p>Incase a user attempted to visit that page directly with their web browser, it is worthwhile
placing more information about the project on the page, or simply making the page redirect.</p>
<p>To help make these redirect pages, I wrote a simple go tool, <a href="https://github.com/bramp/goredirects"><code>goredirects</code></a>, that inspects all local repositories under a vanity domain directory in the local <code>$GOPATH/src/</code> and outputs static HTML pages that can be hosted on that domain.</p>
<p>For example, create your new project on GitHub, but check out the project under <code>$GOPATH/src/example.com/project</code>. Then run the tool:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ go install bramp.net/goredirects
</span></span><span class="line"><span class="cl">$ goredirects example.com outputdir
</span></span></code></pre></div><p>The directory <code>outputdir</code> will now contain multiple directories and html files, one for each project under <code>$GOPATH/src/example.com</code>. These HTML files contain the appropriate goimports meta tag to redirect the download of source code from the vanity name, to GitHub. Just upload these files to your website, voilà you are done. Examples of these vanity redirect files can be found on bramp.net, e.g <a href="https://bramp.net/goredirects/index.html">bramp.net/goredirects/index.html</a>. This tool even works for packages with sub-packages under the main root.</p>
<p>Finally, it is possible to ensure that if someone finds your project via GitHub, that <code>go get</code> will always place it under your vanity domain. This be can be achieved with an <a href="https://golang.org/cmd/go/#hdr-Import_path_checking">import comment</a>. Within the source code, ensure that at least one of the files in your page has a comment like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span><span class="w"> </span><span class="nx">project</span><span class="w"> </span><span class="c1">// import &#34;example.com/project&#34;</span><span class="w">
</span></span></span></code></pre></div><p>Then <code>go get</code> will enforce the correct/vanity URL to use, instead of the true location.</p>
<p>More helpful links on the topic:</p>
<ul>
<li><a href="https://golang.org/cmd/go/#hdr-Import_path_checking">golang.org/cmd/go/#hdr-Import_path_checking</a></li>
<li><a href="https://golang.org/cmd/go/#hdr-Remote_import_paths">golang.org/cmd/go/#hdr-Remote_import_paths</a></li>
<li><a href="https://golang.org/doc/go1.4#canonicalimports">golang.org/doc/go1.4#canonicalimports</a></li>
<li><a href="https://godoc.org/golang.org/x/tools/cmd/fiximports">godoc.org/golang.org/x/tools/cmd/fiximports</a></li>
<li><a href="https://texlution.com/post/golang-canonical-import-paths/">texlution.com/post/golang-canonical-import-paths/</a></li>
</ul>
</description>
    </item>
    
    <item>
      <title>Building a better “What&#39;s My IP?” site</title>
      <link>https://blog.bramp.net/post/2017/02/20/building-a-better-whats-my-ip-site/</link>
      <pubDate>Mon, 20 Feb 2017 12:50:31 -0800</pubDate>
      
      <guid>https://blog.bramp.net/post/2017/02/20/building-a-better-whats-my-ip-site/</guid>
      <description><p>Occasionally I’m curious to know what network my device is using, if it has a IPv6 address, and who owns the address space. For example, when in a coffee shop I’m curious to know their ISP, or when roaming internationally I’m always curious to understand which mobile operator’s IP address gets assigned to device.</p>
<p>Most &ldquo;<a href="https://www.google.com/search?q=What%E2%80%99s+my+IP+address">What’s my IP address</a>&rdquo; sites, will either only show you one of your IPv4, or IPv6. It won’t do a DNS lookup, and they rarely do a WHOIS lookup.  Doing all these things, shouldn’t be too hard, so I figured in a weekend I could hack together a site to do this.</p>
<p>This blog post explains the creation of <a href="http://ip.bramp.net">ip.bramp.net</a>.</p>
<div class="text-center">
  <img src="screenshot.png" alt="Screenshot of ip.bramp.net in action"></img>
</div>
<h2 id="how-to-get-both-ipv4-and-ipv6-address">How to get both IPv4 and IPv6 address?</h2>
<p>When navigating to a website your browser makes a connection, normally over one of IPv4, or IPv6. Which one is based on the DNS records available for the website’s domain, and the preference of your OS and browser. Thus your web server sees a single incoming connection, with a single remote address. This address is typically stored in a variable named REMOTE_ADDR. Most &ldquo;What’s my IP&rdquo; sites then display this variable back to the user as their IP address. However, as I’d like to see both IPv4 and IPv6 addresses, I need to somehow force the browser to make two requests, one over each.</p>
<p>There is no API to tell a browser to use IPv6 over IPv4, however, I use a trick with two domain names. Namely, I have ip4.bramp.net, and ip6.bramp.net. Both resolve to the same web server, but the former only has <a href="https://tools.ietf.org/html/rfc1035">DNS A records</a>, and the latter only <a href="https://tools.ietf.org/html/rfc3596">DNS AAAA records</a>. For example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ dig ip4.bramp.net
</span></span><span class="line"><span class="cl">ip4.bramp.net.		300	IN	A	216.239.32.21
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ dig AAAA ip6.bramp.net
</span></span><span class="line"><span class="cl">ip6.bramp.net.		291	IN	AAAA	2001:4860:4802:32::15
</span></span></code></pre></div><p>This forces the connection to be over either IPv4, or IPv6. If a browser doesn’t support IPv6, then the connection is never made, and an error is returned. Interesting side note, some browsers uses a technique called <a href="https://en.wikipedia.org/wiki/Happy_Eyeballs">Happy Eyeballs</a>, which tries to connect over both concurrently, but abandons the slower or worse behaving of the two connections.</p>
<h2 id="how-do-you-make-two-requests-from-one-page">How do you make two requests from one page?</h2>
<p>To force the site to make requests to both of these domains, I issue two <a href="https://en.wikipedia.org/wiki/Ajax_(programming)">AJAX</a> queries. The typical flow looks like:</p>
<div class="text-center">
  <object data="diagram.svg" type="image/svg+xml" height="364" width="583" alt="diagram of AJAX calls">
    <img src="diagram.png" />
  </object>
</div>
<!--
```
user->ip.bramp.net: GET /
ip.bramp.net->user: <html...>
user->ip4.bramp.net: GET ip4.bramp.net/json
ip4.bramp.net->user: {address: 1.2.3.4}
user->ip6.bramp.net: GET ip6.bramp.net/json
ip6.bramp.net->user: {address: 2001:db8::1}
```
-->
<p>These AJAX queries return a simple JSON object, containing information about the requesting user. In my application an example response may look like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;RemoteAddrFamily&#34;</span><span class="p">:</span> <span class="s2">&#34;IPv6&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;RemoteAddr&#34;</span><span class="p">:</span>       <span class="s2">&#34;2601:646:c200:b466:b446:ff32:b227:a53c&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>This response can then be used to update the page, to display the appropriate address.</p>
<p>An experienced reader may be aware of some security issues with making AJAX request to a different domain. In particular, there are subtle ways in which a malicious site could abuse your AJAX endpoints. This is easily fixed by using cross-origin resource sharing (<a href="https://en.wikipedia.org/wiki/Cross-origin_resource_sharing">CORS</a>) headers, in particular ip4.bramp.net, and ip6.bramp.net return the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ curl -v ip4.bramp.net/json
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&lt; HTTP/1.1 <span class="m">200</span> OK
</span></span><span class="line"><span class="cl">&lt; Content-Type: application/json
</span></span><span class="line"><span class="cl">&lt; Access-Control-Allow-Origin: http://ip.bramp.net
</span></span></code></pre></div><p>The last header, explicitly allows requests from ip.bramp.net only, and thus forbids requests from other sites. Without this header, the AJAX request would be issued, but then rejected by the browser.</p>
<h2 id="what-about-the-reverse-dns-and-whois">What about the reverse DNS and WHOIS?</h2>
<p>I noted I wanted to display both the reverse DNS, and WHOIS records. This is something the browser doesn’t support, but the server side could. Thus as part of processing the /json AJAX request, the application makes various additional requests to remote DNS and WHOIS servers.</p>
<p>To reverse lookup a IP address, you need to issue a <a href="https://tools.ietf.org/html/rfc1035">PTR DNS request</a>. This is a special DNS request which requires the IP address to be formatted as a in-addr.arpa or ip6.arpa name. For example, the IP address 1.2.3.4, would become 4.3.2.1.in-addr.arpa. Then when a request is sent for that in-addr.arpa. name, the reverse DNS is returned.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ dig PTR 4.3.2.1.in-addr.arpa.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">4.3.2.1.in-addr.arpa.	74312	IN	PTR	c-1-2-3-4.example.com.
</span></span></code></pre></div><p>The in-addr.arpa transformation, and lookup is commonly provided by <a href="https://linux.die.net/man/3/getaddrinfo">getaddrinfo(3)</a> function, which makes this easy to do.</p>
<p>The WHOIS lookup is a little bit more complex. Each domain is represented by a different WHOIS server, that can be determined by the ccTLD or TLD of the domain. However, with IP addresses, you must identify the Regional Internet Registry (<a href="https://en.wikipedia.org/wiki/Regional_Internet_registry">RIR</a>) that owns the IP space. Sadly there is not a trivial mapping, so instead I issue a WHOIS query to the Internet Assigned Numbers Authority (<a href="https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority">IANA</a>), who replies with the RIR which owns the IP address. From there, I can query the correct registry directly, typically one of <a href="https://en.wikipedia.org/wiki/AFRINIC">AFRINIC</a>, <a href="https://en.wikipedia.org/wiki/American_Registry_for_Internet_Numbers">ARIN</a>, <a href="https://en.wikipedia.org/wiki/Asia-Pacific_Network_Information_Centre">APNIC</a>, <a href="https://en.wikipedia.org/wiki/Latin_America_and_Caribbean_Network_Information_Centre">LACNIC</a>, or <a href="https://en.wikipedia.org/wiki/R%C3%A9seaux_IP_Europ%C3%A9ens_Network_Coordination_Centre">RIPE NCC</a>. Thus a typical WHOIS request looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ whois -h whois.iana.org 1.2.3.4
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">% IANA WHOIS server
</span></span><span class="line"><span class="cl">% <span class="k">for</span> more information on IANA, visit http://www.iana.org
</span></span><span class="line"><span class="cl">% This query returned <span class="m">1</span> object
</span></span><span class="line"><span class="cl">refer:        whois.apnic.net
</span></span><span class="line"><span class="cl">inetnum:      1.0.0.0 - 1.255.255.255
</span></span><span class="line"><span class="cl">organisation: APNIC
</span></span><span class="line"><span class="cl">status:       ALLOCATED
</span></span><span class="line"><span class="cl">whois:        whois.apnic.net
</span></span><span class="line"><span class="cl">changed:      2010-01
</span></span><span class="line"><span class="cl">source:       IANA
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ whois -h whois.apnic.net 1.2.3.4
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">% <span class="o">[</span>whois.apnic.net<span class="o">]</span>
</span></span><span class="line"><span class="cl">% Whois data copyright terms    http://www.apnic.net/db/dbcopyright.html
</span></span><span class="line"><span class="cl">% Information related to <span class="s1">&#39;1.2.3.0 - 1.2.3.255&#39;</span>
</span></span><span class="line"><span class="cl">inetnum:        1.2.3.0 - 1.2.3.255
</span></span><span class="line"><span class="cl">netname:        Example-prefix
</span></span><span class="line"><span class="cl">descr:          APNIC Example Project
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">% This query was served by the APNIC Whois Service version 1.69.1-APNICv1r0
</span></span></code></pre></div><p>Both the reverse DNS and WHOIS response is returned as part of the AJAX JSON response.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;RemoteAddrFamily&#34;</span><span class="p">:</span>  <span class="s2">&#34;IPv4&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;RemoteAddr&#34;</span><span class="p">:</span>        <span class="s2">&#34;1.2.3.4&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;RemoteAddrReverse&#34;</span><span class="p">:</span> <span class="s2">&#34;c-1-2-3-4.example.com.&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;RemoteAddrWhois&#34;</span><span class="p">:</span>   <span class="s2">&#34;
</span></span></span><span class="line"><span class="cl"><span class="s2">    % IANA WHOIS server
</span></span></span><span class="line"><span class="cl"><span class="s2">    % for more information on IANA, visit http://www.iana.org
</span></span></span><span class="line"><span class="cl"><span class="s2">    % This query returned 1 object
</span></span></span><span class="line"><span class="cl"><span class="s2">    refer:        whois.apnic.net
</span></span></span><span class="line"><span class="cl"><span class="s2">    inetnum:      1.0.0.0 - 1.255.255.255
</span></span></span><span class="line"><span class="cl"><span class="s2">    organisation: APNIC
</span></span></span><span class="line"><span class="cl"><span class="s2">    status:       ALLOCATED
</span></span></span><span class="line"><span class="cl"><span class="s2">    ...
</span></span></span><span class="line"><span class="cl"><span class="s2">  &#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><h2 id="what-about-proxies">What about proxies?</h2>
<p>Many users are behind proxies, which connects to the webserver on the user’s behalf. Thus the REMOTE_ADDR is the address of the proxy, not the actual user.  Some proxies have a workaround for this, by placing the user’s real IP address in the <a href="https://en.wikipedia.org/wiki/X-Forwarded-For">X-Forwarded-For</a> (XFF), or newer <a href="https://tools.ietf.org/html/rfc7239">Forwarded</a> HTTP header. However, these headers are easily set by the user, so can not be trusted. Thus for the moment I ignore these headers, and will instead display the proxy’s IP address. It is conceivable to create a whitelist of proxies, that I would trust the XFF header, but for now I didn’t want that headache. Especially since the server side issues requests to external hosts, if I trusted the XFF header, an abusive user could use my site as a proxy, or even use my site as a relay to <a href="https://en.wikipedia.org/wiki/Denial-of-service_attack">denial of service</a> these remote servers.</p>
<h2 id="tying-this-all-together">Tying this all together</h2>
<p>Server side I use <a href="https://cloud.google.com/appengine/">App Engine</a>, and <a href="https://golang.org/">Go</a>. Why? Because I wanted to play with App Engine, and I’m a fan of Go right now. On the client side I use <a href="http://getbootstrap.com">Bootstrap</a> to make the page look nice, and <a href="https://angularjs.org">AngularJS</a>. AngularJS because I’m familiar with it, and because it is really easy to issue an AJAX requests and transform the result into a web page.</p>
<p>I like App Engine, because of the <a href="https://en.wikipedia.org/wiki/Platform_as_a_service">PaaS</a> model. It keeps my costs down, and I don’t need to setup a virtual machine, or create docker images. Instead I just write a single binary and upload it. However, App Engine does place some restrictions on what I can do, in particular limiting outbound connections to ones made via its own library. Thus I had to jump through a few hoops to make the reverse DNS and WHOIS requests. Instead of using <a href="https://linux.die.net/man/3/getaddrinfo">getaddrinfo(3)</a>, I had to issue DNS requests myself using App Engine’s socket library and my own UDP packets on port 53. Luckily the Go DNS library, <a href="https://github.com/miekg/dns">miekg/dns</a>, makes this relatively easy.  Similarly I had to implement the WHOIS lookup by hand, but again aided by a library, this time <a href="https://github.com/domainr/whois">domainr/whois</a>.</p>
<p>In conclusion, though the use of multiple domain names, some AJAX queries, and server side support. I was able to make &ldquo;(a better) What&rsquo;s My IP Address?&rdquo; site in under a weekend.</p>
<p>Check out the <a href="https://github.com/bramp/myip">full source on github</a>, or view the site at <a href="http://ip.bramp.net/">ip.bramp.net</a></p>
</description>
    </item>
    
    <item>
      <title>Peano Curves</title>
      <link>https://blog.bramp.net/post/2016/08/08/peano-curves/</link>
      <pubDate>Mon, 08 Aug 2016 21:35:35 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2016/08/08/peano-curves/</guid>
      <description><p>My latest addition to the <a href="https://github.com/bramp/hilbert">hilbert go library</a>, <a href="https://en.wikipedia.org/wiki/Peano_curve">Peano Curves</a>. The original space-filing curve, similar to the Hilbert curve, but a little more complex.</p>
<figure><img src="/post/2016/08/08/peano-curves/peano_animation.gif"><figcaption>
      <h4>Animation of Peano curve with N in the range 1..5</h4>
    </figcaption>
</figure>

</description>
    </item>
    
    <item>
      <title>Introducing Hilbert. A Go library to map values onto a Hilbert curve.</title>
      <link>https://blog.bramp.net/post/2015/08/07/introducing-hilbert/</link>
      <pubDate>Fri, 07 Aug 2015 20:53:41 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2015/08/07/introducing-hilbert/</guid>
      <description><p>A <a href="https://en.wikipedia.org/wiki/Hilbert_curve">Hilbert curve</a> is a space-filling (snakey) curve through a 2D space:</p>
<figure><img src="/post/2015/08/07/introducing-hilbert/hilbert.png"><figcaption>
      <h4>Image of 8 by 8 Hilbert curve</h4>
    </figcaption>
</figure>

<p>This can be very useful for mapping a 1D value, into a 2D space. For example, it is commonly used to <a href="https://xkcd.com/195/">map IP addresses into a 2D space</a>.</p>
<p>I recently created a library for <a href="https://golang.org/">Go</a> that can map to and from a curve. The project is <a href="http://github.com/bramp/hilbert">hosted on Github</a>, and can be used like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">import</span><span class="w"> </span><span class="s">&#34;github.com/bramp/hilbert&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// Create a Hilbert curve for mapping to and from a 16 by 16 space.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hilbert</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// Now map one dimension numbers in the range [0, N*N-1], to an x,y</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// coordinate on the curve where both x and y are in the range [0, N-1].</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">x</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">Map</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// Also map back from (x,y) to t.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">MapInverse</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="p">)</span><span class="w">
</span></span></span></code></pre></div><p>The project contains some demos, such as this cool animations:</p>
<figure><img src="/post/2015/08/07/introducing-hilbert/hilbert_animation.gif"><figcaption>
      <h4>Hilbert curve animation</h4>
    </figcaption>
</figure>

</description>
    </item>
    
  </channel>
</rss>
