<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Unsafe on bramp.net</title>
    <link>https://blog.bramp.net/</link>
    <description>Recent content in Unsafe on bramp.net</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-GB</language>
    <lastBuildDate>Wed, 09 Sep 2015 20:29:04 -0700</lastBuildDate>
    <atom:link href="https://blog.bramp.net/tags/unsafe/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Unrolling loops at runtime with Byte Buddy</title>
      <link>https://blog.bramp.net/post/2015/09/09/unrolling-loops-at-runtime-with-byte-buddy/</link>
      <pubDate>Wed, 09 Sep 2015 20:29:04 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2015/09/09/unrolling-loops-at-runtime-with-byte-buddy/</guid>
      <description><p>While creating the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a>, I encountered a problem that I felt I could optimise. The <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> copies objects into off-heap memory, instead of what a normal <code>ArrayList</code> would do, which is to store references to the object in an array on the heap. For example an <code>UnsafeArrayList&lt;FourLong&gt;</code> holds instances of <a href="https://github.com/bramp/unsafe/blob/master/unsafe-tests/src/main/java/net/bramp/unsafe/examples/FourLongs.java">FourLongs</a>, whose fields consume a total of 32 bytes (4×8 bytes) of memory. By design, when <code>set()</code> or <code>get()</code> are called, the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> copies these 32 bytes into or out of a contiguous segment of memory.</p>
<p>To achieve the copying, <code>sun.misc.Unsafe</code>’s <a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html#putLong(long,+long)">putLong()</a> is repeatedly called, moving 8 bytes at a time. For example, this simple loop will copy a long’s worth of memory each iteration, from src, into dest:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">8</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">getUnsafe</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">copy</span><span class="p">(</span><span class="n">Object</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">src</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kt">long</span><span class="w"> </span><span class="n">destOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kt">long</span><span class="w"> </span><span class="n">destEnd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">dest</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">destOffset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">dstEnd</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">dstOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Note, we use <code>putLong</code>, not because the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> is storing objects made up of longs, but because this is the <code>Unsafe</code> method that can copy the most in one go. This <code>putLong</code> method is thus being used as the building block to build a more complex looping copy method. Note, this works great for memory which is aligned on a 8 byte boundary, and the total copy is a multiple of 8 bytes. For the sake of this article, we make the assumption that this is always true.</p>
<p>In the <code>FourLong</code>&rsquo;s case, the copy method would iterates four times. This is predictable, and occurs every time we <code>get()</code> on a <code>UnsafeArrayList&lt;FourLong&gt;</code> instance. Since this copy loop will be executed every time <code>get()</code> is called, it is worth seeing if we can make it execute faster. A common optimisation is for the developer to manually <a href="https://en.wikipedia.org/wiki/Loop_unrolling">unroll the loop</a>, avoiding the loop counter, and producing potentially quicker code<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. In this case, manually unrolling the code is not possible because the parameterised type could be any size. For example, a <code>UnsafeArrayList&lt;Point&gt;</code> would only need to copy 8 bytes (two 4 byte ints). You would hope that the JIT would notice the loop always iterates the same number of times (for a particular list), and be able to remove the loop. Sadly, it does not seem to do this, perhaps because the JVM does not know what side effects <code>unsafe.{get,put}Long</code> have. To measure the cost of the looping we compare the previous code to this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">8</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">getUnsafe</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">copy</span><span class="p">(</span><span class="n">Object</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">src</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">assert</span><span class="p">(</span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">dest</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">4</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kt">long</span><span class="w"> </span><span class="n">destOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>When benchmarked, this manually unrolled code runs 2 times faster! This got me thinking, since a particular <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> instance is always going to copy the same sized object, again and again and again, it could perhaps generate bytecode during creation, that unrolled the loop.</p>
<h2 id="enter-byte-buddy">Enter Byte Buddy</h2>
<p>Thus investigation into <a href="http://bytebuddy.net/">Byte Buddy</a> began, a library designed for generating bytecode at runtime. The rest of this article explains how to use Byte Buddy for this goal.</p>
<p>To start, I used Intellij IDEA’s “<a href="https://plugins.jetbrains.com/plugin/5918">Show Bytecode</a>” option, to inspect the code generated by my hand unrolled code.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="c1">; Initialisation</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; long destOffset = 0;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LCONST_0</span>  <span class="c1">; Load the long zero</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LSTORE</span> <span class="mi">4</span>  <span class="c1">; Store it in “destOffset”</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">; Copy</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; unsafe.putLong(dest, destOffset, unsafe.getLong(src));</span>
</span></span><span class="line"><span class="cl">  <span class="nf">ALOAD</span> <span class="mi">0</span>  <span class="c1">; Load “this”</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; The the “unsafe” member from this.</span>
</span></span><span class="line"><span class="cl">  <span class="nf">GETFIELD</span> <span class="nv">net</span><span class="o">/</span><span class="nv">bramp</span><span class="o">/</span><span class="nv">unsafe</span><span class="o">/</span><span class="nv">Test.unsafe</span> <span class="p">:</span> <span class="nv">Lsun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe</span><span class="c1">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="nf">ALOAD</span> <span class="mi">1</span>  <span class="c1">; Load dest</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">4</span>  <span class="c1">; Load dstOffset</span>
</span></span><span class="line"><span class="cl">  <span class="nf">ALOAD</span> <span class="mi">0</span>  <span class="c1">; Load this</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; The the “unsafe” member from this.</span>
</span></span><span class="line"><span class="cl">  <span class="nf">GETFIELD</span> <span class="nv">net</span><span class="o">/</span><span class="nv">bramp</span><span class="o">/</span><span class="nv">unsafe</span><span class="o">/</span><span class="nv">Test.unsafe</span> <span class="p">:</span> <span class="nv">Lsun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe</span><span class="c1">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">2</span>  <span class="c1">; Load src</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; unsafe.getLong(src), storing result on stack.</span>
</span></span><span class="line"><span class="cl">  <span class="nf">INVOKEVIRTUAL</span> <span class="nv">sun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe.getLong</span> <span class="p">(</span><span class="nv">J</span><span class="p">)</span><span class="nv">J</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; unsafe.putLong(dest, dstOffset, {stack result})</span>
</span></span><span class="line"><span class="cl">  <span class="nf">INVOKEVIRTUAL</span> <span class="nv">sun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe.putLong</span> <span class="p">(</span><span class="nv">Ljava</span><span class="o">/</span><span class="nv">lang</span><span class="o">/</span><span class="nv">Object</span><span class="c1">;JJ)V</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">;; Increment</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; dstOffset += 8;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">4</span>   <span class="c1">; Load dstOffset</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LDC</span> <span class="mi">8</span>     <span class="c1">; Load 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LADD</span>      <span class="c1">; Add dstOffset and 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LSTORE</span> <span class="mi">4</span>  <span class="c1">; Store result to dstOffset</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="c1">; src += 8;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">2</span>   <span class="c1">; Load src</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LDC</span> <span class="mi">8</span>     <span class="c1">; Load 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LADD</span>      <span class="c1">; Add src and 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LSTORE</span> <span class="mi">2</span>  <span class="c1">; Store result to src</span>
</span></span></code></pre></div><p>After reading a <a href="http://download.forge.objectweb.org/asm/asm4-guide.pdf">primer to bytecode</a>, this generated bytecode looked quite simple. It can be broken up into three steps, initialisation, copy, and increment. At runtime, Byte Buddy can be used to generate bytecode that is an unrolled equivalent, such that there is 1 initialisation step, N copy steps, and N-1 increment steps, where N is based on the size of the object the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> plans to copy.</p>
<p>Reading through the Byte Buddy API it seems the best way to achieve this is to create an abstract class, which will form the base of a generated class. Then at runtime create an instantiation of this abstract class, specialised with the unrolled copy bytecode.</p>
<p>For example, the base class would look like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kd">abstract</span><span class="w"> </span><span class="kd">class</span> <span class="nc">UnsafeCopier</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">protected</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">public</span><span class="w"> </span><span class="nf">UnsafeCopier</span><span class="p">(</span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">this</span><span class="p">.</span><span class="na">unsafe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">checkNotNull</span><span class="p">(</span><span class="n">unsafe</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">abstract</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">copy</span><span class="p">(</span><span class="n">Object</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">src</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Leaving us to implement the <code>copy(…)</code> method optimally for the size of object being copied.</p>
<p>Using the <a href="https://en.wikipedia.org/wiki/Builder_pattern">Builder pattern</a> I created the <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnrolledUnsafeCopierBuilder.html"><code>UnrolledUnsafeCopierBuilder</code></a> class. The <code>build()</code> method will calculate the size of the class being copied, then using Byte Buddy generate the copy implementation, and returns a specialised instance UnsafeCopier.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="n">UnsafeCopier</span><span class="w"> </span><span class="nf">build</span><span class="p">(</span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">clazz</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">Class</span><span class="o">&lt;?&gt;</span><span class="w"> </span><span class="n">dynamicType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ByteBuddy</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">subclass</span><span class="p">(</span><span class="n">UnsafeCopier</span><span class="p">.</span><span class="na">class</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">method</span><span class="p">(</span><span class="n">named</span><span class="p">(</span><span class="s">&#34;copy&#34;</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">intercept</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">CopierImplementation</span><span class="p">(</span><span class="n">length</span><span class="p">)).</span><span class="na">make</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">load</span><span class="p">(</span><span class="n">getClass</span><span class="p">().</span><span class="na">getClassLoader</span><span class="p">(),</span><span class="w"> </span><span class="n">ClassLoadingStrategy</span><span class="p">.</span><span class="na">Default</span><span class="p">.</span><span class="na">WRAPPER</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">getLoaded</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">UnsafeCopier</span><span class="p">)</span><span class="w"> </span><span class="n">dynamicType</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">getDeclaredConstructor</span><span class="p">(</span><span class="n">Unsafe</span><span class="p">.</span><span class="na">class</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">newInstance</span><span class="p">(</span><span class="n">unsafe</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This begins by calculating the size of the class. Then using a <a href="http://bytebuddy.net/javadoc/0.7-rc1/index.html?net/bytebuddy/ByteBuddy.html">ByteBuddy</a> instance, creates a new dynamicType, which extends <code>UnsafeCopier</code>. This subclass then obtains its copy method with code generated by <code>CopierImplementation(length)</code>. Finally, this new dynamicType is used to create an instance of the copier, which is now specialised for copying instances of clazz.</p>
<p>The real meat of the code is in <code>CopierImplementation</code>, which can be explained in pieces:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">CopierImplementation</span><span class="w"> </span><span class="kd">implements</span><span class="w"> </span><span class="n">ByteCodeAppender</span><span class="p">,</span><span class="w"> </span><span class="n">Implementation</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">8</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">public</span><span class="w"> </span><span class="nf">CopierImplementation</span><span class="p">(</span><span class="kt">long</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">this</span><span class="p">.</span><span class="na">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">private</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="nf">buildStack</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">setupStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">copyStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">incrementStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">iterations</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">length</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="o">[]</span><span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="o">[</span><span class="n">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">iterations</span><span class="o">]</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">stack</span><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">setupStack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">iterations</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="n">stack</span><span class="o">[</span><span class="n">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">1</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copyStack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="n">stack</span><span class="o">[</span><span class="n">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">2</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">incrementStack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="c1">// Override the last incrementStack with a &#34;return&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">stack</span><span class="o">[</span><span class="n">stack</span><span class="p">.</span><span class="na">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">1</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MethodReturn</span><span class="p">.</span><span class="na">VOID</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Compound</span><span class="p">(</span><span class="n">stack</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Byte Buddy uses <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> objects to define what bytecode to generate. These <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> objects can be built up hierarchically and contain all the bytecode instructions to execute. We define a separate <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> object for each step, and in the <code>buildStack()</code> method combine the steps multiple times into one array. In particular, this stack array contains one initialise step, N copy steps, and N-1 increment steps, with a <code>return</code> instruction on the end.</p>
<p>Recall from the early bytecode listing, that the initialisation was two bytecode operations, a LCONST, and LSTORE. In Byte Buddy, we can thus do the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">setupStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Compound</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">LongConstant</span><span class="p">.</span><span class="na">ZERO</span><span class="p">,</span><span class="w">                       </span><span class="c1">// LCONST_0</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableStore</span><span class="p">.</span><span class="na">LONG</span><span class="p">.</span><span class="na">storeOffset</span><span class="o">[</span><span class="n">4</span><span class="o">]</span><span class="w">  </span><span class="c1">// LSTORE 4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>Byte Buddy provides the primitives for most bytecode instructions, and can be built up in these <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> arrays. However, some instructions are missing, for example LADD (needed by the increment step). But it is simple enough to create one from scratch, as <a href="https://github.com/bramp/unsafe/tree/master/unsafe-unroller/src/main/java/net/bramp/unsafe/bytebuddy">shown  outside of this article</a>.</p>
<p>Next the copy step is defined which is a few more instructions than the increment, but relatively simple:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Field</span><span class="w"> </span><span class="n">unsafeField</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeCopier</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getDeclaredField</span><span class="p">(</span><span class="s">&#34;unsafe&#34;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Method</span><span class="w"> </span><span class="n">getLongMethod</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Unsafe</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getMethod</span><span class="p">(</span><span class="s">&#34;getLong&#34;</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="p">.</span><span class="na">class</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Method</span><span class="w"> </span><span class="n">putLongMethod</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Unsafe</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getMethod</span><span class="p">(</span><span class="s">&#34;putLong&#34;</span><span class="p">,</span><span class="n">Object</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="p">.</span><span class="na">class</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">copyStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Compound</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// unsafe.putLong(dest, destOffset, unsafe.getLong(src));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">REFERENCE</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="c1">// ALOAD 0 this</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">FieldAccess</span><span class="p">.</span><span class="na">forField</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">FieldDescription</span><span class="p">.</span><span class="na">ForLoadedField</span><span class="p">(</span><span class="n">unsafeField</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	                                   </span><span class="p">.</span><span class="na">getter</span><span class="p">(),</span><span class="w"> </span><span class="c1">// GETFIELD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">REFERENCE</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">1</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="c1">// ALOAD 1 dest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">LONG</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">4</span><span class="o">]</span><span class="p">,</span><span class="w">      </span><span class="c1">// LLOAD 4 destOffset</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">REFERENCE</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="c1">// ALOAD 0 this</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">FieldAccess</span><span class="p">.</span><span class="na">forField</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">FieldDescription</span><span class="p">.</span><span class="na">ForLoadedField</span><span class="p">(</span><span class="n">unsafeField</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	                                   </span><span class="p">.</span><span class="na">getter</span><span class="p">(),</span><span class="w"> </span><span class="c1">// GETFIELD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">LONG</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">2</span><span class="o">]</span><span class="p">,</span><span class="w">      </span><span class="c1">// LLOAD 2 src</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodInvocation</span><span class="p">.</span><span class="na">invoke</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">MethodDescription</span><span class="p">.</span><span class="na">ForLoadedMethod</span><span class="p">(</span><span class="n">getLongMethod</span><span class="p">)),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodInvocation</span><span class="p">.</span><span class="na">invoke</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">MethodDescription</span><span class="p">.</span><span class="na">ForLoadedMethod</span><span class="p">(</span><span class="n">putLongMethod</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>Again, the bytecode instructions are created as a sequence of <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a>, replicating the bytecode the java compiler code had generated earlier. This example contains a couple of new <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> classes, in particular the Field and Method Descriptions classes.</p>
<p>The final step is the increment step, which won’t be explained, but for the interested reader <a href="https://github.com/bramp/unsafe/blob/ff8f463bf60661ff63133e8a3beada7fd65c7c45/unsafe-unroller/src/main/java/net/bramp/unsafe/CopierImplementation.java#L86">the source can be found here</a>.</p>
<p>One last piece of information Byte Buddy needs, is the size of the stack needed for the <code>copy()</code> method, including any space local variables may need. The <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> comes in handy here, as it is able to infer some of these details from the byte code it represents. In particular, the following code calculates the stack size:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="n">Size</span><span class="w"> </span><span class="nf">apply</span><span class="p">(</span><span class="n">MethodVisitor</span><span class="w"> </span><span class="n">methodVisitor</span><span class="p">,</span><span class="w"> </span><span class="n">Implementation</span><span class="p">.</span><span class="na">Context</span><span class="w"> </span><span class="n">implementationContext</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">   </span><span class="n">MethodDescription</span><span class="w"> </span><span class="n">instrumentedMethod</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Call buildStack() (from above) to generate the bytecode</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">buildStack</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Calculate the size of this bytecode</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Size</span><span class="w"> </span><span class="n">finalStackSize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">.</span><span class="na">apply</span><span class="p">(</span><span class="n">methodVisitor</span><span class="p">,</span><span class="w"> </span><span class="n">implementationContext</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Now return the size of this bytecode, plus two, which is the size of the local</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// destOffset variable.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Size</span><span class="p">(</span><span class="n">finalStackSize</span><span class="p">.</span><span class="na">getMaximalSize</span><span class="p">(),</span><span class="w"> </span><span class="n">instrumentedMethod</span><span class="p">.</span><span class="na">getStackSize</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">2</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>An important part here, is the <code>+2</code>, which makes room for the <code>long destOffset</code> variable. If that was missing, the generated bytecode would incorrectly write over instructions on the stack, and most likely crash the JVM.</p>
<p>Now at runtime the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a>&rsquo;s constructor can use the <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnrolledUnsafeCopierBuilder.html"><code>UnrolledUnsafeCopierBuilder</code></a> to generate a specialised <code>UnsafeCopier</code> designed for the exact class the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> is storing.</p>
<h2 id="results">Results</h2>
<p>Now we have most of what we need, it is worth benchmarking this code. Using <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a>, we can write three microbenchmarks. One using the original looping code, one using the hand unrolled code, and one using the Byte Buddy unrolled code. The <a href="https://github.com/bramp/unsafe/blob/master/unsafe-benchmark/src/main/java/net/bramp/unsafe/copier/UnrolledCopierBenchmark.java">code for the benchmarks</a> is on GitHub, and follows a similar methodology to that in a <a href="https://blog.bramp.net/post/2015/08/27/unsafe-part-3-benchmarking-a-java-unsafearraylist/">previous article</a>.</p>
<p>The results are as you may expect:</p>















<table class="table">
  <thead>
      <tr>
          <th>Benchmark</th>
          <th>Mode</th>
          <th>Cnt</th>
          <th>Score</th>
          <th>Error</th>
          <th>Units</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Loop</td>
          <td>thrpt</td>
          <td>25</td>
          <td>218.056</td>
          <td>± 11.123</td>
          <td>ops/us</td>
      </tr>
      <tr>
          <td>Hand Unrolled</td>
          <td>thrpt</td>
          <td>25</td>
          <td>430.376</td>
          <td>± 27.448</td>
          <td>ops/us</td>
      </tr>
      <tr>
          <td>Byte Buddy Unrolled</td>
          <td>thrpt</td>
          <td>25</td>
          <td>437.139</td>
          <td>± 22.811</td>
          <td>ops/us</td>
      </tr>
  </tbody>
</table>

<p>The loop code can execute ~218 times per microseconds, whereas both the Byte Buddy, and hand unrolled code had near identical performance, of ~430-437 iterations per microsecond, nearly twice as fast. Of course, not measured here is the startup cost of generating the unrolled code. It is assumed this technique would only be used when the generated code would exist for a long time. Otherwise the setup cost undoes any per execution savings.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In summary, we managed to unroll a loop at runtime by generating on demand bytecode for that specific purpose. This was possible by inspecting machine generated bytecode, and using Byte Buddy to generate equivalent bytecode at runtime, customised specifically with the correct number of unrolled iterations.</p>
<p>This technique may seem completely crazy, and I don’t suggest its used unless you know what you are doing. That includes, actually measuring you have a performance problem which could be fixed with this, and not being able to depend on the JVM’s own JIT to do this optimisation for you.</p>
<p><em>Helpful Links:</em> <a href="https://github.com/bramp/unsafe/">GitHub Home</a> | <a href="https://github.com/bramp/unsafe/tree/master/unsafe-unroller/src/main/java/net/bramp/unsafe">Gitub Code</a> | <a href="https://bramp.github.io/unsafe/">JavaDoc</a></p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Unrolled code is not always faster, as larger code may not fit into CPU instruction cache.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
</description>
    </item>
    
    <item>
      <title>Unsafe Part 3: Benchmarking a java UnsafeArrayList</title>
      <link>https://blog.bramp.net/post/2015/08/27/unsafe-part-3-benchmarking-a-java-unsafearraylist/</link>
      <pubDate>Thu, 27 Aug 2015 20:39:04 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2015/08/27/unsafe-part-3-benchmarking-a-java-unsafearraylist/</guid>
      <description><p>Previously we introduced a <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/">UnsafeArrayList</a>, an ArrayList style collection that instead of storing references to the objects, it would use <a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html">sun.misc.Unsafe</a> and <a href="https://blog.bramp.net/post/2015/08/24/unsafe-part-1-sun.misc.unsafe-helper-classes/">UnsafeHelper</a> to copy the objects into heap allocated memory. This has the unique property of keeping all objects contiguous in memory, and avoids a pointer indirection, at the cost of needing to copy values in and out. This article aims to benchmark this list, and understand its unique characteristics.</p>
<h2 id="methodology">Methodology</h2>
<p>To test the performance of this new style of list, a series of benchmarks were devised. The new <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH benchmark framework</a> was used, and final benchmark code is <a href="https://github.com/bramp/unsafe/tree/master/unsafe-benchmark">available here</a>.</p>
<p>Multiple iterations were run, and unless stated results were calculated with a 99% confidence interval. A couple of warmup iteration were always run and discarded. All tests were run on a Ubuntu Linux 3.19.0-22 desktop, with a 64bit Intel® Core™ i3-2125 CPU @ 3.30GHz, and 16 GiB of 1333 MHz DDR3 RAM. The JVM was OpenJDK (version 1.8.0_45-internal).</p>
<p>For each benchmark new ArrayLists and UnsafeArrayLists were constructed, and populated with newly created objects. The size of the lists were varied, up to a maximum that could be held in memory without disk swapping. Two artificial workloads were created,</p>
<ol>
<li>Reading items from the lists start to finish, and</li>
<li>Processing the elements in a random order.</li>
</ol>
<p>The first was reproduced by simply reading the first field of every element of the list in order, and the second by sorting the list based on the object’s fields (with a simple quicksort).</p>
<p>Three test classes of different sizes were created to be stored within the ArrayLists, one class had two long fields, one had four long fields, and finally one with eight long fields . Named TwoLongs, FourLongs and EightLongs requiring 16, 32, and 64 bytes for the fields respectively. Each iteration these classes were created with random values in the fields.</p>
<h2 id="the-results">The Results</h2>
<table class="table table-hover table-striped table-condensed">
	<thead>
		<tr><th>Benchmark</th><th>List</th><th>Type</th><th>Size</th><th class="text-center">Mean Time (s)</th></tr>
	</thead>
	<tbody>
		<tr><td>Iterate</td><td>ArrayList</td><td>TwoLongs</td><td>80,000,000</td><td class="text-center">2.266 ± 0.229</td></tr>
		<tr><td>Iterate</td><td>UnsafeArrayList</td><td>TwoLongs</td><td>80,000,000</td><td class="text-center">1.79 ± 0.03</td></tr>
		<tr><td>IterateInPlace</td><td>UnsafeArrayList</td><td>TwoLongs</td><td>80,000,000</td><td class="text-center">0.442 ± 0.023</td></tr>
		<tr><td></td><td></td><td></td><td></td><td></td></tr>
		<tr><td>Iterate</td><td>ArrayList</td><td>FourLongs</td><td>80,000,000</td><td class="text-center">2.277 ± 0.211</td></tr>
		<tr><td>Iterate</td><td>UnsafeArrayList</td><td>FourLongs</td><td>80,000,000</td><td class="text-center">2.126 ± 0.019</td></tr>
		<tr><td>IterateInPlace</td><td>UnsafeArrayList</td><td>FourLongs</td><td>80,000,000</td><td class="text-center">0.648 ± 0.019</td></tr>
		<tr><td></td><td></td><td></td><td></td><td></td></tr>
		<tr><td>Iterate</td><td>ArrayList</td><td>EightLongs</td><td>80,000,000</td><td class="text-center">2.792 ± 0.072</td></tr>
		<tr><td>Iterate</td><td>UnsafeArrayList</td><td>EightLongs</td><td>80,000,000</td><td class="text-center">2.672 ± 0.322</td></tr>
		<tr><td>IterateInPlace</td><td>UnsafeArrayList</td><td>EightLongs</td><td>80,000,000</td><td class="text-center">0.941 ± 0.032</td></tr>
		<tr><td></td><td></td><td></td><td></td><td></td></tr>
		<tr><td>Sort</td><td>ArrayList</td><td>TwoLongs</td><td>80,000,000</td><td class="text-center">70.31 ± 3.939</td></tr>
		<tr><td>Sort</td><td>ArrayList</td><td>FourLongs</td><td>80,000,000</td><td class="text-center">79.673 ± 6.119</td></tr>
		<tr><td>Sort</td><td>ArrayList</td><td>EightLongs</td><td>80,000,000</td><td class="text-center">97.687 ± 4.86</td></tr>
		<tr><td></td><td></td><td></td><td></td><td></td></tr>
		<tr><td>Sort</td><td>UnsafeArrayList</td><td>TwoLongs</td><td>80,000,000</td><td class="text-center">18.69 ± 3.158</td></tr>
		<tr><td>Sort</td><td>UnsafeArrayList</td><td>FourLongs</td><td>80,000,000</td><td class="text-center">24.822 ± 0.79</td></tr>
		<tr><td>Sort</td><td>UnsafeArrayList</td><td>EightLongs</td><td>80,000,000</td><td class="text-center">40.697 ± 0.743</td></tr>
	</tbody>
</table>
<h3 id="iterate">Iterate</h3>
<p>Starting with the smallest test object, TwoLongs, to read the first field of all 80 million  elements within an ArrayList took on average 2.266 ± 0.229 seconds. To do the same with the UnsafeArrayList (which doesn’t store objects, and instead copies elements in/out) took on average 1.79 ±0.03 seconds (an 24% improvement).</p>
<p>Remember in the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/">previous article</a>, UnsafeArrayList has two methods for retrieving an element <code>T get(int index)</code> and a <code>T get(T dest, int index)</code>. The former creates a new object and copies the fields. The latter copies the fields in place of a given destination object, allowing the reuse of a single temp object, and avoiding creations of new objects, thus is labelled &ldquo;InPlace&rdquo; in the above results.</p>
<p>It is therefore surprising that the UnsafeArrayList can iterate 24% faster than an ArrayList, when it has the additional overhead of creating an object, and copying fields into it. Compared to an ArrayList which is just reading existing objects.</p>
<p>Some theory is needed to understand what might be happening here. A modern CISC CPU can execute an instruction in a few clock cycles, let&rsquo;s say ~0.5 nanoseconds, however, reading from RAM takes ~10 nanoseconds. While the CPU is waiting for the response from RAM it is effectively blocked. To compensate the CPU deploys a few tricks, two of which could be helping here. Firstly, the CPU tries to predicting and prefetch the next memory request. Secondly, the CPU will execute instructions out of order, thus not waiting for the memory if a later instruction does not depend on the read.</p>
<p>In the ArrayList case, the array of reference is stored in contiguous memory. However, the actual objects (that the references point to) could be anywhere in RAM. As the program loops through it is making reads from effectively random locations in memory, that can’t be predicted, and thus stalls the CPU.</p>
<p>There is no doubt in the UnsafeArrayList the CPU is prefetching the next elements before it is needed. Additionally the cost of creating these short lived objects is most likely very small because they live and die in eden space and are thus simple to create and garbage collect. I also would not be surprised if the CPU or the JIT compiler was able to do <a href="https://en.wikipedia.org/wiki/Automatic_vectorization">some kind of vectorising</a> on the input. That is, concurrently operating on multiple entries at the same time.</p>
<p>If we then test the <code>T get(T dest, int index)</code> method (labelled IterateInPlace), it can iterate through the array in an impressive 0.442 ±0.023 seconds. That’s 5 times faster than the ArrayList, and 4 times faster than the <code>T get(int index)</code>. This is certainly because the objects are not created for each get.</p>
<p>It was not measured here, but it is possible to confirm what the CPU is doing, by using <a href="https://en.wikipedia.org/wiki/Hardware_performance_counter">hardware based performance counters</a>. These are special registers within the CPU that can be configured to measure cache hit/miss rates, prefetches, instructions per cycle, and many other metrics. These can be invaluable to understand what’s truly going on, as in most cases humans are bad at understanding performance bottlenecks through intuition alone. Tools such as <a href="http://oprofile.sourceforge.net/">oprofile</a>, <a href="https://perf.wiki.kernel.org/index.php/Main_Page">perf</a>, <a href="https://en.wikipedia.org/wiki/DTrace">dtrace</a> and <a href="https://sourceware.org/systemtap/">systemtap</a> can be used for this.</p>
<p>To do a quick sanity check, in the ArrayList case it takes an average of 28.325 nanoseconds per element. <a href="https://en.wikipedia.org/wiki/CAS_latency">According to wikipedia</a> it takes between 9.00-18.75 nanoseconds to read from DDR3 memory at 1333 Mhz. Thus this number doesn’t seem unexpected, as the ArrayList has to issue two memory reads, firstly reading sequentially from an array of references, and then reading from the object (which is at an unpredictable address).</p>
<p>With the UnsafeArrayList in-place test, it takes an average of 5.53 nanoseconds per element. As the fields are stored contiguously in memory, the CPU can efficiency pipeline the requests, amortizing the 9-18 ns memory read cost. Here the speed is most likely limited by either the memory’s bandwidth, or the CPU’s clock cycles.  To read 80 million memory addresses in 0.442 seconds, requires 180 Megatransfers per second, and assuming each object is two longs, or 16 bytes requires ~2.68 GiB/s of throughput. Neither of those values approach the upper limit of what DDR3 is capable of, thus I suspect the time is a combination of this and CPU instructions.</p>
<h3 id="sorting">Sorting</h3>
<p>The second benchmark measured the speed at which the lists could be read and written to somewhat randomly, and in particular sorted. This should cause a less predictable reads from memory.  To sort 80 million elements in the ArrayList took 70.31 ±3.939 seconds, and only 18.69 ±3.158 seconds for the UnsafeArrayList using the in-place get. The relative times is not as impressive as the previous test, but still the UnsafeArrayList is ~3.7 times as quick.  I’m unsure exactly why the UnsafeArrayList would be faster, but I suspect it is related to the fewer memory indirections, and prefetching effect the copying of fields has.</p>
<p>It’s also worth noting, the increase performance becomes less profound as the size of the stored class increases. For the FourLong the difference between ArrayList and UnsafeArrayList is 3.2x, and for EightLong the difference is 2.4x. This can easily be explained by the increasing cost of copying the fields in and out of the list. Even so, I would argue that the copy cost is in part hidden, as it is effectively prefetching the object’s fields into the CPU cache. Saving a memory load when the field is actually used (most likely shortly after it is pulled from the list).</p>
<h3 id="other-observations">Other observations</h3>
<p>Overlooked is the smaller memory requirements for the UnsafeArrayList. A TwoLong instance is 16 bytes of data, plus 16 bytes of JVM object header. Thus an ArrayList of 40 million instances take 2.4 GiB of RAM (32 bytes x 80M), plus an additional 305MiB for an array of 80 million references (assuming <a href="https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop">compressed object pointers</a> takes 4 bytes each). Totalling 2.68 GiB, whereas the UnsafeArray takes 16 bytes per entry, totaling only 1.2GiB (roughly half the size!).</p>
<p>Of course if the array is holding larger classes (such as the EightLong), the per object overhead is smaller, in these cases 6.25GiB vs 4.76GiB, roughly 75% the size.</p>
<p>One last observation of interest is the confidence intervals for the results. A larger error implies more variability in the test runs. For example, if the garbage collector ran during some of the runs, and slowed down the test, it would increase this error. In all the tests using the UnsafeArrayList in-place methods, the confidence interval is smaller, implying more constancy and predictability. This can be important in certain situations, such as real-time systems.</p>
<h2 id="conclusion">Conclusion</h2>
<p>We benchmarked the <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnsafeHelper.html">UnsafeArrayList</a>, against a normal ArrayList in two artificial workloads. We found that in both the start-to-finish iteration, and in the sorting case, that the UnsafeArrayList was 4-5x faster than its counterpart. This result itself is interesting when designing high performance data structures, however, the use of <a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html">sun.misc.Unsafe</a> is considered dangerous, and thus the performance comes with many caveats and risks. In fact, it was recently announced that the <a href="http://blog.dripstat.com/removal-of-sun-misc-unsafe-a-disaster-in-the-making/">Unsafe class is being deprecated and hidden in java 9</a>. So instead, this was just an insightful journey into how the CPU can optomise particular workloads, and how Java can be pushed to extreme speeds.</p>
<p>Your results may vary, and as always you should benchmark your exact workload instead of a hypothetical one, but this was still an interesting experiment.</p>
</description>
    </item>
    
    <item>
      <title>Unsafe Part 2: Using sun.misc.Unsafe to create a contiguous array of objects</title>
      <link>https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/</link>
      <pubDate>Wed, 26 Aug 2015 17:51:02 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/</guid>
      <description><p>I recently came across an article from the <a href="http://mechanical-sympathy.blogspot.com/2012/10/compact-off-heap-structurestuples-in.html">Mechanical Sympathy blog</a>, that used the <a href="https://en.wikipedia.org/wiki/Flyweight_pattern">flyweight pattern</a> to build a “compact off-heap” array of objects. They basically allocated an area of memory large enough to store N copies of their object. Then using a single instance of a proxy object, would pack/unpack fields into this memory. For example, let&rsquo;s say we needed to store an array of <a href="https://docs.oracle.com/javase/7/docs/api/java/awt/Point.html">Point</a> objects. We could construct a simple array like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="n">Point</span><span class="o">[]</span><span class="w"> </span><span class="n">points</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Point</span><span class="o">[</span><span class="n">N</span><span class="o">]</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><p>The inefficiency here is that each instance of a Point requires 12-16 bytes of overhead to store metadata about the object (such as class, GC state, etc), and each additional instance adds to the cost of garbage collection. Additionally, the array actually contains references to Point objects stored elsewhere in RAM. These references require a memory indirection when accessing the actual instances.</p>
<p>In the <a href="http://mechanical-sympathy.blogspot.com/2012/10/compact-off-heap-structurestuples-in.html">Mechanical Sympathy</a> article, they instead packed all the fields of the instances into a contiguous array. For simplification I changed their example, but it was something like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kt">int</span><span class="o">[]</span><span class="w"> </span><span class="n">memory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="kt">int</span><span class="o">[</span><span class="n">N</span><span class="o">*</span><span class="n">2</span><span class="o">]</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">ProxyPoint</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kd">private</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">setIndex</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">this</span><span class="p">.</span><span class="na">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kd">public</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">getX</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    	</span><span class="k">return</span><span class="w"> </span><span class="n">memory</span><span class="o">[</span><span class="n">index</span><span class="o">*</span><span class="n">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">0</span><span class="o">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kd">public</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">getY</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    	</span><span class="k">return</span><span class="w"> </span><span class="n">memory</span><span class="o">[</span><span class="n">index</span><span class="o">*</span><span class="n">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">1</span><span class="o">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>With this approach there is no overhead for each Point object (as there is only ever one PointProxy, and one array). This also has the interesting property that the fields for all the Points are stored in the same contiguous region of memory.  Which leads to some great cache/CPU benefits. For example, if you read all the points sequentially, adjacent objects share the same CPU cache line, and the CPU can predictably prefetch the next point. This would not be possible with an array of references to Points, as each Point could potentially be stored anywhere in RAM.</p>
<p>Now with this primer, it would be interesting to have a normal Java <a href="https://docs.oracle.com/javase/8/docs/api/java/util/List.html">List</a> that stored fields packed together like this. The above solution only works if you create a proxy object ahead of time knowing what class you would be storing. Using the recently released <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnsafeHelper.html">UnsafeHelper class</a> (<a href="https://blog.bramp.net/post/2015/08/24/unsafe-part-1-sun.misc.unsafe-helper-classes/">discussed previously</a>), I went about to build something that looked like a standard generic ArrayList, that could store any type. But with the benefit of storing all elements in contiguous region of memory.</p>
<p>The final solution is <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnsafeArrayList.html">UnsafeArrayList.java</a>. This implements the Java List interface, but instead of storing references to objects, it copies the object into a contiguous region of memory. If you are a C++ programmer, you can think of this as a <code>std::vector&lt;Point&gt;</code> instead of a <code>std::vector&lt;Point*&gt;</code>. This minor change comes with it’s own pros and cons, outlined later.</p>
<p>To begin with the list is constructed like so <code>new UnsafeArrayList&lt;Point&gt;(Point.class)</code>. The <code>Point.class</code> is passed in so that the list knows what kind of objects it will be storing. This is required due to a limitation in Java’s implementation of generics, that makes it <a href="http://stackoverflow.com/q/182636/88646">impossible for a class to know its own generic type</a>.</p>
<p>The constructor begins by calculating the size of an instance, and uses the UnsafeHelper to calculates the offset to the first field within an instance.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="nf">UnsafeArrayList</span><span class="p">(</span><span class="n">Class</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">capacity</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">this</span><span class="p">.</span><span class="na">firstFieldOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">firstFieldOffset</span><span class="p">(</span><span class="n">type</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">this</span><span class="p">.</span><span class="na">elementSize</span><span class="w">      </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">firstFieldOffset</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">this</span><span class="p">.</span><span class="na">unsafe</span><span class="w">           </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">getUnsafe</span><span class="p">();</span><span class="w">
</span></span></span></code></pre></div><p>An area of memory is then allocated, like so:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="w">    </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">allocateMemory</span><span class="p">(</span><span class="n">elementSize</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">capacity</span><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>This base variable holds the address to the beginning of the memory, and can only be used via the Unsafe class. The memory is large enough to hold <code>capacity</code> objects of <code>elementSize</code> bytes.</p>
<p>Unlike a Java reference, this base address allows <a href="https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/pointer.html">pointer arithmetic</a>, and thus to access a particular element we have a simple method to calculate the memory offset:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="w">    </span><span class="kd">private</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="nf">offset</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">elementSize</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Then to <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeArrayList.html#set-int-T-">set</a> an element within this List, we copy its fields into the allocated memory:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="w">    </span><span class="nd">@Override</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kd">public</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="nf">set</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="n">element</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">unsafe</span><span class="p">.</span><span class="na">copyMemory</span><span class="p">(</span><span class="n">element</span><span class="p">,</span><span class="w"> </span><span class="n">firstFieldOffset</span><span class="p">,</span><span class="w"> </span><span class="c1">// src, src_offset</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                          </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="p">(</span><span class="n">index</span><span class="p">),</span><span class="w">       </span><span class="c1">// dst, dst_offset</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                          </span><span class="n">elementSize</span><span class="p">);</span><span class="w">              </span><span class="c1">// size</span><span class="w">
</span></span></span></code></pre></div><p>This copies from object <code>element</code>, starting at offset <code>firstFieldOffset</code>, into the raw memory address determined by <code>offset(index)</code>.</p>
<p>The <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeArrayList.html#get-int-">get</a> method is a little more problematic, as the List interface expects get to return an instance of the object. Since we aren’t actually storing references to the objects (but copies of their fields), we need to construct an instance and populate it. This is quite costly, and defeats the point of this UnsafeArrayList. Instead an additional <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeArrayList.html#get-T-int-">get</a> method is provided, that allows an object to be passed in, which will have its fields replaced.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="w">    </span><span class="kd">public</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="nf">get</span><span class="p">(</span><span class="n">T</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">unsafe</span><span class="p">.</span><span class="na">copyMemory</span><span class="p">(</span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="p">(</span><span class="n">index</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                          </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">firstFieldOffset</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                          </span><span class="n">elementSize</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="n">dest</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>For completeness a standard <code>get(int index)</code> method is provided, which creates a new instance of the object (using unsafe.allocateInstance() instead of <code>new Type</code>).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="w">    </span><span class="kd">public</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="nf">get</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="n">get</span><span class="p">((</span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">allocateInstance</span><span class="p">(</span><span class="n">type</span><span class="p">),</span><span class="w"> </span><span class="n">index</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>You can inspect the rest of the <a href="https://github.com/bramp/unsafe/blob/master/unsafe-collection/src/main/java/net/bramp/unsafe/UnsafeArrayList.java">code via GitHub</a>, but these are the main parts.</p>
<p>In conclusion, this approach has some pros and cons, but was mostly created for fun.</p>
<ul>
<li>
<p>Pros</p>
</li>
<li>
<p>List&lt;&gt; interfaces that stores objects in contiguous memory</p>
</li>
<li>
<p>Better cache locality and CPU performance</p>
</li>
<li>
<p>Minimal memory overhead</p>
</li>
<li>
<p>Cons</p>
</li>
<li>
<p>Uses sun.misc.Unsafe</p>
</li>
<li>
<p>Additional CPU cycles needed to copies objects in and out of array</p>
</li>
<li>
<p>Copies the class out of the garbage collector’s view, thus if a stored object contains the only references to other objects, the garbage collector will not know it is still used.</p>
</li>
</ul>
<p>In the <a href="https://blog.bramp.net/post/2015/08/27/unsafe-part-3-benchmarking-a-java-unsafearraylist/">next article</a>, we&rsquo;ll benchmark this UnsafeArrayList, and investigate the performance impact of the cache locality, and other overheads.</p>
</description>
    </item>
    
    <item>
      <title>Unsafe Part 1: sun.misc.Unsafe Helper Classes</title>
      <link>https://blog.bramp.net/post/2015/08/24/unsafe-part-1-sun.misc.unsafe-helper-classes/</link>
      <pubDate>Mon, 24 Aug 2015 20:13:58 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2015/08/24/unsafe-part-1-sun.misc.unsafe-helper-classes/</guid>
      <description><p>I recently came across the <a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html">sun.misc.Unsafe class</a>, a poorly documented, internal API that gives your java program direct access to the JVM’s memory. Of course accessing the JVM’s memory can be considered unsafe, but allows for some exciting opportunities.</p>
<p>You can use Unsafe to inspect and manipulate the layout of your objects in RAM, allocate memory off the heap, do interesting things with threads, or even <a href="http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/">hack in multiple inheritance</a>. Multiple people have <a href="https://dzone.com/articles/understanding-sunmiscunsafe">written about Unsafe</a> before, and there are some really <a href="http://mydailyjava.blogspot.com/2013/12/sunmiscunsafe.html">good articles</a>, so we won’t cover it here.</p>
<p>Using unsafe is not too difficult, but I found the need for a few helper methods, thus I created a collection of classes wrapping the Unsafe code, starting with <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnsafeHelper.html">UnsafeHelper</a>. The main methods of interest are <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#getUnsafe--">getUnsafe()</a>, <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#sizeOf-java.lang.Object-">sizeOf()</a>, <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#firstFieldOffset-java.lang.Class-">firstFieldOffset()</a>, <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#toByteArray-java.lang.Object-">toByteArray()</a> and <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#hexDump-java.io.PrintStream-java.lang.Object-">hexDump()</a>. The <a href="https://bramp.github.io/unsafe/">javadoc</a> is the best place to look for documentation, however I’ll quickly explain their use.</p>
<p>To get an sun.misc.Unsafe instance, you have to extract it from a private static field within sun.misc.Unsafe class. For ease, the <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#getUnsafe--">UnsafeHelper.getUnsafe()</a> method does that.</p>
<p>When accessing an object, you typically need to know the size of the object (in bytes), and be able to find the offset to individual fields. If you <a href="http://www.codeinstructions.com/2008/12/java-objects-memory-structure.html">understand the memory layout</a> the JVM uses, you’ll know there is a header in front of the Object’s fields. Typically it looks like this, but varies based on CPU architecture, platform, etc:</p>
<table class="table table-bordered" style="margin-bottom: 0px">
  <tr>
    <th class="text-center">0</th>
    <th class="text-center">1</th>
    <th class="text-center">2</th>
    <th class="text-center">3</th>
    <th class="text-center">4</th>
    <th class="text-center">5</th>
    <th class="text-center">6</th>
    <th class="text-center">7</th>
    <th class="text-center">8</th>
    <th class="text-center">9</th>
    <th class="text-center">10</th>
    <th class="text-center">11</th>
    <th class="text-center">12</th>
    <th class="text-center">13</th>
    <th class="text-center">14</th>
    <th class="text-center">15</th>
  </tr>
  <tr>
    <td class="text-center" colspan="8">mark word(8)</td>
    <td class="text-center" colspan="4">klass pointer(4)</td>
    <td class="text-center" colspan="4">padding</td>
  </tr>
</table>
<div class="text-right">More information [here][6] and [here][7].</div>
<p>To hide some of the details, <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#headerSize-java.lang.Object-">headerSize()</a> returns the size of the header, and <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#sizeOf-java.lang.Object-">sizeOf()</a> return the total size an object including the header in bytes. <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#firstFieldOffset-java.lang.Class-">firstFieldOffset()</a> is then useful as it provides the the offset to the first field. Note that <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#headerSize-java.lang.Object-">headerSize()</a> and <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#firstFieldOffset-java.lang.Class-">firstFieldOffset()</a> do not always return identical results, as padding (not part of the header) may be used to correctly align the first field.</p>
<p>Next <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#toByteArray-java.lang.Object-">toByteArray()</a> will take an object, and copy it (and its header) into a byte array. Useful for easily inspecting, and serialising the object. Finally, <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#hexDump-java.io.PrintStream-java.lang.Object-">hexDump()</a> uses the <a href="https://bramp.github.io/unsafe/net/bramp/unsafe/UnsafeHelper.html#toByteArray-java.lang.Object-">toByteArray()</a> to grab an object, and print out a hex representation of the memory, for example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="cm">/**
</span></span></span><span class="line"><span class="cl"><span class="cm"> * hexDump(new Class4()) prints:
</span></span></span><span class="line"><span class="cl"><span class="cm"> * 0x00000000: 01 00 00 00 00 00 00 00  8A BF 62 DF 67 45 23 01
</span></span></span><span class="line"><span class="cl"><span class="cm"> */</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">static</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Class4</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0x01234567</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="cm">/**
</span></span></span><span class="line"><span class="cl"><span class="cm"> * Longs are always 8 byte aligned, so 4 bytes of padding
</span></span></span><span class="line"><span class="cl"><span class="cm"> * hexDump(new Class8()) prints:
</span></span></span><span class="line"><span class="cl"><span class="cm"> * 0x00000000: 01 00 00 00 00 00 00 00  9B 81 61 DF 00 00 00 00
</span></span></span><span class="line"><span class="cl"><span class="cm"> * 0x00000010: EF CD AB 89 67 45 23 01
</span></span></span><span class="line"><span class="cl"><span class="cm"> */</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">static</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Class8</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kt">long</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0x0123456789ABCDEFL</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>In the first example, Class4, a simple class with a single int field, takes up 16 bytes of memory, with the first 8 used by the JVM, the 2nd 4 bytes being a class pointer (basically how the object knows what kind of class it is), and the last four actually being the value of the field. The second example shows a similar header, but with bytes 12-16 being used as padding, so that the long field value is 8 byte aligned.</p>
<p>These helper methods are available in <a href="https://github.com/bramp/unsafe">new project on Github</a>, and downloadable via Maven. Just <a href="https://oss.sonatype.org/service/local/repositories/releases/content/net/bramp/unsafe/unsafe-helper/1.0/unsafe-helper-1.0.jar">download the jar file</a>, or include a maven dependency, and <code>import net.bramp.unsafe.UnsafeHelper</code>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-xml" data-lang="xml"><span class="line"><span class="cl"><span class="nt">&lt;dependency&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;groupId&gt;</span>net.bramp.unsafe<span class="nt">&lt;/groupId&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;artifactId&gt;</span>unsafe-helper<span class="nt">&lt;/artifactId&gt;</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&lt;version&gt;</span>1.0<span class="nt">&lt;/version&gt;</span>
</span></span><span class="line"><span class="cl"><span class="nt">&lt;/dependency&gt;</span>
</span></span></code></pre></div><p><a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/">Next article</a>, we&rsquo;ll make use of this new UnsafeHelper to build a special List which copies objects, instead of storing references.</p>
</description>
    </item>
    
  </channel>
</rss>
