<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Byte-Buddy on bramp.net</title>
    <link>https://blog.bramp.net/</link>
    <description>Recent content in Byte-Buddy on bramp.net</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-GB</language>
    <lastBuildDate>Wed, 09 Sep 2015 20:29:04 -0700</lastBuildDate>
    <atom:link href="https://blog.bramp.net/tags/byte-buddy/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Unrolling loops at runtime with Byte Buddy</title>
      <link>https://blog.bramp.net/post/2015/09/09/unrolling-loops-at-runtime-with-byte-buddy/</link>
      <pubDate>Wed, 09 Sep 2015 20:29:04 -0700</pubDate>
      
      <guid>https://blog.bramp.net/post/2015/09/09/unrolling-loops-at-runtime-with-byte-buddy/</guid>
      <description><p>While creating the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a>, I encountered a problem that I felt I could optimise. The <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> copies objects into off-heap memory, instead of what a normal <code>ArrayList</code> would do, which is to store references to the object in an array on the heap. For example an <code>UnsafeArrayList&lt;FourLong&gt;</code> holds instances of <a href="https://github.com/bramp/unsafe/blob/master/unsafe-tests/src/main/java/net/bramp/unsafe/examples/FourLongs.java">FourLongs</a>, whose fields consume a total of 32 bytes (4×8 bytes) of memory. By design, when <code>set()</code> or <code>get()</code> are called, the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> copies these 32 bytes into or out of a contiguous segment of memory.</p>
<p>To achieve the copying, <code>sun.misc.Unsafe</code>’s <a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html#putLong(long,+long)">putLong()</a> is repeatedly called, moving 8 bytes at a time. For example, this simple loop will copy a long’s worth of memory each iteration, from src, into dest:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">8</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">getUnsafe</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">copy</span><span class="p">(</span><span class="n">Object</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">src</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kt">long</span><span class="w"> </span><span class="n">destOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kt">long</span><span class="w"> </span><span class="n">destEnd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">dest</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">destOffset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">dstEnd</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">dstOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Note, we use <code>putLong</code>, not because the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> is storing objects made up of longs, but because this is the <code>Unsafe</code> method that can copy the most in one go. This <code>putLong</code> method is thus being used as the building block to build a more complex looping copy method. Note, this works great for memory which is aligned on a 8 byte boundary, and the total copy is a multiple of 8 bytes. For the sake of this article, we make the assumption that this is always true.</p>
<p>In the <code>FourLong</code>&rsquo;s case, the copy method would iterates four times. This is predictable, and occurs every time we <code>get()</code> on a <code>UnsafeArrayList&lt;FourLong&gt;</code> instance. Since this copy loop will be executed every time <code>get()</code> is called, it is worth seeing if we can make it execute faster. A common optimisation is for the developer to manually <a href="https://en.wikipedia.org/wiki/Loop_unrolling">unroll the loop</a>, avoiding the loop counter, and producing potentially quicker code<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. In this case, manually unrolling the code is not possible because the parameterised type could be any size. For example, a <code>UnsafeArrayList&lt;Point&gt;</code> would only need to copy 8 bytes (two 4 byte ints). You would hope that the JIT would notice the loop always iterates the same number of times (for a particular list), and be able to remove the loop. Sadly, it does not seem to do this, perhaps because the JVM does not know what side effects <code>unsafe.{get,put}Long</code> have. To measure the cost of the looping we compare the previous code to this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">8</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">getUnsafe</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">copy</span><span class="p">(</span><span class="n">Object</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">src</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">assert</span><span class="p">(</span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">dest</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">4</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kt">long</span><span class="w"> </span><span class="n">destOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">destOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">src</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">unsafe</span><span class="p">.</span><span class="na">putLong</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="n">destOffset</span><span class="p">,</span><span class="w"> </span><span class="n">unsafe</span><span class="p">.</span><span class="na">getLong</span><span class="p">(</span><span class="n">src</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>When benchmarked, this manually unrolled code runs 2 times faster! This got me thinking, since a particular <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> instance is always going to copy the same sized object, again and again and again, it could perhaps generate bytecode during creation, that unrolled the loop.</p>
<h2 id="enter-byte-buddy">Enter Byte Buddy</h2>
<p>Thus investigation into <a href="http://bytebuddy.net/">Byte Buddy</a> began, a library designed for generating bytecode at runtime. The rest of this article explains how to use Byte Buddy for this goal.</p>
<p>To start, I used Intellij IDEA’s “<a href="https://plugins.jetbrains.com/plugin/5918">Show Bytecode</a>” option, to inspect the code generated by my hand unrolled code.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="c1">; Initialisation</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; long destOffset = 0;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LCONST_0</span>  <span class="c1">; Load the long zero</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LSTORE</span> <span class="mi">4</span>  <span class="c1">; Store it in “destOffset”</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">; Copy</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; unsafe.putLong(dest, destOffset, unsafe.getLong(src));</span>
</span></span><span class="line"><span class="cl">  <span class="nf">ALOAD</span> <span class="mi">0</span>  <span class="c1">; Load “this”</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; The the “unsafe” member from this.</span>
</span></span><span class="line"><span class="cl">  <span class="nf">GETFIELD</span> <span class="nv">net</span><span class="o">/</span><span class="nv">bramp</span><span class="o">/</span><span class="nv">unsafe</span><span class="o">/</span><span class="nv">Test.unsafe</span> <span class="p">:</span> <span class="nv">Lsun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe</span><span class="c1">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="nf">ALOAD</span> <span class="mi">1</span>  <span class="c1">; Load dest</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">4</span>  <span class="c1">; Load dstOffset</span>
</span></span><span class="line"><span class="cl">  <span class="nf">ALOAD</span> <span class="mi">0</span>  <span class="c1">; Load this</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; The the “unsafe” member from this.</span>
</span></span><span class="line"><span class="cl">  <span class="nf">GETFIELD</span> <span class="nv">net</span><span class="o">/</span><span class="nv">bramp</span><span class="o">/</span><span class="nv">unsafe</span><span class="o">/</span><span class="nv">Test.unsafe</span> <span class="p">:</span> <span class="nv">Lsun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe</span><span class="c1">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">2</span>  <span class="c1">; Load src</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; unsafe.getLong(src), storing result on stack.</span>
</span></span><span class="line"><span class="cl">  <span class="nf">INVOKEVIRTUAL</span> <span class="nv">sun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe.getLong</span> <span class="p">(</span><span class="nv">J</span><span class="p">)</span><span class="nv">J</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; unsafe.putLong(dest, dstOffset, {stack result})</span>
</span></span><span class="line"><span class="cl">  <span class="nf">INVOKEVIRTUAL</span> <span class="nv">sun</span><span class="o">/</span><span class="nv">misc</span><span class="o">/</span><span class="nv">Unsafe.putLong</span> <span class="p">(</span><span class="nv">Ljava</span><span class="o">/</span><span class="nv">lang</span><span class="o">/</span><span class="nv">Object</span><span class="c1">;JJ)V</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">;; Increment</span>
</span></span><span class="line"><span class="cl">  <span class="c1">; dstOffset += 8;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">4</span>   <span class="c1">; Load dstOffset</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LDC</span> <span class="mi">8</span>     <span class="c1">; Load 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LADD</span>      <span class="c1">; Add dstOffset and 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LSTORE</span> <span class="mi">4</span>  <span class="c1">; Store result to dstOffset</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">  <span class="c1">; src += 8;</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LLOAD</span> <span class="mi">2</span>   <span class="c1">; Load src</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LDC</span> <span class="mi">8</span>     <span class="c1">; Load 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LADD</span>      <span class="c1">; Add src and 8</span>
</span></span><span class="line"><span class="cl">  <span class="nf">LSTORE</span> <span class="mi">2</span>  <span class="c1">; Store result to src</span>
</span></span></code></pre></div><p>After reading a <a href="http://download.forge.objectweb.org/asm/asm4-guide.pdf">primer to bytecode</a>, this generated bytecode looked quite simple. It can be broken up into three steps, initialisation, copy, and increment. At runtime, Byte Buddy can be used to generate bytecode that is an unrolled equivalent, such that there is 1 initialisation step, N copy steps, and N-1 increment steps, where N is based on the size of the object the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> plans to copy.</p>
<p>Reading through the Byte Buddy API it seems the best way to achieve this is to create an abstract class, which will form the base of a generated class. Then at runtime create an instantiation of this abstract class, specialised with the unrolled copy bytecode.</p>
<p>For example, the base class would look like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kd">abstract</span><span class="w"> </span><span class="kd">class</span> <span class="nc">UnsafeCopier</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">protected</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">public</span><span class="w"> </span><span class="nf">UnsafeCopier</span><span class="p">(</span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">this</span><span class="p">.</span><span class="na">unsafe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">checkNotNull</span><span class="p">(</span><span class="n">unsafe</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">abstract</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">copy</span><span class="p">(</span><span class="n">Object</span><span class="w"> </span><span class="n">dest</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">src</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Leaving us to implement the <code>copy(…)</code> method optimally for the size of object being copied.</p>
<p>Using the <a href="https://en.wikipedia.org/wiki/Builder_pattern">Builder pattern</a> I created the <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnrolledUnsafeCopierBuilder.html"><code>UnrolledUnsafeCopierBuilder</code></a> class. The <code>build()</code> method will calculate the size of the class being copied, then using Byte Buddy generate the copy implementation, and returns a specialised instance UnsafeCopier.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="n">UnsafeCopier</span><span class="w"> </span><span class="nf">build</span><span class="p">(</span><span class="n">Unsafe</span><span class="w"> </span><span class="n">unsafe</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeHelper</span><span class="p">.</span><span class="na">sizeOf</span><span class="p">(</span><span class="n">clazz</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">Class</span><span class="o">&lt;?&gt;</span><span class="w"> </span><span class="n">dynamicType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ByteBuddy</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">subclass</span><span class="p">(</span><span class="n">UnsafeCopier</span><span class="p">.</span><span class="na">class</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">method</span><span class="p">(</span><span class="n">named</span><span class="p">(</span><span class="s">&#34;copy&#34;</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">intercept</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">CopierImplementation</span><span class="p">(</span><span class="n">length</span><span class="p">)).</span><span class="na">make</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">load</span><span class="p">(</span><span class="n">getClass</span><span class="p">().</span><span class="na">getClassLoader</span><span class="p">(),</span><span class="w"> </span><span class="n">ClassLoadingStrategy</span><span class="p">.</span><span class="na">Default</span><span class="p">.</span><span class="na">WRAPPER</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">getLoaded</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">UnsafeCopier</span><span class="p">)</span><span class="w"> </span><span class="n">dynamicType</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">getDeclaredConstructor</span><span class="p">(</span><span class="n">Unsafe</span><span class="p">.</span><span class="na">class</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">.</span><span class="na">newInstance</span><span class="p">(</span><span class="n">unsafe</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This begins by calculating the size of the class. Then using a <a href="http://bytebuddy.net/javadoc/0.7-rc1/index.html?net/bytebuddy/ByteBuddy.html">ByteBuddy</a> instance, creates a new dynamicType, which extends <code>UnsafeCopier</code>. This subclass then obtains its copy method with code generated by <code>CopierImplementation(length)</code>. Finally, this new dynamicType is used to create an instance of the copier, which is now specialised for copying instances of clazz.</p>
<p>The real meat of the code is in <code>CopierImplementation</code>, which can be explained in pieces:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">class</span> <span class="nc">CopierImplementation</span><span class="w"> </span><span class="kd">implements</span><span class="w"> </span><span class="n">ByteCodeAppender</span><span class="p">,</span><span class="w"> </span><span class="n">Implementation</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">8</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">final</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">public</span><span class="w"> </span><span class="nf">CopierImplementation</span><span class="p">(</span><span class="kt">long</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">this</span><span class="p">.</span><span class="na">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="kd">private</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="nf">buildStack</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">setupStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">copyStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">incrementStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">iterations</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">length</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">COPY_STRIDE</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="o">[]</span><span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="o">[</span><span class="n">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">iterations</span><span class="o">]</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">stack</span><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">setupStack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">iterations</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="n">stack</span><span class="o">[</span><span class="n">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">1</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copyStack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">			</span><span class="n">stack</span><span class="o">[</span><span class="n">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">2</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">incrementStack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="c1">// Override the last incrementStack with a &#34;return&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="n">stack</span><span class="o">[</span><span class="n">stack</span><span class="p">.</span><span class="na">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">1</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MethodReturn</span><span class="p">.</span><span class="na">VOID</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">		</span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Compound</span><span class="p">(</span><span class="n">stack</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Byte Buddy uses <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> objects to define what bytecode to generate. These <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> objects can be built up hierarchically and contain all the bytecode instructions to execute. We define a separate <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> object for each step, and in the <code>buildStack()</code> method combine the steps multiple times into one array. In particular, this stack array contains one initialise step, N copy steps, and N-1 increment steps, with a <code>return</code> instruction on the end.</p>
<p>Recall from the early bytecode listing, that the initialisation was two bytecode operations, a LCONST, and LSTORE. In Byte Buddy, we can thus do the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">setupStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Compound</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">LongConstant</span><span class="p">.</span><span class="na">ZERO</span><span class="p">,</span><span class="w">                       </span><span class="c1">// LCONST_0</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableStore</span><span class="p">.</span><span class="na">LONG</span><span class="p">.</span><span class="na">storeOffset</span><span class="o">[</span><span class="n">4</span><span class="o">]</span><span class="w">  </span><span class="c1">// LSTORE 4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>Byte Buddy provides the primitives for most bytecode instructions, and can be built up in these <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> arrays. However, some instructions are missing, for example LADD (needed by the increment step). But it is simple enough to create one from scratch, as <a href="https://github.com/bramp/unsafe/tree/master/unsafe-unroller/src/main/java/net/bramp/unsafe/bytebuddy">shown  outside of this article</a>.</p>
<p>Next the copy step is defined which is a few more instructions than the increment, but relatively simple:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Field</span><span class="w"> </span><span class="n">unsafeField</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UnsafeCopier</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getDeclaredField</span><span class="p">(</span><span class="s">&#34;unsafe&#34;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Method</span><span class="w"> </span><span class="n">getLongMethod</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Unsafe</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getMethod</span><span class="p">(</span><span class="s">&#34;getLong&#34;</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="p">.</span><span class="na">class</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">Method</span><span class="w"> </span><span class="n">putLongMethod</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Unsafe</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getMethod</span><span class="p">(</span><span class="s">&#34;putLong&#34;</span><span class="p">,</span><span class="n">Object</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="p">.</span><span class="na">class</span><span class="p">,</span><span class="w"> </span><span class="kt">long</span><span class="p">.</span><span class="na">class</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">final</span><span class="w"> </span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">copyStack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Compound</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// unsafe.putLong(dest, destOffset, unsafe.getLong(src));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">REFERENCE</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="c1">// ALOAD 0 this</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">FieldAccess</span><span class="p">.</span><span class="na">forField</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">FieldDescription</span><span class="p">.</span><span class="na">ForLoadedField</span><span class="p">(</span><span class="n">unsafeField</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	                                   </span><span class="p">.</span><span class="na">getter</span><span class="p">(),</span><span class="w"> </span><span class="c1">// GETFIELD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">REFERENCE</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">1</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="c1">// ALOAD 1 dest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">LONG</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">4</span><span class="o">]</span><span class="p">,</span><span class="w">      </span><span class="c1">// LLOAD 4 destOffset</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">REFERENCE</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="c1">// ALOAD 0 this</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">FieldAccess</span><span class="p">.</span><span class="na">forField</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">FieldDescription</span><span class="p">.</span><span class="na">ForLoadedField</span><span class="p">(</span><span class="n">unsafeField</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	                                   </span><span class="p">.</span><span class="na">getter</span><span class="p">(),</span><span class="w"> </span><span class="c1">// GETFIELD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodVariableAccess</span><span class="p">.</span><span class="na">LONG</span><span class="p">.</span><span class="na">loadOffset</span><span class="o">[</span><span class="n">2</span><span class="o">]</span><span class="p">,</span><span class="w">      </span><span class="c1">// LLOAD 2 src</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodInvocation</span><span class="p">.</span><span class="na">invoke</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">MethodDescription</span><span class="p">.</span><span class="na">ForLoadedMethod</span><span class="p">(</span><span class="n">getLongMethod</span><span class="p">)),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">MethodInvocation</span><span class="p">.</span><span class="na">invoke</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">MethodDescription</span><span class="p">.</span><span class="na">ForLoadedMethod</span><span class="p">(</span><span class="n">putLongMethod</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>Again, the bytecode instructions are created as a sequence of <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a>, replicating the bytecode the java compiler code had generated earlier. This example contains a couple of new <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> classes, in particular the Field and Method Descriptions classes.</p>
<p>The final step is the increment step, which won’t be explained, but for the interested reader <a href="https://github.com/bramp/unsafe/blob/ff8f463bf60661ff63133e8a3beada7fd65c7c45/unsafe-unroller/src/main/java/net/bramp/unsafe/CopierImplementation.java#L86">the source can be found here</a>.</p>
<p>One last piece of information Byte Buddy needs, is the size of the stack needed for the <code>copy()</code> method, including any space local variables may need. The <a href="http://bytebuddy.net/javadoc/0.7-rc1/net/bytebuddy/implementation/bytecode/StackManipulation.html"><code>StackManipulation</code></a> comes in handy here, as it is able to infer some of these details from the byte code it represents. In particular, the following code calculates the stack size:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="n">Size</span><span class="w"> </span><span class="nf">apply</span><span class="p">(</span><span class="n">MethodVisitor</span><span class="w"> </span><span class="n">methodVisitor</span><span class="p">,</span><span class="w"> </span><span class="n">Implementation</span><span class="p">.</span><span class="na">Context</span><span class="w"> </span><span class="n">implementationContext</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">   </span><span class="n">MethodDescription</span><span class="w"> </span><span class="n">instrumentedMethod</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Call buildStack() (from above) to generate the bytecode</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">StackManipulation</span><span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">buildStack</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Calculate the size of this bytecode</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="n">StackManipulation</span><span class="p">.</span><span class="na">Size</span><span class="w"> </span><span class="n">finalStackSize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">.</span><span class="na">apply</span><span class="p">(</span><span class="n">methodVisitor</span><span class="p">,</span><span class="w"> </span><span class="n">implementationContext</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// Now return the size of this bytecode, plus two, which is the size of the local</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="c1">// destOffset variable.</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">	</span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Size</span><span class="p">(</span><span class="n">finalStackSize</span><span class="p">.</span><span class="na">getMaximalSize</span><span class="p">(),</span><span class="w"> </span><span class="n">instrumentedMethod</span><span class="p">.</span><span class="na">getStackSize</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">2</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>An important part here, is the <code>+2</code>, which makes room for the <code>long destOffset</code> variable. If that was missing, the generated bytecode would incorrectly write over instructions on the stack, and most likely crash the JVM.</p>
<p>Now at runtime the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a>&rsquo;s constructor can use the <a href="https://bramp.github.io/unsafe/index.html?net/bramp/unsafe/UnrolledUnsafeCopierBuilder.html"><code>UnrolledUnsafeCopierBuilder</code></a> to generate a specialised <code>UnsafeCopier</code> designed for the exact class the <a href="https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/"><code>UnsafeArrayList</code></a> is storing.</p>
<h2 id="results">Results</h2>
<p>Now we have most of what we need, it is worth benchmarking this code. Using <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a>, we can write three microbenchmarks. One using the original looping code, one using the hand unrolled code, and one using the Byte Buddy unrolled code. The <a href="https://github.com/bramp/unsafe/blob/master/unsafe-benchmark/src/main/java/net/bramp/unsafe/copier/UnrolledCopierBenchmark.java">code for the benchmarks</a> is on GitHub, and follows a similar methodology to that in a <a href="https://blog.bramp.net/post/2015/08/27/unsafe-part-3-benchmarking-a-java-unsafearraylist/">previous article</a>.</p>
<p>The results are as you may expect:</p>















<table class="table">
  <thead>
      <tr>
          <th>Benchmark</th>
          <th>Mode</th>
          <th>Cnt</th>
          <th>Score</th>
          <th>Error</th>
          <th>Units</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Loop</td>
          <td>thrpt</td>
          <td>25</td>
          <td>218.056</td>
          <td>± 11.123</td>
          <td>ops/us</td>
      </tr>
      <tr>
          <td>Hand Unrolled</td>
          <td>thrpt</td>
          <td>25</td>
          <td>430.376</td>
          <td>± 27.448</td>
          <td>ops/us</td>
      </tr>
      <tr>
          <td>Byte Buddy Unrolled</td>
          <td>thrpt</td>
          <td>25</td>
          <td>437.139</td>
          <td>± 22.811</td>
          <td>ops/us</td>
      </tr>
  </tbody>
</table>

<p>The loop code can execute ~218 times per microseconds, whereas both the Byte Buddy, and hand unrolled code had near identical performance, of ~430-437 iterations per microsecond, nearly twice as fast. Of course, not measured here is the startup cost of generating the unrolled code. It is assumed this technique would only be used when the generated code would exist for a long time. Otherwise the setup cost undoes any per execution savings.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In summary, we managed to unroll a loop at runtime by generating on demand bytecode for that specific purpose. This was possible by inspecting machine generated bytecode, and using Byte Buddy to generate equivalent bytecode at runtime, customised specifically with the correct number of unrolled iterations.</p>
<p>This technique may seem completely crazy, and I don’t suggest its used unless you know what you are doing. That includes, actually measuring you have a performance problem which could be fixed with this, and not being able to depend on the JVM’s own JIT to do this optimisation for you.</p>
<p><em>Helpful Links:</em> <a href="https://github.com/bramp/unsafe/">GitHub Home</a> | <a href="https://github.com/bramp/unsafe/tree/master/unsafe-unroller/src/main/java/net/bramp/unsafe">Gitub Code</a> | <a href="https://bramp.github.io/unsafe/">JavaDoc</a></p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Unrolled code is not always faster, as larger code may not fit into CPU instruction cache.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
</description>
    </item>
    
  </channel>
</rss>
