her.esy.fun/src/Scratch/en/blog/2010-05-24-Trees--Pragmatis.../index.html

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>YBlog - Trees; Pragmatism and Formalism</title>
        <meta name="keywords" content="XML, Perl, programming, tree, theory, mathematics, regexp, script" />

        <link rel="shortcut icon" type="image/x-icon" href="../../../../Scratch/img/favicon.ico" />
        <link rel="stylesheet" type="text/css" href="/css/y.css" />
        <link rel="stylesheet" type="text/css" href="/css/legacy.css" />
        <link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <link rel="apple-touch-icon" href="../../../../Scratch/img/about/FlatAvatar@2x.png" />
        <!--[if lt IE 9]>
        <script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
        <![endif]-->
        <!-- IndieAuth -->
        <link href="https://twitter.com/yogsototh" rel="me">
        <link href="https://github.com/yogsototh" rel="me">
        <link href="mailto:yann.esposito@gmail.com" rel="me">
        <link rel="pgpkey" href="../../../../pubkey.txt">
    </head>
    <body lang="en" class="article">
        <div id="content">
	        			<div id="header">
			    <div id="choix">
        	    <span id="choixlang">
                  <a href="../../../../Scratch/fr/blog/2010-05-24-Trees--Pragmatism-and-Formalism/">French</a> 
        	    </span>
              <span class="tomenu"><a href="#navigation">↓ Menu ↓</a></span>
        	    <span class="flush"></span>
        	</div>
			</div>

			<div id="titre">
				<h1>Trees; Pragmatism and Formalism</h1>
				<h2>When theory is more efficient than practice</h2>

			</div>
			<div class="flush"></div>
      <div id="afterheader" class="article">
          <div class="corps">
              <div class="intro">
<p><span class="sc"><abbr title="Too long; didn't read">tl;dr</abbr>: </span>:</p>
<ul>
<li>I tried to program a simple filter</li>
<li>Was blocked 2 days</li>
<li>Then stopped working like an engineer monkey</li>
<li>Used a pen and a sheet of paper</li>
<li>Made some math.</li>
<li>Crushed the problem in 10 minutes</li>
<li>Conclusion: The pragmatism shouldn’t mean “never use theory”.</li>
</ul>
</div>
<h2 id="abstract-longer-than-tldr">Abstract (longer than <span class="sc"><abbr title="Too long; didn't read">tl;dr</abbr>: </span>)</h2>
<p>For my job, I needed to resolve a problem. It first seems not too hard. Then I started working directly on my program. I entered in the <em>infernal</em>: <em>try &amp; repair loop</em>. Each step was like:</p>
<blockquote>
<p>– Just this thing to repair and that should be done.<br />
– OK, now that should just work.<br />
– Yeah!!!<br />
– Oops! I forgotten that…<br />
<code>repeat until death</code></p>
</blockquote>
<p>After two days of this <a href="http://fr.wikipedia.org/wiki/Sisyphe">Sisyphus</a> work, I finally just stopped to rethink the problem. I took a pen, a sheet of paper. I simplified the problem, reminded what I learned during my Ph.D. about trees. Finally, the problem was crushed in less than 20 minutes.</p>
<p>I believe the important lesson is to remember that the most efficient methodology to resolve this <em>pragmatic</em> problem was the <em>theoretical</em> one. And therefore, argues opposing science, theory to pragmatism and efficiency are fallacies.</p>
<hr />
<h1 id="first-my-experience">First: my experience</h1>
<p>Apparently 90% of programmer are unable to program a binary search without bug. The algorithm is well known and easy to understand. However it is difficult to program it without any flaw. I participated to <a href="http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/">this contest</a>. And you can see the <a href="http://reprog.wordpress.com/2010/04/21/binary-search-redux-part-1/">results here</a><a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a>. I had to face a problem of the same kind at my job. The problem was simple to the start. Simply transform an <sc>xml</sc> from one format to another.</p>
<p>The source <sc>xml</sc> was in the following general format:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb1-1" title="1"><span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-2" title="2">    <span class="kw">&lt;contenu&gt;</span></a>
<a class="sourceLine" id="cb1-3" title="3">        <span class="kw">&lt;tag1&gt;</span>value1<span class="kw">&lt;/tag1&gt;</span></a>
<a class="sourceLine" id="cb1-4" title="4">        <span class="kw">&lt;tag2&gt;</span>value2<span class="kw">&lt;/tag2&gt;</span></a>
<a class="sourceLine" id="cb1-5" title="5">        ...</a>
<a class="sourceLine" id="cb1-6" title="6">    <span class="kw">&lt;/contenu&gt;</span></a>
<a class="sourceLine" id="cb1-7" title="7">    <span class="kw">&lt;enfant&gt;</span></a>
<a class="sourceLine" id="cb1-8" title="8">        <span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-9" title="9">            ...</a>
<a class="sourceLine" id="cb1-10" title="10">        <span class="kw">&lt;/rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-11" title="11">        ...</a>
<a class="sourceLine" id="cb1-12" title="12">        <span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-13" title="13">            ...</a>
<a class="sourceLine" id="cb1-14" title="14">        <span class="kw">&lt;/rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-15" title="15">    <span class="kw">&lt;/enfant&gt;</span></a>
<a class="sourceLine" id="cb1-16" title="16"><span class="kw">&lt;/menu&gt;</span></a></code></pre></div>
<p>and the destination format was in the following general format:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb2-1" title="1"><span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;Menu0&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-2" title="2">    <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-3" title="3">        <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;menu&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-4" title="4">            <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-5" title="5">                <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;tag1&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-6" title="6">                    <span class="kw">&lt;value&gt;</span>value1<span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-7" title="7">                <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-8" title="8">                <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;tag2&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-9" title="9">                    <span class="kw">&lt;value&gt;</span>value2<span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-10" title="10">                <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-11" title="11">                ...</a>
<a class="sourceLine" id="cb2-12" title="12">                <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;menu&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-13" title="13">                    <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-14" title="14">                        ...</a>
<a class="sourceLine" id="cb2-15" title="15">                    <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-16" title="16">                    <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-17" title="17">                        ...</a>
<a class="sourceLine" id="cb2-18" title="18">                    <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-19" title="19">                <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-20" title="20">            <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-21" title="21">        <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-22" title="22">    <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-23" title="23"><span class="kw">&lt;/item&gt;</span></a></code></pre></div>
<p>At first sight I believed it will be easy. I was so certain it will be easy that I fixed to myself the following rules:</p>
<ol type="1">
<li>do not use <sc>xslt</sc></li>
<li>avoid the use of an <sc>xml</sc> parser</li>
<li>resolve the problem using a simple perl script[^2]</li>
</ol>
<p>You can try if you want. If you attack the problem directly opening an editor, I assure you, it will certainly be not so simple. I can tell that, because it’s what I’ve done. And I must say I lost almost a complete day at work trying to resolve this. There was also, many small problems around that make me lose more than two days for this problem.</p>
<p>Why after two days did I was unable to resolve this problem which seems so simple?</p>
<p>What was my behaviour (workflow)?</p>
<ol type="1">
<li>Think</li>
<li>Write the program</li>
<li>Try the program</li>
<li>Verify the result</li>
<li>Found a bug</li>
<li>Resolve the bug</li>
<li>Go to step 3.</li>
</ol>
<p>This was a <em>standard</em> workflow for computer engineer. The flaw came from the first step. I thought about how to resolve the problem but with the eyes of a <em>pragmatic engineer</em>. I was saying:</p>
<blockquote>
<p>That should be a simple perl search and replace program.<br />
Let’s begin to write code</p>
</blockquote>
<p>This is the second sentence that was plainly wrong. I started in the wrong direction. And the workflow did not work from this entry point.</p>
<h2 id="think">Think</h2>
<p>After some times, I just stopped to work. Tell myself <em>“it is enough, now, I must finish it!”</em>. I took a sheet of paper, a pen and began to draw some trees.</p>
<p>I began by make by removing most of the verbosity. I first renamed <code>&lt;item name="Menu"&gt;</code> by simpler name <code>M</code> for example. I obtained something like:</p>
<p><img src="code/The_source_tree.png" alt="The source tree" /></p>
<p>and</p>
<p><img src="code/The_destination_tree.png" alt="The destination tree" /></p>
<p>Then I made myself the following reflexion:</p>
<p>Considering Tree Edit Distance, each unitary transformation of tree correspond to a simple search and replace on my <sc>xml</sc> source<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a>. We consider three atomic transformations on trees:</p>
<ul>
<li><em>substitution</em>: renaming a node</li>
<li><em>insertion</em>: adding a node</li>
<li><em>deletion</em>: remove a node</li>
</ul>
<p>One of the particularity of atomic transformations on trees, is ; if you remove a node, all children of this node, became children of its father.</p>
<p>An example:</p>
<pre class="twilight">
r - x - a
  \   \
   \    b
    y - c   
</pre>
<p>If you delete the <code>x</code> node, you obtain</p>
<pre class="twilight">
    a
  /
r - b
  \
    y - c   
</pre>
<p>And look at what it implies when you write it in <sc>xml</sc>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">&lt;r&gt;</span></a>
<a class="sourceLine" id="cb3-2" title="2">  <span class="kw">&lt;x&gt;</span></a>
<a class="sourceLine" id="cb3-3" title="3">    <span class="kw">&lt;a&gt;</span>value for a<span class="kw">&lt;/a&gt;</span></a>
<a class="sourceLine" id="cb3-4" title="4">    <span class="kw">&lt;b&gt;</span>value for b<span class="kw">&lt;/b&gt;</span></a>
<a class="sourceLine" id="cb3-5" title="5">  <span class="kw">&lt;/x&gt;</span></a>
<a class="sourceLine" id="cb3-6" title="6">  <span class="kw">&lt;y&gt;</span></a>
<a class="sourceLine" id="cb3-7" title="7">    <span class="kw">&lt;c&gt;</span>value for c<span class="kw">&lt;/c&gt;</span></a>
<a class="sourceLine" id="cb3-8" title="8">  <span class="kw">&lt;/y&gt;</span></a>
<a class="sourceLine" id="cb3-9" title="9"><span class="kw">&lt;/r&gt;</span></a></code></pre></div>
<p>Then deleting all <code>x</code> nodes is equivalent to pass the <sc>xml</sc> via the following search and replace script:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">s/</span><span class="ot">&lt;\/</span><span class="ch">?</span><span class="ot">x&gt;</span><span class="kw">//g</span></a></code></pre></div>
<p>Therefore, if there exists a one state deterministic transducer which transform my trees ; I can transform the <sc>xml</sc> from one format to another with just a simple list of search and replace directives.</p>
<h1 id="solution">Solution</h1>
<p>Transform this tree:</p>
<pre class="twilight">
R - C - tag1
  \   \
   \    tag2
    E -- R - C - tag1
      \   \    \
       \   \     tag2
        \    E ...
         R - C - tag1 
           \    \
            \     tag2
             E ...
</pre>
<p>to this tree:</p>
<pre class="twilight">
                tag1
              /
M - V - M - V - tag2      tag1
              \         / 
                M --- V - tag2
                  \     \ 
                   \      M
                    \     tag1
                     \  / 
                      V - tag2
                        \ 
                          M
</pre>
<p>can be done using the following one state deterministic tree transducer:</p>
<blockquote>
<p>C -&gt; ε<br />
E -&gt; M<br />
R -&gt; V</p>
</blockquote>
<p>Wich can be traduced by the following simple search and replace directives:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">s/</span><span class="ot">C</span><span class="kw">//g</span></a>
<a class="sourceLine" id="cb5-2" title="2"><span class="kw">s/</span><span class="ot">E</span><span class="kw">/</span><span class="st">M</span><span class="kw">/g</span></a>
<a class="sourceLine" id="cb5-3" title="3"><span class="kw">s/</span><span class="ot">R</span><span class="kw">/</span><span class="st">V</span><span class="kw">/g</span></a></code></pre></div>
<p>Once adapted to <sc>xml</sc> it becomes:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb6-1" title="1"><span class="kw">s%</span><span class="ot">&lt;/</span><span class="ch">?</span><span class="ot">contenu&gt;</span><span class="kw">%%g</span></a>
<a class="sourceLine" id="cb6-2" title="2"><span class="kw">s%</span><span class="ot">&lt;enfant&gt;</span><span class="kw">%</span><span class="st">&lt;item name=&quot;menu&quot;&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-3" title="3"><span class="kw">s%</span><span class="ot">&lt;/enfant&gt;</span><span class="kw">%</span><span class="st">&lt;/item&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-4" title="4"><span class="kw">s%</span><span class="ot">&lt;rubrique&gt;</span><span class="kw">%</span><span class="st">&lt;value&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-5" title="5"><span class="kw">s%</span><span class="ot">&lt;/rubrique&gt;</span><span class="kw">%</span><span class="st">&lt;/value&gt;</span><span class="kw">%g</span></a></code></pre></div>
<p>That is all.</p>
<h1 id="conclusion">Conclusion</h1>
<p>It should seems a bit paradoxal, but sometimes the most efficient approach to a pragmatic problem is to use the theoretical methodology.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Hopefully I am in the 10% who had given a bug free implementation.<a href="#fnref1" class="footnote-back">↩</a></p></li>
<li id="fn2"><p>I did a program which generate automatically the weight in a matrix of each edit distance from data.<a href="#fnref2" class="footnote-back">↩</a></p></li>
</ol>
</section>
          </div>
          <div id="afterarticle">
              <div id="social">
                  <a href="/rss.xml" target="_blank" rel="noopener noreferrer nofollow" class="social">RSS</a>
                  ·
                  <a href="https://twitter.com/home?status=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/%20via%20@yogsototh" target="_blank" rel="noopener noreferrer nofollow" class="social">Tweet</a>
                  ·
                  <a href="http://www.facebook.com/sharer/sharer.php?u=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/" target="_blank" rel="noopener noreferrer nofollow" class="social">FB</a>
                  <br />
                  <a class="message" href="../../../../Scratch/en/blog/Social-link-the-right-way/">These social sharing links preserve your privacy</a>
              </div>
              <div id="navigation">
                  <a href="../../../../">Home</a>
                  <span class="sep">¦</span>
                  <a href="../../../../Scratch/en/blog">Blog</a>
                  <span class="sep">¦</span>
                  <a href="../../../../Scratch/en/softwares">Softwares</a>
                  <span class="sep">¦</span>
                  <a href="../../../../Scratch/en/about">About</a>
              </div>
              <div id="totop"><a href="#header">↑ Top ↑</a></div>
              <div id="bottom">
                  <div>
                      Published on 2010-05-24
                  </div>
                  <div>
                      <a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
                  </div>
                  <div>
                      <a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Yann Esposito©</a>
                  </div>

                  <div>
                      Done with
                      <a href="http://www.vim.org" target="_blank" rel="noopener noreferrer nofollow"><strike>Vim</strike></a>
                      <a href="http://spacemacs.org" target="_blank" rel="noopener noreferrer nofollow">spacemacs</a>
                      <span class="pala">&amp;</span>
                      <a href="http://nanoc.ws" target="_blank" rel="noopener noreferrer nofollow"><strike>nanoc</strike></a>
                      <a href="http://jaspervdj.be/hakyll" target="_blank" rel="noopener noreferrer nofollow">Hakyll</a>
                  </div>
                  <hr />
                  <div style="max-width: 100%">
                      <a href="https://cardanohub.org">
                          <img src="../../../../Scratch/img/ada-logo.png" class="simple" style="height: 16px;
                                    border-radius: 50%;
                                    vertical-align:middle;
                                    display:inline-block;" />
                          ADA:
                      </a>
                          <code style="display:inline-block;
                                       word-wrap:break-word;
                                       text-align: left;
                                       vertical-align: top;
                                       max-width: 85%;">
                              DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp
                          </code>
                  </div>
              </div>
          </div>
      </div>

        </div>
    </body>
</html>