You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

295 lines
20 KiB

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>YBlog - Trees; Pragmatism and Formalism</title>
<meta name="keywords" content="XML, Perl, programming, tree, theory, mathematics, regexp, script" />
<link rel="shortcut icon" type="image/x-icon" href="../../../../Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/css/y.css" />
<link rel="stylesheet" type="text/css" href="/css/legacy.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="apple-touch-icon" href="../../../../Scratch/img/about/FlatAvatar@2x.png" />
<!--[if lt IE 9]>
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
<![endif]-->
<!-- IndieAuth -->
<link href="https://twitter.com/yogsototh" rel="me">
<link href="https://github.com/yogsototh" rel="me">
<link href="mailto:yann.esposito@gmail.com" rel="me">
<link rel="pgpkey" href="../../../../pubkey.txt">
</head>
<body lang="en" class="article">
<div id="content">
<div id="header">
<div id="choix">
<span id="choixlang">
<a href="../../../../Scratch/fr/blog/2010-05-24-Trees--Pragmatism-and-Formalism/">French</a>
</span>
<span class="tomenu"><a href="#navigation">↓ Menu ↓</a></span>
<span class="flush"></span>
</div>
</div>
<div id="titre">
<h1>Trees; Pragmatism and Formalism</h1>
<h2>When theory is more efficient than practice</h2>
</div>
<div class="flush"></div>
<div id="afterheader" class="article">
<div class="corps">
<div class="intro">
<p><span class="sc"><abbr title="Too long; didn't read">tl;dr</abbr>: </span>:</p>
<ul>
<li>I tried to program a simple filter</li>
<li>Was blocked 2 days</li>
<li>Then stopped working like an engineer monkey</li>
<li>Used a pen and a sheet of paper</li>
<li>Made some math.</li>
<li>Crushed the problem in 10 minutes</li>
<li>Conclusion: The pragmatism shouldn’t mean “never use theory”.</li>
</ul>
</div>
<h2 id="abstract-longer-than-tldr">Abstract (longer than <span class="sc"><abbr title="Too long; didn't read">tl;dr</abbr>: </span>)</h2>
<p>For my job, I needed to resolve a problem. It first seems not too hard. Then I started working directly on my program. I entered in the <em>infernal</em>: <em>try &amp; repair loop</em>. Each step was like:</p>
<blockquote>
<p>– Just this thing to repair and that should be done.<br />
– OK, now that should just work.<br />
– Yeah!!!<br />
– Oops! I forgotten that…<br />
<code>repeat until death</code></p>
</blockquote>
<p>After two days of this <a href="http://fr.wikipedia.org/wiki/Sisyphe">Sisyphus</a> work, I finally just stopped to rethink the problem. I took a pen, a sheet of paper. I simplified the problem, reminded what I learned during my Ph.D. about trees. Finally, the problem was crushed in less than 20 minutes.</p>
<p>I believe the important lesson is to remember that the most efficient methodology to resolve this <em>pragmatic</em> problem was the <em>theoretical</em> one. And therefore, argues opposing science, theory to pragmatism and efficiency are fallacies.</p>
<hr />
<h1 id="first-my-experience">First: my experience</h1>
<p>Apparently 90% of programmer are unable to program a binary search without bug. The algorithm is well known and easy to understand. However it is difficult to program it without any flaw. I participated to <a href="http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/">this contest</a>. And you can see the <a href="http://reprog.wordpress.com/2010/04/21/binary-search-redux-part-1/">results here</a><a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a>. I had to face a problem of the same kind at my job. The problem was simple to the start. Simply transform an <sc>xml</sc> from one format to another.</p>
<p>The source <sc>xml</sc> was in the following general format:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb1-1" title="1"><span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-2" title="2"> <span class="kw">&lt;contenu&gt;</span></a>
<a class="sourceLine" id="cb1-3" title="3"> <span class="kw">&lt;tag1&gt;</span>value1<span class="kw">&lt;/tag1&gt;</span></a>
<a class="sourceLine" id="cb1-4" title="4"> <span class="kw">&lt;tag2&gt;</span>value2<span class="kw">&lt;/tag2&gt;</span></a>
<a class="sourceLine" id="cb1-5" title="5"> ...</a>
<a class="sourceLine" id="cb1-6" title="6"> <span class="kw">&lt;/contenu&gt;</span></a>
<a class="sourceLine" id="cb1-7" title="7"> <span class="kw">&lt;enfant&gt;</span></a>
<a class="sourceLine" id="cb1-8" title="8"> <span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-9" title="9"> ...</a>
<a class="sourceLine" id="cb1-10" title="10"> <span class="kw">&lt;/rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-11" title="11"> ...</a>
<a class="sourceLine" id="cb1-12" title="12"> <span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-13" title="13"> ...</a>
<a class="sourceLine" id="cb1-14" title="14"> <span class="kw">&lt;/rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-15" title="15"> <span class="kw">&lt;/enfant&gt;</span></a>
<a class="sourceLine" id="cb1-16" title="16"><span class="kw">&lt;/menu&gt;</span></a></code></pre></div>
<p>and the destination format was in the following general format:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb2-1" title="1"><span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;Menu0&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-2" title="2"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-3" title="3"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;menu&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-4" title="4"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-5" title="5"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;tag1&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-6" title="6"> <span class="kw">&lt;value&gt;</span>value1<span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-7" title="7"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-8" title="8"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;tag2&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-9" title="9"> <span class="kw">&lt;value&gt;</span>value2<span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-10" title="10"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-11" title="11"> ...</a>
<a class="sourceLine" id="cb2-12" title="12"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;menu&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-13" title="13"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-14" title="14"> ...</a>
<a class="sourceLine" id="cb2-15" title="15"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-16" title="16"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-17" title="17"> ...</a>
<a class="sourceLine" id="cb2-18" title="18"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-19" title="19"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-20" title="20"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-21" title="21"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-22" title="22"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-23" title="23"><span class="kw">&lt;/item&gt;</span></a></code></pre></div>
<p>At first sight I believed it will be easy. I was so certain it will be easy that I fixed to myself the following rules:</p>
<ol type="1">
<li>do not use <sc>xslt</sc></li>
<li>avoid the use of an <sc>xml</sc> parser</li>
<li>resolve the problem using a simple perl script[^2]</li>
</ol>
<p>You can try if you want. If you attack the problem directly opening an editor, I assure you, it will certainly be not so simple. I can tell that, because it’s what I’ve done. And I must say I lost almost a complete day at work trying to resolve this. There was also, many small problems around that make me lose more than two days for this problem.</p>
<p>Why after two days did I was unable to resolve this problem which seems so simple?</p>
<p>What was my behaviour (workflow)?</p>
<ol type="1">
<li>Think</li>
<li>Write the program</li>
<li>Try the program</li>
<li>Verify the result</li>
<li>Found a bug</li>
<li>Resolve the bug</li>
<li>Go to step 3.</li>
</ol>
<p>This was a <em>standard</em> workflow for computer engineer. The flaw came from the first step. I thought about how to resolve the problem but with the eyes of a <em>pragmatic engineer</em>. I was saying:</p>
<blockquote>
<p>That should be a simple perl search and replace program.<br />
Let’s begin to write code</p>
</blockquote>
<p>This is the second sentence that was plainly wrong. I started in the wrong direction. And the workflow did not work from this entry point.</p>
<h2 id="think">Think</h2>
<p>After some times, I just stopped to work. Tell myself <em>“it is enough, now, I must finish it!”</em>. I took a sheet of paper, a pen and began to draw some trees.</p>
<p>I began by make by removing most of the verbosity. I first renamed <code>&lt;item name="Menu"&gt;</code> by simpler name <code>M</code> for example. I obtained something like:</p>
<p><img src="code/The_source_tree.png" alt="The source tree" /></p>
<p>and</p>
<p><img src="code/The_destination_tree.png" alt="The destination tree" /></p>
<p>Then I made myself the following reflexion:</p>
<p>Considering Tree Edit Distance, each unitary transformation of tree correspond to a simple search and replace on my <sc>xml</sc> source<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a>. We consider three atomic transformations on trees:</p>
<ul>
<li><em>substitution</em>: renaming a node</li>
<li><em>insertion</em>: adding a node</li>
<li><em>deletion</em>: remove a node</li>
</ul>
<p>One of the particularity of atomic transformations on trees, is ; if you remove a node, all children of this node, became children of its father.</p>
<p>An example:</p>
<pre class="twilight">
r - x - a
\ \
\ b
y - c
</pre>
<p>If you delete the <code>x</code> node, you obtain</p>
<pre class="twilight">
a
/
r - b
\
y - c
</pre>
<p>And look at what it implies when you write it in <sc>xml</sc>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">&lt;r&gt;</span></a>
<a class="sourceLine" id="cb3-2" title="2"> <span class="kw">&lt;x&gt;</span></a>
<a class="sourceLine" id="cb3-3" title="3"> <span class="kw">&lt;a&gt;</span>value for a<span class="kw">&lt;/a&gt;</span></a>
<a class="sourceLine" id="cb3-4" title="4"> <span class="kw">&lt;b&gt;</span>value for b<span class="kw">&lt;/b&gt;</span></a>
<a class="sourceLine" id="cb3-5" title="5"> <span class="kw">&lt;/x&gt;</span></a>
<a class="sourceLine" id="cb3-6" title="6"> <span class="kw">&lt;y&gt;</span></a>
<a class="sourceLine" id="cb3-7" title="7"> <span class="kw">&lt;c&gt;</span>value for c<span class="kw">&lt;/c&gt;</span></a>
<a class="sourceLine" id="cb3-8" title="8"> <span class="kw">&lt;/y&gt;</span></a>
<a class="sourceLine" id="cb3-9" title="9"><span class="kw">&lt;/r&gt;</span></a></code></pre></div>
<p>Then deleting all <code>x</code> nodes is equivalent to pass the <sc>xml</sc> via the following search and replace script:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">s/</span><span class="ot">&lt;\/</span><span class="ch">?</span><span class="ot">x&gt;</span><span class="kw">//g</span></a></code></pre></div>
<p>Therefore, if there exists a one state deterministic transducer which transform my trees ; I can transform the <sc>xml</sc> from one format to another with just a simple list of search and replace directives.</p>
<h1 id="solution">Solution</h1>
<p>Transform this tree:</p>
<pre class="twilight">
R - C - tag1
\ \
\ tag2
E -- R - C - tag1
\ \ \
\ \ tag2
\ E ...
R - C - tag1
\ \
\ tag2
E ...
</pre>
<p>to this tree:</p>
<pre class="twilight">
tag1
/
M - V - M - V - tag2 tag1
\ /
M --- V - tag2
\ \
\ M
\ tag1
\ /
V - tag2
\
M
</pre>
<p>can be done using the following one state deterministic tree transducer:</p>
<blockquote>
<p>C -&gt; ε<br />
E -&gt; M<br />
R -&gt; V</p>
</blockquote>
<p>Wich can be traduced by the following simple search and replace directives:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">s/</span><span class="ot">C</span><span class="kw">//g</span></a>
<a class="sourceLine" id="cb5-2" title="2"><span class="kw">s/</span><span class="ot">E</span><span class="kw">/</span><span class="st">M</span><span class="kw">/g</span></a>
<a class="sourceLine" id="cb5-3" title="3"><span class="kw">s/</span><span class="ot">R</span><span class="kw">/</span><span class="st">V</span><span class="kw">/g</span></a></code></pre></div>
<p>Once adapted to <sc>xml</sc> it becomes:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb6-1" title="1"><span class="kw">s%</span><span class="ot">&lt;/</span><span class="ch">?</span><span class="ot">contenu&gt;</span><span class="kw">%%g</span></a>
<a class="sourceLine" id="cb6-2" title="2"><span class="kw">s%</span><span class="ot">&lt;enfant&gt;</span><span class="kw">%</span><span class="st">&lt;item name=&quot;menu&quot;&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-3" title="3"><span class="kw">s%</span><span class="ot">&lt;/enfant&gt;</span><span class="kw">%</span><span class="st">&lt;/item&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-4" title="4"><span class="kw">s%</span><span class="ot">&lt;rubrique&gt;</span><span class="kw">%</span><span class="st">&lt;value&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-5" title="5"><span class="kw">s%</span><span class="ot">&lt;/rubrique&gt;</span><span class="kw">%</span><span class="st">&lt;/value&gt;</span><span class="kw">%g</span></a></code></pre></div>
<p>That is all.</p>
<h1 id="conclusion">Conclusion</h1>
<p>It should seems a bit paradoxal, but sometimes the most efficient approach to a pragmatic problem is to use the theoretical methodology.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Hopefully I am in the 10% who had given a bug free implementation.<a href="#fnref1" class="footnote-back"></a></p></li>
<li id="fn2"><p>I did a program which generate automatically the weight in a matrix of each edit distance from data.<a href="#fnref2" class="footnote-back"></a></p></li>
</ol>
</section>
</div>
<div id="afterarticle">
<div id="social">
<a href="/rss.xml" target="_blank" rel="noopener noreferrer nofollow" class="social">RSS</a>
·
<a href="https://twitter.com/home?status=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/%20via%20@yogsototh" target="_blank" rel="noopener noreferrer nofollow" class="social">Tweet</a>
·
<a href="http://www.facebook.com/sharer/sharer.php?u=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/" target="_blank" rel="noopener noreferrer nofollow" class="social">FB</a>
<br />
<a class="message" href="../../../../Scratch/en/blog/Social-link-the-right-way/">These social sharing links preserve your privacy</a>
</div>
<div id="navigation">
<a href="../../../../">Home</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/blog">Blog</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/softwares">Softwares</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/about">About</a>
</div>
<div id="totop"><a href="#header">↑ Top ↑</a></div>
<div id="bottom">
<div>
Published on 2010-05-24
</div>
<div>
<a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
</div>
<div>
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Yann Esposito©</a>
</div>
<div>
Done with
<a href="http://www.vim.org" target="_blank" rel="noopener noreferrer nofollow"><strike>Vim</strike></a>
<a href="http://spacemacs.org" target="_blank" rel="noopener noreferrer nofollow">spacemacs</a>
<span class="pala">&amp;</span>
<a href="http://nanoc.ws" target="_blank" rel="noopener noreferrer nofollow"><strike>nanoc</strike></a>
<a href="http://jaspervdj.be/hakyll" target="_blank" rel="noopener noreferrer nofollow">Hakyll</a>
</div>
<hr />
<div style="max-width: 100%">
<a href="https://cardanohub.org">
<img src="../../../../Scratch/img/ada-logo.png" class="simple" style="height: 16px;
border-radius: 50%;
vertical-align:middle;
display:inline-block;" />
ADA:
</a>
<code style="display:inline-block;
word-wrap:break-word;
text-align: left;
vertical-align: top;
max-width: 85%;">
DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp
</code>
</div>
</div>
</div>
</div>
</div>
</body>
</html>