her.esy.fun/src/Scratch/en/blog/2010-05-24-Trees--Pragmatis.../index.html

296 lines
20 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>YBlog - Trees; Pragmatism and Formalism</title>
<meta name="keywords" content="XML, Perl, programming, tree, theory, mathematics, regexp, script" />
<link rel="shortcut icon" type="image/x-icon" href="../../../../Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/css/y.css" />
<link rel="stylesheet" type="text/css" href="/css/legacy.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="apple-touch-icon" href="../../../../Scratch/img/about/FlatAvatar@2x.png" />
<!--[if lt IE 9]>
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
<![endif]-->
<!-- IndieAuth -->
<link href="https://twitter.com/yogsototh" rel="me">
<link href="https://github.com/yogsototh" rel="me">
<link href="mailto:yann.esposito@gmail.com" rel="me">
<link rel="pgpkey" href="../../../../pubkey.txt">
</head>
<body lang="en" class="article">
<div id="content">
<div id="header">
<div id="choix">
<span id="choixlang">
<a href="../../../../Scratch/fr/blog/2010-05-24-Trees--Pragmatism-and-Formalism/">French</a>
</span>
<span class="tomenu"><a href="#navigation">↓ Menu ↓</a></span>
<span class="flush"></span>
</div>
</div>
<div id="titre">
<h1>Trees; Pragmatism and Formalism</h1>
<h2>When theory is more efficient than practice</h2>
</div>
<div class="flush"></div>
<div id="afterheader" class="article">
<div class="corps">
<div class="intro">
<p><span class="sc"><abbr title="Too long; didn't read">tl;dr</abbr>: </span>:</p>
<ul>
<li>I tried to program a simple filter</li>
<li>Was blocked 2 days</li>
<li>Then stopped working like an engineer monkey</li>
<li>Used a pen and a sheet of paper</li>
<li>Made some math.</li>
<li>Crushed the problem in 10 minutes</li>
<li>Conclusion: The pragmatism shouldnt mean “never use theory”.</li>
</ul>
</div>
<h2 id="abstract-longer-than-tldr">Abstract (longer than <span class="sc"><abbr title="Too long; didn't read">tl;dr</abbr>: </span>)</h2>
<p>For my job, I needed to resolve a problem. It first seems not too hard. Then I started working directly on my program. I entered in the <em>infernal</em>: <em>try &amp; repair loop</em>. Each step was like:</p>
<blockquote>
<p> Just this thing to repair and that should be done.<br />
OK, now that should just work.<br />
Yeah!!!<br />
Oops! I forgotten that…<br />
<code>repeat until death</code></p>
</blockquote>
<p>After two days of this <a href="http://fr.wikipedia.org/wiki/Sisyphe">Sisyphus</a> work, I finally just stopped to rethink the problem. I took a pen, a sheet of paper. I simplified the problem, reminded what I learned during my Ph.D. about trees. Finally, the problem was crushed in less than 20 minutes.</p>
<p>I believe the important lesson is to remember that the most efficient methodology to resolve this <em>pragmatic</em> problem was the <em>theoretical</em> one. And therefore, argues opposing science, theory to pragmatism and efficiency are fallacies.</p>
<hr />
<h1 id="first-my-experience">First: my experience</h1>
<p>Apparently 90% of programmer are unable to program a binary search without bug. The algorithm is well known and easy to understand. However it is difficult to program it without any flaw. I participated to <a href="http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/">this contest</a>. And you can see the <a href="http://reprog.wordpress.com/2010/04/21/binary-search-redux-part-1/">results here</a><a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a>. I had to face a problem of the same kind at my job. The problem was simple to the start. Simply transform an <sc>xml</sc> from one format to another.</p>
<p>The source <sc>xml</sc> was in the following general format:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb1-1" title="1"><span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-2" title="2"> <span class="kw">&lt;contenu&gt;</span></a>
<a class="sourceLine" id="cb1-3" title="3"> <span class="kw">&lt;tag1&gt;</span>value1<span class="kw">&lt;/tag1&gt;</span></a>
<a class="sourceLine" id="cb1-4" title="4"> <span class="kw">&lt;tag2&gt;</span>value2<span class="kw">&lt;/tag2&gt;</span></a>
<a class="sourceLine" id="cb1-5" title="5"> ...</a>
<a class="sourceLine" id="cb1-6" title="6"> <span class="kw">&lt;/contenu&gt;</span></a>
<a class="sourceLine" id="cb1-7" title="7"> <span class="kw">&lt;enfant&gt;</span></a>
<a class="sourceLine" id="cb1-8" title="8"> <span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-9" title="9"> ...</a>
<a class="sourceLine" id="cb1-10" title="10"> <span class="kw">&lt;/rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-11" title="11"> ...</a>
<a class="sourceLine" id="cb1-12" title="12"> <span class="kw">&lt;rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-13" title="13"> ...</a>
<a class="sourceLine" id="cb1-14" title="14"> <span class="kw">&lt;/rubrique&gt;</span></a>
<a class="sourceLine" id="cb1-15" title="15"> <span class="kw">&lt;/enfant&gt;</span></a>
<a class="sourceLine" id="cb1-16" title="16"><span class="kw">&lt;/menu&gt;</span></a></code></pre></div>
<p>and the destination format was in the following general format:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb2-1" title="1"><span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;Menu0&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-2" title="2"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-3" title="3"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;menu&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-4" title="4"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-5" title="5"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;tag1&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-6" title="6"> <span class="kw">&lt;value&gt;</span>value1<span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-7" title="7"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-8" title="8"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;tag2&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-9" title="9"> <span class="kw">&lt;value&gt;</span>value2<span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-10" title="10"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-11" title="11"> ...</a>
<a class="sourceLine" id="cb2-12" title="12"> <span class="kw">&lt;item</span><span class="ot"> name=</span><span class="st">&quot;menu&quot;</span><span class="kw">&gt;</span></a>
<a class="sourceLine" id="cb2-13" title="13"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-14" title="14"> ...</a>
<a class="sourceLine" id="cb2-15" title="15"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-16" title="16"> <span class="kw">&lt;value&gt;</span></a>
<a class="sourceLine" id="cb2-17" title="17"> ...</a>
<a class="sourceLine" id="cb2-18" title="18"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-19" title="19"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-20" title="20"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-21" title="21"> <span class="kw">&lt;/item&gt;</span></a>
<a class="sourceLine" id="cb2-22" title="22"> <span class="kw">&lt;/value&gt;</span></a>
<a class="sourceLine" id="cb2-23" title="23"><span class="kw">&lt;/item&gt;</span></a></code></pre></div>
<p>At first sight I believed it will be easy. I was so certain it will be easy that I fixed to myself the following rules:</p>
<ol type="1">
<li>do not use <sc>xslt</sc></li>
<li>avoid the use of an <sc>xml</sc> parser</li>
<li>resolve the problem using a simple perl script[^2]</li>
</ol>
<p>You can try if you want. If you attack the problem directly opening an editor, I assure you, it will certainly be not so simple. I can tell that, because its what Ive done. And I must say I lost almost a complete day at work trying to resolve this. There was also, many small problems around that make me lose more than two days for this problem.</p>
<p>Why after two days did I was unable to resolve this problem which seems so simple?</p>
<p>What was my behaviour (workflow)?</p>
<ol type="1">
<li>Think</li>
<li>Write the program</li>
<li>Try the program</li>
<li>Verify the result</li>
<li>Found a bug</li>
<li>Resolve the bug</li>
<li>Go to step 3.</li>
</ol>
<p>This was a <em>standard</em> workflow for computer engineer. The flaw came from the first step. I thought about how to resolve the problem but with the eyes of a <em>pragmatic engineer</em>. I was saying:</p>
<blockquote>
<p>That should be a simple perl search and replace program.<br />
Lets begin to write code</p>
</blockquote>
<p>This is the second sentence that was plainly wrong. I started in the wrong direction. And the workflow did not work from this entry point.</p>
<h2 id="think">Think</h2>
<p>After some times, I just stopped to work. Tell myself <em>“it is enough, now, I must finish it!”</em>. I took a sheet of paper, a pen and began to draw some trees.</p>
<p>I began by make by removing most of the verbosity. I first renamed <code>&lt;item name="Menu"&gt;</code> by simpler name <code>M</code> for example. I obtained something like:</p>
<p><img src="code/The_source_tree.png" alt="The source tree" /></p>
<p>and</p>
<p><img src="code/The_destination_tree.png" alt="The destination tree" /></p>
<p>Then I made myself the following reflexion:</p>
<p>Considering Tree Edit Distance, each unitary transformation of tree correspond to a simple search and replace on my <sc>xml</sc> source<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a>. We consider three atomic transformations on trees:</p>
<ul>
<li><em>substitution</em>: renaming a node</li>
<li><em>insertion</em>: adding a node</li>
<li><em>deletion</em>: remove a node</li>
</ul>
<p>One of the particularity of atomic transformations on trees, is ; if you remove a node, all children of this node, became children of its father.</p>
<p>An example:</p>
<pre class="twilight">
r - x - a
\ \
\ b
y - c
</pre>
<p>If you delete the <code>x</code> node, you obtain</p>
<pre class="twilight">
a
/
r - b
\
y - c
</pre>
<p>And look at what it implies when you write it in <sc>xml</sc>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode xml"><code class="sourceCode xml"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">&lt;r&gt;</span></a>
<a class="sourceLine" id="cb3-2" title="2"> <span class="kw">&lt;x&gt;</span></a>
<a class="sourceLine" id="cb3-3" title="3"> <span class="kw">&lt;a&gt;</span>value for a<span class="kw">&lt;/a&gt;</span></a>
<a class="sourceLine" id="cb3-4" title="4"> <span class="kw">&lt;b&gt;</span>value for b<span class="kw">&lt;/b&gt;</span></a>
<a class="sourceLine" id="cb3-5" title="5"> <span class="kw">&lt;/x&gt;</span></a>
<a class="sourceLine" id="cb3-6" title="6"> <span class="kw">&lt;y&gt;</span></a>
<a class="sourceLine" id="cb3-7" title="7"> <span class="kw">&lt;c&gt;</span>value for c<span class="kw">&lt;/c&gt;</span></a>
<a class="sourceLine" id="cb3-8" title="8"> <span class="kw">&lt;/y&gt;</span></a>
<a class="sourceLine" id="cb3-9" title="9"><span class="kw">&lt;/r&gt;</span></a></code></pre></div>
<p>Then deleting all <code>x</code> nodes is equivalent to pass the <sc>xml</sc> via the following search and replace script:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">s/</span><span class="ot">&lt;\/</span><span class="ch">?</span><span class="ot">x&gt;</span><span class="kw">//g</span></a></code></pre></div>
<p>Therefore, if there exists a one state deterministic transducer which transform my trees ; I can transform the <sc>xml</sc> from one format to another with just a simple list of search and replace directives.</p>
<h1 id="solution">Solution</h1>
<p>Transform this tree:</p>
<pre class="twilight">
R - C - tag1
\ \
\ tag2
E -- R - C - tag1
\ \ \
\ \ tag2
\ E ...
R - C - tag1
\ \
\ tag2
E ...
</pre>
<p>to this tree:</p>
<pre class="twilight">
tag1
/
M - V - M - V - tag2 tag1
\ /
M --- V - tag2
\ \
\ M
\ tag1
\ /
V - tag2
\
M
</pre>
<p>can be done using the following one state deterministic tree transducer:</p>
<blockquote>
<p>C -&gt; ε<br />
E -&gt; M<br />
R -&gt; V</p>
</blockquote>
<p>Wich can be traduced by the following simple search and replace directives:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">s/</span><span class="ot">C</span><span class="kw">//g</span></a>
<a class="sourceLine" id="cb5-2" title="2"><span class="kw">s/</span><span class="ot">E</span><span class="kw">/</span><span class="st">M</span><span class="kw">/g</span></a>
<a class="sourceLine" id="cb5-3" title="3"><span class="kw">s/</span><span class="ot">R</span><span class="kw">/</span><span class="st">V</span><span class="kw">/g</span></a></code></pre></div>
<p>Once adapted to <sc>xml</sc> it becomes:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><a class="sourceLine" id="cb6-1" title="1"><span class="kw">s%</span><span class="ot">&lt;/</span><span class="ch">?</span><span class="ot">contenu&gt;</span><span class="kw">%%g</span></a>
<a class="sourceLine" id="cb6-2" title="2"><span class="kw">s%</span><span class="ot">&lt;enfant&gt;</span><span class="kw">%</span><span class="st">&lt;item name=&quot;menu&quot;&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-3" title="3"><span class="kw">s%</span><span class="ot">&lt;/enfant&gt;</span><span class="kw">%</span><span class="st">&lt;/item&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-4" title="4"><span class="kw">s%</span><span class="ot">&lt;rubrique&gt;</span><span class="kw">%</span><span class="st">&lt;value&gt;</span><span class="kw">%g</span></a>
<a class="sourceLine" id="cb6-5" title="5"><span class="kw">s%</span><span class="ot">&lt;/rubrique&gt;</span><span class="kw">%</span><span class="st">&lt;/value&gt;</span><span class="kw">%g</span></a></code></pre></div>
<p>That is all.</p>
<h1 id="conclusion">Conclusion</h1>
<p>It should seems a bit paradoxal, but sometimes the most efficient approach to a pragmatic problem is to use the theoretical methodology.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Hopefully I am in the 10% who had given a bug free implementation.<a href="#fnref1" class="footnote-back"></a></p></li>
<li id="fn2"><p>I did a program which generate automatically the weight in a matrix of each edit distance from data.<a href="#fnref2" class="footnote-back"></a></p></li>
</ol>
</section>
</div>
<div id="afterarticle">
<div id="social">
<a href="/rss.xml" target="_blank" rel="noopener noreferrer nofollow" class="social">RSS</a>
·
<a href="https://twitter.com/home?status=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/%20via%20@yogsototh" target="_blank" rel="noopener noreferrer nofollow" class="social">Tweet</a>
·
<a href="http://www.facebook.com/sharer/sharer.php?u=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/" target="_blank" rel="noopener noreferrer nofollow" class="social">FB</a>
<br />
<a class="message" href="../../../../Scratch/en/blog/Social-link-the-right-way/">These social sharing links preserve your privacy</a>
</div>
<div id="navigation">
<a href="../../../../">Home</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/blog">Blog</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/softwares">Softwares</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/about">About</a>
</div>
<div id="totop"><a href="#header">↑ Top ↑</a></div>
<div id="bottom">
<div>
Published on 2010-05-24
</div>
<div>
<a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
</div>
<div>
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Yann Esposito©</a>
</div>
<div>
Done with
<a href="http://www.vim.org" target="_blank" rel="noopener noreferrer nofollow"><strike>Vim</strike></a>
<a href="http://spacemacs.org" target="_blank" rel="noopener noreferrer nofollow">spacemacs</a>
<span class="pala">&amp;</span>
<a href="http://nanoc.ws" target="_blank" rel="noopener noreferrer nofollow"><strike>nanoc</strike></a>
<a href="http://jaspervdj.be/hakyll" target="_blank" rel="noopener noreferrer nofollow">Hakyll</a>
</div>
<hr />
<div style="max-width: 100%">
<a href="https://cardanohub.org">
<img src="../../../../Scratch/img/ada-logo.png" class="simple" style="height: 16px;
border-radius: 50%;
vertical-align:middle;
display:inline-block;" />
ADA:
</a>
<code style="display:inline-block;
word-wrap:break-word;
text-align: left;
vertical-align: top;
max-width: 85%;">
DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp
</code>
</div>
</div>
</div>
</div>
</div>
</body>
</html>