<title>YBlog - When regexp is not the best solution</title>
<h1>When regexp is not the best solution</h1>
<p>Regular expression are really useful. Unfortunately, they are not always the best way of doing things. Particularly when transformations you want to make are easy.</p>
<p>I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:</p>
<p><code class="ruby"> # regexp str.match(/[^.]*<span class="math inline">/); <em>e</em><em>x</em><em>t</em>=</span>&amp;</p>
<h1 id="split">split</h1>
<h1 id="file-module">File module</h1>
ext=File.extname(str) </code>
<p>At first sight I believed that the regexp should be faster than the split because it could be many <code>.</code> in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the <code>File</code> module.</p>
<p>Here is the Benchmark ruby code:</p>
<p><code class="ruby" file="regex_benchmark_ext.rb"> #!/usr/bin/env ruby require ‘benchmark’ n=80000 tab=[ ‘/accounts/user.json’, ‘/accounts/user.xml’, ‘/user/titi/blog/toto.json’, ‘/user/titi/blog/toto.xml’ ]</p>
puts “Get extname” do |x|“regexp:”) { n.times do str=tab[rand(4)]; str.match(/[^.]*<span class="math inline">/); <em>e</em><em>x</em><em>t</em>=</span>&amp;; end }" split:“) { n.times do str=tab[rand(4)]; ext=str.split(‘.’)[-1] ; end }” File:") { n.times do str=tab[rand(4)]; ext=File.extname(str); end } end </code>
<p>And here is the result</p>
<pre class="twilight">
Get extname
user system total real
regexp: 2.550000 0.020000 2.570000 ( 2.693407)
split: 1.080000 0.050000 1.130000 ( 1.190408)
File: 0.640000 0.030000 0.670000 ( 0.717748)
<p>Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time).</p>
<h2 id="file-path-without-the-extension.">file path without the extension.</h2>
<p><code class="ruby" file="regex_benchmark_strip.rb"> #!/usr/bin/env ruby require ‘benchmark’ n=80000 tab=[ ‘/accounts/user.json’, ‘/accounts/user.xml’, ‘/user/titi/blog/toto.json’, ‘/user/titi/blog/toto.xml’ ]</p>
puts “remove extension” do |x|" File:“) { n.times do str=tab[rand(4)]; path=File.expand_path(str,File.basename(str,File.extname(str))); end }”chomp:") { n.times do str=tab[rand(4)]; ext=File.extname(str); path=str.chomp(ext); end } end </code>
<p>and here is the result:</p>
<pre class="twilight">
remove extension
user system total real
File: 0.970000 0.060000 1.030000 ( 1.081398)
chomp: 0.820000 0.040000 0.860000 ( 0.947432)
<p>Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.</p>
Published on 2010-02-23
