her.esy.fun/src/Scratch/en/blog/2010-02-23-When-regexp-is-n.../index.html

143 lines
8.1 KiB
HTML
Raw Normal View History

2021-04-18 10:23:24 +00:00
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>YBlog - When regexp is not the best solution</title>
<meta name="keywords" content="programming, regexp, regular expression, extension, file" />
<link rel="shortcut icon" type="image/x-icon" href="../../../../Scratch/img/favicon.ico" />
2021-05-25 20:25:47 +00:00
<link rel="stylesheet" type="text/css" href="/css/y.css" />
<link rel="stylesheet" type="text/css" href="/css/legacy.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />
2021-04-18 10:23:24 +00:00
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="apple-touch-icon" href="../../../../Scratch/img/about/FlatAvatar@2x.png" />
<!--[if lt IE 9]>
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
<![endif]-->
<!-- IndieAuth -->
<link href="https://twitter.com/yogsototh" rel="me">
<link href="https://github.com/yogsototh" rel="me">
<link href="mailto:yann.esposito@gmail.com" rel="me">
<link rel="pgpkey" href="../../../../pubkey.txt">
</head>
<body lang="en" class="article">
<div id="content">
<div id="header">
<div id="choix">
<span id="choixlang">
<a href="../../../../Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/">French</a>
</span>
<span class="tomenu"><a href="#navigation">↓ Menu ↓</a></span>
<span class="flush"></span>
</div>
</div>
<div id="titre">
<h1>When regexp is not the best solution</h1>
</div>
<div class="flush"></div>
<div id="afterheader" class="article">
<div class="corps">
<p>Regular expression are really useful. Unfortunately, they are not always the best way of doing things. Particularly when transformations you want to make are easy.</p>
<p>I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:</p>
<div>
<p><code class="ruby"> # regexp str.match(/[^.]*<span class="math inline">/);<em>e</em><em>x</em><em>t</em>=</span>&amp;</p>
<h1 id="split">split</h1>
<p>ext=str.split(.)[-1]</p>
<h1 id="file-module">File module</h1>
ext=File.extname(str) </code>
</div>
<p>At first sight I believed that the regexp should be faster than the split because it could be many <code>.</code> in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the <code>File</code> module.</p>
<p>Here is the Benchmark ruby code:</p>
<div>
<p><code class="ruby" file="regex_benchmark_ext.rb"> #!/usr/bin/env ruby require benchmark n=80000 tab=[ /accounts/user.json, /accounts/user.xml, /user/titi/blog/toto.json, /user/titi/blog/toto.xml ]</p>
puts “Get extname” Benchmark.bm do |x| x.report(“regexp:”) { n.times do str=tab[rand(4)]; str.match(/[^.]*<span class="math inline">/);<em>e</em><em>x</em><em>t</em>=</span>&amp;; end } x.report(" split:“) { n.times do str=tab[rand(4)]; ext=str.split(.)[-1] ; end } x.report(” File:") { n.times do str=tab[rand(4)]; ext=File.extname(str); end } end </code>
</div>
<p>And here is the result</p>
<pre class="twilight">
Get extname
user system total real
regexp: 2.550000 0.020000 2.570000 ( 2.693407)
split: 1.080000 0.050000 1.130000 ( 1.190408)
File: 0.640000 0.030000 0.670000 ( 0.717748)
</pre>
<p>Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time).</p>
<h2 id="file-path-without-the-extension.">file path without the extension.</h2>
<div>
<p><code class="ruby" file="regex_benchmark_strip.rb"> #!/usr/bin/env ruby require benchmark n=80000 tab=[ /accounts/user.json, /accounts/user.xml, /user/titi/blog/toto.json, /user/titi/blog/toto.xml ]</p>
puts “remove extension” Benchmark.bm do |x| x.report(" File:“) { n.times do str=tab[rand(4)]; path=File.expand_path(str,File.basename(str,File.extname(str))); end } x.report(”chomp:") { n.times do str=tab[rand(4)]; ext=File.extname(str); path=str.chomp(ext); end } end </code>
</div>
<p>and here is the result:</p>
<pre class="twilight">
remove extension
user system total real
File: 0.970000 0.060000 1.030000 ( 1.081398)
chomp: 0.820000 0.040000 0.860000 ( 0.947432)
</pre>
<p>Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.</p>
</div>
<div id="afterarticle">
<div id="social">
2021-05-25 20:25:47 +00:00
<a href="/rss.xml" target="_blank" rel="noopener noreferrer nofollow" class="social">RSS</a>
2021-04-18 10:23:24 +00:00
·
<a href="https://twitter.com/home?status=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/%20via%20@yogsototh" target="_blank" rel="noopener noreferrer nofollow" class="social">Tweet</a>
·
<a href="http://www.facebook.com/sharer/sharer.php?u=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/" target="_blank" rel="noopener noreferrer nofollow" class="social">FB</a>
<br />
<a class="message" href="../../../../Scratch/en/blog/Social-link-the-right-way/">These social sharing links preserve your privacy</a>
</div>
<div id="navigation">
<a href="../../../../">Home</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/blog">Blog</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/softwares">Softwares</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/about">About</a>
</div>
<div id="totop"><a href="#header">↑ Top ↑</a></div>
<div id="bottom">
<div>
Published on 2010-02-23
</div>
<div>
<a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
</div>
<div>
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Yann Esposito©</a>
</div>
<div>
Done with
<a href="http://www.vim.org" target="_blank" rel="noopener noreferrer nofollow"><strike>Vim</strike></a>
<a href="http://spacemacs.org" target="_blank" rel="noopener noreferrer nofollow">spacemacs</a>
<span class="pala">&amp;</span>
<a href="http://nanoc.ws" target="_blank" rel="noopener noreferrer nofollow"><strike>nanoc</strike></a>
<a href="http://jaspervdj.be/hakyll" target="_blank" rel="noopener noreferrer nofollow">Hakyll</a>
</div>
<hr />
<div style="max-width: 100%">
<a href="https://cardanohub.org">
<img src="../../../../Scratch/img/ada-logo.png" class="simple" style="height: 16px;
border-radius: 50%;
vertical-align:middle;
display:inline-block;" />
ADA:
</a>
<code style="display:inline-block;
word-wrap:break-word;
text-align: left;
vertical-align: top;
max-width: 85%;">
DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp
</code>
</div>
</div>
</div>
</div>
</div>
</body>
</html>