143 lines
8.1 KiB
HTML
143 lines
8.1 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<title>YBlog - When regexp is not the best solution</title>
|
||
<meta name="keywords" content="programming, regexp, regular expression, extension, file" />
|
||
|
||
<link rel="shortcut icon" type="image/x-icon" href="../../../../Scratch/img/favicon.ico" />
|
||
<link rel="stylesheet" type="text/css" href="/css/y.css" />
|
||
<link rel="stylesheet" type="text/css" href="/css/legacy.css" />
|
||
<link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
<link rel="apple-touch-icon" href="../../../../Scratch/img/about/FlatAvatar@2x.png" />
|
||
<!--[if lt IE 9]>
|
||
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
|
||
<![endif]-->
|
||
<!-- IndieAuth -->
|
||
<link href="https://twitter.com/yogsototh" rel="me">
|
||
<link href="https://github.com/yogsototh" rel="me">
|
||
<link href="mailto:yann.esposito@gmail.com" rel="me">
|
||
<link rel="pgpkey" href="../../../../pubkey.txt">
|
||
</head>
|
||
<body lang="en" class="article">
|
||
<div id="content">
|
||
<div id="header">
|
||
<div id="choix">
|
||
<span id="choixlang">
|
||
<a href="../../../../Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/">French</a>
|
||
</span>
|
||
<span class="tomenu"><a href="#navigation">↓ Menu ↓</a></span>
|
||
<span class="flush"></span>
|
||
</div>
|
||
</div>
|
||
|
||
<div id="titre">
|
||
<h1>When regexp is not the best solution</h1>
|
||
|
||
</div>
|
||
<div class="flush"></div>
|
||
<div id="afterheader" class="article">
|
||
<div class="corps">
|
||
<p>Regular expression are really useful. Unfortunately, they are not always the best way of doing things. Particularly when transformations you want to make are easy.</p>
|
||
<p>I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:</p>
|
||
<div>
|
||
<p><code class="ruby"> # regexp str.match(/[^.]*<span class="math inline">/); <em>e</em><em>x</em><em>t</em>=</span>&</p>
|
||
<h1 id="split">split</h1>
|
||
<p>ext=str.split(‘.’)[-1]</p>
|
||
<h1 id="file-module">File module</h1>
|
||
ext=File.extname(str) </code>
|
||
</div>
|
||
<p>At first sight I believed that the regexp should be faster than the split because it could be many <code>.</code> in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the <code>File</code> module.</p>
|
||
<p>Here is the Benchmark ruby code:</p>
|
||
<div>
|
||
<p><code class="ruby" file="regex_benchmark_ext.rb"> #!/usr/bin/env ruby require ‘benchmark’ n=80000 tab=[ ‘/accounts/user.json’, ‘/accounts/user.xml’, ‘/user/titi/blog/toto.json’, ‘/user/titi/blog/toto.xml’ ]</p>
|
||
puts “Get extname” Benchmark.bm do |x| x.report(“regexp:”) { n.times do str=tab[rand(4)]; str.match(/[^.]*<span class="math inline">/); <em>e</em><em>x</em><em>t</em>=</span>&; end } x.report(" split:“) { n.times do str=tab[rand(4)]; ext=str.split(‘.’)[-1] ; end } x.report(” File:") { n.times do str=tab[rand(4)]; ext=File.extname(str); end } end </code>
|
||
</div>
|
||
<p>And here is the result</p>
|
||
<pre class="twilight">
|
||
Get extname
|
||
user system total real
|
||
regexp: 2.550000 0.020000 2.570000 ( 2.693407)
|
||
split: 1.080000 0.050000 1.130000 ( 1.190408)
|
||
File: 0.640000 0.030000 0.670000 ( 0.717748)
|
||
</pre>
|
||
<p>Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time).</p>
|
||
<h2 id="file-path-without-the-extension.">file path without the extension.</h2>
|
||
<div>
|
||
<p><code class="ruby" file="regex_benchmark_strip.rb"> #!/usr/bin/env ruby require ‘benchmark’ n=80000 tab=[ ‘/accounts/user.json’, ‘/accounts/user.xml’, ‘/user/titi/blog/toto.json’, ‘/user/titi/blog/toto.xml’ ]</p>
|
||
puts “remove extension” Benchmark.bm do |x| x.report(" File:“) { n.times do str=tab[rand(4)]; path=File.expand_path(str,File.basename(str,File.extname(str))); end } x.report(”chomp:") { n.times do str=tab[rand(4)]; ext=File.extname(str); path=str.chomp(ext); end } end </code>
|
||
</div>
|
||
<p>and here is the result:</p>
|
||
<pre class="twilight">
|
||
remove extension
|
||
user system total real
|
||
File: 0.970000 0.060000 1.030000 ( 1.081398)
|
||
chomp: 0.820000 0.040000 0.860000 ( 0.947432)
|
||
</pre>
|
||
<p>Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.</p>
|
||
</div>
|
||
<div id="afterarticle">
|
||
<div id="social">
|
||
<a href="/rss.xml" target="_blank" rel="noopener noreferrer nofollow" class="social">RSS</a>
|
||
·
|
||
<a href="https://twitter.com/home?status=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/%20via%20@yogsototh" target="_blank" rel="noopener noreferrer nofollow" class="social">Tweet</a>
|
||
·
|
||
<a href="http://www.facebook.com/sharer/sharer.php?u=http%3A%2F%2Fyannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/" target="_blank" rel="noopener noreferrer nofollow" class="social">FB</a>
|
||
<br />
|
||
<a class="message" href="../../../../Scratch/en/blog/Social-link-the-right-way/">These social sharing links preserve your privacy</a>
|
||
</div>
|
||
<div id="navigation">
|
||
<a href="../../../../">Home</a>
|
||
<span class="sep">¦</span>
|
||
<a href="../../../../Scratch/en/blog">Blog</a>
|
||
<span class="sep">¦</span>
|
||
<a href="../../../../Scratch/en/softwares">Softwares</a>
|
||
<span class="sep">¦</span>
|
||
<a href="../../../../Scratch/en/about">About</a>
|
||
</div>
|
||
<div id="totop"><a href="#header">↑ Top ↑</a></div>
|
||
<div id="bottom">
|
||
<div>
|
||
Published on 2010-02-23
|
||
</div>
|
||
<div>
|
||
<a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
|
||
</div>
|
||
<div>
|
||
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Yann Esposito©</a>
|
||
</div>
|
||
|
||
<div>
|
||
Done with
|
||
<a href="http://www.vim.org" target="_blank" rel="noopener noreferrer nofollow"><strike>Vim</strike></a>
|
||
<a href="http://spacemacs.org" target="_blank" rel="noopener noreferrer nofollow">spacemacs</a>
|
||
<span class="pala">&</span>
|
||
<a href="http://nanoc.ws" target="_blank" rel="noopener noreferrer nofollow"><strike>nanoc</strike></a>
|
||
<a href="http://jaspervdj.be/hakyll" target="_blank" rel="noopener noreferrer nofollow">Hakyll</a>
|
||
</div>
|
||
<hr />
|
||
<div style="max-width: 100%">
|
||
<a href="https://cardanohub.org">
|
||
<img src="../../../../Scratch/img/ada-logo.png" class="simple" style="height: 16px;
|
||
border-radius: 50%;
|
||
vertical-align:middle;
|
||
display:inline-block;" />
|
||
ADA:
|
||
</a>
|
||
<code style="display:inline-block;
|
||
word-wrap:break-word;
|
||
text-align: left;
|
||
vertical-align: top;
|
||
max-width: 85%;">
|
||
DdzFFzCqrhtAvdkmATx5Fm8NPJViDy85ZBw13p4XcNzVzvQg8e3vWLXq23JQWFxPEXK6Kvhaxxe7oJt4VMYHxpA2vtCFiP8fziohN6Yp
|
||
</code>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
</div>
|
||
</body>
|
||
</html>
|