You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

136 lines
8.1 KiB

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>YBlog - Pragmatic Regular Expression Exclude</title>
<meta name="keywords" content="regex, regexp, regular expression, negate" />
<link rel="shortcut icon" type="image/x-icon" href="../../../../Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/css/y.css" />
<link rel="stylesheet" type="text/css" href="/css/legacy.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="apple-touch-icon" href="../../../../Scratch/img/about/FlatAvatar@2x.png" />
<!--[if lt IE 9]>
<script src=""></script>
<!-- IndieAuth -->
<link href="" rel="me">
<link href="" rel="me">
<link href="" rel="me">
<link rel="pgpkey" href="../../../../pubkey.txt">
<body lang="en" class="article">
<div id="content">
<div id="header">
<div id="choix">
<span id="choixlang">
<a href="../../../../Scratch/fr/blog/2010-02-15-All-but-something-regexp/">French</a>
<span class="tomenu"><a href="#navigation">↓ Menu ↓</a></span>
<span class="flush"></span>
<div id="titre">
<h1>Pragmatic Regular Expression Exclude</h1>
<div class="flush"></div>
<div id="afterheader" class="article">
<div class="corps">
<p>Sometimes you cannot simply write:</p>
<code class="ruby"> if str.match(regexp) and not str.match(other_regexp) do_something </code>
<p>and you have to make this behaviour with only one regular expression. But, there exists a major problem: the complementary of a regular language might not be regular. Then, for some expression it is absolutely impossible to negate a regular expression.</p>
<p>But sometimes with some simple regular expression it should be possible<sup><a href="#note1"></a></sup>. Say you want to match everything containing the some word say <code>bull</code> but don’t want to match <code>bullshit</code>. Here is a nice way to do that:</p>
<p><code class="ruby"> # match all string containing ‘bull’ (bullshit comprised) /bull/</p>
<h1 id="match-all-string-containing-bull-except-bullshit">match all string containing ‘bull’ except ‘bullshit’</h1>
<p>/bull([^s]|<span class="math inline">)|<em>b</em><em>u</em><em>l</em><em>l</em><em>s</em>([<sup><em>h</em></sup>]|</span>)| bullsh([^i]|<span class="math inline">)|<em>b</em><em>u</em><em>l</em><em>l</em><em>s</em><em>h</em><em>i</em>([<sup><em>t</em></sup>]|</span>)/</p>
<h1 id="another-way-to-write-it-would-be">another way to write it would be</h1>
/bull([^s]|<span class="math inline">|<em>s</em>([<sup><em>h</em></sup>]|</span>)|sh([^i]|<span class="math inline">)|<em>s</em><em>h</em><em>i</em>([<sup><em>t</em></sup>]|</span>))/ </code>
<p>Let look closer. In the first line the expression is: <code>bull([^s]|$)</code>, why does the <code>$</code> is needed? Because, without it the word <code>bull</code> would be no more matched. This expression means:</p>
<p>The string finish by <code>bull</code><br />
or,<br />
contains <code>bull</code> followed by a letter different from <code>s</code>.</p>
<p>And this is it. I hope it could help you.</p>
Notice this method is not always the best. For example try to write a regular expression equivalent to the following conditional expression:
<code class="ruby"> # Begin with ‘a’: ^a # End with ‘a’: c$ # Contain ‘b’: .<em>b.</em> # But isn’t ‘axbxc’ if str.match(/^a.<em>b.</em>c<span class="math inline">/)<em>a</em><em>n</em><em>d</em><em>n</em><em>o</em><em>t</em><em>s</em><em>t</em><em>r</em>.<em>m</em><em>a</em><em>t</em><em>c</em><em>h</em>(/<sup><em>a</em></sup><em>x</em><em>b</em><em>x</em><em>c</em></span>/) do_something end </code>
<p>A nice solution is:</p>
<code class="ruby"> /abc| # length 3 a.bc| # length 4 ab.c| a[^x]b[^x]c| # length 5 a…<em>b.</em>c| # length &gt;5 a.<em>b…</em>c/ </code>
<p>This solution uses the maximal length of the string not to be matched. There certainly exists many other methods. But the important lesson is it is not straightforward to exclude something of a regular expression.</p>
<hr />
<p><small><a name="note1"></a> It can be proved that any regular set minus a finite set is also regular. </small></p>
<div id="afterarticle">
<div id="social">
<a href="/rss.xml" target="_blank" rel="noopener noreferrer nofollow" class="social">RSS</a>
<a href="" target="_blank" rel="noopener noreferrer nofollow" class="social">Tweet</a>
<a href="" target="_blank" rel="noopener noreferrer nofollow" class="social">FB</a>
<br />
<a class="message" href="../../../../Scratch/en/blog/Social-link-the-right-way/">These social sharing links preserve your privacy</a>
<div id="navigation">
<a href="../../../../">Home</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/blog">Blog</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/softwares">Softwares</a>
<span class="sep">¦</span>
<a href="../../../../Scratch/en/about">About</a>
<div id="totop"><a href="#header">↑ Top ↑</a></div>
<div id="bottom">
Published on 2010-02-15
<a href="">Follow @yogsototh</a>
<a rel="license" href="">Yann Esposito©</a>
Done with
<a href="" target="_blank" rel="noopener noreferrer nofollow"><strike>Vim</strike></a>
<a href="" target="_blank" rel="noopener noreferrer nofollow">spacemacs</a>
<span class="pala">&amp;</span>
<a href="" target="_blank" rel="noopener noreferrer nofollow"><strike>nanoc</strike></a>
<a href="" target="_blank" rel="noopener noreferrer nofollow">Hakyll</a>
<hr />
<div style="max-width: 100%">
<a href="">
<img src="../../../../Scratch/img/ada-logo.png" class="simple" style="height: 16px;
border-radius: 50%;
display:inline-block;" />
<code style="display:inline-block;
text-align: left;
vertical-align: top;
max-width: 85%;">