In Memoriam Linuxlots (http://www.linuxlots.com/~dunne), 1999 -- 2011, R.I.P.

You are here: http://dunne.freeshell.org//review.mastering_regular_expressions.html

Go to: index | blog | contact


Review of Mastering Regular Expressions

My first thought on reading the title of this book was, "why didn't I think of that?" In retrospect, such a work has been needed for years; yet this is the first of its type, so far as I am aware. A regular expression — or "regexp" — is a powerful, concise means of describing patterns in text. A dry definition, this, which captures little of their real power. Examples are the only way to see just how useful they can be. Here, for instance, is a regular expression which captures the header of a typical e-mail message on my system:

                          /^From: .*/,/^$/

This describes all the lines in the message from a line beginning with 'From: ' to the first blank line. The regular expression syntax used here is that of the Unix stream editor, sed, which brings up an important point. One of the main motifs in Friedl's book is, that while the theory of regular expression engines divides them neatly into two classes, "Non-Deterministic Finite Automata" or NFAs and "Deterministic Finite Automata" or DFAs, in practice every implementation goes its own way, and syntax often varies, more or less. In our example, the /.../,/.../ is peculiar to sed and friends, and would not work with, for instance, the regexp engine in awk.

As I have said, there are two types of regexp engine, NFA and DFA; and umpteen different implementations. In fact, just to make matters more "interesting", the NFA type is further divided into traditional and POSIX-compliant flavours. Friedl puts order into this confusion, explaining in great detail how the two types differ, giving the advantages and disadvantages of each. The book gives lots of practical information on the subtle differences between regexp engines of either type. Surely we must all have spent five minutes, or ten minutes, or a quarter of an hour, struggling in vain to use a regexp character in sed that was only valid in egrep or awk? I know I have! Friedl's book certainly throws a lot of light on these niggling little differences. It would be nice if the current mix of tools could converge to an agreed standard on regexp syntax; but then, the differences in part at least reflect real differences between the power and scope of the two types of regexp engine.

The book would be worth buying for its extended treatment of perl regexps alone. Over one hundred pages are devoted to the perl-specific chapter, and there is no padding there. Reading "Mastering Regular Expressions" has made me much more aware that one of the strengths of perl is its comprehensive support for regexps, and added to my frustration with the "all slightly different" regexp flavours in awk, sed, grep, egrep etc.

As a finale, the final version of the solution to a problem that has been tackled with increasing sophistication in several parts of the book, that of expressing in full generality the RFC822 syntax for an e-mail message, is presented. I would like to give it here, but even the bare-bones version is 6,578 bytes long — rather more than this review!

So has this masterpiece any flaws at all? Well, the over-use of motor-car engine analogies quickly becomes irritating, and I found it an obstacle, not an aid, to following the argument. Surely I am not the only one to whom the innards of engines are even more obscure than the arcane details of regular expression parsing? Other than that, well, to paraphrase the little girl in the story, I might say that this book told me more about regular expressions than I wanted to know! It's a good line, but in truth barely a line of the book is redundant; it might indeed have told me more than I NEED to know, but there is no harm in that.

In conclusion: A very good, and much-needed, work. Regular expressions are a powerful tool; but writing them is something of an art; anyone who hopes to master the art should read this book. Recommended for almost everyone! ;

Title: Mastering Regular Expressions
Author: Jeffrey E. F. Friedl
Publisher: O'Reilly
ISBN: 1-56592-257-3
Price: $29.95
Pages: 342
Date: January 1997

Paul Dunne 1997


Viewable with any browser

This page was brought to you by ksh, vi, m4, sed & make, courtesy of openbsd.
Last changed: Sun Mar 3 11:54:02 CET 2019