In Memoriam Linuxlots (http://www.linuxlots.com/~dunne), 1999 -- 2011, R.I.P.

You are here: http://dunne.freeshell.org//using_m4.html

Using the m4 Macro Processor

Introduction

m4 is one of the unsung heroes of Linux and Unix. Unsung? Well, for instance, in that great book Unix Power Tools, not a single mention is made of it, though m4 has been a standard part of Unix since V7. So, what is it about m4 that makes it so useful, and yet so over-looked? m4 is a "macro processor": a dry name that disguises a great facility. A macro-processor is basically a program that scans text looking for defined symbols, which it replaces by other text — or other symbols. Thus, it is a powerful general-purpose utility that can be used to automate many tasks people often end up doing in sed, awk, perl, or even their favourite text editor. Now, although this is so, it is not so obvious that a "macro processor" is that big a deal. then, Unix developers already have a built-in macro processor, in the form of the C pre-processor, built into their compiler. Perhaps it is this that accounts for m4's relative neglect. Whatever, this article hopes to show Linux users the power and usefulness of this software tool.

What Is m4?

So, what is macro processing, and what it is good for? Kernighan and Plauger, in their seminal work "Software Tools" have a succinct definition:

"Macros are used to extend some underlying language — to perform a translation from one language to another."

Thus, symbolic constants may be defined so that subsequent occurrences of the name are replaced by the defining string of characters, regardless of the contents of the definition or its context. such a definition is called a macro, the replacement process is called macro expansion, and the the program for doing it is called a macro processor. So, the basic facility provided by any macro processor is the replacement of text by other text. A macro is either defined by the m4 program (a "built-in") or by the user. As well as doing macro expansion, m4 has functions that include other files, do integer arithmetic, manipulate text, and so forth. It is, that is to say, a perfect example of the power of the Unix filter concept.

The contemporary implementation of m4 on a Linux system is GNU m4, which follows System V Release 3 m4, with extensions. I am aware of no other version of m4 that has been ported to Linux. m4 implementations on BSD may differ slightly; but by and large, m4 is m4, and this article should be useful for other Unix users too. The latest version is 1.4, which was released in October 1994.

Overview

The Scanning Process

As m4 reads its input, it separates it into "tokens". A token is either a previously-defined name, a string, or any single character that is not a part of either a name or a string. The input is then scanned for recognised macros. This scanning process is recursive: scanning continues until no more macros are recognised. The input thus transformed is written to the output. Macros can be built-in, or user-defined. A list of built-in macro follows later.

Defining Macros

The most important of the built-in macros is , which allows the user to define his own macros. For example, defines a macro "Paul Dunne", any occurrence of which will be expanded to the string "Paul Dunne". m4 expands macro names into their defining text as soon as it possibly can.

Quoting

The m4 quote characters are ‘ and ’. For example, ‘this is quoted’. It is often best to quote both macro name and substitution text in a definition. This avoids any unwanted side-effects, such as too early expansion of another macro name. Also, since m4 uses commas as argument separators, any definition with commas in must be quoted.

Arguments

As we have said, arguments to macros are delimited by commas. They are also, as we've seen, enclosed in parentheses. A macro may also be called with no arguments. This is common where we simply wish to replace one string with another, as in the "Paul Dunne" example above.

Built-in Functions

m4 provides a small set of useful built-in functions. We may group them under the following headings:

Flow Control Functions

m4 provides the classic "if-then" programming construct, in two related forms.

ifdef(a,b)

defines b if a is defined, and

ifelse(a,b,c,d)

compares the strings a and b. If they match, string c is returned as the function value; if not, string d. Actually, ifelse is not limited to four arguments; it can take any greater number, and thus provides a limited multi-way decision capability. For example,

ifelse(a,b,c,d,e,f,g)

means that if a matches b, then c; else if d matches e, then f; else g.

Arithmetic Functions

There are three arithmetic built-ins.

incr

which increments its numeric argument by one.

dec

which decrements its numeric argument by one.

eval

which performs arbitrary integer arithmetic. Its operators are:

unary + and -
** or ^ exponentiation
+ -
== != < <= > >= equal, not equal, less than, less than or equal to, greater than, greater than or equal to
! not
& or && logical and
| or || logical or

String Functions

len(a)

Returns the length of the string "a".

substr(s, m, n)

Returns a substring from the string "s", starting at position m, and continuing for n characters.

As a more complicated example than those we've had so far, consider this combination of ifelse, expr and substr.

define(len,`ifelse($1,,0,`eval(1+len(substr($1,2)))')')

Well now, what does that do? It is an implementation of the m4 built-in len in terms of other m4 built-ins! Note the two layers of quotes. The outer layer prevents all initial evaluation. We want "len" defined as exactly what is in the second argument. The inner layer protects the eval builtin from being evaluated while the arguments for the ifelse are collected.

translit(s, f, t)

Returns the string "s" with all occurrences of the character(s) listed in "f" replaced by those listed in "t". It functions as a simpler version of the Linux command tr. For example, translit(s,abcdefghijklmnopqrstuvwxyz, nopqrstuvwxyzabcdefghijklm) is the well-known rot13 or Caeser cipher.

File Functions

include(filename)

includes the contents of "filename" at the point in the input stream at which it occurs. Useful if we have a central collection of standard m4 macros, which we can then use in another file simply with an appropriate include macro.

divert(n)

This is used to divert text from the input stream to an internal file number. File number -1 is equivalent to discarding the text, file number 0 is the normal output stream, and fields number 2 to 9 are usually used for temporary storage. For example,

divert(-1) is most commonly used to get rid of the extraneous white space that is often generated by m4. For example,

divert(-1)
...
definitions
....
divert

ensures that no output is performed while the various definitions between the ellipses are performed (the ellipses are not part of m4 syntax!). Otherwise, we would end up with a pack of newlines in our output.

dnl

Hard to categorise this one, so I've put it here. dnl is "delete to newline". Used as a comment character in the original m4: as the name suggests, all characters up to the next newline are deleted from the output stream. GNU m4 also allows use of # as a comment character, with the difference that such comments *are* passed to the output stream. Any macro calls or definitions after the # are however ignored — the input is passed to the output exactly as is.

System Functions

esyscmd

Passes a command to the system interpreter (usually the unix shell) for execution. For example, esyscmd(date) returns today's date.

There are also some miscellaneous functions that have been added to the original m4 function set.

changecom

Used to change the m4 comment character (normally #).

traceon/off

Turn tracing on and off. This is useful for debugging.

Usage

A full summary of m4 usage is available through typing m4 —help. This gives:

Usage: m4 [OPTION]... [FILE]...
Mandatory or optional arguments to long options are mandatory or optional
for short options too.

Operation modes:
      —help                   display this help and exit
      —version                output version information and exit
  -e, —interactive            unbuffer output, ignore interrupts
  -E, —fatal-warnings         stop execution after first warning
  -Q, —quiet, —silent        suppress some warnings for builtins
  -P, —prefix-builtins        force a m4_ prefix to all builtins

Preprocessor features:
  -I, —include=DIRECTORY      search this directory second for includes
  -D, —define=NAME[=VALUE]    enter NAME has having VALUE, or empty
  -U, —undefine=NAME          delete builtin NAME
  -s, —synclines              generate #line NO "FILE" lines

Limits control:
  -G, —traditional            suppress all GNU extensions
  -H, —hashsize=PRIME         set symbol lookup hash table size
  -L, —nesting-limit=NUMBER   change artificial nesting limit

Frozen state files:
  -F, —freeze-state=FILE      produce a frozen state on FILE at end
  -R, —reload-state=FILE      reload a frozen state from FILE at start

Debugging:
  -d, —debug=[FLAGS]          set debug level (no FLAGS implies aeq)
  -t, —trace=NAME             trace NAME when it will be defined
  -l, —arglength=NUM          restrict macro tracing size
  -o, —error-output=FILE      redirect debug and trace output

FLAGS is any of:
  t   trace for all macro calls, not only traceon'ed
  a   show actual arguments
  e   show expansion
  q   quote values as necessary, with a or e flag
  c   show before collect, after collect and after call
  x   add a unique macro call id, useful with c flag
  f   say current input file name
  l   say current input line number
  p   show results of path searches
  i   show changes in input files
  V   shorthand for all of the above flags

If no FILE or if FILE is -, standard input is read.

Well, that's a formidable list of options. But we need only use a few. In fact, most often m4 is run as just 'm4', with perhaps the -P flag to specify that built-ins are preceded by 'm4_', e.g. m4_include rather than include. For example, here's a line I use in a makefile to generate my html pages:

cat $*.m4 | htmlize | m4 -P > $*.html

In Use

Example: generating HTML

I use m4, among other Linux software tools, to maintain my web pages. Rather than mark each page up in HTML, a tiresome chore, I have written a set of definitions that translates m4 macros into HTML. As well as being easier on the eye and easier to write than HTML, this has other advantages. For example, an often-seen feature on web sites is the navigational "button bar", which has links to the main parts of a site. Obviously, it is nicer not to have a link from the button bar to our Linux page if that's where we already are, for example. This can be automated using m4, so that the right HTML code is generated. The definition I use is this:



define(
 ,
<HR>
<P ALIGN="center">
ifdef(_index,[Home],_link(index.html, [Home]))
ifdef(_linux,[Linux],_link(linux.html, [Linux]))
ifdef(_writing,[Writing],_link(writing.html, [Writing]))
ifdef(_bookshop,[Bookshop],_link(bookshop/index.html, [Bookstore]))
</P>
<HR>
)

Then, in the file linux.html, the macro _linux is defined, and so the button bar code generated has no link to the Linux page — the Linux link is "grayed out".

Again, we can define our email address in the master file. Then, if that changes, there is no need to do a global search-and-replace through all the files that constitute the site. A simple "make" updates everything — but that's the subject of another article.

Example: a Linux key-map

An interesting use of m4 is for the maintenance of Linux keymap files. I don't do this myself, since hacking an existing file was simplest for me, but it is a good example of the imaginative uses that m4 can be put to. We don't have the space to examine the file in any depth here; take a look at /usr/lib/kbd/keymaps/i386/qwerty/hypermap.m4 on your Linux system.

Example: sendmail config

Perhaps the most well-known of m4 applications is it's use to tame the fearsome complexity of sendmail configuration files. The sendmail source distribution comes with m4 macros that are sufficient to generate a sendmail.cf for most any site; at most, a little tweaking of the resulting sendmail.cf file (whose syntax has been memorably and justly compared to line noise) may be required. For anyone who has tried to write a sendmail config file from scratch, in the days before the m4 macros, this is a God-send.

Differences Between m4 Versions

Inevitably, there are different versions of m4. This is not an issue for the Linux user, as they will invariably be using GNU m4.

The main difference is that SysV m4 supports multiple arguments to defn. Since the usefulness of this is unclear to GNU m4's maintainer (and indeed to me), this feature is not in GNU m4.

There are several other incompatibilities (which souldn't surprise anyone who's tried to use GNU make and then BSD's pmake, or vice versa). None are important, so those interested can read the relevant info page (alas, no man page provided). Again, since this article is about m4 rather than GNU m4, so I won't mention the various extensions implemented in the GNU version.

Things to Watch

Quoting can be cantankerous on occasion. Quoting problems can usually be solved by . For example, if we want to include one of the quote characters in a macro definition, we can use and then

will keep the quote characters in the macro definition — note that and can't be escaped, so we have to do it this way.

Another thing to watch for is that the names of m4 builtins occurring in your text will be taken for calls to the function, and expanding. This can be avoided by quoting, but that is inconvenient. GNU m4 offers us a better way. The -P command-line switch allows us to preface all builtins by the string m4_, rather as in the C preprocessor the # character is used.

Limitations

Sadly, there is no man page. There is an info page, however.

m4 is a useful tool, but it can be overstrained. Although it can be made to do most things with ingenuity, m4 is at its best when used for straightforward text substitution, as with our HTML example.

Kernighan and Plauger sum it up nicely in Software Tools :

"The main thing is to ensure that any operation — macro call, definition, other built-in — can occur in the middle of any other one. If this is possible, then in principle the macro processor is capable of doing any computation, although it may well be hard to express."

but

"In principle, macro [i.e. m4] is capable of performing any computing task, but it is all too easy to write incomprehensible macros."

Resources

The manual for the GNU version of m4 is on-line at http://www.gnu.org/manual/m4.

The classic book Software Tools devotes a chapter to the development of a macro processor based on m4.

The original paper documenting the first Bell Labs m4 is available at the Bell Labs site, as part of the V7 Unix doc. collection— see http://plan9.bell-labs.com/7thEdMan/index.html.

Bob Hepple has an interesting page about using m4 to generate HTML (in fact, I got the idea from his article) — see http://www.bit.net.au/%7Ebhepple.

Paul Dunne 2001

This page was brought to you by ksh, vi, m4, sed & make, courtesy of openbsd.
Last changed: Sun Mar 3 11:54:02 CET 2019