A PCRE module for Lua 4.0.1

  — Wim Couwenberg, May 14, 2003

This module offers a simple Lua wrapper for the regular expression matching functionality that is provided by the PCRE library. It is written and tested for PCRE versions 3.9 and 4.1 but it should be fairly easy to adapt or extend it for other versions. You can get the PCRE library here. To compile the pcre4lua module, the original PCRE header and library files should already be installed on your system.

To activate the library for a Lua state L you should call lua_pcrelibopen(L). This installs the pcre4lua interface in a global table called "pcre". By default, the lua_pcrelibopen call pushes nothing and returns 0. The implementation of lua_pcrelibopen contains a conditional section that, when enabled, leaves the interface table on the top of the stack instead without assigning it to a global and returns 1.

Version info

The major, minor and date fields of pcre contain the version information for the PCRE library. They are taken from the PCRE_MAJOR, PCRE_MINOR and PCRE_DATE respectively, defined in pcre.h.

Options

Each option PCRE_OPTION from pcre.h is available as OPTION in the pcre interface. For PCRE version 3.9 these are CASELESS, MULTILINE, DOTALL, EXTENDED, ANCHORED, DOLLAR_ENDONLY, EXTRA, NOTBOL, NOTEOL, UNGREEDY, NOTEMPTY, UTF8 (for what it's worth). Version 4.0 added NO_AUTO_CAPTURE.

Compiling a pattern

The pcre interface contains a single function, compile(pattern [, options]). It takes a pattern string and an optional numeric options parameter. If an options parameter is provided it should be the sum of any number of different pcre options listed above. On success compile returns a compiled pattern object. If it fails it returns nil, error string, error offset.

Compiled pattern interface

A compiled pattern offers two methods: exec and match. The exec method for a compiled pattern pat is called as pat:exec(text [, offset [, options]]). It scans text for the pattern pat. If offset is provided then the scan starts at this offset. Valid options are any combination of ANCHORED, NOTBOL, NOTEOL and NOTEMPTY.

If no match is found then exec returns nil. If a match is found then the return values are arranged like those for Lua's strfind, i.e. first offset, last offset [, capture, ...] For each processed capture of the match, the matching text is returned. However, if a capture is not present in the match then that capture is reported as nil. Note that this is not the same as a capture containing an empty string.

The match method of a compiled pattern is identical to exec except that it returns a single match object if a match was found. The interface of a match object is described below.

Match object interface

The number of processed captures in a match object m is available as the property m.n. If the matched pattern did not define any captures then m.n will be 0. If i is an integral index between 1 and m.n then m[i] is the text of capture number i or nil if the capture was not present. The complete matched substring is available as m[0] or, equivalently, as the property m.match.

For integral i between 1 and m.n the call m:interval(i) returns the bounds first offset, last offset of capture i or nil if the capture is not present in the match. Both m:interval() and m:interval(0) return the bounds of the complete matched substring.