Perl FAQ 4.26: What does it mean that regexps are greedy? How can I get around it?

Perl FAQ 4.26

What does it mean that regexps are greedy? How can I get around it?

The basic idea behind regexps being greedy is that they will match the maximum amount of data that they can, sometimes resulting in incorrect or strange answers.

For example, I recently came across something like this:

    $_="this (is) an (example) of multiple parens";
    while ( m#\((.*)\)#g ) {
	print "$1\n";
    }

This code was supposed to match everything between a set of parentheses. The expected output was:

is
example

However, the backreference ($1) ended up containing "is) an (example", clearly not what was intended.

In perl4, the way to stop this from happening is to use a negated group. If the above example is rewritten as follows, the results are correct:

    while ( m#\(([^)]*)\)#g ) {

In perl5 there is a new minimal matching metacharacter, '?'. This character is added to the normal metacharacters to modify their behaviour, such as ``*?'', ``+?'', or even ``??''. The example would now be written in the following style:

    while (m#\((.*?)\)#g )

Hint: This new operator leads to a very elegant method of stripping comments from C code:

    s:/\*.*?\*/::gs


Other resources at this site: