The basic idea behind regexps being greedy is that they will match the maximum amount of data that they can, sometimes resulting in incorrect or strange answers.
For example, I recently came across something like this:
$_="this (is) an (example) of multiple parens"; while ( m#\((.*)\)#g ) { print "$1\n"; }
This code was supposed to match everything between a set of parentheses. The expected output was:
is example
However, the backreference ($1) ended up containing "is) an (example", clearly not what was intended.
In perl4, the way to stop this from happening is to use a negated group. If the above example is rewritten as follows, the results are correct:
while ( m#\(([^)]*)\)#g ) {
In perl5 there is a new minimal matching metacharacter, '?'. This character is added to the normal metacharacters to modify their behaviour, such as ``*?'', ``+?'', or even ``??''. The example would now be written in the following style:
while (m#\((.*?)\)#g )
Hint: This new operator leads to a very elegant method of stripping comments from C code:
s:/\*.*?\*/::gs
Other resources at this site: