Regular Expressions and String Manipulation

these days, there are many great reasons to program in perl. One of those happens to be the first among those: its natural ability to play with strings and, in particular, regular expressions.

The following two operators, =~ (match) and !~ (no match), are among the most basic. =~ returns the number of times a substring matching the regular expression is found in the supplied string. Sometimes it is interpreted as a true/false expression, where 0 matches is false (not found). The "not in" opertor !~ retunrs true if no matches are found.

The general forms are as follows:


    $nummatches = ($somestring =~ /regular expression/); 
    $notin = ($somestring !~ /regular expression/); 
  

If you group parts of a regular expression within ()-parenthesis, and the regular expression is matched, each match within ()-parenthesis will be saved into a special variable -- much as was the case with, for example, sed. These special variables are $1, $2, etc. Careful! Careful! Everyone wants to believe that these variables represent command-line arguments as they do in shell. Notice the difference! It is also worth noting that, although not preferred, Perl will accept the \1, \2, /3, etc, notation common in many other programs. Regardless, here's a quick example:

  if ( $somestring ~= /([0-9]+)[a-zA-Z]*([0-9]+)/) {
    # $1 is the number at the begining of the line
    # $2 is the number at the ending of the line
  } else {
    # $1 and $2 are unchanged
  }
  

perl also has a special variable, $_, which represents the default string. Several important operators act on this string by default. For example, perl can do sed-style searching and replacing. When this type of expression is defined, it is acting upon $_:


  $_ = "This is an example string: Hello World";

  $changes = s/World/WORLD/g;

  print "$_\n"; # "World" is now WORLD 

  print "$changes\n"; # The number of substitutions made; in this case, 1
  

The tr function is also very powerful. It acts much like the tr command. It allows the user to define a mapping of character-for-character substitutions and applies them to $_. Each character in the first field will be replaced by the corresponding character in the second filed. As with th s function above, it returns the number of substitutions:


  $changes = tr/abc/123/; # a becomes 1, b becomes 2, c becomes 3
  

Please note: In the examples above, there are no quotes around the tr and s expressions. This is important. If the expressions are quoted, they'll be interpreted as strings and assigned, instead of interpreted as regex operations and performed.