Return to the Lecture Notes Index

15-112 Lecture 25 (June 24, 2014)

Regular Expressions

Today we chatted a bit about the relationship between regular expressions, regular languages, and finite state machines (FSMs). You'll get that discussion with a lot more rigor in 15-251, so I don't want to emphasize it here. It is often said that, to teach, you should, "Tell them what you are going to tell them. Tell them. And then tell them what you've told them." Think of it this way: We just did step #1 -- we'll leave steps #2 and #3 for 15-251.

We then discussed how to use regular expressions in Python. The resource you want as a reference is the Python Regular Expression HOWTO. It is excellent. We emphasized the following:

One interesting example we did in class involved the need to escape the \-slash when using it as a positional, and the need to use ?: to avoid capturing a group:


#!/usr/bin/python

import re

text = "01/01/2013 some other text 09/09/2013"

# Let's find all dates
p = re.compile("[0-9]+/[0-9]+/(19|20)[0-9]{2}")
matches = p.finditer(text)
for match in matches: print match.group()

print ""
print "Capturing"
print ""


# Two (2) Things to notice below:
# 1. The ?: causes us to use the () to form a group, like () in math,
#    but not to capture them into a group saved as a group() 
# 2. \1 represents the first captured group (vs a ?: non-captured group).
#    Notice that we had to escape it as "\\" to prevent Python
#    From viewing it as an "escaped 1" and sending that (whatever it is?) 
#    as part of the string instead fo the \-slash to the compile function.

text = "01/01/2013 some other text 09/09/2013 and the date again: 01/01/2013"
p = re.compile("([0-9]+/[0-9]+/(?:19|20)[0-9]{2}).*(\\1)")
matches = p.finditer(text)
print "Notice that we only print the repeated date"
for match in matches: print match.group(1), match.group(2)
  

Last year's TAs produced this handout [pdf] that may be helpful to you as a quick summry of the regular expression language.