[Gardeners] Potting Soil: Regular Expressions in Lisp
Larry Clapp
larry at theclapp.org
Sun Jun 25 21:54:45 CDT 2006
A short discussion of the basics of Edi Weitz's Portable Perl
Compatible Regular Expression library. Assumes you're familiar with
regular expressions in general.
I haven't used Edi's cl-ppcre package before, so this article was
partly just a learning experience for me. It's documentation is quite
complete, so I could just say "Go read http://www.weitz.de/cl-ppcre/"
and be done. :) All the examples are copied from that file.
Where to get it:
- naked: http://weitz.de/files/cl-ppcre.tar.gz
- Debian: apt-get install cl-ppcre
- The doc says "There's also a port for Gentoo Linux thanks to
Matthew Kennedy and a FreeBSD port thanks to Henrik Motakef.
Installation via asdf-install should as well be possible."
Where to read the docs: http://www.weitz.de/cl-ppcre/
How to load it:
- naked: (load "load.lisp") will compile and load everything.
- Debian: (clc:clc-require :cl-ppcre)
Interesting points:
- many of the functions have a flag, sharedp, which tells the
function that the various substrings it generates can share
structure with the string matched against. So you could do
multiple megabytes of matching, but only actually allocate a few
displaced arrays. Nifty.
- "CL-PPCRE uses a compiler macro and LOAD-TIME-VALUE to make sure
that the scanner is only built once if the first argument to SCAN,
SCAN-TO-STRINGS, SPLIT, REGEX-REPLACE, or REGEX-REPLACE-ALL is a
constant form." So if you pass it a constant form, it's smart
enough to realize that and pre-compile the regex parser. Also
very nifty.
Basics:
- scan regex target-string &key start end
=> match-start, match-end, reg-starts, reg-ends
Search a string for a regular expression. Returns the start of the
match, the end of the match, and two arrays denoting the beginnings
and ends of register matches. On failure returns NIL.
(cl-ppcre:scan "a*b" "xaaabd") ; no register matches
=> 1, 5, #(), #()
(cl-ppcre:scan "(a)*b" "xaaabd") ; 1 register match
=> 1, 5, #(3), #(4)
(subseq "xaaabd" 1 5)
=> "aaab"
(subseq "xaaabd" 3 4)
=> "a"
(cl-ppcre:scan "(a*)b" "xaaabd") ; 1 register match, in different place
=> 1, 5, #(1), #(4)
(subseq "xaaabd" 1 4)
=> "aaa"
- scan-to-strings regex target-string &key start end sharedp
=> match, regs
Like SCAN but returns substrings of target-string instead of
positions.
(cl-ppcre:scan-to-strings "(([^b])*)b" "aaabd")
=> "aaab", #("aaa" "a")
- split regex target-string
&key start end limit with-registers-p omit-unmatched-p sharedp => list
Matches regex against target-string as often as possible and returns a
list of substrings between the matches.
(cl-ppcre:split "\\s+" "foo bar baz frob")
=> ("foo" "bar" "baz" "frob")
(cl-ppcre:split "\\s+" "foo bar baz
frob")
=> ("foo" "bar" "baz" "frob")
(cl-ppcre:split "\\s*" "foo bar baz
frob")
=> ("f" "o" "o" "b" "a" "r" "b" "a" "z" "f" "r" "o" "b")
- regex-replace regex target-string replacement
&key start end preserve-case simple-calls
=> list
Try to match target-string between start and end against regex and
replace the first match with replacement.
(cl-ppcre:regex-replace "fo+" "foo bar" "frob")
=> "frob bar"
See the documentation for other functions and other examples.
-- Larry
More information about the Gardeners
mailing list