Perl-compatible regular expression optimizer
0. Contents
This is the documentation of regex-opt-1.2.4.
1. Purpose
Optimizes perl-compatible regular expressions.
2. Usage
The general syntax for running the program is:
regex-opt <regexp>
Example:
regex-opt 'xaz|xbz|xcz'
x[a-c]z
Try running regex-opt on
Abigail's 7 kilobyte
URL regexp.
The result should be about 5 kilobytes long.
3. Supported syntax
- * (repeat 0-inf)
- + (repeat 1-inf)
- ? (repeat 0-1)
- {n} (repeat n)
- {n,} (repeat n-inf)
- {n,m} (repeat n-m)
- . (accept any char except \n)
- [a-z] (character sets)
- [^a-z] (inverse character sets)
- [[:alpha:]] (character classes)
- \s (and other character classes and escapes)
- x|y (alternatives)
- (?:x|y) (non-capturing grouping)
- *? (non-greedy repeat)
4. Unsupported syntax
- ^ (match string-begin)
- $ (match string-end)
- () (capturing is converted to noncapturing)
- Any (? -command that is not mentioned in supported syntax
- Unicode-specific markup
5. Optimizations performed
- Character set optimization: [A-Zabcdefgh-yz0-9%] becomes [[:alnum:]%]
- Alternate characters: y|[yp]|[zx] becomes [px-z]
- Counting: aaa* and aa+ become a{2,} and (a?){3} becomes a{0,3}
- Combining: abcde|xycde becomes (?:ab|xy)cde
- Parenthesis reduction: ((abc)) becomes abc, (xx|yy)|zz becomes xx|yy|zz
- Compression: xyzyzxyzyz becomes (?:x(?:yz){2}){2}
- This might not be always a good thing.
- Choice counting: a+|aa+ becomes a+, (b|) becomes b?, dxxxxb|dxxxb|dxxb|dxb becomes dx{1,4}b
6. Optimizations not performed
- Combining counts:
- a?|b? should become (?:a|b)?, now becomes a?|b?
- Redundancy removal (removal of alternatives that are subsets of other alternatives):
- xfooy|x[a-q]+y should become x[a-q]+y, now becomes x(?:foo|[a-q]+)y
Help in solving these shortcomings would be welcome.
7. Copying
8. Requirements
For compiling you need the following GNU tools: g++, make.
9. Downloading
Generated from
progdesc.php (last updated: Mon, 13 Nov 2017 21:53:59 +0200)
with docmaker.php (last updated: Mon, 13 Nov 2017 21:53:59 +0200)
at Mon, 13 Nov 2017 21:53:59 +0200