Archive for March, 2010
Uncommon Regular Expressions That Should Be
Posted by Jason in Web Development, Weblog on March 20th, 2010
For some strange reason I was recently surprised to find that regexes that should be common were no where to be found inside the interweb. So here they are.
This is a Regex for an IP address:
^(?:(?:25[0-5]|2[0-4]\d|[1]\d\d?|[1-9]\d?|[0])\.){3}(?:25[0-5]|2[0-4]\d|[1]\d\d?|[1-9]\d?|[0])$
Notice that it does use those fancy extensions which have a wonderful description in the python documentation for the regular expression module named re. This allows only IP addresses that would be valid in a bind zone file (my reason for needing this regex). That is it dissallows any number higher than 255 in any octet and does not allow zeros to lead any of the octets — an important oversight in many IP address regexes in the web.
And here is a regex for a fully qualified domain name:
^(?:(?!-)[a-zA-Z0-9]+(?:-+[a-zA-Z0-9]+)*\.)+[a-zA-Z]{2,4}\. $
This is for fully qualified domain names based on RFC 952 and 1123. Meaning that we do not allow any underscores inside the domain name or any of the other slack rules that were introduced in 2181. Also the GTLDs change enough (and they are making it easier to create new GTLDs) that it is futile to try limiting the top level domain with a regex. If you really want to know if the domain name is valid then resolve it yourself — I just want to know that it is in a valid format. For more information on valid names check out this link.
I have used these regular expressions in Python, Javascript and PHP, but they should work pretty much anywhere as long as the extensions are supported.