PowerShell with a Purpose Blog

Regular Expressions Intro

Have you experienced the power of regular expressions? If you have, then you already know what a useful tool they can be - I myself was put off for a long time by the nutso syntax. I mean, deciphering "^\\\\\w+\\\w+(\\\w+)+" isn't exactly easy-peasy. But once you dig into them, regular expressions have a lot of utility. PowerShell happens to have great regex support, making it easier to use regular expressions that you've written or copied - ahem, "repurposed" - from the Intertubes.

Essentially, a regex is a way of describing a text pattern to a computer. I've used them to extract IP addresses from IIS server logs and firewall logs, to validate user input to make sure something looks like a UNC, extract hyperlink tags from an HTML document, and to check e-mail addresses for conformity to a corporate standard. The regex syntax is simply a very specialized mini-programming language, used to describe those patterns to the computer. PowerShell can then tell you if a piece of data matches a given regex, or it can use a regex to locate and extract information from a larger body of text.

My two favorite Web sites are RegExLib and RegExTester. The former is a vast, free library of user-contributed regular expressions for a variety of tasks, and the second is a free, Web-based tester for regexes. Need to validate the pattern for an Italian address?

^[a-zA-Z0-9ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÿ\.\,\-\/\']+[a-zA-Z0-9ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÿ\.\,\-\/\' ]+$

Wow. Need to detect and strip potentially-malicious HTML code from user input?

^[^`~!/@\#}$%:;)(_^{&*=|'+]+$

Thanks, RegExLib! Putting these to use in PowerShell requires one of two methods: The -match operator, or the Select-String cmdlet (other commands, such as the Switch construct, also accept regular expressions). Need to see if a suer has entered a valid YYYY-mm-dd formatted date? Get the user's input into a variable, such as $userdate, and do this:

$userdate -match "^[0-9]{4}-(((0[13578]|(10|12))-(0[1-9]|[1-2][0-9]|3[0-1]))|(02-(0[1-9]|[1-2][0-9]))|((0[469]|11)-(0[1-9]|[1-2][0-9]|30)))$"

PowerShell will return True or False if there's a match. The trick is to remember that the text data goes before the -match operator, and the regex comes after. Heck, there's a third way: The -replace operator can replace regex matches with whatever you want. So replace that malicious HTML input with an empty string (effectively deleting the offending markup):

$user_input -replace "^[^`~!/@\#}$%:;)(_^{&*=|'+]+$", ""

Both -match and -replace are case-insensitive; use -cmatch and -creplace if you need a case-sensitive version. 

Are regular expressions something you'd find useful in PowerShell? Say the word and I'll write up a more detailed syntax tutorial!
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish