Create No-Sweat Regular Expressions

Validate data input with this powerful — but tricky — pattern-matching language.


Languages: VB | C# | JScript

Technologies: Regular Expressions | Validation


Create No-Sweat Regular Expressions

Validate data input with this powerful - but tricky - pattern-matching language.


By Steven A. Smith


Regular expressions can provide powerful string-searching, formatting, and validating functionality. These expressions can be quite daunting to understand at first, but after you learn the basics, they are not nearly as scary to look at. Until .NET, Microsoft's support for regular expressions was weak compared to many UNIX implementations. With the .NET Framework, however, you now have a powerful implementation of regular expressions at your disposal (see the sidebar, "What are Regular Expressions").


Regular expressions are great for searching text and finding particular patterns or substrings. They're also useful for data validation, ensuring a piece of data matches a predetermined format. Finally, you can use regular expressions to format strings by matching and replacing certain patterns (such as changing a date format from mm/dd/yyyy to dd-mm-yyyy).


The .NET Framework provides a great deal of regular expression support in many of its classes. The System.Text.RegularExpressions namespace provides support for the evaluation of regular expressions. In addition, the ASP.NET validation controls support regular expressions via the RegularExpressionValidator control.


In this article, you will learn what kinds of tasks regular expressions are designed to perform, how to write expressions yourself, and how to take advantage of the regular expression classes provided in the .NET framework. If you'd like to try out the examples yourself, you can find them online at


Regular Expressions to the Rescue

Regular expressions provide a powerful way to search, replace, match, and validate strings by using a process called pattern matching. By using the regular expression syntax you will learn in the next section, you can create an expression that represents a particular pattern, such as a ZIP code, e-mail address, or telephone number. Consider the fairly common data-validation task of verifying an e-mail address. E-mail addresses must follow a well known pattern. They need to have some characters, such as an "@" (at) sign, and then some more characters with at least one period. The exact requirements for the format of an e-mail address are described in section 3.4.1 of RFC 2822 (


Validating an e-mail address using ASP classic string manipulation is a daunting task and makes a good counterexample to using regular expressions. Without regular expressions, you would have to write about 100 lines of code to validate e-mail addresses, using If-Then statements and string comparisons. First, you'd have to check for an @ symbol. Next, you'd have to make sure a period followed the @ symbol and that at least one character separated the two. You'd then need to make sure at least two characters follow the period. Finally, you'd need to scan the user input to make sure all characters are alphanumeric, with the allowance for a period, an @ symbol, and underscore characters. That's a lot of work for a little bit of validation.


Using a regular expression, however, you can eliminate this complicated series of If-Then statements. In fact, using ASP.NET's RegularExpressionValidator control, you can perform the same validation that required 100 lines of code without writing any code at all (see Figure 1). What's more, you get both client-side and server-side validation; with the ASP classic solution, you'd need an additional 100 or so lines of code to include client-side checking of the address. In Figure 1, note the ValidationExpression property of the RegularExpressionValidator control. This is the expression used to validate an e-mail address according to the RFC (I found it at You'll learn how to build expressions like this one by the end of this article. (For more information about validation controls, see "Master ASP.NET Validator Controls," asp.netPRO July 2002.)


<%@ Page language="vb" %>






ErrorMessage="Invalid Email Address" Display="None"

ControlToValidate="txtEmail" />

Figure 1. You can validate an e-mail address using ASP.NET and regular expressions.


Master the Basics

The simplest regular expressions are string literals. If, for example, you need to find the string "wood" in the phrase "How much wood could a woodchuck chuck if a woodchuck could chuck wood?" you would simply use the literal regular expression "wood". This search would find four matches in the phrase. Most characters used in a regular expression match themselves. Here is a group of metacharacters, however, that have special meaning within regular expressions:


$ ^ . { [ ( | ) ] } * + ? \


These characters are used to create the powerful searching and matching capabilities of regular expressions.


One group of these special characters is known collectively as "quantifiers" because they affect the number of times a particular expression must repeat - the "*" (asterisk), "+" (plus), and "?" (question mark) characters comprise the quantifiers group. For example, if you place the * character immediately after an expression, it means that expression might occur any number of times (including zero). Similarly, the + character requires an expression to occur one or more times, and the ? character requires an expression to occur exactly once or not at all. The expression that precedes the *, +, or ? normally is a single character, but you can use parentheses like this "(...)" to designate a longer expression. Figure 2 demonstrates how to use these three quantifiers. (Note: You can test the expressions in these examples using the Windows Forms application I have created for this article, which you can download. Alternately, you can use the regular expression tester found at; see the sidebar, "Research Regular Expression Development.")







Literal match only.






Zero or more matches.






One or more matches.




Zero or one match.

Figure 2. This table shows you how to use the *, +, and ? metacharacters, known collectively as "quantifiers."


Another quantifier expression you can use to specify an exact number of matches (or range of numbers) is implemented using the curly braces like this {...}. Following any expression with curly braces holding a number (such as "{n}") specifies that the expression must occur exactly that many times. Specifying two numbers separated by a comma (such as "{n,m}") designates a range of occurrences (from the first number to the second) that the expression must match. You also can specify a single number followed by a comma to specify no upper limit (such as "{n,}"). Figure 3 illustrates how curly braces can be used to specify a designated number of matches.







Match exactly twice.






Zero or more matches. {0,} is the same as *.






One or more matches. {1,} is the same as +.




Zero or one match. {0,1} is the same as ?.




Between two and three matches.

Figure 3. You can use {n}, {n,}, and {n,m} to specify a specific number of matches.


Match Patterns

Another way to use metacharacters in regular expressions is to match certain kinds of patterns. These can be useful because they provide a more powerful method of matching than simple literals. The simplest of these meta-characters is the "." (period) character, which matches any one character. For example, using the quantifier expressions in combination with . makes it easy to specify a certain string length. Another useful set of metacharacters for specifying a range of particular characters to match are the hard brackets: "[...]". You can place a set or range of characters inside hard brackets, indicating any one of these characters represents a match. To designate a range, use the "-" (hyphen) character. In order to match a literal hyphen character within hard brackets, it must appear first in the list. You also can specify multiple ranges in a single set of brackets. Finally, you can reverse the behavior so the expression matches any character except for those listed in the brackets by preceding the list of characters with the "^" (caret) character. Figure 4 illustrates the use of hard brackets and the . metacharacter in regular expressions.










Matches any single character except a new line (\n).




This is a sentence.

Matches any combination of any number of characters. Anything will match.





123 abc

Matches any string of characters between four and eight characters long.





Matches any number of the characters a, b, or c in any combination.






Matches any length string consisting of only alpha-numeric characters.





, . , . - , . , .

Matches any combination of hyphens, commas, and periods.



A string with no numbers.

Matches anything so long as it doesn't contain a numeric digit (0 through 9).

Figure 4. This table shows the range of characters and how to use the . metacharacter.


In all the preceding examples, the regular expressions would match any occurrence of their matching strings within a larger string. For instance, the "." character would match every character in the string abc - in fact, there would be three matches: one for a, one for b, and one for c. To restrict a regular expression so it matches only the exact string pattern in isolation, or to match a particular pattern that appears at the beginning or end of a larger string, you can use two more metacharacters: The caret ("^") and dollar sign ("$") match the beginning and end of the search string, respectively.


The RegularExpressionValidator control restricts its matches automatically, whether or not you use the ^ and $ metacharacters. Using the characters is optional and does not change the behavior of the RegularExpressionValidator control.


Using hard brackets, you can designate any combination of characters to match (or not match).But because many situations would require listing many possible matching characters, you can use several metacharacters as shortcuts to represent various sets of characters in a regular expression. The most commonly used expressions match numeric digits, white space, or word characters. Figure 5 shows some of these metacharacters and their equivalent hard-bracket implementation.







Any numeric character.



Any non-numeric character.



Any letter, number, or underscore ("_"). .NET uses the Unicode equivalents of these values by default.



Anything but a letter, number, or underscore ("_"). .NET uses the Unicode equivalents of these values by default.


[ \f\n\r\t\v]

Matches any white-space character.


[^ \f\n\r\t\v]

Matches any non-white-space character.



Matches a Unicode character using exactly four hexadecimal digits. In this case, 000C is a form feed.



Matches an ASCII character using exactly two hexadecimal digits. In this case, \x41 is A.



Matches a form feed.



Matches a new line.



Matches a carriage return.



Matches a tab.



Matches a vertical tab.




Matches the literal metacharacter following the \. (In this case, it's a question mark.)



Matches a backspace if within [] (hard brackets), otherwise matches a word boundary (between the \w and \W characters).

Figure 5. You can use these metacharacters as shortcuts to represent sets of characters in a regular expression.


In Managed C++ or C#, you must escape the "\" (backslash) character in regular expression strings with a second \ so the C++ or C# compiler doesn't interpret it. In C#, you also can do this by preceding the string with @. For example, to match "\d" in Managed C++, you would have to code the string as "\\d"; in C#, you could use this same technique, or you could code the string as "@\d".


When several different expressions can be used for a match, you use the "|" (vertical bar) metacharacter to separate the different matching expressions. Figure 6 shows how to use pattern shortcut characters, quantifiers, character ranges, and vertical bars to match real-world string patterns.








Matches exactly five digits, as in a U.S. ZIP Code.




Matches exactly five digits or five digits, a hyphen, and four more digits. Used for U.S. ZIP or ZIP+4 codes.




Same as the previous expression, but specified using the ? metacharacter instead of |.






Matches any positive or negative real number.


Same as the previous expression, plus matches an empty string.

Matches any positive or negative real number, or an empty string.






Matches a zero-padded 24-hour time string with no separators (i.e., it will not match 730, but will match 0730).


/* Comment */

Matches the contents of any comments in C#, C, C++, JavaScript, CSS, etc., using the /* ... */ syntax. Note that [\d\D] matches anything.

Figure 6. Use these expressions to match real-world regular-expression patterns.


Use Regular Expressions in the .NET Framework

The .NET Framework's support for regular expressions is located primarily in the System.Text.RegularExpressions namespace, although other classes also use regular expressions. Using these classes, you can plug into a regular expression's behavior as it is being applied to a string. The main class used to implement regular expressions in the .NET Framework is the Regex class. This class has a rich set of methods you can use to apply regular expressions to input strings in order to find and/or replace matching substrings.


When used to find matches, the Regex class can return either a single Match object or a MatchCollection. The Regex class also supports both instance methods and shared (static) methods, making it very flexible. Figure 7 demonstrates how to use the Regex class to validate whether an input string matches a particular expression, extract a collection of matches from an input, and replace all matching substrings with a new string. Figure 7 also demonstrates how to use the Regex class' shared methods (note that no instance of Regex is ever instantiated). Figure 8 shows the output of that code.


<%@ Page language="vb" %>

<%@ Import Namespace="System.Text.RegularExpressions" %>

Figure 7. This sample page performs several common Regex tasks.


Figure 8. Here is a sample output for Common.aspx, demonstrating some common uses of regular expressions.


Another part of the .NET Framework that uses regular expressions is the ASP.NET validation controls. Using the RegularExpressionValidator, it is possible to validate any form input to verify that it matches a particular regular expression. This can be a powerful tool to ensure data being entered by users is valid (or at least well formed). Figure 9 demonstrates how to use a RegularExpressionValidator to ensure a U.S. ZIP code field (not ZIP+4) is entered correctly.


<%@ Page language="vb" %>

ZIP Code:

 ID="RegExZip" ValidationExpression="\d{5}"

 ErrorMessage="Invalid ZIP Code"

 ControlToValidate="txtZip" />

Figure 9. Use the RegularExpressionValidator on an ASPX page to validate whether any form input matches a particular regular expression.


Explore Your Options

You can specify several options that determine how some of the metacharacters in a regular expression behave. See Figure 10 for the most useful options, and consult the .NET Framework SDK for more information about these and other options you can use. To use these options in your .NET code, specify the individual options. You can specify multiple options by combining them with the OR operator. Note that you can set both the Multiline and Singleline options.





Causes the ^ and $ metacharacters to refer to the beginning and end of each line, rather than the entire string.


Causes all character matches to ignore case.


Allows the regular-expression pattern to be created with as much white space as desired. Also supports in-pattern comments, which can be useful for documenting exceptionally complicated patterns.


The pattern will treat the whole search string as a single line, even if it is multiline. Causes the . character to match any character, whereas when SingleLine is not set, the . character will match any character except a new line (\n).

Figure 10. You can control the behavior of some regular-expression metacharacters by using these language options.


I've created a simple ASP.NET page to test regular expression options. This page will let you test different regular expressions and see how setting different options changes how the expressions apply to the input string. You can download the source code for this page along with this article's download files.


The files referenced in this article are available for download.


Steven A. Smith is president of and head trainer at, which provides .NET training. He is a co-author of ASP.NET By Example and a speaker at several .NET conferences each year. He can be reached at mailto:[email protected].




What are Regular Expressions?

Regular expressions comprise a language developed to describe and match patterns in strings of characters. American mathematician Stephen Kleene, who characterized them as "the algebra of regular sets," developed regular expressions in the 1950s. Regular expressions use many special characters to describe different kinds of patterns, which cause all but the simplest of expressions to resemble a jumble of random symbols. As with most languages, however, regular expressions contain common metacharacters that, once known, make even the novice user capable of understanding most practical expressions.


Research Regular Expression Development

Regular expressions are not new, so a variety of resources are available to help you learn more about how to use them.

The Regular Expression Library ( that I helped create has a database of more than 100 regular expressions that you can search using keywords or expression substrings. It also features a regular expression tester that can be helpful when developing new expressions.

Got questions? Check out two regular expressions discussion lists at that are full of experts willing to help you out. The documentation for the .NET Framework, available in the SDK, is also useful. I also recommend Mastering Regular Expressions by Jeffrey E. Friedl and Andy Oram (O'Reilly Nutshell).

And don't forget to download the simple WinForms regular expression test I developed for this article.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.