Understanding VBScript: Real-World Uses of Regular Expressions



Last month, I introduced you to VBScript's regular expressions. This month, I present handy code that you can use to enhance your real-world VBScript applications. Specifically, I show you how to use regular expressions to rewrite two commonly used VBScript functions: InputBox and Replace. You can use the rewritten InputBox function to create dialog boxes that accept only data that matches a regular expression you specify. You can use the rewritten Replace function to perform powerful search-and-replace operations.

Rewriting the InputBox Function
In the Web-exclusive sidebar "Building a Better InputBox Function" in my October 1999 column, I created an improved version of the InputBox function called InputBoxEx. This user-defined function not only produced a dialog box but also validated the data users entered in it. You can use parts of the InputBoxEx code to create a function called InputBoxRegEx. This new user-defined function produces a dialog box, then determines whether the entered text matches the specified regular expression pattern.

Listing 1, page 2, contains the InputBoxRegEx code. The code begins with the definition of the InputBoxRegEx function. InputBoxRegEx takes the same arguments as InputBox (i.e., prompt, title, and default) plus a new one (i.e., re). The prompt and title arguments specify the dialog box's message and title, respectively. The default argument specifies the default response that appears in the edit box (i.e., the box in which users enter their input). The re argument specifies the pattern you want to test the entered data against.

After the function initializes the InputBoxRegEx variable by setting it to an empty string, the function calls InputBox. InputBox displays a dialog box for users and returns the text that they enter in the str variable. If users cancel the dialog box, InputBox returns an empty string.

If the string isn't empty, InputBoxRegEx uses VBScript's CreateObject function to create an instance of the RegEx object. The function uses RegExp's Pattern property to set the regular expression pattern (i.e., the pattern in the re argument), then sets the IgnoreCase property to True for a case-insensitive search.

Finally, InputBoxRegEx applies RegExp's Test method to determine whether the text in the str variable matches the regular expression pattern. If a match occurs, the function sets the text to the InputBoxRegEx variable. If no match occurs, the function returns an empty string.

How might you use InputBoxRegEx in your job? Suppose you must ask users to enter an email address. With the code in Listing 2, page 2, you can make sure users enter a valid email address. If the email address matches the pattern you specify, a message box displays the entered email address. If the email address doesn't match the pattern, the message Not a valid email address appears.

Modifying InputBoxRegEx
You can easily modify InputBoxRegEx to meet your needs. For example, in its current form, InputBoxRegEx returns only one piece of information: either the dialog box's content (if a match occurs) or an empty string (if a match doesn't occur). You can modify this function so that it also returns a Boolean value specifying the result of the match. To make this modification, InputBoxRegEx needs to return an array of values. You can use many approaches to incorporate this array, including the approach that the InputBoxRegExArray code in Listing 3 illustrates.

In this approach, you adapt the InputBoxRegEx function in Listing 1 in four areas:

  1. Add the RESULT_TEXT and RESULT_TEST constants that callout A in Listing 3 shows. These user-defined constants provide an elegant way to identify the exact item in the array that contains the specific chunk of information you need. By assigning the RESULT_TEXT constant the value of 0, it represents the first item in the array when you use the code a(RESULT_TEXT). Similarly, by assigning the RESULT_TEST constant the value of 1, it represents the second item in the array when you use the code a(RESULT_TEST).

  2. Replace the line
    InputBoxRegEx = ""
    with the code at callout B in Listing 3. The Array function creates the array.

  3. Replace the lines
    If regexp.Test(str) Then
    InputBoxRegEx = str
    with the code at callout C in Listing 3. This code fills the array. The first item in the array is the content of the dialog box; the second element is the Boolean value.

  4. Add the code at callout D in Listing 3. This code returns the array and sets it to the InputBoxRegEx variable.

    After you make these modifications, you can use InputBoxRegExArray with code like that in Listing 4. This code prompts InputBoxRegExArray to execute, then displays its results.

    This example is only one of many ways you can modify InputBoxRegEx. However, if you're a novice scriptwriter, I recommend that you maintain the role and position of InputBoxRegEx's prompt, title, and default arguments because of their correlation with their counterparts in InputBox.

    Rewriting the Replace Function
    In VBScript, the Replace function replaces each occurrence of a given substring with new text. This function's syntax is

    Replace(expression, find, _ replaceWith _
    \[, start\[, count\[, compare\]\]\])

    The function scans the text you specify in the expression argument and searches for all the occurrences of the substring you specify in the find argument. The function then replaces those substrings with the text you specify in the replaceWith argument, following the restrictions you set in the start, count, and compare arguments. (For more information about the Replace function's syntax, see my November 1999 column.) This function is a simple pattern-matching mechanism because it lets you identify and replace text that matches only one certain substring.

    JScript's counterpart to the Replace function is the Replace method. This method's syntax is

    stringObj.replace(rgExp, _ replaceText)

    The method replaces the substrings that match the regular expression you specify in the rgExp argument with the text you specify in the replaceText argument. Because using a regular expression is a more powerful mechanism to identify text to replace than using a substring, JScript's Replace method is a more powerful search-and-replace tool than VBScript's Replace function. However, because VBScript now supports regular expressions, you can rewrite the Replace function so that it uses regular expressions to identify text to replace. This new user-defined function is called ReplaceEx.

    Listing 5 contains the ReplaceEx code. The code begins with the definition of the ReplaceEx function, which has three arguments:

    • strOrig (specifies the text you're searching)
    • re (specifies the regular expression pattern)
    • replaceWith (specifies the replacement text)

    After initializing the ReplaceEx variable, the function uses the New keyword to create an instance of the RegExp object. (As I discussed last month, you can use either the New keyword or VBScript's CreateObject function to create an instance of RegExp.) ReplaceEx then sets the IgnoreCase property to True for a case-insensitive search, sets the Global property to True to search for all occurrences of the pattern, and uses RegExp's Pattern property to set the regular expression pattern.

    Next, ReplaceEx uses RegEx's Replace method to perform the search-and-replace operation and sets the results to the buf variable. Finally, the function sets the contents of the buf variable to the ReplaceEx variable.

    The code in Listing 6 shows how to use this function. Run that code, click OK in each dialog box that appears, and see what happens.

    In ReplaceEx or any code using regular expressions, you can put the regular expression pattern in parentheses (), which means you want to consider that pattern a subexpression. For example, you might use the code


    which means that you want to consider \www\.\w+\.\w+ a subexpression. With subexpressions, you can apply the same replacement text to all occurrences in a match.

    Another useful special character in regular expressions is the $ qualifier. Basically, $1 identifies the first subexpression in the current match, $2 identifies the second subexpression in the current match, and so on. You can use $1 in the replacement string to inform the RegExp object that you want to modify only the current match. For example, suppose you want to search a document for URLs and prefix them all with http://. You can use the subexpression (\www\.\w+\.\w+) to identify the URLs, then modify them with a qualifier such as http://$1.

    Although useful, subexpressions and qualifiers can be limiting. For example, using the (\www\.\w+\.\w+) subexpression with the http://$1 qualifier works fine as long as you have URLs only in the form www.server.com. What if your text contains URLs in different forms, such as www.server.com and http://www.server.com? In this case, if you process the latter URL, you end up with a URL such as http://http://www.server.com.

    To correct this situation, you need to distinguish among the various URL forms in the list. As I explained in my August 2000 column, you can take advantage of the Matches collection object and the Match object. With these objects, you can skip the URLs that already include the http:// prefix and replace only those that lack it. Similarly, you can modify ReplaceEx so that it accommodates variations.

    Modifying ReplaceEx
    The ReplaceEx1 function in Listing 7 uses the Matches and Match objects that I covered in my August 2000 column and the code-evaluation techniques that I covered in my July 2000 column to accommodate variations in the text you're replacing. ReplaceEx1 works a bit differently than ReplaceEx. In the ReplaceEx function, you use the replaceWith argument to pass in the string that will replace the matching text. In the ReplaceEx1 function, you don't pass in a replacement string but rather a user-defined function. As callout A in Listing 7 shows, you use the callback argument to pass in this function. ReplaceEx1 executes the user-defined callback function during runtime. The callback function, in turn, returns the replacement text.

    After you define the ReplaceEx1 function and initialize the ReplaceEx1 variable, you create an instance of the RegExp object and set its IgnoreCase, Global, and Pattern properties. You then use RegExp's Execute method to initialize the Matches collection object and assign the text you're searching to that object.

    As callout B in Listing 7 shows, you next walk through the Matches collection and execute the callback function for each Match object. After the callback function returns the replacement text, ReplaceEx1 uses the Replace function to replace the matching text. Because the Replace function truncates the initial portion of the string if the replacement begins from a starting position higher than 1, you use a temporary variable (i.e., temp) to hold the portion that the Replace function will truncate.

    To use ReplaceEx1, you need to write the callback function, which will be unique for each application. For example, Listing 8 shows a callback function called AddHttpCallback that adds the prefix http:// to all URL matches that don't already contain it. AddHttpCallback receives two arguments: the original string (i.e., the strOrig argument) and the Match object (i.e., the m argument).

    To run ReplaceEx1 with AddHttpCallback and display the results, you can use code such as

    MsgBox ReplaceEx1(strText, _
    strTextToReplace, _ "AddHttpCallback")

    You must include the code for the callback function in the same VBScript file as the rest of the application (which will contain the calling module and ReplaceEx1), unless you use the Include function I presented in my July 2000 column.

    An Important Tool
    Regular expressions are a powerful tool. When you learn how to use the RegExp object, it can be the underpinning for many scripting solutions.

    Next month, I'll launch into a new topic: the Windows shell. I'll discuss its object model, which lets you extend file processing and I/O operations to nonstandard folders such as Printers, My Dial-up Connections, or My Network Places.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.