Rename Files by Using Regular Expressions

Downloads
95072.zip

Most users are familiar with performing file-management tasks by using GUI tools such as Windows Explorer. Although Windows Explorer is a great tool for displaying a structured view of the file system, some file-management tasks are better suited to a command-line tool. I recently needed to rename a group of files and folders en masse, and Windows Explorer isn't well suited to such a task. To perform this task, at first I wrote a simple script that replaced space characters with underscores, but I quickly realized that I could generalize the script so that it used regular expressions to rename files. The result is Renamer.js.

Using Renamer.js
To use Renamer.js, you need Windows 2000 with Windows Script Host (WSH) 5.6 (you'll meet this prerequisite if you're running Microsoft Internet Explorer-IE-6.0) or Windows XP or later. Renamer.js uses the following command-line syntax:

\[cscript\] Renamer.js directory
   \[...\] \[/f:str\] \[/r:str\]
   \[/o:d|f|df\] \[/c\] \[/p\] \[/s\] \[/t\]

Directory (a required parameter) is the directory containing the files and folders you want to rename. You may specify more than one folder name on the command line. Renamer.js requires the CScript host, so the cscript keyword at the beginning of the command line isn't required if CScript is your default script host. To configure CScript as your default script host (I recommend that you do so), type the following at a command prompt:

cscript //h:cscript //nologo //s

There are seven additional, optional parameters for Renamer.js. The /f parameter specifies the string (a regular expression) you want to replace in filenames or folder names. If this string contains spaces, enclose it in quotes. This parameter is optional; the default find string is a space character (i.e., /f:" ").

The /r parameter specifies the replacement string. The replacement string can contain regular-expression backreferences (I'll explain more about backreferences in the next section). /r is also optional; if you don't specify it, the default replacement string is an underscore character (i.e., /r:"_").

The /o parameter specifies whether Renamer.js should rename folders (directories), files, or both. You can specify d, f, or df as the option to this parameter. If you specify /o:d, Renamer.js will rename only folders (directories), /o:f renames only files, and /o:df tells Renamer.js to rename both files and folders. If you don't specify /o, the default is /o:f (i.e., the script will rename only files).

The /c parameter tells Renamer.js that it shouldn't ignore case when processing filenames and folder names (i.e, file and folder names are case-sensitive). Without /c, Renamer.js will ignore case. The /p parameter specifies that Renamer.js should prompt the user before renaming each file or folder. The /s parameter lets you rename files and/or folders in subfolders. And finally, the /t parameter makes Renamer.js operate in test mode, where it only displays the names of the files and/or folders it will rename and doesn't actually do the renaming. If you specify both /p and /t, /t takes precedence.

It almost goes without saying that you need to be very careful with this script. If you inadvertently rename files in a system or an application's folder, doing so could cause disastrous results. I recommend that you always use test mode (/t) before renaming files or folders on a production system.

A Crash Course in Regular Expressions
Before I discuss the Renamer.js script's structure and how it works, I need to explain a bit about regular expressions and why Renamer.js uses them. A regular expression (sometimes abbreviated as RE, regexp, or regex) is a sequence of ordinary characters and special characters (sometimes called metacharacters or metasequences) that specifies a generic pattern you want to match. (You can find more information about regular expressions and their scripting syntax on MSDN at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/2380d458-3366-402b-996c-9363906a7353.asp. )

Table 1 shows some of the more common metacharacters you might use in regular expressions. For example, consider the pattern ".+\.do\[ct\]$" (without the quotes). Let's break down this pattern: .+ matches any single character one or more times, \. matches a literal period, and do\[ct\]$ matches the string "doc" or "dot" at the end of the string. In other words, this pattern matches any filename that ends with the string ".doc" or ".dot".

You can use parentheses to group portions of a regular expression's pattern. For example, the expression (abc)* matches the string "abc" zero or more times. Without the parentheses, the * would only match the final character; that is, abc* would match "ab" followed by zero or more occurrences of the letter c.

Regular-expression patterns are case-sensitive unless you specify otherwise. Since Windows filenames are case-retentive but not case-sensitive, Renamer.js as a rule ignores case. If you don't want Renamer.js to ignore case, use the /c parameter on the Renamer.js command-line.

Parentheses also let you refer back to each grouped expression by number in the replacement string. $1 refers back to the first expression in parentheses, $2 refers back to the second expression, and so forth. You can refer back to a maximum of nine grouped expressions. The $1 through $9 sequences are usually called backreferences because they refer back to grouped expressions in the search string.

For example, consider the list of files in Figure 1. These filenames have two numbers followed by a space, then the word Data. Suppose you want to place the numbers at the end of the filename, rather than at the beginning, preceded by an underscore (e.g., you want to rename 00 Data to Data_00). To do so, you can use this find pattern:

(\[0-9\]\{2\}) (Data)

(exactly two digits, a space, and the word Data). The replacement pattern would be

$2_$1

The string $2 will be replaced by the second sequence in parentheses from the find expression (the word Data), and $1 will be replaced with the first sequence in parentheses (the two digits).

Let's consider another example that shows how regular expressions can help you rename files. Suppose you have a large number of image files that don't have filename extensions, but the file's type is indicated in the filename itself. The filenames consist of a set of four hexadecimal digits, a dash, and the file type. For example, the file 002F-JPG is a JPEG file, and 01A6-TIF is a TIFF file. You'd like to rename these files so that they have extensions. For this example, the find pattern would be

(\[0-9A-F\]\{4\})-(.+)

Let's break down this expression. (\[0-9A-F\]\{4\}) means any characters 0 through 9 and A through F exactly four times, and the enclosing parentheses specify a backreference (that is, we can use $1 in the replacement string to refer to it). The - character matches itself, and (.+) matches any character one or more times. The parentheses create a second backreference. The replacement string would be

$1.$2

$1 refers back to the first grouped expression (the four hex digits), and $2 refers back to the second grouped expression (the file's type), separated by a period.

Inside Renamer.js
At the beginning of the script, which Listing 1 shows, Renamer.js declares a set of global variables and executes the main function by calling it as a parameter of the WScript object's Quit method. That is, the main function's return value is the script's return value.

The main function declares its own set of variables and checks the command line. If the command lacks at least one unnamed argument (i.e., an argument that doesn't start with a forward slash), or if the /? argument exists on the command line, the main function calls the usage function, which displays a usage message and ends the script.

Next, the main function uses the scripthost function to determine the script host that's executing the current script. If CScript isn't executing the script, the main function echoes an error message and returns with a nonzero exit code.

The main function's next task is to create an object with a set of properties and assign it to the options variable. Callout B in Listing 1 shows how the main function creates the object and assigns default values to its properties. This object will be passed as a parameter to the process function, which I'll describe later.

The main function's next task is to validate the script's command-line arguments. First, the script determines whether the /f parameter exists on the command line. The main function sets the options object's oldstr property to the /f parameter's argument. If the /f parameter's argument is an empty string, the main function outputs an error message and returns with a nonzero exit code. If the /f parameter doesn't exist on the command line, the main function sets the options object's oldstr property to a space character.

The main function processes the /r parameter similarly. If the /r parameter appears on the command line, the main function sets the options object's newstr property to the /r parameter's argument. In this case, it's possible to set the replacement string to an empty string (i.e., /r:""), so that the main function doesn't return with an error as it does with the /f parameter if the replacement string is empty. If the /r parameter doesn't appear on the command line, the main function sets the options object's newstr property to an underscore character.

After processing the /f and /r parameters, the main function checks whether they match. If the /f and /r parameters' arguments match, then the main function outputs an error message and returns with a nonzero exit code.

Next, the main function tests for the presence of the /p, /s, and /t parameters, as the code at callout C shows. If any of these parameters exist on the command line, the main function sets the options object's prompt, recurse, and testmode properties, respectively, to true.

The main function's next task is to parse the /o parameter. The logic is thus: If the /o parameter's argument contains the letter d, set the option object's dirs property to true. Likewise, if the /o parameter's argument contains the letter f, set the option object's files property to true. If the /o parameter's argument contains neither the letters d nor f, the option object's dirs and files properties both remain false. If the /o parameter doesn't exist on the command line, the main function sets the options object's files property to true.

If the options object's dirs and files properties both remain false (i.e., the /o parameter contained neither the letters d nor f), the main function outputs an error message and returns with a nonzero exit code.

At this point, the main function has processed the script's command-line parameters and is ready to continue. The main function sets the options object's re property to a compiled version of the regular expression found in the oldstr property, as the code at callout D shows. The compile method compiles the regular expression into an internal format for faster execution.

The compile method's syntax is RE.compile(pattern, flags). RE is the regular-expression object, and pattern is the regular-expression pattern. The flags parameter is a string that can be empty or can contain the letters g, i, or m (in any combination). If flags contains g, the regular expression will find all occurrences of the pattern in the search string (global); without g, the expression will return only the first occurrence. Renamer.js always uses g. If flags contains i (ignore case), searches aren't case-sensitive. Renamer.js uses i if the /c option doesn't exist on the command line; if /c exists on the command line, Renamer.js leaves the i out of the flags parameter. (The m flag specifies multiline mode, which doesn't apply to file and folder names, so Renamer.js doesn't use this flag.)

Next, after the main function creates a reference to a FileSystemObject object, the function retrieves the collection of unnamed command-line parameters, because each unnamed parameter is the name of a folder that contains files and/or folders to be renamed. The main function iterates this collection by using the for statement and retrieves the contents of each argument by using the item method. The main function uses the FileSystemObject's FolderExists method to determine whether the folder exists. If the folder doesn't exist, the main function outputs a message to this effect and continues to the next folder in the collection. Otherwise, the main function uses the FileSystemObject object's GetFolder method to obtain a reference to the named folder and passes this reference, as well as a reference to the options object, to the process function, which I'll explain in more detail in the next section.

After the main function executes the process function for each folder named on the command line, the main function creates an output string that contains the results of its operations and echoes the results.

The Process Function
The process function at callout A declares some variables, then checks whether the options object's dirs property is true. If it's true, the process function retrieves the SubFolders collection from the current folder (i.e., the Folder object passed as the process function's first parameter). For each Folder object in the collection, the process function uses the search method to determine whether the current subfolder's name contains the regular-expression pattern. The search method returns -1 if it doesn't find the regular-expression pattern in the current subfolder's name, so if the search method didn't return -1, the process function uses the replace method (by using the options object's newstr property) to generate the replacement name. The process function then calls the rename function (which I'll explain shortly) to perform the renaming operation. If the rename function returns true (i.e., the rename was successful), the process function increments the options object's dirtotal property.

Next, the process function checks the value of the options object's recurse property. If the recurse property is true (i.e., the /s parameter exists on the command line), the function again retrieves the SubFolders collection from the current folder and iterates the collection with the for statement. The process function then calls itself to process each subfolder of the current folder.

Finally, the process function checks whether the options object's files property is true. If the property is true, the function retrieves the Files collection from the current folder (i.e., the Folder object passed as the function's first parameter). For each File object in the collection, the process function follows essentially the same procedure it did when renaming folders: It uses the search method to determine whether the regular-expression pattern occurs in the file's name, and if so, it uses the replace method to generate a replacement name. The process function then calls the rename function to rename the file. If the rename operation succeeded, the process function increments the options object's filetotal property.

The Rename Function
As mentioned, the process function calls the rename function to rename a file or folder. The rename function has three parameters: o is a reference to the file or folder being renamed, newname is the proposed new name of the file or folder (generated by the process function), and options is a reference to the options object created by the main function. The rename function uses a try block to trap errors that might occur when renaming a file. (Without a try block, an error will abort the entire script.)

Inside the try block, the rename function checks whether the options object's testmode property is false (i.e., the /t parameter isn't on the command line). If the testmode property is false, the rename function next checks whether the options object's prompt property is true (i.e., the /p parameter exists on the command line). If the prompt property is true, the rename function calls the query function (described next) to prompt the user whether to rename the file. If the query function returns false, the user opted not to rename the file and the rename function returns false; otherwise, the rename function attempts to assign the new name to the file or folder object. If the assignment fails, the catch block handles the error gracefully by generating an error message to standard error output and returns false.

The Query Function
The query function displays a prompt on standard output. The function then calls the ReadLine method of the WScript object's StdIn property to wait for the user to enter a string. The query function converts the input to lower case and retrieves the input's first character. If the user's input started with the letter "y," the function returns true; otherwise, it returns false.

Renaming Without Limits
Renamer.js brings the power of regular-expression pattern matching to the mundane task of renaming files and folders. If you need to perform a mass-renaming operation, Renamer.js may be just what you need-and you'll learn something about using regular expressions as well.

Comments

Plain text