Splitting Lists to Boost Query Performance

Divide and conquer


Have you ever had such an operation, one that you needed to perform on many PCs and whose results you needed to obtain quickly? One way to accomplish this task is to break your input list of node names into smaller lists and run them separately in multiple command-shell windows. However, you could accidentally omit or duplicate nodes when you divide the input list. This method is also problematic because you must combine all the output files to obtain the total results. You must also ensure that all the separate instances have completed before you join the results, and you'll probably want the results in the same order as they are in the input file. If you cut and paste the results back together, you again risk omitting, duplicating, or incorrectly ordering entries. Because the multiple instances include several input and output files, you'll have to create separate copies of your script or code your script so that it accepts arguments for the input and output file paths.

Wouldn't it be useful to be able to launch a task on a list of several hundred or several thousand nodes, automatically split the input list and run it in multiple instances, then automatically recombine the results into one file? How about pinging a list of 2000 nodes in 5 minutes instead of 20 minutes or longer? If it sounds too good to be true, just hold on to your keyboard because I've created the ListSplitter.bat script to do just that. I'm going to use ListSplitter to perform a relatively simple query task to demonstrate how the script works, but you can use ListSplitter to perform other types of queries.

How ListSplitter Works

ListSplitter's logic follows this basic scheme:

1.The script automatically divides the list into separate pieces and lets the user configure the number of entries in the secondary (or breakout) lists.

2.The script begins launching operations in parallel as soon as each breakout list is compiled.

3.The script logs the operations in all the script instances.

4.When all lists have been created and operations have completed against all the nodes on each list, the script reassembles the separate log files into a master report that lists all the nodes in their original order.

The first section of ListSplitter in Listing 1, page 2, performs various configuration jobs, clears and sets the counters, locates the directory path in which the script is stored, and parses the input file before splitting it into the breakout files. After the script executes these preparatory tasks, the code at callout D copies a specified number of lines in the input file to a new breakout file with a unique filename. You use the Set BrkOutNum= statement in the script's configuration section (at the top of Listing 1) to set the number of lines to be copied; in our example, it's 200. The %L1counter% variable tracks which section of the script is performing processing; the variable also forms part of the breakout file's name.

After ListSplitter copies the specified number of lines, it launches the RunOps script, which Listing 2 shows. RunOps calls a utility such as Ping to query the nodes in the first breakout list. At the same time that RunOps is querying the nodes, script flow returns to ListSplitter, which creates a new file and copies the next set number of entries until it eventually processes all the lines in the original input file. The code at callout C in Listing 1shows the If statement that changes the breakout filename after the counter reaches the configured number of entries and launches RunOps against the newly created breakout file.

The RunOps :Task label section (at callout B in Listing 2) contains the code that actually performs the query operation (i.e., Ping). You can easily modify this code to perform virtually any machine query—for example, checking registry entries, virus-definition­file dates, file existence, or local group membership—on a list of nodes. The only requirement is that you must log results to the log file that's named for the particular breakout file that RunOps is running against. You can do so by using code similar to the following example:

Echo %target% OFF>>%Logfile%

After RunOps has finished querying the nodes—but before ListSplitter can combine the files—we need a mechanism to check whether all instances of the RunOps script have actually finished running. Although these instances were launched sequentially, we don't know when they'll actually finish or whether they'll finish in order. The code at callout A in Listing 2 uses the Move command to change the name of each breakout file after the operation has run successfully against it. The changed filename signals that the RunOps instance associated with that file has completed successfully.

ListSplitter uses the code at callout A in Listing 1 to determine which input files are in the done state (i.e., are renamed). The script repeats this check every 10 seconds until it accounts for all the renamed files. (You can change the code as necessary to increase or decrease the checking interval.) Then, the code at callout B uses a somewhat obscure but valuable technique to chain the log files together. The first six lines of code use the Dir command to locate the log files and the + assignment operator to create a string that contains the names of the log files to be merged. The filenames are copied back together in the same order in which they were split to ensure that the order of the results matches the order of the nodes in the original input file. The Copy command will use the string to copy all the log files' content into one file. This string must be built on the fly because the number of log files will vary according to the number of lines in the breakout files and the total number of items in the input file.

The code at callout B is possible because of the Setlocal EnableDelayedExpansion command at the beginning of the script. By default, when the command processor encounters an environment variable in a command, it immediately expands the variable before executing the command. The EnableDelayedExpansion argument delays any environment variable's expansion until either the matching Endlocal command or the end of the script is reached.

Using ListSplitter and RunOps
I tested ListSplitter and RunOps on a server running Windows 2000 Server Service Pack 3 (SP3) and on PCs running Windows XP SP1 and Win2K SP3. To use ListSplitter and RunOps, first download these scripts from the Windows Scripting Solutions Web site. Go to http://www.windowsitpro.com/windowsscripting, enter 43752 in the InstantDoc ID text box, then click the 43752.zip hotlink. (Column widths in the printed publication force us to wrap code lines, which might cause the printed code to run incorrectly.) Then, perform the following steps:

1.Create a folder for the two scripts. The scripts automatically create and locate their temporary files in that folder.

2.Review the code comments in both scripts. The comments contain additional information about how the scripts work.

3.Configure the input file location by editing the following line in ListSplitter's configuration section:

Set InputFile=\\server4\share


4.Substitute your output file location in the line

Set OutFile=\\server4\share


5.Configure the Sleep utility's location by modifying the line

Set SleepLoc=\\server3


6.Configure the number of line items you want in each breakout file by editing the line

Set BrkOutNum=number

To determine the value for number, you must consider the total number of items in the main input list. If you set the line-item number too low, you could create an excessive number of command sessions and exhaust node resources. You'll have to tune this number to your situation, based on factors such as PC or server resources and network throughput. As a general guide, use a figure that's about one-tenth the number of records in the input file.

7.If you want a header at the top of the final report, create a file called headerfile.txt to contain the report header. Insert a carriage return after the final line of header information to prevent subsequent report data from running into the header information. Place this file in the same folder as ListSplitter and RunOps.

ListSplitter has one caveat: It works most efficiently for large jobs—for example, input lists that contain hundreds or thousands of entries. Because of the script's processing overhead, ListSplitter probably won't save you time if you run it against only a small number of nodes.

Time to Split
ListSplitter can speed up your large-scale query operations by eliminating manual cutting and pasting of input and output data. Keep it handy for the next virus threat, security vulnerability, or other problem that requires you to quickly gather information about your workstation and server environment.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.