Presenting the PowerShell Pipeline

Presenting the PowerShell Pipeline

The pipeline is a core PowerShell concept. Little makes sense in PowerShell without an understanding of the pipeline. Unix shells initiated the pipeline concept, Cmd.exe copied it, and PowerShell takes it to the next level. Before we get into the PowerShell pipeline, I need to provide some background on standard input and output.

Standard Input and Output

In Cmd.exe, the pipeline is closely related to standard input, input redirection, standard output, and output redirection. Briefly: Standard input is input you enter at the keyboard unless you tell the shell to read the input from somewhere else (i.e., input redirection). Standard output is the normal output that commands display on the screen unless you tell the shell to store the output somewhere else (i.e., output redirection).

To see how standard input works, enter the following command at a PowerShell or Cmd.exe prompt:

sort.exe

When you do this, the cursor will sit and wait for you to enter something. This is because, by default, Sort.exe sorts standard input. We didn’t provide any input, so it will wait for us to enter something. (Press Ctrl+C to cancel.)

Now, suppose you have a file called MyData.txt that you want to sort. Here is how you would display the sorted output of the file on the screen (standard output):

type MyData.txt | sort.exe

In this example, the Type command sends the content of the file MyData.txt to standard output. The pipe (|) takes this output and uses it as input to the Sort.exe program.

The core concept is this: When you use the pipe (|) character in a command, you are creating a pipeline. In a pipeline, the shell uses the output of the command on the pipe’s left-hand side as the input of the command on the right-hand side.

In most command shells (such as Cmd.exe), standard output and standard input are text only. This can make many kinds of data manipulation and extraction an awkward exercise. Figure 1 shows a simple example of the contortions that Cmd.exe makes us go through to list text files last written in the current year.

Figure 1 - Cmd.exe shell scripts to list text files created in the current year

 

The Sample1.cmd script outputs the time for each file, followed by a hard tab character, followed by the file name. Sample2.cmd gets the current year and runs Sample1.cmd, only outputting files where each file’s year matches. (A red arrow marks the hard tab character in both scripts.) Finally, Figure 1 shows the output of Sample2.cmd (only File1.txt and File3.txt).

Notice that both scripts are forced to use string parsing that depends on the format of the date string (%%~tF in Sample1.cmd and %DATE% in Sample2.cmd). On non-US English versions of Windows, the lines of code that use these date strings will have to be updated, because different locales use different date formats. In addition, the arcane nature of the Cmd.exe script syntax makes readability and maintainability of these scripts a challenge (what does %DATE:~10,4% mean, anyway?).

The point of this example is to show that a seemingly simple example (list files created in the current year) is clumsy and awkward to do in a batch file, and a big part of the problem is that we’re forced to parse strings to determine the year. In addition, the year parsing depends on the locale, which may be a significant problem for some environments that share scripts. Also notice that if the requirements (for example, delete files last written before this year), the scripts are going to get even more complicated and even harder to read. There has to be a better way! Let’s see how PowerShell solves these kinds of problems.

PowerShell’s Pipeline

As noted, standard output and standard input are how other text-based shells (like Cmd.exe) use the pipeline to pass textual data between programs. PowerShell’s pipeline uses the same basic concept--use the output of one command as the input to another command--except that the output and input are objects, not text. This concept, although simple in principle, has far-reaching ramifications.

Filtering with Where-Object

Let’s use the example from the previous section (list *.txt files last written this year). In PowerShell, we do this by getting the file system objects (Get-ChildItem), then selecting (Where-Object) only those file system objects that have a LastWriteTime property of this year. Here’s the command:

Get-ChildItem *.txt | Where-Object {

$_.LastWriteTime.Year -eq (Get-Date).Year

}

You could type this command on a single line, but I split the command onto multiple lines to make it easier to read. The code between the curly braces, { }, is called a scriptblock. Inside the Where-Object scriptblock, the $_ variable means “the current object from the pipeline.” In other words, this command says: “Get file system objects matching *.txt [Get-ChildItem *.txt], and output only objects where [Where-Object] the year of the last write time of each object [$_.LastWriteTime.Year] is equal to [-eq] the current year [(Get-Date).Year].”

As you can see from this example, the Where-Object cmdlet lets us filter objects coming from the left side of the pipeline, and it outputs only objects that match the criteria we specify in the filter. Notice that we’re not doing any string parsing of the date: We’re simply asking each file what its year is.

Now, suppose we want to delete files that were written to before this year. To do this, we simply change our filter a bit and pipe to Remove-Item:

Get-ChildItem *.txt | Where-Object {

$_.LastWriteTime.Year -lt (Get-Date).Year

} | Remove-Item

All we changed here was to use -lt (less than) instead of -eq (equal), and then add the Remove-Item cmdlet after a pipe.

In these two PowerShell commands, notice that instead of passing text strings between commands, we’re passing objects: A file is an object, and the date of a file is also an object.

Performing Actions with ForEach-Object

Aside from filtering objects in the pipeline with Where-Object, we can also pipe to the ForEach-Object cmdlet to perform an action for each object passed through the pipeline. Just as with Where-Object, the ForEach-Object cmdlet uses a scriptblock and the $_ variable to represent the current object in the pipeline.

For example, suppose we want to output the full path and filename of each *.txt file. The command is as follows:

Get-ChildItem *.txt | ForEach-Object {

$_.FullName

}

The output of this command is the full path and filename of each *.txt file. You can, of course, perform many more actions inside the scriptblock. For example, suppose you want to record the names of, then remove, the *.log files in the C:\Logs directory. Here is a sample command that would do this:

Get-ChildItem C:\Logs\*.log | ForEach-Object {

"Removing $($_.FullName)"

  Remove-Item $_

} | Out-File C:\Logs\Cleanup.txt -Append

This command will output the string “Removing ” and then remove the file (Remove-Item). All of the output strings are then written to the file C:\Logs\Cleanup.txt.

Of course, we can also combine filtering (Where-Object) with actions (ForEach-Object) to construct even more flexible commands. For example, suppose we want to remove *.log files that are older than 6 months, but record the name of each before removing:

Get-ChildItem C:\Logs\*.log | Where-Object {

$_.LastWriteTime -lt (Get-Date).AddMonths(-6)

} | ForEach-Object {

  "Removing $($_.FullName)"

Remove-Item $_

} | Out-File C:\Logs\Cleanup.txt -Append

Even if you’re not a PowerShell expert, it’s possible, with some understanding of the aforementioned basics about objects and pipelines, to comprehend what is happening with these PowerShell commands.

The Power of the Pipeline

The pipeline is the key to much of the power in PowerShell. Experiment with the examples I presented earlier and you will find that PowerShell makes complex things much simpler than what was possible in Cmd.exe. Also, read the PowerShell help topic about_pipelines (https://technet.microsoft.com/en-us/library/hh847902.aspx) help topic for more information and examples.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish