Systems administrators often find themselves examining raw computer data that doesn't make much sense. Data from tools such as network-packet analyzers and disk editors should be easy to read, but instead it's a computer novel of hexadecimal codes and offset values. For busy administrators, sorting through this "hex dump" for useful information is a waste of time. So, I created a Perl script named HexDump.pl that converts computer hex dumps into legible text. To understand how this script works, you need to know the anatomy of a network (i.e., TCP/IP) packet and what a hex dump contains.
The Anatomy of a Network Packet
Most network-packet analyzers let you analyze each network packet. Such analyses describe all sorts of interesting information, such as the IP address the packet was traveling to, the IP address it came from, and destination and source TCP ports. It's easy for network-packet analyzers to pick out this information because every packet must have it and it's in a predicable location.
Each network packet also contains a payload section that contains the actual data being transported. The content of the payload section depends on what server sent the packet and what client application reads it. For example, a packet's payload section might contain compressed audio and video data if the packet is going from a streaming media server to a music-player program or the payload section might contain a Web page's HTML code if the packet is going from a Web server to a Web-browser program.
What's in a Hex Dump
Most network-packet analyzers, such as WinDump (http://www.winpcap.org/windump), Ethereal (http://www.ethereal.com), and NetMon (a component of Microsoft System Management Server—SMS) display hex dumps in a common format. Other tools, such as programming and disk editors, also use this format.
A hex dump typically contains multiple rows, each of which consists of byte-offset values, hex-based byte values (aka hex bytes), and an ASCII character display of the data (in that order). In Figure 1, you'll see a hex dump from an HTTP request for a Web page. The payload section starts in the fourth line.
This hex dump shows that at the beginning of the packet (offset of 0x0000), there are 4 bytes that contain the hex values of 0x00, 0x0f, 0xb5, and 0xbe. Now skip down to the line that starts with 0080 (0x0080 hex bytes or 128 bytes in decimal), which contains the hex values of 0x49, 0x66, 0x2d, 0x4d, and so on. If you look at the end of the line in the right column, you'll see that the bytes in this line represent the ASCII text If-Modified-Sinc. If you combine the bytes in the last three lines in the hex dump, you'll find that they represent the ASCII text If-Modified-Since: Tue, 07 Feb 2006 19:20:39 GMT.
As this example shows, it quickly becomes a burden to read text in a column that's only 8 or 16 characters wide. When you have to examine hundreds of packets, this task is quite time-consuming. Thus, I wrote HexDump.pl to make it easier to read hex dumps. The script waits for you to copy a hex dump to the Windows clipboard. The script then converts the hex dump to text and displays the results. When a byte contains data that isn't a valid printable text character, a period will be printed in its place. That way, you'll know that some data was present but just not presentable. For example, Figure 2 shows HexDump.pl's output for the hex dump in Figure 1.
How the Script Works
Listing 1 shows HexDump.pl. This simple script uses the Win32:: Clipboard extension, which is part of ActiveState Software's ActivePerl standard distribution (http://www.activestate.com). For more information about this extension, see my article "Capture the Clipboard's Contents" (July 2006, InstantDoc ID 50376).
As callout A in Listing 1 shows, the script creates a Win32 Clipboard object. If the script can't create this object, the script fails. After the Clipboard object is created, the script calls the WaitForChange() method, passing in a timeout value of 100 milliseconds. This pause essentially clears any pending clipboard changes that have been queued.
The code at callout B starts a While loop that waits for a hex dump to be copied to the clipboard. This code uses the WaitForChange() method so that the script pauses until a change is made to the clipboard. In this instance, however, no timeout value is specified, so the script will wait indefinitely for a change.
After a change is detected, the While loop code runs. This code first checks to see whether the clipboard contains text. When the clipboard doesn't contain text (e.g., you copy a graphic image), the While loop continues to wait for a change. When the clipboard does contain text, a Foreach loop begins, as callout C shows. The Foreach loop enumerates each line of text in the clipboard. It does so by calling the Get() method to retrieve the clipboard's text, which it separates into rows using a carriage return (\n) as the delimiter.
As the Foreach loop processes each line of text, the code at callout D splits each row of text into three components—an offset value, a list of hex bytes, and ASCII data—using a regular expression (regex). When the loop fails to find these components, it moves on to the next row. When the loop finds these components, it assigns them to the scalar variables of $Offset, $Hex, and $Display, respectively. For the remainder of the script, only $Hex is used. The remaining $Offset and $Display variables are ignored; their real function was to help parse out the $Hex component.
The rest of the Foreach loop is simple. For each $Hex value, the loop first removes any spaces. The loop then uses the Pack() command to convert the $Hex value into a character string. When the character is unprintable, it's replaced with a period. In this case, an unprintable character is either greater than 0x7e or less than 0x20 and not a linefeed (0x0a) or carriage return (0x0d). The loop finishes by appending the results of this hex conversion to the $Data variable, after which it starts processing the next row of data.
After all rows of text are processed and the $Data variable contains all the text, the code at callout E runs. This block of code simply cleans up the text in $Data by replacing newline characters (\r) with carriage returns (\n) unless the newline character is already is accompanied by a carriage return (\r\n). The script also replaces tab characters with a couple of spaces to make printed text look nicer.
Perl to the Rescue
Whether you're analyzing network packets, examining sectors on a hard disk, or trying to extract data from a corrupt database file, HexDump.pl can come to your rescue. This script uses the Win32::Clipboard extension along with some simple Perl processing to convert a hex dump into simple text. It's much easier for human eyes to pour through this text than a myriad of hexadecimal codes and offset values.