Alternate data streams occasionally crop up as security concerns because an attacker might use these streams to hide files on your system. These streams are also of interest to law enforcement because people sometimes use them to hide illegal material and records of illegal activities. Most programmers don't understand alternate data streams, and few tools can detect their presence. I'll help you understand how NTFS stores a file and present an application you can use to display the data streams present in a file.
Every file consists of a set of attributes. Oddly enough, a file’s name isn’t part of the file; rather, the filename is a directory entry that points to the actual file. This level of indirection is necessary because Windows 2000 and Windows NT both support links—for details, take a look at the CreateHardLink() function. Think of a directory entry as a pointer—each filename and directory entry tells the file system which file to access. It is possible to have more than one pointer that points to the same data (e.g., you can have more than one directory entry point to the same data). This concept won’t be new to you if you have any experience with UNIX systems. While these pointers have always been present in NT, until Win2K shipped, the API calls to create hard links were available only in the Device Driver Kit (DDK).
File attributes consist of several fields. The first field describes whether a file is system, hidden, read-only, archive, or one of several less typical attributes. The second field describes the creation time, access time, write time, and the size of the file. You can retrieve file attributes in code using two functions: GetFileAttributesEx() and GetFileInformationByHandle().
In addition to the attributes, each file that you store on an NTFS volume typically contains two data streams. The first data stream stores the security descriptor (for more information on the security descriptor, see Setting Security), and the second stores the data within a file. For more information on how data streams and NTFS work, see David Solomon and Helen Custer, Inside Windows NT, second or third edition (Microsoft Press, 1998).
Alternate data streams are another type of named data stream that can be present within each file. Before we can search for alternate data streams, we need to create a file containing such a stream. Start by going to the command line and typing
Put some data in the file, save the file, and close Notepad. From the command line, type
and note the file size. Next, go to the command line and type
Type some new text into Notepad, save the file, and close Notepad. Check the file size again and notice that it hasn’t changed!
If you open test.txt, you see your original data and nothing else. If you use the type command on the filename from the command line, you still get the original data. If you go to the command line and type
you get an error. If you have UNIX command-line tools (available from the Microsoft Windows NT 4.0 Resource Kit, various vendors, or this FTP site, which contains a free set that I prefer), try using cat. Using cat reveals the following:
\[c:\temp\]cat test.txt this is a normal data stream \[c:\temp\]cat test.txt:hidden.txt this is a hidden data stream
Now that we have a file with an alternate data stream, how do we detect this stream? A search for "alternate data stream" on the Microsoft Developer Network (MSDN) library gives us some clues. The first is a pointer to a WIN32_STREAM_ID structure, and the second is a pointer to the tape backup functions that work with these structures. Although using the tape backup functions might seem easy at first, it took me a couple of passes to unravel exactly how to use the BackupRead() and BackupSeek() functions.
Look at Listing 1, liststreams.cpp, starting with wmain() at line 170. (To download the compiled version of the application, click Download the code from the Article Information box at the top right corner of this page.) Using wmain in many cases delivers command-line arguments as UNICODE characters, which is more efficient than ASCII because the OS uses UNICODE at the lowest levels. After the usual error checking and usage information, the first function you encounter is EnableBackupRights(). For a more thorough understanding of process tokens, see my last article.
If you’re logged on as a member of the Administrators group, your process token will have backup rights present; however, to use these rights, an application must enable them. This fact is important because if a user had backup rights enabled all the time, that user could read any file on the system without triggering audits. Such behavior might be desirable for your tape backup application, but it isn’t recommended for day-to-day use of the system. EnableBackupRights() begins at line 118.
To enable backup rights in a process token, we must first open the process token with access of TOKEN_ADJUST_PRIVILEGES. Next, we need to initialize the TOKEN_PRIVILEGES structure. A Locally Unique Identifier (LUID) represents the user's privilege in the system, and we obtain the LUID by calling LookupPrivilegeValue(). A TOKEN_PRIVILEGES structure can contain information about more than one privilege. For instance, we might want to enable backup and restore rights at the same time. Because we only need backup rights for our application to work (which simplifies matters), we can initialize the PrivilegeCount member to one. After we initialize the TOKEN_PRIVILEGES structure, we then pass it into AdjustTokenPrivileges(). Checking the return on this function is a little tricky because it can return success even if we don't enable all the privileges. To adjust more than one user privilege at a time, call GetLastError() to be sure all the privileges asked for were actually set.
After we enable the backup right for the process, we open the file using the FILE_FLAG_BACKUP_SEMANTICS flag. This flag is the reason why we put all the extra work in EnableBackupRights(). If we don't set this flag, we won’t see the entire file when the application calls BackupRead(). Because we don’t know beforehand how large each file will be or how many streams a file might have, we need to read the file in a while() loop. Unlike an ordinary backup application that reads the actual file data, our application simply displays the streams. When the application calls BackupRead(), a WIN32_STREAM_ID structure will precede the data buffer, and the buffer needs to be large enough to hold both this structure and some of the file data. BackupRead() also takes a void pointer to an undocumented context structure—all we need to know is to initialize the context pointer to NULL and call BackupRead() one final time to clear the structures. Once BackupRead() returns, we must check to see whether the function actually read anything—like the ordinary ReadFile() function, BackupRead() notifies us when we reach the end of the file by returning success and 0 bytes read.
After you have some data in the buffer, the application calls DumpStreamId() at line 17. The application can obtain multiple data streams from one read operation, so this function does some tricky pointer manipulation to go from one WIN32_STREAM_ID structure to the next. If you plan on dealing with these structures, carefully read the comments in this function. Begin by looking to see whether the stream has a name (the security descriptor and normal data streams are not named). If the stream has a name, the WCHAR string that the cStreamName member points to isn’t null-terminated, so you have to have a safe way to copy the string to a buffer that can be null-terminated. (In this case, I chose to use STL’s wstring data type.) Next, convert the stream ID number to something meaningful, check the stream attributes, and print the stream size. Because NTFS allows files greater than 4GB, check the high DWORD of the file size. Finally, increment the pointer to the next stream, and the condition for the while() loop will check to see whether the pointer is beyond the read buffer. Now that you have at least some of the streams associated with the file, return into wmain().
If the number of bytes read equals the buffer size, the file might contain more streams. BackupSeek() will seek only to the end of the current stream, so this function is used to find the end of the stream, not necessarily the end of the file. There are three possible outcomes: we’re already at the end of the file, we’ve successfully moved the file pointer to the beginning of the next stream, or there has been an unexpected error. If we’re at the end of the file or encountered an error, it’s time to quit. Otherwise, we go back to the top and read more data (if available).
If you try out the application on the file you created earlier, the output will be as follows:
Stream information on test.txt: Security descriptor data Stream contains security information Stream size: High DWORD = 0, Low DWORD = 200 Standard data Stream size: High DWORD = 0, Low DWORD = 28 Stream Name = :hidden.txt:$DATA Alternative data stream Stream size: High DWORD = 0, Low DWORD = 14
Several other functions, such as calling OpenProcess() with debug rights, require enabling user rights for a process, and this application presents some code you can use for that purpose. Although alternate data streams aren’t usually something that concern a security-conscious programmer, this tool is very useful to security administrators and anyone doing computer forensics.