Skip navigation

Using VB and HTTP to Securely Upload Files

Downloads
upload_cgi.zip

 

Contributing author Kent Empie combines a VB CGI program with HTTP File Upload to securely transfer files

 

\[Editor's Note: VB Solutions is about using Visual Basic (VB) to build a variety of solutions to specific business problems. This column doesn't teach you how to write VB, but how to use VB as a tool to provide quick, easy-to-implement solutions that you can use right away.\]

Many organizations need the capability to upload files from a browser to a Web server. Although adding an FTP server can solve this problem, an FTP server introduces extra security risks and administrative tasks. Opening up an FTP port to the world increases your risk of unauthorized access from hackers because FTP doesn't encrypt the user ID, password, or content of the file. In addition, the FTP server and the Web server use two separate databases, which complicates administration. This article, contributed by Kent Empie, presents an alternative to FTP that solves the problem of secure file uploads using your existing NT Web server and a Visual Basic (VB) implementation of the Common Gateway Interface (CGI). Using a VB CGI program in combination with HTTP File Upload, you can securely transfer files from a Web browser to your Web server.

An Overview of HTTP File Upload
Netscape first implemented HTTP File Upload in Navigator 2.0 in early 1996. Since then, Microsoft has implemented it in Internet Explorer (IE) 3.02a and IE 4.0. HTTP File Upload lets the browser accept a filename in a text input field. Screen 1 shows a typical HTTP File Upload form that an application might present to a user.

To the right of the File Name input field, a Browse option lets the user find a file via a standard File, Open dialog box. For security reasons (e.g., Web sites uploading files from machines without the user knowing it), the File Name field cannot be hidden, nor can it contain a default filename. Once the user clicks Upload File to submit the form, the contents of the file transfer to the Web server.

Typically, an application that uses HTTP File Upload next displays a screen that notifies the user whether the file transfer was successful. Screen 2 shows an example user notification screen for a successful upload. In this example, the application notifies the user, displays the file name and size, and prompts the user with a screen that captures information so that a search engine can index the file. This example is just one type of application that you can build with the HTTP File Upload capability.

Now that you've seen how HTTP File Upload looks to the end user, let's take a look at the underlying components that make up the upload process. Screen 1 presents an overview of the HTTP File Upload process.

To begin the upload, the user first browses to a Web page on the Internet or a corporate intranet. (If you use HTTP File Upload over the Internet, you need to perform user authentication at this point.) As you saw in the example in Screen 1, the Web page includes a form to select a file on the user's local machine. The user enters a filename or browses to select a file from a local directory. Next, the user clicks the form's submit button (Upload File in Screen 1 ), which sends the contents of the form to the Web server. After the user clicks the submit button, the browser begins reading the selected file. The browser encodes the upload file as a multipart file type; that is, the browser encodes the file with special boundaries in much the same way as mail programs encode MIME files sent as attachments in mail messages. Once the Web server receives the posted data, the Web server calls a custom CGI program (e.g., a VB CGI program) that decodes the file and saves it to disk. The Web server invokes the appropriate CGI program based on the name that's part of the form's POST syntax. (For more information about the HTTP File Upload specifications, see the sidebar, "Background on HTTP File Upload,")

Visual Basic Using True CGI
If you're new to the Web arena, you might not be very familiar with CGI. CGI is a standard that programs use to communicate with a Web server on the server side. A program that incorporates the CGI standard communicates with a Web server in the following ways: It reads parameters at the command line, reads from Standard In, writes to Standard Out, and reads information passed through environment variables. CGI is not language specific. You can implement CGI in any language that can communicate in the ways mentioned above.

To clarify one issue, the code in this article uses true CGI. Almost every CGI book I've examined incorrectly states that VB is not capable of executing true CGI programs. Before Microsoft released 32-bit VB 4.0, 16-bit VB 3.0 programmers had to use Win-CGI programming techniques to circumvent VB 3.0's inability to read from Standard In and write to Standard Out. With the Win-CGI workaround, programmers passed variables between the Win-CGI program and the Web server using INI files. Although this method was a less efficient way to communicate with the Web server than using true CGI, for 16-bit VB programmers it was a life saver. However, all that changed with 32-bit VB 4.0, which can read from Standard In and write to Standard Out by calling two Win32 API functions: ReadFile and WriteFile.

Inside the Upload_CGI Program
Now that you've seen an overview of the HTTP File Upload process, let's look at how you can create the VB CGI program that receives the uploaded file. To read environment variables as well as to read from Standard In and write to Standard Out, the upload_cgi application uses several functions that the Win32 API supplies. Because the Win32 API functions are in an external DLL, you must declare them before you can use them in VB. Listing 1 shows the declarations for the Win32 API functions that upload_cgi uses.

The upload_cgi application uses the GetEnvironmentVariable function to read the environment variables from the Web server. GetEnvironmentVariable takes three parameters: a string that contains the name of the environment variable name, a buffer that contains the value of the environment variable, and the size of the buffer.

The upload_cgi application calls the GetStdHandle function to get a handle to the Standard In or Standard Out functions. GetStdHandle takes one parameter that specifies the type of handle to be returned. A parameter value of STD_INPUT_HANDLE causes the function to return a handle for Standard In, and a parameter value of STD_OUTPUT_HANDLE causes the function to return a value for Standard Out.

The Win32 API's ReadFile and WriteFile functions are similar, and each takes five parameters. The first parameter is a handle to the file. To use Standard In or Standard Out, this handle must be the one that GetStdHandle returns. The second parameter is a buffer that contains the data for the read or write operation. The third parameter is the number of bytes to read or write. In the fourth parameter, the function returns the number of bytes actually read or written. Finally, the fifth parameter designates whether overlapped I/O is to be used. The upload_cgi application doesn't use overlapped I/O, so the program sets this parameter to null.

The upload_cgi program starts after the Web Server receives the posted data. Unlike most VB programs that begin by displaying a form (or window), the non-graphical upload_cgi program begins by executing the Main subroutine, which Listing 2 presents.

At callout A in Listing 2, you can see that the first subroutine Main calls is the InitCGIVariables subroutine. InitCGIVariables simply calls the GetCGIenvVar subroutine to retrieve each environment variable that the Web server sends. Inside GetCGIenvVar is the Win32 API GetEnvironmentVariable function. This function returns information about the browser, the server, and the client's IP address, as well as other session information.

At B in Listing 2, the upload_cgi application calls the SendHeader function. The SendHeader function begins building the HTML results form to be sent back to the user when the file upload has completed.

At C in Listing 2, you can see where the upload_cgi program calls the GetStandardInData subroutine, which reads from Standard In, using the Win32 API ReadFile function. GetStandardInData, shown in Listing 3, reads the data file that the user's browser sends.

GetStandardInData first calls the Win32 API GetStdHandle function to get a handle for Standard In. Next, the subroutine uses a Do loop to read the data. Within the loop, the VB String function makes sure that the gsBuff variable's buffer is large enough to hold the data read from Standard In. Then, GetStandardInData calls the ReadFile function using the handle that GetStdHandle returned. The subroutine then reads the data stream from Standard In into a buffer. GetStandardInData compares the string's size to the CGI_Content_Length environment variable to determine when it has received all the information from Standard In.

When GetStandardInData finishes, the Main subroutine resumes; at D in Listing 2, Main parses the data that the browser posted. The browser sends files in multipart format, and Main looks at the CGI_Content_Type environment variable to determine the multipart boundary.

The example in Screen 2 shows what the CGI_Content_Type and CGI_Content_ Length environment variables might look like. When surrounded with the MIME boundaries, the upload file looks like the example in Figure 1.

Because the file is encoded with the traditional MIME headers and footers, handling the string the Web server receives can be a little messy. The form the browser submits also sends the filename, including the full path of the file on the client machine. All the VB program needs to do is strip the path from the filename and write out the contents of the file in binary mode using the correct filename.

At E in Listing 2, you can see where the upload_cgi program checks for the target output directory and then writes the file to the Web server in binary mode. The file is now available to any applications that need it. (On a security note, don't place the uploaded files in a CGI directory or a public HTML directory without first assessing the security risk.) After the upload has completed, the Main subroutine sends a successful completion message to the browser, with all the environment variables the subroutine used.

What Goes Up Must Come Down
You can underestimate the task of downloading a file on the Web because all you have to do is make the file available in a public directory on a Web server, and anyone with a browser can download the file. But the task isn't always that easy. The Web server earmarks many file types for special MIME handling, even if you simply want to download a file and save it on disk. Also, you may not want to store your files on a public directory on a Web server. Even if you use authentication, you may not want to use the Web server's access control list (ACL) to decide whether a user has access to download a particular file. Instead, you might want to use a smart Web application--an application that determines access rights based on a set of events, such as whether the user has filled out a questionnaire or entered valid credit card information. You can easily handle HTTP downloads by using a CGI routine to send a file to a Web browser. The upload_cgi program includes an example CGI download routine: the DownloadFile subroutine shown in Listing 4.

When you implement HTTP file downloading, your CGI program first needs to check whether the person is allowed to access the file in question. This process, of course, depends on your environment and how you determine who can access your files. If you deny a user access to the file, you need to send a regular HTML header and a message to notify the user that the access criteria were not met. If the user has access rights, the program needs to immediately send a header that describes the file as a binary file. The format of the download header, which DownloadFile sends at callout A in Listing 4, is as follows:

Content-type: application/octet-stream
After sending the header, DownloadFile uses the Do loop at B in Listing 4, to read from the disk file (which, of course, does not have to be in a public HTML directory) in binary mode and call the Send subroutine. The Send subroutine, shown in Listing 5, sends the data to the browser.

In Listing 5, Send uses the Win32 API GetStdHandle function to get the handle for Standard Out. The first parameter of the WriteFile function is this handle. The second parameter of WriteFile is the data to be transferred appended with the carriage-return line-feed character. The third parameter contains the length of the data to be downloaded, and the fourth parameter will contain the number of bytes sent after the WriteFile function finishes executing.

Unlike a regular file download via an HREF tag, the Web server doesn't know the contents of a file and sends the file as a binary stream. Therefore, the server will not try to send the file as a particular MIME type. Let's look at one possibility of how to call the CGI routine from the HTML form:

<FORM METHOD="POST" ACTION= "/cgi-bin/file_download.exe?download:filename.doc">

<INPUT TYPE="SUBMIT" VALUE= " file_name.doc ">

</FORM>

This example shows the download CGI program (file_download.exe) being called and passed the download file's name (filename.doc) as a CGI Query string. This arrangement works fine, but when the File, Save As dialog box shows up, the default file name will be the name of the CGI program, not the name of the file to be saved. To get around this problem, you can trick the browser into providing the correct file name as the default, as shown in this modified ACTION parameter:

ACTION="/cgi-bin/file_download. exe/filename.doc?documents/filename.doc "

The correct CGI routine will still execute on the server side, but now the File, Save As dialog box will default to the correct filename.

Just the Tip of the Application Iceberg

In this article, I've shown how to use a VB CGI program to do HTTP File Uploads and downloads. The example upload_cgi program uploads a file to a directory and then echoes the contents of that directory to the user. The user can then download a file to verify that the upload worked properly.

You can easily modify this shell to meet lots of specific business situations. For instance, you can create an Internet or intranet file warehouse that allows uploading, indexing, and searching of the warehoused files. But this idea is just the tip of the iceberg. Once you have adapted the program to your company's needs, simply add user authentication and Secure Sockets Layer (SSL) to your server, and you get a very secure method for transferring files to your Web server.

We Want Your VB Code!
Windows NT Magazine wants to publish your VB solutions. Send us any interesting and useful VB solutions you've created for your business problems. If we agree that your VB solutions are valuable to our readers, we'll publish your code and pay you $100. You can send contributions or ideas for VB solutions to me at [email protected].
Obtaining the Code
The complete source and executable code for this VB solution is available for downloading from Windows NT Magazine's Web site at http://www.winntmag.com.

 

Background on HTTP File Upload
Ernesto Nebel and Larry Masinter from Xerox Corporation coined the term HTTP File Upload in their Request for Comments (RFC) 1867. Written in November 1995, this RFC proposed a new option for an HTML form, <form enctype=multipart/form-data...>, coupled with a new input type, type=file.

Nebel and Masinter initially developed and tested HTTP File Upload as a set of patches to Mosaic, and Netscape has supported HTTP File Upload since Navigator 2.0. The World Wide Web Consortium (W3C) officially accepted this standard in January 1997, as part of HTML 3.2. Netscape 2.0 and Microsoft Internet Explorer 3.0a and 4.0 support this standard.

Additional Reading
RFC 1867, "Form-based File Upload in HTML," ftp://ds.internic.net/rfc/rfc1867.txt

RFC 2068, "Hypertext Transfer Protocol­HTTP/1.1," http://www.ics.uci.edu/pub/ietf/http/rfc2068.txt

 

TAGS: Security
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish