Better HTML and URL Encoding Functions

Defend Against Cross-site Scripting Attacks

Secure ASP.NET




Better HTML and URL Encoding Functions

Defend Against Cross-site Scripting Attacks


By Don Kiely


The HtmlEncode and UrlEncode methods of the HttpServerUtility class in the System.Web namespace have long provided a first line of defense against cross-site scripting attacks. These are the kinds of attacks where someone puts scripting code into an input box on a Web page that includes script. A simple example is to enter this literal text into a text box that prompts for a person s first name:


<script>alert("Ha ha! We've attacked your site!")</script>


When you redirect to another page and display what you thought was the person s first name, an alert box pops up with nefarious text. This is a simple and trivial example of cross-site scripting. If you create Web pages, you should be well aware of this kind of attack and know how to protect against it. Google cross-site scripting for lots of good information.


The HtmlEncode and UrlEncode methods provide protection by converting known bad characters in a string of text to either the &amp;#DECIMAL; or single and double byte notations, respectively. Encoding the characters this way keeps the browser from interpreting it as script. When you pass the <script> code above through these methods, you get these results:



&lt;script&gt;alert("Ha ha! We've attacked your site!")&lt;/script&gt;





These methods take a known bad approach to protecting against attacks. The idea is that there are certain characters that are known to be a problem in these kinds of attacks, notably these characters: <, >, &, , and characters with ASCII values of 160-255, inclusive. As long as you encode those characters, you should be safe or so goes the concept.


The key word in the previous sentence is should. You should be safe as long as an attacker doesn t come up with a way to attack your Web site using other characters. Unfortunately, that s exactly what has been happening lately, making the .NET encoding methods less useful. So Microsoft has shifted away from a known bad strategy to a known good strategy, with its new Anti-CrossSite Scripting Library. The idea is that you shouldn t eliminate only the characters that you know are bad, because that list changes all the time. Instead, leave alone only the characters that you know are okay.


So the functions in the library encode all characters other than the following, providing the same HtmlEncode and UrlEncode methods as in the .NET Framework:

  • a to z
  • A to Z
  • 0 to 9
  • , (Comma)
  • . (Period)
  • - (Dash)
  • _ (Underscore)
  • Space (only in the UrlEncode function)


When you run the <script> code above through these new methods, here is what you get:



&#60;script&#62;alert&#40;&#8220;Ha ha&#33; We&#8217;ve attacked your site&#33;&#8221;&#41;&#60;&#47;script&#62;




As you can see, far less of the original text remains in its character format, meaning that less of the text could be considered executable by the browser. This isn t exactly a monumental change, and the code in the library is quite simple. However, it results in far less of an opportunity for cross-site scripting attacks to succeed.


One difference in the AntiXSSLibrary versions of the HtmlEncode and UrlEncode functions is that they each only have a single overload. The .NET Framework versions have an overloaded form to take both a string and TextWriter object. This overload returns the resulting output to the specified output stream. While you can easily code around this to use the AntiXSSLibrary versions, it could break some code so be careful if you use the new functions in existing applications.


This initial release contains the binaries for versions 1.x and 2.0 of the .NET Framework. You can download the library here (


Don Kiely, MVP, MCSD, is a senior technology consultant, building custom applications as well as providing business and technology consulting services. His development work involves tools such as SQL Server, Visual Basic, C#, ASP.NET, and Microsoft Office. He writes regularly for several trade journals, and trains developers in database and .NET technologies. You can reach Don at mailto:[email protected] and read his blog at





Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.