ASP.NET MVC and Search Engine Optimization

ASP.NET MVC has been obviously touted as being search engine optimization (SEO) friendly by easily enabling the use of within web sites. In this briefing I'll also look at how it provides a fantastically flexible architecture that makes easy to cleanly, and intelligently, handle requests for missing or moved content with just a few lines of code.

SEO and Missing Content

Since the goal of SEO is to ensure efficiently indexed site content, any time content moves or is removed, it's essential that robots or spiders pick up those changes correctly. Accordingly, if a request is made for the following.

http://www.myserver.com/something /that-does-not-exist , best practices for ensuring great end-user experiences while ensuring SEO require the following: Pretty error messages - or some intelligible markup that tells users that the content they're looking for can't be found. Ideally, this will also include options for other links that might meet user needs, or help keep them on the site. In other words you don't want your web server to just kick out a raw HTTP 404 that will get handled by the user's browser. Proper HTTP response codes (such as HTTP 404s, 410s, and 301s or even 302s) that tell bots how to index missing or relocated content - even though you're giving end-user 'pretty' error messages. In other words, just because you're outputting nicely formatted error messages for humans, you can't be giving out HTTP 200 (OK) responses to bots.

Tackling Missing Content with ASP.NET

With 'classic' ASP.NET, there are a couple of ways to handle requests for missing content. One of the easiest is to use the web.config's CustomError element along with directives for how to handle 404s (and other status codes). Sadly, this approach actually kicks off a full-blown redirect (HTTP 302) to a defined catch-all page, which makes it completely laughable from an SEO perspective - even if it does allow you to provide end-users with pretty error messages.

In the past, when working with dynamic URLs that feature ids as part of the path or querystring, I've taken the approach of creating a NotFoundException class. Then, if a request for something results in a failure to locate that resource in the database (such as a request to /product.aspx?id=-999), I just throw a new NotFoundException. Within the Global.asax it's then fairly straight-forward to check for typeof(NotFoundException) instances using the Server.GetLastError() in the Application_Error method of the Global.asax. If a NotFoundException is detected, details can then be processed or logged as needed, and a call to Server.Transfer allows for both a pretty error message to be returned along with an HTTP 404 - without changing or redirecting the requested URL Overall, this process has worked fairly well for me in the past with ASP.NET applications, but NOT when it comes to locations that just don't exist. For example, if the request is: /products.aspx?id=-999 Then the NotFoundException works well - because when you go to grab the -999 id from the database, it doesn't exist, and your code can handle that just fine. But what happens if someone fat-fingers the url, gets a bad link, or somehow just gets a bit off kilter with something like /procudts/?id=1277 In cases like this, ASP.NET doesn't offer many practical options. You could use CustomError handling from the web.config - but that results in requests being redirected. Likewise, you could implement your own HTTPModule to validate requests, but that's a lot of work and flies in the face of DRY(Don't Repeat Yourself) - as you end up having to 'double' account for urls within your site.

Tackling Missing Urls with ASP.NET MVC

Out of the box, ASP.NET MVC makes it much easier to handle requests for paths that don't exist. That, of course, is because it takes an entirely different approach to routing requests - where developers are able to specify the specify kinds of routes and requests that they would like to map to various parts of their applications. Then, for anything that doesn't match any predefined routes, it's an MVC best-practice to define a 'catchall' route that handles any non-defined routes. With a catch-all defined, you're free to route that to any Controller Action you want. For me, it makes the most sense to send it to my 404 Action - where end-users are given a friendly "these aren't the droids you're looking for" error message, and robots are given a full-blown HTTP 404 to help ensure that nothing is indexed. Defining this approach is simple within code. First, all you need is a catchall routing definition (which you'll want to place dead-last in your list of mapped routes) that maps to a controller that you want to use for handling these 404's:

	routes.MapRoute(

"CatchAll",

	"{*catchall}", new { Controller = "Error" , Action = "NotFound"

Then, within the controller action itself, you can do any desired logging that will let site admins know about any potentially missing content or bad inbound links, and so on. Likewise, you could also do some parsing of the bad URL itself to see if you might not be able to offer the requestor some viable options. But once any and all of that processing is done, just make sure to set the correct Response Code, and return a View.

public class ErrorController : Controller 
{
public ActionResult NotFound ()
{
  // do any logging, 
 // processing, etc on the 404

 ErrorManager.Report404Details()

Response.StatusCode = 404

 // next line needed in IIS7

Response.TrySkipIisCustomErrors = true;

return View
    }
}

With ASP.NET Webforms applications, one thing that has always driven me batty was that setting Response.StatusCode manually frequently ends in your Response being hijacked by the WebForms HTTPHandler as it detects the status code and then jumps in and routes the Response through any CustomErrors you have defined. When using MVC applications on IIS7, I haven't run into that problem at all - which is a welcome change as it allows me to set whatever Response Codes make sense, while still maintaining complete control of the emitted markup.

MVC and Missing Content

While MVC functionality makes it easy to handle bogus URLs, it also provides a very flexible and capable architecture that makes it easy to handle requests for bogus content made against legitimate routes, or request paths. So, for example, if you had a site that listed community members on a state by state basis, and someone entered the following request: /members/state/idaho/ That's something you'd expect to pull up a result. But if the request was for members/state/alberta what you'd REALLY like to do is make sure that users ended up on /members/province/alberta/ You'd also want to make sure that spiders knew that there was a difference, so you wouldn't want to just throw out an HTTP Redirect (or HTTP 302).

Extending ASP.NET MVC with a NotFoundResult

Happily, in order to handle this, and other similar needs (such telling bots that content has been permanently moved or removed) I recently augmented one of my SEO-critical sites with a custom ActionResult implementation called a NotFoundResult. What's cool about this extension though is how incredibly easy it was to implement - thanks to the well-thought-out nature of the ASP.NET MVC Framework.

From an implementation perspective, a bare-bones version of a NotFoundResult only takes a few lines of code to complete. In fact, a minimal approach almost has as many lines of error handling code (for 'just in case scenarios') as it does core logic:

public class NotFoundResult : ActionResult 
{
public NotFoundResult() { }
public override void ExecuteResult(ControllerContext context)
{
try
{
RouteData rd = new RouteData(
context.RouteData.Route, 
context.RouteData.RouteHandler);
RouteValueDictionary routes = 
new RouteValueDictionary(
new { Controller = "Error", Action = "NotFound" });
foreach (var route in routes)
rd.Values.Add(route.Key, route.Value);
IHttpHandler handler = new MvcHandler(
new RequestContext(context.HttpContext, rd));
handler.ProcessRequest(HttpContext.Current);
}
catch (Exception ex)
{
// log/alert/etc:
// Services.Blah.LogOrAlert(ex.Message)
context.HttpContext.Response.StatusCode = 404;
context.HttpContext.Response.Write("Resource Not Found.");
context.HttpContext.Response.End();
}
}

A more advanced versions of this kind of approach could then use an enumeration to define what kind of 'NotFoundResult' we were dealing with here such as:

public enum NotFoundType 
{
NotFound,
Moved,
MovedPermanently,
Removed
}

Then, of course, based upon the subtype of NotFoundResult returned from a Controller Action, you could define different 'handlers' by mapping to different Controller Actions within the Error Controller itself. Something like replacing one of the lines above with something like the following:

RouteValueDictionary routes = 
new RouteValueDictionary(
new { Controller = "Error", 
Action = this.NotFoundType.ToString() });

Of course, knowing what sub-type of NotFoundType to return requires some fairly carnal knowledge about your application and URLs - but as a developer that's stuff you should have a good feel for anyhow. Signaling that requested content hasn't been found is quite simple, and works well with the command pattern - which I personally, and stylistically, prefer to use instead of throwing exceptions when content isn't found, or is known by the application to have been moved or removed. For example, consider the following sample, where code accounts for a product either not existing, or being 'retired' or archived:

public class ProductController : Controller 
{
public ActionResult Detail(int productId)
{
ProductRepository Repo = ProductRepository.GetCurrent();
Product requested = Repo.GetProductById(productId);
if (requested == null)
return new NotFoundResult(NotFoundType.NotFound);
if (requested.Archived)
return new NotFoundResult(NotFoundType.Removed);
// otherwise... normal handling of ViewData/etc... 
return View();
}
}

At this point, if a NotFoundResult is returned, it will be routed to the correct Action within the Error Controller, where additional processing and handling can be handled as needed, such as in the case of a NotFound result:

public ActionResult NotFound()
{
// do any logging, lookups,
// processing, etc on the 404:
ErrorManager.Report404Details();
Response.StatusCode = 404;
// next line isn't needed in IIS7
Response.TrySkipIisCustomErrors = true;
return View();
}

And hopefully, that code looks familiar. It should - as it's the code I was using earlier to handle catch-all 404s. Therefore, the nice thing about this approach, is that you don't need two different approaches to handle missing data/content 404s as well as bad urls (as is the case with ASP.NET). Likewise, other Actions (such as Removed, or Moved) can implement different, customized, logic that can attempt to decipher what users are requesting, drop that into ViewData, and then display those links or resources to end-users while ensuring that bots don't get any goofy ideas about what's going on. Conclusion The beauty, of course, is that by adding a tiny bit of extensibility to existing ASP.NET MVC functionality, it's possible to create an ideal SEO experience where end-users are given helpful error pages that make their overall experience better, and bots are still given the HTTP Response Codes needed to keep SEO on par. Even better though is that managing both of these needs from a developer perspective as it allows you to reuse and leverage MVC conventions to harness routing and views to provide users and bots with the perfect content. This approach also makes it very easy to create unit tests that help ensure that your application is both handling 'bogus' requests along with the need to handle moved and retired content from your site in order to address SEO needs. And pulling all of this off only takes a few lines of code to set up.

Comments

Plain text