I wrote this document simply because Ive witnessed first hand some dreadful fallacies emerge in the IT industry. The fallacies have likely been adopted largely because even though the developer is writing inefficient code, the code still works relatively quickly thanks, in most, to behind the scenes optimizations and ever faster hardware. But should the application be run on a lesser machine or more likely should the database grow in size the application will be become unusable simply due to those same misguided individual(s).

A warning to those who shudder when well established nomenclature isnt used and instead less conventional language or slang is used in its place:

It is important to recognize that rows are not records, fields are not columns and tables are not files. However, in some instances it is easier to speak of a single row in a table as being a single record and so on. So in some sections of this document I have abused conventional database terminology by using record to refer to a row or an attribute to refer to a column, etc.

Ten Database and ADO.NET Fallacies

The following comprise the list of what I would consider the most misguided fallacies in regards to designing a data model to be used in a SQL Server database and the subsequent use of that database in ADO.NET though some of the concepts are applicable across DBMSs and across application programming languages. Lastly, two of these fallacies are only applicable if the front end of your application happens to be ASP.NET.

1

Use Multi-Value Attributes Instead of Junction Tables

Some people have been informed of the dangers/cost of table joins. This has caused them to implement their own light-weight solution to table joins. Unfortunately this usually results in the light-weight solution costing many times more than the traditional table join. Relational databases have existed for decades and have been refined over those years. It would be folly to think that one could come up with a better solution in a much shorter span of time without understanding the underlying mechanics of those databases.

In one instance I met someone who was convinced that using a comma-delimited string as an attribute was a better solution than using a junction table. To illustrate what they proposed I have created the following simplistic example:

There is a table student:

student_num

first_name

last_name

classes

123456

John

Smith

1,2,4

123457

Jane

Doe

1,3,5

And a table class:

Uid

name

building

room_number

Biology

Grant Hall

123

History

McConnell Hall

231

Geology

Jeffrey Hall

312

Math

Richardson Hall

213

Philosophy

Dunning Hall

321

Effectively anytime that you wanted to select which students attended a particular class, you would have to select all student records and parse the classes attribute of each record.

As opposed to:

The new student table (notice the missing column classes):

student_num

first_name

last_name

123456

John

Smith

123457

Jane

Doe

The same class table shown previously and lastly the junction table:

student_num

class_uid

123456

123457

Now, to get all students registered to a particular class, an inner join can be done between the student table and the junction table. In many instances, the class table wouldnt need to be included in the join since the class data including the UID - would likely be stored in memory when the system starts up. This in-memory storage would likely take the form of a hashtable data structure to increase efficiency.

In the above section I have informally stated the reason why using a comma delimited list is an ineffective solution, but more formally:

The comma delimited list cant effectively be indexed.
Any select queries done on the mapping for the comma delimited list would require string parsing.
It removes the possibility of database optimizations on relationships between tables.

Another approach that people may be tempted to adopt is to use XML in a single column to store different attributes. Typically people will come to the conclusion that using mark-up is an efficient alternative to additional columns in a table. This is likely because of .Nets XMLReader and XMLWriter classes which, through various sources, have become associated with the definitive way to access data whether that datas source is a database or a file.

In some instances, storing XML in a single column is useful if there is no expectation that this data will be used in any database queries. Therefore, when in doubt use well understood database concepts (additional columns, junction tables, etc) to store the data.

2

Store XML to Capture Attributes Instead of Using Columns

As previously mentioned, whether it be the use of XML or some other mark-up, if queries are likely to involve these attributes, using XML instead of a separate column is only going to slow those queries.

Using the student table from the previous section:

student_num

first_name

last_name

123456

John

Comments

Plain text

10 Database and ADO.NET Fallacies

Ten Database and ADO.NET Fallacies

1

2

Comments

Plain text