Interview with David Campbell

David Campbell is the General Manager of Strategy, Infrastructure and
Architecture of Microsoft SQL Server. He joined Microsoft in 1994 and played
key roles in shaping the product that grew and evolved to become the SQL
Server that we know today—the product that many of us dedicate our careers
and passion to. You can find David’s BIO here:
http://www.insidetsql.com#David_Campbell.

As I was preparing the questions for the interview, I realized that there were too
many questions that I wanted to ask, so I had to restrain myself.

To give you a sense of how much David is appreciated by people who know him,
a colleague of mine who knows him for years told me “David is one of the most
genuine people you will ever meet.”

I’d like to start with a question a bit off topic.

It has been almost four months since Jim Gray was reported missing at sea when
he went sailing to spread his mother's ashes. Jim Gray is a well known and
admired figure in the SQL Server community and everyone who knows him feels
devastated by his disappearance. I understand his family has not given up hope
yet, and we all join them in hope to find any sign of him.

Since you have been working at Microsoft since 1994 and deeply involved in
shaping the SQL Server database technology you most probably had many
interactions with Jim. Can you tell us about your interactions with him; what he
means to you, to Microsoft, to science? Who is Jim Gray the human being from
your perspective, and what was his contribution to SQL Server, databases, and
science in general as a scientist/researcher?

Wow, this is a tough one. Jim was a giant in the field of databases – actually he
was a giant in science. I guess what impressed me most about Jim was that he
made everyone around him feel like a giant as well. He was a great collaborator;
a great “connector” who brought people together from different companies, even
different fields, to pool their knowledge and make breakthrough advances. Jim
also had an amazing ability to look at things differently than those around him.
He could simply see many things that others couldn’t, and he could take ideas
from one domain and immediately see their relevance in some completely
different domain.

These two characteristics – the ability to see things where others couldn’t, and
the ability to connect people from across an amazing network, made Jim an
incredibly influential and loved person in many communities. We all miss him.

Can you tell us about your areas of responsibility beyond what everyone can
read in your BIO, and in a less formal manner - what does your day look like?

The last couple of years have been very interesting. SQL Server has grown
tremendously and the product is now at the point where we can really start to
innovate. It took 10 years and 2-3 releases to go from a relational database
engine and client libraries to what we refer to as a “Complete Data Platform”
with Analytics, Data Mining, Integration, Reporting and more. We are no longer
“chasing taillights” and are now in a position to do some amazing things. I now
lead a group that we call “SIA – Strategy, Infrastructure & Architecture” and
spend much of my time working with other teams around the company trying to
chart how we can help advance the data platform vision to solve new and
interesting customer problems. I believe the next major advance in Information
Technology will come from addressing the gap between people and information.
Microsoft is the best place to deliver on this vision and I often tell people I have
one of the top 10 jobs in the company.

The most major leap that SQL Server made over the years was probably the
redesign of both the Storage Engine and the Relational Engine in SQL Server 7.0.
Even today, with SQL Server 2005 the core engines are still very much based on
version 7.0. This was also the leap from a small database to an enterprise level
one. Can you tell us what, in your eyes, were the most important architectural
changes of the engines that enabled SQL Server to become an enterprise level
database? Also, are there any major architectural changes planned in the
engines in future versions that you can share with us?

The architectural transition for SQL Server 7.0 was a great experience and
achievement for all of us who worked on it. If I were to pick two architectural
tenets that paid off and are still relevant today I’d have to say automation and
paying attention to Hubble’s law.

By the time we were redoing the engine for SQL Server 7.0 most database
servers had 100’s of control knobs and very few people truly understood what
they all did. The few that did know couldn’t turn them fast enough to keep up
with the dynamic workloads that were emerging in the mid 1990’s. Furthermore,
we saw many, many systems that were performing poorly because they were
mis-configured. We rethought our philosophy around the knobs so that instead
of controlling the things we simply couldn’t figure out how to do dynamically in
the code, (remember “hash buckets”, or “open objects”?), we added knobs to
capture the administrator’s intent. This was a great achievement but we were
ahead of the market when we released SQL Server 7.0 and this was frustrating.
DBAs were afraid we were going to put them out of work since they were paid to
change the knobs, typically when their pagers went off. Furthermore, our
competitors were saying, “SQL Server; how can that be a real database product?
– it has 20 knobs and ours has 500!” Of course, people have since realized the
value of this work and it is becoming commonplace in the industry.

Now to the Hubble’s Law thing. Hubble was the astronomer who first presented
evidence that the universe is expanding. I believe Pat Helland should get credit
for associating “Hubble’s Law” with the increasing latency or “distance” between
CPUs, main memory, and disks. In recognition of this effect we completely
redesigned the I/O subsystem and the query processor to try to do deep
predictive I/O forecasting over the disks and try and turn as much random I/O
into skip-sequential I/O as we could. This is why we introduced things such as the
Index Allocation Map (IAM) to control and record allocations in a dense data
structure. These changes led to some amazing speedups and a couple of
memorable anecdotes. In the “wide update plans” that we introduced in SQL
Server 7.0 the query processor can do large updates by computing a delta stream,
sorting it, and then applying the changes in key order to an index thus only
touching each page once and in optimal disk order, (assuming everything was
defragmented). This can dramatically reduce the random I/O for a large update
and we had someone file a bug when a typically large update query they used to
run that took over an hour on SQL Server 6.5 ran in less than a minute on SQL
Server 7.0. At first they thought they were sure they had found a defect in the
product, but then they checked the data and it appeared correct. Ultimately they
filed a bug anyway to ask us to explain what was going on.

The other anecdote is that there was a time when SQL Server 7.0 was introduced
where it was about the best disk driver stress test on the planet. I answered
many newsgroup posts that went like this – “I installed SQL Server 7.0 and now
my machine blue screens”. I’d respond, “I’ll bet you dollars to donuts if you
upgrade your I/O subsystem firmware and drivers the problem will go away”.
Sure enough it would. There was one hardware manufacturer that admitted to
me that they had seen I/O blue screen in their labs with their tests but hadn’t
bothered to fix it since there were no real world applications that encountered it -
(before SQL Server 7.0 shipped).

Even though each version of SQL Server has improvements and enhancements in
several areas, I guess that each version has some main focus area (performance,
stability, functionality, high availability, security, etc.). What was the focus area
for each of the versions of SQL Server as of 7.0, and what is the focus area in
Katmai?

The areas you list above are what we typically call “Enterprise Abilities” or simply
“the –ilities”. We need to move these forward with every release as the demands,
expectations, and scale of database applications continue to grow. Real quickly
I’d say SQL Server 2000 was about finishing SQL Server 7.0. We initially thought
we’d need a very quick “point” release to respond to market feedback on 7.0 but
since SQL Server 7.0 was such a dramatic improvement over 6.5 we took a look
after we finished our first round of SQL 2000 features and since 7.0 was doing so
well we went back in and added some more to SQL Server 2000 before shipping it.

SQL Server 2005 is a major release and is really the first example of the
“Complete Data Platform” I mentioned earlier. Having the Business Intelligence
services “in the box” has been tremendous. The fact that we have analysis,
integration, reporting, data mining, service broker, and native XML support in
the product is allowing people to build some incredible end to end applications.

For Katmai we have 4 major themes:

· “Enterprise Abilities” – which I mentioned before.

· “Beyond Relational”, which is about managing “all data, birth to
archival.”

· “Dynamic Development”, which is really about dramatically reducing the
“time to solution” to address information requirements – whether it’s a
new application, analysis over existing data, creating reports to drive the
business, etc.

· “Pervasive Insight”, this theme is about bringing Business Intelligence to
the masses. Rather than have BI be just about informing the 10’s of
strategic decisions a company makes on an annual basis, we want to use
it to empower the people who are really running the business day to day.

I’m sure that the Microsoft research group these days are laying the foundations
for the future database and that the SQL Server team interacts with the research
group.

Can you take us 10 years forward in time; how do you envision SQL Server in the
year 2017?

I’ll frame my response by saying that the line between “platform” and
“application” tends to move upwards over time. 15 years ago networking wasn’t
part of the PC platform and you had to buy hardware and a TCP stack to put your
computer on the network. 7-10 years ago “Data Mining” wasn’t part of the
database platform, etc. Here are some predictions:

1. The system will be much more adaptive and scalable. It will be able to
meet increasing processing and capacity demands without a lot of
human intervention. We’ll be much closer to being able to wheel in
compute and storage resources and have the system integrate them into
the mix automatically.

2. The focus will move to the data itself rather than on the machinery used
to manipulate it. We’ll be less concerned with the plumbing and more
concerned with data quality, data protection, and information
production.

3. Most of the data services provided by SQL Server will be driven from a
common data model. Whether you’re creating a report, building an
information cube, or integrating data from another system, you will be
able to start from a common model of the key data entities such as
“customer”, “order”, or “prospect”.

4. Finally, fewer and fewer people will miss, (or remember), the “open
databases” sp_configure option…

In the days of SQL Server 6.5 a single human being could aim at knowing SQL
Server really well. These days the product is so huge that you can’t master the
product, rather a certain area of the product. For those who seek to invest their
time and effort building a career around SQL Server, what would you
recommend focusing on? Which SQL Server areas and technologies do you find
that have great potential?

I think the most diplomatic way to answer this is to look at the predictions above
and draw your own conclusions about how to build a career around SQL Server.
If you believe my predictions you can bet one way, if you don’t, bet the other.

Besides technical skills, what are you looking for at the personal level when
hiring new people for the SQL Server team?

I look for passion, perseverance, and the ability to work well with others. There’s
a “make stuff happen” gene that some people possess that I also try to spot. I’ve
seen all too many technically brilliant people fail due to a lack of these
characteristics.

What is the process of evaluating and adding new features to SQL Server? Has
anything changed in this process over the years?

This has changed tremendously. When we rebuilt the database engine for SQL
Server 7.0, or when the Analysis Services team built AS we knew what we were
going to build. In many cases we had people who had built these things before so
the playbook was pretty much written. Today we have all the key features and
customers no longer ask us if we have “row level locking”, or “online index build”.
Instead, they are asking us to help solve their business problems and to help them
do it quickly and effectively. We’re now much more mindful of customer needs
and building end to end scenarios to address those needs. Sometimes it will take
several releases for the entire picture to become clear. For example, in SQL
Server 2005 we built a great security infrastructure – these are building blocks. In
Katmai we’re using these building blocks to address customer scenarios like
information leak protection and transparent encryption.

A common question that I get from students and customers is “Should all tables
have a clustered index or are there cases where it makes sense to organize the
table as a heap?” What would you answer to such a question?

Hey, is this a quiz?

My rules of thumb are:

1. If you have a table with a single index you should generally make it a
clustered index.

2. Clustered indexes are typically the way to go but watch out for the key
size.

3. Watch out for insert rates on clustered indexes with monotonically
increasing keys. (You can distribute inserts sometimes by doing some key
munging tricks…)

4. For logging tables with no indexes and high insert rates go with a heap
as we can distribute the inserts across a number of pages.

How many people today are part of the various SQL Server teams? Can you
share the challenges the group went through and how you overcame those
challenges when changing from a small to such a big group?

The team is about 15 times larger than when I started. We’ve had a number of
challenges in growing so fast and perhaps it’s no surprise in that one of the
biggest challenges is staying nimble and agile while we grow. We’ve changed our
development process a lot in Katmai to address this and we’re still having some
growing pains but one of the things I really like about our team is that we’re
honest about our challenges and continuously strive to make things better. I
think customers will definitely see the results of our process improvements in the
polish and scenario completeness in our upcoming release but it will probably
take another release cycle to get really good at the process and tune it so day to
day life is efficient for everyone on the product development team.

From yours and Microsoft’s experience, how is SQL Server 2005 accepted, and
what is the adoption rate? What major issues did you face, and were these
addressed in Service Packs 1 and 2?

SQL Server 2005 adoption has been very good. We met with an analyst who
tracks the adoption rate of enterprise database releases and his data showed
incredible uptake thus far.

One of the challenges we had during SQL Server 2005 development was simply
that there was so much going on in the release and we were changing a lot of
infrastructure. Perhaps the most tension came in the management tools where
the team not only had to move our framework over to the managed shell but
also had to keep up with the features pouring in from the other component
teams. There were strong opinions on SSMS from customers who had to spend
their work day in it but the team has done a great job in addressing customer
feedback on SSMS in SP1 and SP2.

Once SQL Server was all about SQL; today, it’s about many technologies—CLR,
XML, and so on. Not underestimating the value of the added technologies and
admitting that some important T-SQL features were added to SQL Server 2005
(CTEs, Ranking Calculations, Exception Handling), I do feel that T-SQL doesn’t get
enough focus and attention. For example, in terms of ANSI SQL grammatical
support SQL Server seems behind. Other database platforms already
implemented the profound OVER clause fully, have support for regular
expressions in SQL, intervals, full support for vector expressions/row value
constructors, more complete support for recursive CTEs (searching and cycles),
and so on. For proper disclosure I should say that since my focus is T-SQL I may
be biased and always want more... Am I completely off base here? And if not, is
it that Microsoft simply believes that the surrounding technologies are more
important?

It’s a fair question and I’d expect nothing less from a T-SQL guru like you! The
real answer is that it’s a balancing act. Every component team wishes they could
do more in each release and every MVP and expert on a particular aspect of the
product feels like we could do so much more for their area. In reality we’re trying
to focus on helping our partners and customers solve their business problems. If
we really need great regular expression support in the engine to do this we’ll put
it in. Honestly, it’s a return on investment thing and we have to look at the big
picture across the entire product.

We spend a fair bit of time thinking about what not to put in as well. Very few
people would consider a Swiss army knife as elegant and once you put in a
feature, even if it’s only used by 2% of the customer base, it’s very hard to pull it
out again.

Can you tell us a little bit about yourself outside of work? What do you like to do
when not working; what kind of non-technical books do you like to read?

I have two teen age boys and the youngest one is a junior in high school so my
wife, Marcia, and I are spending time lately visiting colleges with him. I enjoy
travelling and photography and I credit Marcia with teaching me how important
travel experiences are. We’re trying to visit all the National Parks in the US –
(currently at 22 out of 58). These trips scratch both my travel and photography
itches.

Most of the non-technical books I read are about history, psychology or are
biographies of one form or another.

Back in the days when you were deeply involved in the shaping of the storage
engine of SQL Server 7.0 you used to participate in some private SQL
newsgroups. Your explanations were so detailed and clear and were considered
pure gold by many of us. For example, I remember how you solved a major
mystery that involved querying a small heap that reported a huge number of
logical reads, after the table underwent an update statement expanding varchar
strings. You explained that the expansion of the varchar strings that didn’t have
room to expand in their hosting pages caused a large number of forwarding
pointers, and that SQL Server had to jump back and forth between the page
holding the pointer and the page holding the pointed record. Suddenly it all
seemed to make sense. For us teachers and students, someone with both a lot
of knowledge and great explanatory skills is a rare sight, and the SQL Server
community can benefit from having this knowledge conveyed through books.
Have you considered/are considering writing a book and passing on what you
know through such means?

Thanks for the kind words. I really miss writing about stuff but it takes time for
me to do it well and I have so many other things going on now that I find myself
frustrated if I try to respond quickly. Just last week I responded quickly to an
e-mail from a customer I met at an event and instead of giving him the full “it
depends” explanation with the rationale I gave him the quick conclusion since I
was falling behind in e-mail. Unfortunately, this confused him and others on the
thread and it took 2 other e-mail exchanges to sort it out.

I have a big queue of things to write about – hopefully I can jot some of it down
before I forget it all!

Comments

Plain text