Search This Blog

Thursday, July 06, 2006

SQL - Date Convert & Sort Problem

The Problem

I'll use Northwind for an example, since it's one that most DBAs have access to on a test server. You can also reset it up from the SQL Server media if you need to. Let's look at the orders table, in particular, this query:

select top 5
orderid
, customerid
, requireddate
from orders
order by requireddate desc
This is straightforward, give me the 5 orders that need to go out by the most forward "requireddate" on back. Running this gives me
orderid     customerid requireddate                                          
----------- ---------- ------------------------------------------------------
11061 GREAL 1998-06-11 00:00:00.000
11059 RICAR 1998-06-10 00:00:00.000
11074 SIMOB 1998-06-03 00:00:00.000
11075 RICSU 1998-06-03 00:00:00.000
11076 BONAP 1998-06-03 00:00:00.000

Notice that I have the June 11, 98 order first, then the June 10, the 3rd, etc. This is what I wanted, giving me the orders that will be due and working backwards. Running something like this with some flag to note if I've processed them allows me to see the newest orders coming in. I know it's a little funny, but the dates made more sense with the requireddate than the orderdate in this example.

So now suppose that your developer says that the date time thing messes him up and can you please just send back the date formatted a little nicer. Ok, no problem, so you change the query:

select top 5
o.orderid
, o.customerid
, CONVERT(char(10), o.requireddate, 101) as requireddate
from orders o
order by requireddate desc
Easy enough, we add a convert to send this back as a character value instead of a datetime and then we use the 101 conversion code to specify that we want mm/dd/yyyy.
Because this is going back to a client and it's not a query we're reading in QA, we want to give the column a name that can be referenced. So we choose the same name, requireddate, to make it easier on the developer. How many of you have changed a name and had a developer scream? And if they didn't, it it because they reference values by position? Tsk, tsk, that's a no-no.

In any case, you might run this, see that it runs and send it on. But did you examine the results carefully? They look like this:

orderid     customerid requireddate
----------- ---------- ------------
10763 FOLIG 12/31/1997
10764 ERNSH 12/31/1997
10370 CHOPS 12/31/1996
10371 LAMAI 12/31/1996
10761 RATTC 12/30/1997
Notice that the first date isn't the June 11th, 1998, but rather December 31st, 1997!?!?!!!?

Whoopsi. Hopefully things weren't too date dependent.

Still, it's interesting and I bet a few people get caught by this because adding a convert doesn't. While I'm sure that there are some better explanations and more technical ones, I'll give you mine because I think it's easier to understand.

I grabbed the query plan and put it up here:

Notice the order in which things are done. First the rows are retrieved. Then the new column is computed, this being the CONVERTed date field. Lastly the sort takes place, not on the original column value as stored in the table, but rather the new computer column, as changed to a character format. In this case, the worktable in tempdb, which houses only the result data, including the computed character column, is used for the sorting. So you get 12/31 as a higher value than 06-11 and so it comes first.

So how do you avoid this?

First of all, you read lots of articles like this one where someone else has made the mistake, you chuckle at them, and then it sticks in your mind as something to watch out for. Next you rename columns when you change their values. That way when someone comes down the road and adds a new order by or group by, it doesn't affect things. That may not always work and there are definitely reasons to keep the column name the same, so then you rely on rule 1 :)

The other thing you do, and which I did not, is test carefully. Even the smallest change can affect results in ways you did not think of. That's what happened here and I thought that a simple order by would solve things. I didn't check the data carefully. I most likely used my briefcase, which has 3 articles in it, so the issue may not have even appeared for me.

As with most things, this came down to proper QA and testing. We all move too fast at times and sometimes it matters, and sometimes it doesn't. I'm sure you see similar things in your company. Mistakes and bugs will slip through. They do in all software, even in lots of other industries. What you try your best to do is learn and make fewer ones as you move forward.

--------------------------------------------------------------------------

You could read the actual article in http://www.sqlservercentral.com/columnists/sjones/anotherdbawhoops.asp

No comments: