T-SQL Quiz - Dealing With Trends | You’ve Been Haacked

I’m not one to post a lot of quizzes on my blog. Let’s face it, while we may create altruistic reasons for posting quizzes such as:

It’s an interesting problem I thought up
It’s an interesting bug I ran into

we all know the real reasons for posting a quiz.

It serves as blog filler.
It’s a way to show off how smart the blogger is.

With that in mind, let me humbly present my latest SQL Quiz, which is something I ran into at work recently, and will not show off any smarts whatsoever.

The circumstances of this problem have been dramatically changed and simplified to both protect the guilty and save me from a lot of typing.

In this application, we have two tables. One contains a lookup list of various statistics. The second is a larger table of measurements for each of the statistics.

The following screenshot shows the data model.

Statistic table and Measurement
Table

The following screenshot shows the list of contrived statistics.

Statistic Table
Data

What we see above are the following:

LOC per bug - Lines of code per bug.
Simplicity Index - some magical number that purports to measure simplicity.
Awe Factor - The awe factor for the source code.

For each of these statistics, the larger, the better.

The following is a view of the Measurement table.

Measurement Table
Data

Each measurement has the previous score and current score (this is a denormalized version of the actual tables for the purposes of demonstration).

I needed to write a query that would show each of the stats for a given developer as well as a Trend Factor. The Trend Factor tells you whether or not the statistic is trending positive or negative, where positive is better and negative is worse.

Result of the
query

Here is my first cut at the stored procedure. It’s pretty straightforward. In order to make the important part of the query as clear as possible, I used a Common Table Expression to make sure the count of measurements for each statistic can be referenced as if it were a column.

CREATE PROC [dbo].[Statistics_GetForDeveloper](
  @Developer nvarchar(64)
)
AS
WITH MeasurementCount(StatisticId, MeasurementCount) AS
(
  SELECT s.Id
    ,MeasurementCount = COUNT(1)
  FROM Statistic s
    LEFT OUTER JOIN Measurement m ON m.StatisticId = s.Id
  GROUP BY s.Id
)
SELECT 
  Statistic = s.Title
  , Developer
  , CurrentScore
  , PreviousScore
  , mc.MeasurementCount
  , TrendFactor = (CurrentScore - PreviousScore)/mc.MeasurementCount
FROM Statistic s
  INNER JOIN MeasurementCount mc ON mc.StatisticId = s.Id
  LEFT OUTER JOIN Measurement m ON m.StatisticID = s.Id
WHERE Developer = @Developer
GO

I bolded the relevant part of the query. We calculate the TrendFactor by taking the current score, subtracting the previous score, and then dividing the difference by the number of measurements for that particular statistic. This tells us how that statistic is trending.

In this application, I am going to present an up arrow for trend factors larger than 0.1, a down arrow for trend factors less than -0.1, and a flat line for anything in between. A trend factor going upward is always considered a “good thing”.

The Challenge

This works for now because for each statistic, a larger value is considered better. But we need to add a new statistic, Deaths per LOC, which measures the number of deaths per line of code (gruesome, yes. But whoever said this industry is all roses and rainbows?). For this statistic, an upward trend is a “bad thing”.

Therefore, if the current score is larger than the previous score for this statistic, we would want the TrendFactor to be negative. Not only that, we may want to add more statistics in the future. Some for which larger values are better. And some for which smaller values are better.

So here is the quiz question. You are allowed to make a schema change to the Statistic table and to the stored procedure. What changes would you make to fulfill the requirements?

Bonus points, can you fulfill the requirements without using a CASE statement in the stored procedure?

Here is a SQL script that willl setup the tables and initial stab at the stored procedure for you. The script requires SQL Server 2005 or SQL Server Express 2005.

Comments

9 responses

Jon Galloway • April 9th, 2007
Hmm... I think I see my initials in the test data, and your hypothetical scores were way higher than mine...
Does that mean that this test takes place in some alternate reality universe? Should I assume that T-SQL function any differently in this bizarro-world you've imagined?

Haacked • April 9th, 2007
I think you're reading too much into a completely innocent example. Completely innocent. Says a lot about you, doesn't it.
;)

Damien Guard • April 9th, 2007
Add a new decimal column to the Statistics table called Weight (type Decimal(18,2) will suffice).
Fill in the new column with a value of 1 for existing Statistics and -1 for the new "Deaths per LOC" row.
Modify the line of the SP that says:
, TrendFactor = (CurrentScore - PreviousScore)/mc.MeasurementCount
To read:
, TrendFactor = (CurrentScore - PreviousScore)/mc.MeasurementCount*Weight
[)amien

Josh • April 9th, 2007
Why not do "LOC per death", which allows you to continue with "more is a good thing" in your trending?
--Josh

Damien Guard • April 9th, 2007
Oh it's probably worth noting that as well as letting you flip the scores from negative to positive this solution lets you perform overall weighting on the metrics.
You may decide for example that "LOC per bug" isn't quite as important as "Deaths per LOC" and this lets you tweak the weighting as you go along without affecting the recording of the statistics.
[)amien

never • April 9th, 2007
take it to a small town hall meeting and let the old people fart on it. that just about solves any problem. you can thank me later.

Jon Galloway • April 9th, 2007
I wrote up a solution last night but didn't post it because I had inside information on your solution. Interestingly enough, I not only used the same solution as [)amien, I used the same column name. As he indicated, using a float weight column allows you to do more with your aggregations. For instance, Analysis Services can use it as an Operator column which handles weighted rollups.
Why didn't I mention that yesterday when you chatted me about it? Because I was saving it for comments on the inevitable blog post... no, not really. I just didn't get the problem until you'd spelled it out all pretty like, with diagrams and stuff.
insert into statistic(id, title) values (4,'Deaths per LOC')
alter table statistic add weight float
update statistic set weight = 1.0 where title != 'Deaths per LOC'
update statistic set weight = -1.0 where title = 'Deaths per LOC'
ALTER PROC [dbo].[Statistics_GetForDeveloper](
@Developer nvarchar(64)
)
AS
WITH MeasurementCount(StatisticId, MeasurementCount) AS
(
SELECT s.Id
,MeasurementCount = COUNT(1)
FROM Statistic s
LEFT OUTER JOIN Measurement m ON m.StatisticId = s.Id
GROUP BY s.Id
)
SELECT
Statistic = s.Title
, Developer
, CurrentScore
, PreviousScore
, mc.MeasurementCount
, TrendFactor = Weight * (CurrentScore - PreviousScore)/mc.MeasurementCount
FROM Statistic s
INNER JOIN MeasurementCount mc ON mc.StatisticId = s.Id
LEFT OUTER JOIN Measurement m ON m.StatisticID = s.Id
WHERE Developer = @Developer
PRINT 'haacked is teh 5uxx0r'
GO

Kevin Dente • April 10th, 2007
Bah, forget this SQL stuff. Throw it in a cube, define a KPI, and set a trend expression there. SQL is so 1998. ;)

Jennifer Smith • April 14th, 2007
I think this is slightly similar to some of the following solutions, but here goes:
1. Add a bit field called "IsNegativeTrend" to the Statistic table (1 for Deaths per LOC, 0 for all the previous measures) (or you could default it)
2. In the Statistics_GetForDeveloper SP, calculate the trend factor as:
TrendFactor = (CurrentScore - PreviousScore)/mc.MeasurementCount * POWER(-1, s.IsNegativeTrend)
OK so this is much the same as storing -1/ 1 in the Statistic table. But I reckon storing a simple bit field makes the Statistic field a bit more readable: someone adding another statistic later on will have a better idea of what the column means.
Of course, the trend calculation is a tiny bit more complicated but you could always add the multiplier as calculated column if you felt like it! I am pretty sure SQL server would do some optimisation as well.