Programmer Helper: November 2005

Thursday, November 24, 2005

New Sober.Y variant is 2005 largest email worm outbreak

If you get a message from FBI or CIA or anyone, that looks like this
Examples of such messages include:

Dear Sir/Madam,
We have logged your IP-address on more than 30 illegal Websites.
Please answer our questions!
The list of questions are attached.
Yours faithfully,
Steven Allison
*** Federal Bureau of Investigation -FBI-

and has an attachment file in zip format or any other format, please delete the mail. DO NOT OPEN IT.

The first Sober was found in October 2003, over two years ago. F-Secure believes all 25 variants of this virus have been written by the same individual, operating from somewhere in Germany. Unlike most of the other widespread viruses nowadays, Sober doesn't seem to have a clear financial motive behind it.

Some Sober variants have displayed neo-nazi messages, but the latest version of the virus (Sober.Y) does not do this. However, all Sober variants send German messages to German email addresses and English messages to other addresses.

Several millions of emails infected with Sober.Y have been seen by Internet operators over the last hours.

MessageLabs has so far intercepted over 2.7-million copies of the new Sober virus, many of which are being spoofed to appear as though they are sent from the FBI or the CIA. The first copy was stopped on 21st November. The size of the attack indicates that this is a major offensive, certainly one of the largest in the last few months.

These emails suggest to recipients that their Internet use has been monitored by the FBI or CIA and that they have accessed illegal Web sites. The email directs users to open the ZIP attachment containing the executable, which once opened delivers the Sober virus payload. It then spreads by searching the infected computer for other email addresses to send copies of itself to, but ignoring any domains for certain security organizations, including MessageLabs.

The virus will send emails in German for domains ending .DE or .AT and a few others, with the remainder being sent in English. It seems that despite warnings, many recipients are still opening the emails allowing the virus to spread still further.

Since the virus first struck on Monday 21st November, the amount of viruses being sent per hour has approximately tripled, indicating that this particular strain of Sober virus has been written to rapidly exploit the so-called ‘zero hour’ holes in anti-virus security software (the time before anti-virus software writers have prepared and distributed an update to repair infected PCs).

Email Systems has noted that there are currently approximately thirty times the usual quantity of virus infected emails being sent and received online. This new virus has doubled email traffic in 24 hours.

Neil Hammerton, CEO of Email Systems, commented: “This is one of the worst viruses to strike in some time, spreading extremely rapidly. Although AV updates are actually now available from the major software vendors, it seems as though this particular variant managed to quickly grab a sufficiently large foothold to continue to propagate once the fixes were unveiled. Again this serves to underline the importance of using a specialist managed service for corporate email as users experience no zero hour syndrome as with AV software, given that our systems are specifically designed to continually check for virus-like activity and block infected emails before they reach the users’ network or PC.”

One of the reasons why this email worm seems to be so successful in spreading is that some of the messages it sends are fake warnings from FBI, CIA or from the German Bundeskriminalamt (BKA).

"The numbers we're now seeing with Sober.Y are just huge", comments Mikko Hypponen, Chief Research Officer at F-Secure Corporation. "This is the largest email worm outbreak of the year - so far!"

Wednesday, November 23, 2005

SQL Server Query Execution Plan Analysis

When it comes time to analyze the performance of a specific query, one of the best methods is to view the query execution plan. A query execution plan outlines how the SQL Server query optimizer actually ran (or will run) a specific query. This information if very valuable when it comes time to find out why a specific query is running slow.

There are several different ways to view a query's execution plan. They include:

From within Query Analyzer is an option called "Show Execution Plan" (located on the Query drop-down menu). If you turn this option on, then whenever you run a query in Query Analyzer, you will get a query execution plan (in graphical format) displayed in a separate window.
If you want to see an execution plan, but you don't want to run the query, you can choose the option "Display Estimated Execution Plan" (located on the Query drop-down menu). When you select this option, immediately an execution plan (in graphical format) will appear. The difference between these two (if any) is accountable to the fact that when a query is really run (not simulated, as in this option), current operations of the server are also considered. In most cases, plans created by either method will produce similar results.
When you create a SQL Server Profiler trace, one of the events you can collect is called: MISC: Execution Plan. This information (in text form) shows the execution plan used by the query optimizer to execute the query.
From within Query Analyzer, you can run the command SET SHOWPLAN_TEXT ON. Once you run this command, any query you execute in this Query Analyzer sessions will not be run, but a text-based version of the query plan will be displayed. If the query you are running uses temp tables, then you will have to run the command, SET STATISTICS PROFILE ON before running the query.

Of these options, I prefer using the "Show Execution Plan", which produces a graphical output and considers current server operations.

If you see any of the following in an execution plan, you should consider them warning signs and investigate them for potential performance problems. Each of them are less than ideal from a performance perspective.

Index or table scans: May indicate a need for better or additional indexes.
Bookmark Lookups: Consider changing the current clustered index, consider using a covering index, limit the number of columns in the SELECT statement.
Filter: Remove any functions in the WHERE clause, don't include Views in your Transact-SQL code, may need additional indexes.
Sort: Does the data really need to be sorted? Can an index be used to avoid sorting? Can sorting be done at the client more efficiently?

It is not always possible to avoid these, but the more you can avoid them, the faster your performance will be.

If you have a stored procedure, or other batch Transact-SQL code that uses temp tables, you cannot use the "Display Estimated Execution Plan" option in the Query Analyzer to evaluate it. Instead, you must actually run the stored procedure or batch code. This is because when a query is run using the "Display Estimated Execution Plan" option, it is not really run, and temp tables are not created. Since they are not created, any references to them in the code will fail, which prevents an estimated execution plan from being created.

On the other hand, if you use a table variable (available in SQL Server 2000) instead of a temp table, you can use the "Display Estimated Execution Plan" option

If you have a very complex query you are analyzing in Query Analyzer as a graphical query execution plan, the resulting plan can be very difficult to view and analyze. You may find it easier to break down the query into its logical components, analyzing each component separately.

The results of a graphical query execution plan are not always easy to read and interpret. Keep the following in mind when viewing a graphical execution plan:

In very complex query plans, the plan is divided into many parts, with each part, listed one on top of the other on the screen. Each part represents a separate process or step that the query optimizer had (has) to perform in order to get to the final results.
Each of the execution plan steps is often broken down into smaller sub-steps. Unfortunately, you don't view the sub-steps from left to right, but from right to left. This means you must scroll to the far right of the graphical query plan to see where each step starts.
Each of the sub-steps and steps is connected by an arrow, showing the path (order) taken of the query when it was executed.
Eventually, all of the parts come together at the top left side of the screen.
If you move your cursor above any of the steps or sub-steps, a pop-up windows is displayed, providing more detailed information about this particular step or sub-step.
If you move your cursor over any of the arrows connecting the steps and sub-steps, you see a pop-up window showing how many records are being moved from one step or sub-step to another step or sub-step.

The arrows that connect one icon to another in a graphical query plan have different thicknesses. The thickness of the arrow indicates the relative cost in the number of rows and row size of the data moving between each icon. The thicker the arrow, the more the relative cost is.

You can use this indicator as a quick gauge as to what is happening within the query plan of your query.

You will want to pay extra attention to thick arrows in order to see how it affects the performance of your query. For example, thick lines should be at the right of the graphical execution plan, not the left. If you see them on the left, this could indicate that too many rows are being returned, and that the query execution plan is less than optimal.

In an execution plan, each part of it is assigned a percentage cost. This represents how much this part costs in regard to resource use, relative to the rest of the execution plan. When you analyze an execution plan, you should focus your efforts on those parts that have the largest percentage cost. This way, you focus your limited time on those areas that have the greatest potential for a return on your time investment.

In an execution plan, you may have noticed that some parts of the plan are executed more than once. As part of your analysis of an execution plan, you should focus some of your time on any part that takes more than one execution, and see if there is any way to reduce the number of executions performed. The fewer executions that are performed, the faster the query will be executed.

In an execution plan you will see references to I/O and CPU cost. These don't have a "real" meaning, such as representing the use of a specific amount of resources. These figures are used by the Query Optimizer to help it make the best decision. But there is one meaning you can associate with them, and that is that a smaller I/O or CPU cost uses less server resources than a higher I/O or CPU cost.
When you examine a graphical SQL Server query execution plan, one of the more useful thing to look for is how indexes were used (if at all) by the query optimizer to retrieve data from tables from a given query. By finding out if an index was used, and how it was used, you can help determine if the current indexes are allowing the query to run as well as it possibly can.

When you place the cursor over a table name (and its icon) in a graphical execution plan, and display the pop-up window, you will see one of several messages. These messages tell you if and how an index was used to retrieve data from a table. They include:

Table Scan: If you see this message, it means there was no clustered index on the table and that no index was used to look up the results. Literally, each row in the table being queried had to be examined. If a table is relatively small, table scans can be very fast, sometimes faster than using an index.

So the first thing you want to do, when you see that a table scan has been performed, is to see how many rows there are in the table. If there are not many, then a table scan may offer the best overall performance. But if this table is large, then a table scan will most likely take a long time to complete, and performance will suffer. In this case, you need to look into adding an appropriate index(s) to the table that the query can use.

Let's say that you have identified a query that uses a table scan, but you also discover that there is an appropriate nonclustered index, but it is not being used. What does that mean, and how come the index was not used? If the amount of data to be retrieved is large, relative to the size of the table, or if the data is not selective (which means that there are many rows with the same values in the same column), a table scan is often performed instead of an index seek because it is faster. For example, if a table has 10,000 rows, and the query returns 1,000 of them, then a table scan of a table with no clustered index will be faster than trying to use a non-clustered index. Or, if the table had 10,000 rows, and 1,000 of the rows have the same value in the same column (the column being used in the WHERE clause), a table scan is also faster than using a non-clustered index.

When you view the pop-up window when you move the cursor over a table in a graphical query plan, notice the "Estimated Row Count" number. This number is the query optimizer's best guess on how many rows will be retrieved. If a table scan was done, and this number is very high, this tells you that the table scan was done because a high number of records were returned, and that the query optimizer believed that it was faster to perform a table scan than use the available non-clustered index.
Index Seek: When you see this, it means that the query optimizer used a non-clustered index on the table to look up the results. Performance is generally very quick, especially when few rows are returned.
Clustered Index Seek: If you see this, this means that the query optimizer was able to use a clustered index on the table to look up the results, and performance is very quick. In fact, this is the fastest type of index lookup SQL Server can do.
Clustered Index Scan: A clustered index scan is like a table scan, except that it is done on a table that has a clustered index. Like a regular table scan, a clustered index scan may indicate a performance problem. Generally, they occur for two different reasons. First, there may be too many rows to retrieve, relative to the total number of rows in the table. See the "Estimated Row Count" to verify this. Second, it may be due to the column queried in the WHERE clause may not be selective enough. In any event, a clustered index is generally faster than a standard table scan, as not all records in the table always have to be searched when a clustered index scan is run, unlike a standard table scan. Generally, the only thing you can do to change a clustered index scan to a clustered index seek is to rewrite the query so that it is more restrictive and fewer rows are returned.

In most cases, the query optimizer will analyze joins and JOIN the tables using the most efficient join type, and in the most efficient order. But not always. In the graphical query plan, you will see icons that represent the different types of JOINs used in the query. In addition, each of the JOIN icons will have two arrows pointing to it. The upper arrow pointing to the JOIN icon represents the outer table in the join, and the lower arrow pointing to the JOIN icon represent the inner table in the join. Follow the arrows back to see the name of the table being joined.

Sometimes, in queries with multiple JOINs, tracing the arrow back won't reveal a table, but another JOIN. If you place the cursor over the arrows pointing to the upper and lower JOINs, you will see a popup window that tells you how many rows are being sent to the JOIN for processing. The upper arrow should always have fewer rows than the lower arrow. If not, then the JOIN order selected by the query optimizer might be incorrect (see more on this below).

First of all, let's look at JOIN types. SQL Server can JOIN a table using three different techniques: nested loop, hash, and merge. Generally, the fastest type of join in a nested loop, but if that is not feasible, then a hash JOIN or merge JOIN is used (as appropriate), both of which tend to be slower than the nested JOIN.

When very large tables are JOINed, a merge join, not a nested loop join, may be the best option. The only way to really know is to try both and see which one is the most efficient.

If a particular query is slow, and you suspect it may be because the JOIN type is not the optimum one for your data, you can override the query optimizer's choice by using a JOIN hint. Before you use a JOIN hint, you will want to take some time and learn about each of the JOIN types, and how they are designed to work. This is a complicated subject, beyond the scope of this tip.

JOIN order is also selected by the query optimizer, which it trying to select the most efficient order to JOIN tables. For example, for a nested loop join, the upper table should be the smaller of the two tables. For hash joins, the same is true, the upper table should be the smaller of the two tables. If you feel that the query optimizer is selecting the wrong order, you can override it using JOIN hints.

In many cases, the only way to know for sure if using a JOIN hint to change JOIN type or JOIN order will boost or hinder performance is to give them a try and see what happens.

If your SQL Server has multiple CPUs, and you have not changed the default setting in SQL Server to limit SQL Server's ability to use all of the CPUs in the server, then the query optimizer will consider using parallelism to execute some queries. Parallelism refers to the ability to execute a query on more than one CPU at the same time. In many cases, a query that runs on multiple processors is faster than a query that only runs on a single processor, but not always.

The Query Optimizer will not always use parallelism, even though it potentially can. This is because the Query Optimizer takes a variety of different things into consideration before it decides to use parallelism. For example, how many active concurrent connections are there, how busy is the CPU, is there enough available memory to run parallel queries, how many rows are being processed, and what is the type of query being run? Once the Query Optimizer collects all the facts, then it decides if parallelism is best for this particular run of the query. You may find that one time a query runs without parallelism, but a few minutes later, the exact same query runs again, but this time, parallelism is used.

In some cases, the overhead of using multiple processors is greater than the resource savings of using them. While the query processor does try to weigh the pros and cons of using a parallel query, it doesn't always guess correctly.

If you suspect that parallelism might be hurting the performance of a particular query, you can turn off parallelism for this particular query by using the OPTION (MAXDOP 1) hint.

The only way to know for sure is to test the query both ways, and see what happens.

When reviewing a graphical execution plan, you may notice that one or more of the icon text is displayed in red, not in black, as is normal. This means that the related table is missing some statistics that the Query Optimizer would like to have in order to come up with a better execution plan.

To create the missing statistics, right-click on the icon and then select the option, "Create Missing Statistics." This will display the "Create Missing Statistics" dialog box, where you can then easily add the missing statistics.

If you are given the option to update missing statistics, you should always take the opportunity to do so as it will most likely benefit the performance of the query that is being analyzed.

Sometimes, when viewing a graphical query execution plan, you see an icon labeled "Assert." All this means is that the query optimizer is verifying a referential integrity or check constraint to see if the query will violate it or not. If not, there is no problem. But if it does, then the Query Optimizer will be unable to create an execution plan for the query and an error will be generated.

Often, when viewing a graphical query execution plan, you see an icon labeled "Bookmark Lookup." Bookmark lookups are quite common to see. Essentially, they are telling you that the Query Processor had to look up the row columns it needs from the table or a clustered index, instead of being able to read it directly from a non-clustered index.

For example, if all of the columns in the SELECT, JOIN, and WHERE clauses of a query don't all exist in the non-clustered index used to locate the rows that meet the query's criteria, then the Query Optimizer has to do extra work and look at the table or clustered index to find all the columns it needs to satisfy the query.

Another cause of a bookmark lookup is using SELECT *, which should never be used.

Bookmark lookups are not ideal from a performance perspective because extra I/O is required to look up all the columns for the rows to be returned.

If you think that a bookmark lookup is hurting a query's performance, you have four potential options to avoid it. First, you can create a clustered index that will be used by the WHERE clause, you can take advantage of index intersection, you can create a covering non-clustered index, or you can (if you have SQL Server 2000 Enterprise Edition, create an indexed view. If none of these are possible, or if using one of these will use more resources than using the bookmark lookup, then the bookmark lookup is the optimal choice.

Sometimes, the Query Optimizer will need to create a temporary worktable in the tempdb database. If this is the case, it will be indicated in the graphical query execution plan with an icon labeled like this: Index Spool, Row Count Spool, or Table Spool.

Anytime that a worktable is used, performance is generally hurt because of the extra I/O generated when maintaining a worktable. Ideally, there should be no worktables. Unfortunately, they cannot always be avoided. And sometimes their use can actually boost performance because using a worktable is more efficient that the alternatives.

In any event, the use of a worktable in a graphical query execution plan should raise an alert with you. Take a careful look at such a query and see if there is anyway it can be rewritten to avoid the work table. There may not be. But if there is, you are one step closer to boosting the performance of the query.

In a graphical query execution plan, often you see the Stream Aggregate icon. All this means is that some sort of aggregation into a single input is being performed. This is most commonly seen when a DISTINCT clause is used, or any aggregation operator, such as AVG, COUNT, MAX, MIN, or SUM.

The Query Analyzer is not the only tool that can generate and display query execution plans for queries. The SQL Server Profiler can also display them, albeit in text format only. One of the advantages of using Profiler instead of Query Analyzer to display execution plans is that it can do so for a great many queries from your actual production work, instead of running one at a time using Query Analyzer.

To capture and display query execution plans using Profiler, you must create a trace using the following configuration:

Events to Capture

Performance: Execution Plan
Performance: Show Plan All
Performance: Show Plan Statistics
Performance: Show Plan Text

Data Columns to Display

StartTime
Duration
TextData
CPU
Reads
Writes

Filters

Duration. You will want to specify a maximum duration, such as 5 seconds, so that you don't get flooded with too much data.

Of course, you can capture more information than is listed above in your trace, the above is only a guideline. But keep in mind that you don't want to capture too much data, as this could have a negative affect on your server's performance as the trace is being run.

If you use the OPTION FAST hint in a query, be aware that the Execution Plan results may not be what you expect. The Execution Plan that you get is based on the results of using the FAST hint, not the actual Execution Plan for the full query.

The FAST hint is used to tell the Query Optimizer to return the specified number of rows as fast as possible, even if they hurts the overall performance of the query. The purpose of this hint is to return a specified number of records quickly in order to produce an illusion of speed for the user. Once the specified number of rows is returned, the remaining rows are retuned as they would be normally.

So if you are using the FAST hint, the execution plan will be for only those rows that are returned FAST, not for all of the rows. If you want to see the execution plan for all the rows, then you must perform an Execution Plan of the query with the hint removed.

Please visit my other blogs too: http://edwardanil.blogspot.com for information and
http://netsell.blogspot.com for internet marketing. Thanks !!

Thursday, November 10, 2005

Stored Procedures Vs Dynamic SQL : The Argument Continues

Argument For Stored Procedures and Against Dynamic SQL
http://weblogs.asp.net/rhoward/archive/2003/11/17/38095.aspx
http://weblogs.asp.net/rhoward/archive/2003/11/18/38298.aspx
http://weblogs.asp.net/rhoward/archive/2003/11/18/38446.aspx

Argument For Dynamic SQL and Against Stored Procedures
http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx

Argument For Dynamic SQL and For Stored Procedures !!
http://www.simple-talk.com/2005/04/11/to-sp-or-not-to-sp-in-sql-server/

When did Dynamic SQL get such a bad name?
http://radio.weblogs.com/0101986/stories/2003/11/12/whenDidDynamicSqlGetSuchABadName.html

DATA POINTS
http://msdn.microsoft.com/msdnmag/issues/03/11/datapoints/

Bogus resumes and unblushing lies: navigating the database hiring waters

by Phil Factor

I have mixed feelings about selecting a team for a development project. I’ve been so long in the industry and have played every part - from blushing young novice to hard-bitten contractor to harassed employer - that I should find the process a simple, straightforward affair. In selecting the right people to work with though, I’ve learned that the more I know the more I know what I don’t know.

One problem with selecting candidates to undertake an IT role is that so many of them write bogus resumes and lie unblushingly about their skills and experiences during the interview. In a well-ordered universe, all references would be contacted, qualifications checked, and previous employers phoned to check the story, but I have yet to inhabit this dream world. And even with these precautions, errors are made.

Making false statements regarding qualifications on a resume and thereby obtaining a job is a serious criminal offence. I’ve only heard of prosecutions against teachers or doctors, but if the police were to turn their attention to the IT industry, our prisons would be overflowing. Quite often, when employment agencies call about a job they want me to apply for, I’m asked to rewrite my resume so it fits the job requirements more closely. Occasionally, they want to do it for me.

The unworthy wunderkind

I once got a job as a SQL Server developer for a telecommunications company. The pay was good, and it was a restful job after the helter-skelter of the dot-com boom. A month after I started, the IT director rushed in excitedly, holding a resume, saying he had just interviewed, and hired, a most excellent fellow to be my team leader. Would I move my desk over to give him a cubicle commensurate with his qualifications? The name sounded vaguely familiar, but I shrugged and thought fondly of my hourly rate - the perfect panacea for stress in any contractor.

The wunderkind arrived a fortnight later. I recognized him immediately. I’d recently employed him as a C++ programmer, before his reincarnation as a database expert. He had, in fact, been a mediocre C++ programmer with a poor grasp of databases. Then he emerged, like a moth from its cocoon, as a SQL Server expert. For three months I stared at him like Macbeth’s ghost of Banquo as he messed up catastrophically. I had, of course, told the IT director, who assumed I was motivated by professional jealousy.

Programming in Croydon

In another position, I interviewed a programmer for some tricky database work that required interfacing with a variety of languages. Was he familiar with Pascal, I asked? Had he used Visual Basic? Having worked my way through the list of languages and been assured with great effusions of sincerity that he had programmed in all of them, I moved on to other questions.

Our office was located in an out-of-the-way borough of London, and, with my radar on alert from his overenthusiastic responses, I thought to ask the candidate if he minded programming in Croydon, which was, of course, the name of our town. To my surprise, he burst into a speech about his expertise with the Croydon language!

To make matters worse, when employment agencies get wind of an opening in the IT field, there’s often a feeding frenzy and the hiring manager is called repeatedly and bombarded with resumes. Over the past 20 years, I have known a good number of agents, some of whom are honest and check the resumes of their candidates. To these I send Christmas cards every year. Three extra cards is not a great expense, after all.

Some candidates go too far. One resume I reviewed stated that the applicant had two PhDs and had worked for IBM, Microsoft and a host of other large employers. I was intrigued enough to interview the man, at which time it was immediately apparent that his resume was a complete fabrication. He wasn’t old enough to have attained his alleged qualifications, let alone his work experience. His ignorance of the elementary principles of database theory was startling as well.

Questions from a wily nincompoop

After countless experiences like this, I determined that the way to weed out the poor candidates was to ask some searching technical questions. I have lost count of the times I have suffered these myself, and I know only too well that one’s brain can be reduced to jelly when having to remember a few facts that are so familiar that, outside of an interview setting, they are like remembering the names of one’s parents.

It’s monstrous to confront someone with an examination without warning, like some human resource departments did in the 1990s with their intelligence and personality tests. I always warn candidates, via their agents, that they may be asked a few relevant technical questions, but that these would not take too long.

When conducting an interview, I generally adopt the persona of a kindly, middle-aged nincompoop. I gaze at the candidate with the bonhomie one imagines Santa would adopt. Other employers adopt a sullen approach, but I try not to upset the candidate just in case he subsequently lies himself into a job from which he might one day interview me. It has happened. The following approach never seems to have caused resentment in a candidate.

The first part of the interview consists of a rather optimistic account of the company and the pleasure one would derive from working for it. After this softening-up exercise, I gaze at the candidate benignly and ask how he would rate his SQL Server skills. If I have done the first part of the interview correctly, he will tell me his skills are superb.

“Oh, good, so you wouldn’t mind answering a few simple questions about SQL Server?” I ask, with a kindly, paternalistic smile. “Nothing particularly technical, you understand, but I would like to see how you would approach a simple problem in any of the major SQL databases.”

“Of course I wouldn’t,” the candidate inevitably replies, in anticipation of being tickled with a feather duster.

The metaphorical baseball bat that I then use consists of a number of simple questions that anyone who had done serious work in SQL Server would easily tackle but which immediately seem to sort the wheat from the chaff.. In no particular order, below is a small selection of the questions I use, just to give the feel for what they are like.

You have two tables of identical structure, with some identical entries and some different entries. How would you list out the rows from one table that were not contained in the other? How might you list out all entries in either table that were not common to both tables?

Not hard, eh? There are several neat ways of doing this, all of which are valid. I’m pleased when the candidate gets the first part of the question right, and overjoyed if he gets both. More often I hear replies such as: “One would never have two identically structured tables in a database,” or “It can’t be done.”

Imagine you are in charge of a database that has a customer table with an identity field used as a primary key. You find out that this table has duplicate entries. How would you go about finding them? What strategy would you use for eliminating them? What might you need to watch out for?

A good candidate will rattle on about the various tactics that could be adopted and the pitfalls of any rash attempt at de-duplication. It must be difficult to have had experience at the sharp end of commercial database work without being faced with the task of mopping up. I get a little flush of pleasure if the prospective candidate mentions the possibility of using the “group by” clause.

You are asked to produce an accounting report that lists credits and debits in each row on a number of customer accounts, and requires a column that gives a running total for that particular account holder after the transaction. How would you tackle that?

It is always nice to hear the words “correlated subquery” in the answer, but that’s not often the case. Maybe simple financial accounting skills are not taught any more. But even if the programmer is not familiar with subqueries, it is fascinating to see how he wrestles with the problem.

In the interview, I spell out the problem in more detail, and often pull out the report so the candidate can see what I’m talking about. Many times I’m told with complete assurance that it simply isn’t possible to do what I’ve requested in SQL, and I’m given an elaborate account of creating a Java module to do it. How’s that for insight into the candidate’s work?

There’s nothing too technical in my questions. A smart database developer will do as much as he can without specific details of a particular database system or version, as this general knowledge has a long shelf life. It’s more important to know where to get the information to do the job in the shortest possible time.

Also important is an appreciation of set theory, an understanding of SQL, and insight into non-procedural programming. You would weep to know how few candidates who claim to be professional database programmers even understand the questions, let alone come up with answers. If they can tackle one question, they can generally tackle them all. And even if one question is, by a quirk of past experience, unfamiliar to them, they will come up with wonderful solutions that illustrate their essential understanding.

Works well with others?

Before one runs away with the idea that these questions represent the royal road to the ideal database developer, remember that knowledge is nothing if the candidate can’t work with the team.

At one multinational company I worked for, the resume and erudition of a new Sybase programmer dazzled the IT director. He seemed wonderful, and at least technically, he was.

Soon after he had been shown his desk in our crowded, open-plan IT office, he began laughing out loud sporadically and inappropriately while reviewing code. It wasn’t my code, so I didn’t mind. But then his strange mannerisms began, complete with bizarre, jerky arm movements. His digestion seemed to be shot to pieces, and foul air issued noisily from him. Work in the department slowed to a crawl as every neck craned to see what he would do next next.

I tried to fit the man into the team. He confided in me that fools and idiots surrounded him. He mercilessly criticized the work of his colleagues. He definitely lacked personal skills.

Finally, after he had cleared an impressive space around his cubicle, his indignation at the incompetence of his co-workers proved too much for his self-restraint.

“Look at this code,” he spluttered. “It is outrageous! I shall go and speak to the IT director!”

“Hmm. Not a good idea.”

“He must be made aware!” he said firmly and with great zeal. So off he went. The IT director made it clear to me that the man had to go. Shortly thereafter, we reclaimed our space and settled back to our former mundane harmony.

There is something to be said for top-notch technical skills on the development team, but only when combined with an affable personality and appropriate social skills.

###

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 20 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

Programmer Helper

Search This Blog

Thursday, November 24, 2005

New Sober.Y variant is 2005 largest email worm outbreak

Wednesday, November 23, 2005

SQL Server Query Execution Plan Analysis

Thursday, November 10, 2005

Stored Procedures Vs Dynamic SQL : The Argument Continues

Bogus resumes and unblushing lies: navigating the database hiring waters

Links

Total Pageviews

Followers

My other articles

Pix Watch