Search This Blog

Monday, July 31, 2006

ASP Tips to Improve Performance and Style

Performance is a feature. You need to design for performance up front, or you get to rewrite your application later on. That said, what are some good strategies for maximizing the performance of your Active Server Pages (ASP) application?

This article presents tips for optimizing ASP applications and Visual Basic® Scripting Edition (VBScript). Many traps and pitfalls are discussed. The suggestions listed in this article have been tested on http://www.microsoft.com and other sites, and work very well. This article assumes that you have a basic understanding of ASP development, including VBScript and/or JScript, ASP Applications, ASP Sessions, and the other ASP intrinsic objects (Request, Response, and Server).

Often, ASP performance depends on much more than the ASP code itself. Rather than cover all wisdom in one article, we list performance-related resources at the end. These links cover both ASP and non-ASP topics, including ActiveX® Data Objects (ADO), Component Object Model (COM), databases, and Internet Information Server (IIS) configuration. These are some of our favorite links-be sure to give them a look.

  1. Cache Frequently-Used Data on the Web Server
  2. Cache Frequently-Used Data in the Application or Session Objects
  3. Cache Data and HTML on the Web Server's Disks
  4. Avoid Caching Non-Agile Components in the Application or Session Objects
  5. Do Not Cache Database Connections in the Application or Session Objects and Using the Session Object Wisely
  6. Encapsulate Code in COM Objects & Acquire Resources Late, Release Early
  7. Out-of-Process Execution Trades off Performance for Reliability
  8. Option Explicit, Local Variables and Script Variables
  9. Avoid Redimensioning Arrays
  10. Use Response Buffering
  11. Batch Inline Script and Response.Write Statements
  12. Check Connection, Using the OBJECT Tag, TypeLib Declarations
  13. Take Advantage of Your Browser's Validation Abilities & Enable Browser and Proxy Caching
  14. Avoid String Concatenation in Loops
  15. More on Fine Tuning

Len Cardinal
Senior Consultant, Microsoft Consulting Services
George V. Reilly
Microsoft IIS Performance Lead

Adapted from an article by Nancy Cluts
Developer Technology Engineer
Microsoft Corporation


Original Resource: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnasp/html/asptips.asp

ASP Tips: Tip 15: More on Fine Tuning

Use Server.Transfer Instead of Response.Redirect Whenever Possible

Response.Redirect tells the browser to request a different page. This function is often used to redirect the user to a log on or error page. Since a redirect forces a new page request, the result is that the browser has to make two round trips to the Web server, and the Web server has to handle an extra request. IIS 5.0 introduces a new function, Server.Transfer, which transfers execution to a different ASP page on the same server. This avoids the extra browser-to-Web-server round trip, resulting in better overall system performance, as well as better response time for the user. Check out New Directions in Redirection, which talks about Server.Transfer and Server.Execute.

Also see Leveraging ASP in IIS 5.0 for a full list of the new features in IIS 5.0 and ASP 3.0.

Use Trailing Slashes in Directory URLs

A related tip is to make sure to use a trailing slash (/) in URLs that point to directories. If you omit the trailing slash, the browser will make a request to the server, only to be told that it's asking for a directory. The browser will then make a second request with the slash appended to the URL, and only then will the server respond with the default document for that directory, or a directory listing if there is no default document and directory browsing has been enabled. Appending the slash cuts out the first, futile round trip. For user-friendliness, you may want to omit the trailing slash in display names.

For example, write:

<a href="http://msdn.microsoft.com/workshop/"
title="MSDN Web Workshop">http://msdn.microsoft.com/workshop</a>

This also applies to URLs pointing to the home page on a Web site: Use the following:
<a href="http://msdn.microsoft.com/">, not <a href="http://msdn.microsoft.com">.

Avoid Using Server Variables

Accessing server variables causes your Web site to make a special request to the server and collect all server variables, not just the one that you requested. This is akin to needing to retrieve a specific item in a folder that you have in that musty attic of yours. When you want that one item, you have to go to the attic to get the folder first, before you can access the item. This is the same thing that happens when you request a server variable—the performance hit occurs the first time you request a server variable. Subsequent requests for other server variables do not cause performance hits.

Never access the Request object unqualified (for example, Request("Data")). For items not in Request.Cookies, Request.Form, Request.QueryString, or Request.ClientCertificate, there is an implicit call to Request.ServerVariables. The Request.ServerVariables collection is much slower than the other collections.

Upgrade to the Latest and Greatest

System components are constantly updated and we recommend that you upgrade to the latest and greatest. Best of all would be to upgrade to Windows 2000 (and hence, IIS 5.0, ADO 2.5, MSXML 2.5, Internet Explorer 5.0, VBScript 5.1, and JScript 5.1). IIS 5.0 and ADO 2.5 implement spectacular performance gains on multiprocessor machines. Under Windows 2000, ASP scales nicely to four processors or more, whereas under IIS 4.0, ASP didn't scale well beyond two processors. The more script code and ADO usage in your application, the more performance benefits you'll see after upgrading to Windows 2000.

If you can't upgrade to Windows 2000 just yet, you can upgrade to the latest releases of SQL Server, ADO, VBScript and JScript, MSXML, Internet Explorer, and Windows NT 4 Service Packs. All of them offer improved performance as well as increased reliability.

Tune Your Web Server

There are several IIS tuning parameters that can improve site performance. For example, with IIS 4.0, we've often found that increasing the ASP ProcessorThreadMax parameter (see IIS documentation) can have significant benefits, especially on sites that tend to wait on back-end resources such as databases or other middle-ware products such as screen-scrapers. In IIS 5.0, you may find that turning on ASP Thread Gating is more effective than trying to find an optimal setting for AspProcessorThreadMax, as it is now known.

For good references, see Tuning IIS below.

The optimal configuration settings are going to be determined by (among other factors) your application code, the hardware it runs on, and the client workload. The only way to discover the optimal settings is to run performance tests, which brings us to the next tip.

Do Performance Testing

As we said before, performance is a feature. If you are trying to improve performance on a site, set a performance goal, then make incremental improvements until you reach your goal. Don't save all performance testing for the end of the project. Often, at the end of a project, it's too late to make necessary architectural changes, and you disappoint your customer. Make performance testing a part of your daily testing. Performance testing can be done against individual components, such as ASP pages or COM objects, or on the site as a whole.

Many people test the performance of their Web sites by using a single browser to request pages. This will give you a good feel for the responsiveness of the site, but it will tell you nothing about how well the site performs under load.

Generally, to accurately measure performance, you need a dedicated testing environment. This environment should include hardware that somewhat resembles production hardware in terms of processor speed, number of processors, memory, disk, network configuration, and so on. Next, you need to define your client workload: how many simultaneous users, the frequency of requests they will be making, the types of pages they'll be hitting, and so forth. If you don't have access to realistic usage data from your site, you'll need to guesstimate. Finally, you need a tool that can simulate your anticipated client workloads. Armed with these tools, you can start to answer questions such as "How many servers will I need if I have N simultaneous users?" You can also sniff out bottlenecks and target these for optimization.

Some good Web stress-testing tools are listed below. We highly recommend the Microsoft Web Application Stress (WAS) Toolkit. WAS allows you to record test scripts and then simulate hundreds or thousands of users hitting your Web servers. WAS reports numerous statistics, including requests per second, response time distributions, and error counts. WAS has both a rich-client and a Web-based interface; the Web interface allows you to run tests remotely.

ASP Tips: Tip 14: Avoid String Concatenation in Loops

Many people build a string in a loop like this:

s = "<table>" & vbCrLf
For Each fld in rs.Fields
s = s & " <th>" & fld.Name & "</th> "
Next

While Not rs.EOF
s = s & vbCrLf & " <tr>"
For Each fld in rs.Fields
s = s & " <td>" & fld.Value & "</td> "
Next
s = s & " </tr>"
rs.MoveNext
Wend

s = s & vbCrLf & "</table>" & vbCrLf
Response.Write s

There are several problems with this approach. The first is that repeatedly concatenating a string takes quadratic time; less formally, the time that it takes to run this loop is proportional to the square of the number of records times the number of fields. A simpler example should make this clearer.

s = ""
For i = Asc("A") to Asc("Z")
s = s & Chr(i)
Next

On the first iteration, you get a one-character string, "A". On the second iteration, VBScript has to reallocate the string and copy two characters ("AB") into s. On the third iteration, it has to reallocate s again and copy three characters into s. On the Nth (26th) iteration, it has to reallocate and copy N characters into s. That's a total of 1+2+3+...+N which is N*(N+1)/2 copies.

In the recordset example above, if there were 100 records and 5 fields, the inner loop would be executed 100*5 = 500 times and the time taken to do all the copying and reallocation would be proportional to 500*500 = 250,000. That's a lot of copying for a modest-sized recordset.

In this example, the code could be improved by replacing the string concatenation with Response.Write() or inline script (<% = fld.Value %>). If response buffering is turned on (as it should be), this will be fast, as Response.Write just appends the data to the end of the response buffer. No reallocation is involved and it's very efficient.

In the particular case of transforming an ADO recordset into an HTML table, consider using GetRows or GetString.

If you concatenate strings in JScript, it is highly recommended that you use the += operator; that is, use s += "some string", not s = s + "some string".

ASP Tips: Tip 13: Use Your Browser

Take Advantage of Your Browser's Validation Abilities

Modern browsers have advanced support for features such as XML, DHTML, Java applets, and the Remote Data Service. Take advantage of these features whenever you can. All of these technologies can save round trips to the Web server by performing client-side validation as well as data caching. If you are running a smart browser, the browser is capable of doing some validation for you (for example, checking that a credit card has a valid checksum before executing POST). Again, take advantage of this whenever you can. By cutting down on client-server round trips, you'll reduce the stress on the Web server and cut down network traffic (though the initial page sent to the browser is likely to be larger), as well as any back-end resources that the server accesses. Furthermore, the user will not have to fetch new pages as often, improving the experience. This does not relieve you of the need to do server-side validation—you should always do server-side validation as well. This protects against bad data coming from the client for some reason, such as hacking, or browsers that don't run your client-side validation routines.

Much has been made of creating "browser-independent" HTML. This concern often discourages the developer from taking advantage of popular browser features that could benefit performance. For truly high-performance sites that must be concerned about browser "reach," a good strategy is to optimize pages for the popular browsers. Browser features can be easily detected in ASP using the Browser Capabilities Component. Tools such as Microsoft FrontPage can help you design code that works with the browsers and HTML versions you wish to target.

Enable Browser and Proxy Caching

By default, ASP disables caching in browsers and proxies. This makes sense since by nature an ASP page is dynamic with potentially time-sensitive information. If you have a page that doesn't require a refresh on every view, you should enable browser and proxy caching. This allows browsers and proxies to use a "cached" copy of a page for a certain length of time, which you can control. Caching can greatly alleviate load on your server and improve the user experience.

What kind of dynamic pages might be candidates for caching? Some examples are:

  • A weather page, where the weather is only updated every 5 minutes.
  • A home page listing news items or press releases, which are updated twice a day.
  • A mutual fund performance listing, where underlying statistics are only updated every few hours.

Note that with browser or proxy caching, you'll get less hits recorded on your Web server. If you are trying to accurately measure all page views, or post advertising, you may not be happy with browser and proxy caching.

Browser caching is controlled by the HTTP "Expires" header, which is sent by a Web server to a browser. ASP provides two simple mechanisms to send this header. To set the page to expire at a certain number of minutes in the future, set the Response.Expires property. The following example tells the browser that the content expires in 10 minutes:

<% Response.Expires = 10 %>

Setting Response.Expires to a negative number or 0 disables caching. Be sure to use a large negative number, such as -1000 (a little more than a day), to work around mismatches between the clocks on the server and the browsers. A second property, Response.ExpiresAbsolute, allows you set the specific time at which the content will expire:

<% Response.ExpiresAbsolute = #May 31,2001 13:30:15# %>

Rather than using the Response object to set expiration, you can write a <meta> tag into the HTML, usually within the <HEAD> section of the HTML file. Some browsers will respect this directive, although proxies will not.

<meta equiv="Expires" value="May 31,2001 13:30:15">

Finally, you can indicate whether the content is valid for an HTTP proxy to cache, using the Response.CacheControl property. Setting this property to "Public" enables proxies to cache the content.

<% Response.CacheControl = "Public" %>

By default, this property is set to "Private." Note that you should not enable proxy caching for pages that show data specific to a user, as the proxy may serve pages to users that belong to other users.

ASP Tips: Tip 12: Check Connection, Using the OBJECT Tag, TypeLib Declarations

Use Response.IsClientConnected Before Embarking on Long Trips

If the user gets impatient, he or she may abandon your ASP page before you even start executing their request. If he clicks Refresh or moves to a different page on your server, you will have a new request sitting at the end of the ASP request queue and a disconnected request sitting in the middle of the queue. Often this happens when your server is under high load (so it has a long request queue, with correspondingly high response times) and this only makes the situation worse. There's no point executing an ASP page (especially a slow, heavyweight ASP page) if the user is no longer connected. You can check for this condition by using the Response.IsClientConnected property. If it returns False, you should call Response.End and abandon the rest of the page. In fact, IIS 5.0 codifies this practice—whenever ASP is about to execute a new request, it checks to see how long the request has been in the queue. If it has been there for more than 3 seconds, ASP will check to see if the client is still connected and immediately terminate the request if it's not. You can use the AspQueueConnectionTestTime setting in the metabase to adjust this timeout of 3 seconds.

If you have a page that takes a very long time to execute, you may also want to check Response.IsClientConnected at intervals. When response buffering is enabled, it is a good idea to do Response.Flush at intervals to give the user the impression that something is happening.

Note On IIS 4.0, Response.IsClientConnected will not work correctly unless you first do a Response.Write. If buffering is enabled, you'll also need to do a Response.Flush. On IIS 5.0, there is no need for this—Response.IsClientConnected works fine. In any case, Response.IsClientConnected has some costs, so only use it before an operation that takes at least, say 500 milliseconds (that's a long time if you're trying to sustain a throughput of dozens of pages per second). As a general rule of thumb, don't call it in every iteration of a tight loop, such as when painting the rows of a table—perhaps every 20th or 50th row of the table, instead.

Instantiate Objects Using the <object> Tag

If you need to refer to objects that might not be used in all code paths (especially Server- or Application-scoped objects), declare them by using the <object runat="server" id="objname"> tag in Global.asa rather than using the Server.CreateObject method. Server.CreateObject creates the object immediately. If you don't use that object later, you end up wasting resources. The <object id="objname"> tag declares objname, but objname isn't actually created until the first time that one of its methods or properties are used.

This is another example of lazy evaluation.

Use TypeLib Declarations for ADO and Other Components

When using ADO, developers often include adovbs.txt to get access to ADO's various constants. This file must be included on every page that wants to use the constants. This constant file is fairly large, adding a lot of overhead to every ASP page's compilation time and script size.

IIS 5.0 introduces the ability to bind to a component's type library. This allows you to reference the type library once and use it on every ASP page. Each page does not pay the penalty of compiling the constant file, and component developers do not have to build VBScript #include files for use in ASP.

To access the ADO TypeLib, place one of the following statements in Global.asa.

<!-- METADATA NAME="Microsoft ActiveX Data Objects 2.5 Library"
TYPE="TypeLib" UUID="{00000205-0000-0010-8000-00AA006D2EA4}" -->

or

<!-- METADATA TYPE="TypeLib"
FILE="C:\Program Files\Common Files\system\ado\msado15.dll" -->

ASP Tips: Tip 11: Batch Inline Script and Response.Write Statements

The VBScript syntax <% = expression %> writes the value of "expression" to the ASP output stream. If response buffering is not turned on, then each of these statements results in writing data to the browser over the network in many small packets. This is slow. Also, interspersing small amounts of script and HTML causes switching between the script engine and HTML, reducing performance. Thus, use the following tip: Replace closely-bunched inline expressions with one call to Response.Write. For example, in the following sample, there is one write to the response stream per field per row, and many switches between VBScript and HTML per row:


<table>
<% For Each fld in rs.Fields %>
<th><% = fld.Name %></th>
<%
Next
While Not rs.EOF
%>
<tr>
<% For Each fld in rs.Fields %>
<td><% = fld.Value %></td>
<% Next
</tr>
<% rs.MoveNext
Wend %>
</table>

The more efficient code, below, has one write to the response stream per row. All of the code is contained within one VBScript block:


<table>
<%
For each fld in rs.Fields
Response.Write ("<th>" & fld.Name & "</th>" & vbCrLf)
Next
While Not rs.EOF
Response.Write ("<tr>")
For Each fld in rs.Fields %>
Response.Write("<td>" & fld.Value & "</td>" & vbCrLf)
Next
Response.Write "</tr>"
Wend
%>
</table>

This tip has a much bigger effect when response buffering is disabled. It's best to enable response buffering, and then see if batching Response.Write helps performance.

(In this particular example, the nested loop that builds the body of the table (While Not rs.EOF...) can be replaced by a carefully constructed call to GetString.)

ASP Tips: Tip 10: Use Response Buffering

You can buffer a whole page worth of output by turning on "response buffering." This minimizes the amount of writes to the browser and thus improves overall performance. Each write has a lot of overhead (both in IIS and in the amount of data sent down the wire), so the fewer the writes there are, the better. TCP/IP works much more efficiently when it can send a few large blocks of data than when it has to send many small blocks because of its slow start and Nagling algorithms (used to minimize network congestion).

There are two ways of turning response buffering on. First, you can turn on response buffering for an entire application, using the Internet Services Manager. This is the recommended approach and response buffering is turned on by default for new ASP applications in IIS 4.0 and IIS 5.0. Second, on a page-by-page basis, you can enable response buffering by placing the following line of code near the top of the ASP page:

<% Response.Buffer = True %>

This line of code must be executed before any response data has been written to the browser (that is, before any HTML appears in the ASP script and before any Cookies have been set using the Response.Cookies collection). In general, it is best to turn response buffering on for an entire Application. This allows you to avoid the above line of code on every page.

Response.Flush

One common complaint about response buffering is that users perceive ASP pages as being less responsive (even though the overall response time is improved) because they have to wait for the entire page to be generated before they start to see anything. For long-running pages, you can turn response buffering off by setting Response.Buffer = False. However, a better strategy is to utilize the Response.Flush method. This method flushes all HTML that has been painted by ASP to the browser. For example, after painting 100 rows of a 1,000-row table, ASP can call Response.Flush to force the painted results to the browser; this allows the user to see the first 100 rows before the remaining rows are ready. This technique can give you the best of both worlds-response buffering combined with the gradual presentation of data to the browser.

(Note that in the above example of a 1,000-row table, many browsers won't start painting the table until they see the closing </table> tag. Check your targeted browsers for support. To get around this, try breaking the table into multiple tables with less rows, and call Response.Flush after each table. Newer versions of Internet Explorer will paint tables before they are fully downloaded, and will paint especially fast if you specify the table's column widths; this avoids forcing Internet Explorer to calculate the column widths by measuring the width of the contents of every cell.)

The other common complaint about response buffering is that it can use a lot of server memory when generating very large pages. Leaving aside the wisdom of generating large pages, this problem can also be addressed with judicious use of Response.Flush.

ASP Tips: Tip 9: Avoid Redimensioning Arrays

Try to avoid Redim arrays. As far as performance is concerned, if you have a machine that is constrained by physical memory size, it's much better to set the initial dimension of the array to its worst-case scenario—or to set the dimension to its optimal case and redim as necessary. This does not mean that you should just go out and allocate a couple of megabytes of memory if you know you aren't going to need it.

The code below shows you gratuitous use of Dim and Redim.

<%
Dim MyArray()
Redim MyArray(2)
MyArray(0) = "hello"
MyArray(1) = "good-bye"
MyArray(2) = "farewell"
...
' some other code where you end up needing more space happens, then ...
Redim Preserve MyArray(5)
MyArray(3) = "more stuff"
MyArray(4) = "even more stuff"
MyArray(5) = "yet more stuff"
%>

It is far better to simply Dim the array to the correct size initially (in this case, that's 5), than Redim the array to make it larger. You may waste a little memory (if you don't end up using all of the elements), but the gain will be speed.

ASP Tips: Tip 8: Option Explicit, Local Variables and Script Variables

Use Option Explicit

Use Option Explicit in your .asp files. This directive placed at the top of the .asp file forces the developer to declare all variables that will be used. Many programmers consider this helpful in debugging applications, as it avoids the chance of mistyping a variable name and inadvertently creating new variables (for example, MyXLMString=... instead of MyXMLString=).

Perhaps more important, it turns out that declared variables are faster than undeclared variables. Under the covers, the scripting run time references undeclared variables by name, every time they are used. Declared variables, on the other hand, are assigned an ordinal, either at compile time or run time. Subsequently, declared variables are referenced by this ordinal. Since Option Explicit forces variable declaration, it insures that all variables are declared and thus will be accessed quickly.

Use Local Variables in Subroutines and Functions

Local variables are those declared within subroutines and functions. Within a function or subroutine, local variable access is faster than global variable access. Use of local variables also tends to make code cleaner, so use them when you can.

Copy Frequently-Used Data to Script Variables

When accessing COM objects in ASP, you should copy frequently-used object data to script variables. This will cut down on COM method calls, which are relatively expensive compared to accessing script variables. When accessing Collection and Dictionary objects, this technique also cuts down on expensive lookups.

In general, if you are going to access object data more than once, put that data into a script variable. Prime targets for this optimization are Request variables (Form and QueryString variables). For example, your site may pass around a QueryString variable called UserID. Suppose this UserID is referenced a dozen times on a particular page. Instead of calling Request("UserID") a dozen times, assign the UserID to a variable at the top of the ASP page. Then use that variable throughout the page. This will save 11 COM method calls.

In practice, accessing COM properties or methods can be deceptively expensive. Here is an example, showing some fairly common code (syntactically speaking):

Foo.bar.blah.baz = Foo.bar.blah.qaz(1)
If Foo.bar.blah.zaq = Foo.bar.blah.abc Then ' ...

When this code runs, here's what happens:

  1. The variable Foo is resolved as a global object.
  2. The variable bar is resolved as a member of Foo. This turns out to be a COM method call.
  3. The variable blah is resolved as a member of Foo.bar. This, too, turns out to be a COM method call.
  4. The variable qaz is resolved as a member of foo.bar.blah. Yes, this turns out to be a COM method call.
  5. Invoke Foo.bar.blah.quaz(1). One more COM method call. Get the picture?
  6. Do steps 1 through 3 again to resolve baz. The system does not know if the call to qaz changed the object model, so steps 1 through 3 must be done again to resolve baz.
  7. Resolve baz as a member of Foo.bar.blah. Do the property put.
  8. Do steps 1 through 3 again and resolve zaq.
  9. Do steps 1 through 3 yet another time and resolve abc.

As you can see, this is terribly inefficient (and slow). The fast way to write this code in VBScript is:

Set myobj = Foo.bar.blah ' do the resolution of blah ONCE
Myobj.baz = myobj.qaz(1)
If Myobj.zaq = Myobj.abc Then '...

If you're using VBScript 5.0 or later, you can write this using the With statement:

With Foo.bar.blah
.baz = .qaz(1)
If .zaq = .abc Then '...
...
End With

Note that this tip also works with VB programming.

ASP Tips: Tip 7: Out-of-Process Execution Trades off Performance for Reliability

Both ASP and MTS/COM+ have configuration options that allow you to trade off reliability for performance. You should understand these trade-offs when building and deploying your application.

ASP Options

ASP Applications can be configured to run in one of three ways. With IIS 5.0, the terminology "isolation level" has been introduced to describe these options. The three isolation level values are Low, Medium, and High:

  • Low Isolation. This is supported in all versions of IIS and is the fastest. It runs ASP in Inetinfo.exe, which is the primary IIS process. If the ASP application crashes, so does IIS. (To restart IIS under IIS 4.0, Webmasters would monitor the site using tools such as InetMon, and fire off batch files to restart the server if it failed. IIS 5.0 introduces reliable restart, which automatically restarts a failed server.)
  • Medium Isolation. IIS 5.0 introduces this new level, which is referred to as out-of-process, since ASP runs outside of the IIS process. In Medium isolation, all ASP applications configured to run as Medium share a single process space. This reduces the number of processes required to run multiple out-of-process ASP applications on one box. Medium is the default isolation level in IIS 5.0.
  • High Isolation. Supported in IIS 4.0 and IIS 5.0, High isolation is also out-of-process. If ASP crashes, the Web server doesn't. The ASP application is automatically restarted on the next ASP request. With High isolation, each ASP application that is configured to run as High runs in its own process space. This protects ASP applications from each other. Its drawback is that it requires a separate process for each ASP application. This can add up to a lot of overhead when dozens of applications need to be hosted on one box.

Which option is the best? In IIS 4.0, there was a fairly steep performance penalty for running out-of-process. In IIS 5.0, a lot of work was done to minimize the cost of running ASP applications out-of-process. In fact, in most tests, ASP out-of-process applications in IIS 5.0 run faster than in-process applications in IIS 4.0. Regardless, in-process (Low isolation level) still produces the best performance on both platforms. However, you won't see much benefit to the Low isolation level if you have a relatively low hit rate or low maximum throughput. Therefore, you should not feel the need to reach for the Low isolation level until you need hundreds or thousands of pages per second per Web server. As always, test with multiple configurations and determine which trade-offs you are willing to make.

Note When you run ASP applications out-of-process (Medium or High isolation), they run in MTS on NT4 and COM+ on Windows 2000. That is, on NT4 they run in Mtx.exe, and on Windows 2000, they run in DllHost.exe. You can see these processes running in Task Manager. You can also see how IIS configures MTS Packages or COM+ Applications for out-of-process ASP applications.

COM Options

COM components also have three configuration options, though not completely analogous to the ASP options. COM components can be:"unconfigured," configured as Library Applications, or configured as Server Applications. Unconfigured means that the component is not registered with COM+. The component will run in the caller's process space, that is, they are "in-process." Library Applications are also in-process, but benefit from COM+'s services, including security, transactions, and context support. Server Applications are configured to run in their own process space.

You may see a slight benefit of unconfigured components over Library Applications. You're likely to see a large performance benefit of Library Applications over Server Applications. This is because Library Applications run in the same process as ASP, whereas Server Applications run in their own process. Inter-process calls are more expensive than in-process calls. Also, when passing data such as recordsets between processes, all of the data must be copied between the two processes.

Pitfall! When using COM Server Applications, if you pass objects between ASP and COM, make sure that the objects implement "marshal-by-value," or MBV. Objects that implement MBV copy themselves from one process to another. This is better than the alternative, in which the object remains in the creator's process, and the other process calls repeatedly into the creating process to use the object. Disconnected ADO recordsets will marshal-by-value; connected recordsets won't. The Scripting.Dictionary does not implement MBV and should not be passed between processes. Finally, a message to VB programmers out there: MBV is NOT achieved by passing a parameter ByVal. MBV is implemented by the original component author.

What to Do?

If we were to recommend configurations with reasonable trade-offs of performance versus reliability, they would be as follows:

  • On IIS 4.0, use ASP's Low Isolation level, and use MTS Server Packages.
  • On IIS 5.0, use ASP's Medium isolation level, and use COM+ Library Applications.

These are very general guidelines; hosting companies generally run ASP at Medium or High Isolation level, whereas single-purpose Web servers can be run at Low isolation. Measure the trade-offs and decide for yourself which configuration meets your needs.

ASP Tips: Tip 6: Encapsulate Code and Working on Resources

Encapsulate Code in COM Objects

If you have a lot of VBScript or JScript, you can often improve performance by moving the code to a compiled COM object. Compiled code typically runs faster than interpreted code. Compiled COM objects can access other COM objects through "early binding," a more efficient means of invoking COM object methods than the "late binding" employed by script.

There are advantages (beyond performance) to encapsulating code in COM objects:

  • COM objects are good for separating presentation logic from business logic.
  • COM objects enable code reuse.
  • Many developers find code written in VB, C++, or Visual J++ easier to debug than ASP.

COM objects have disadvantages, including initial development time and the need for different programming skills. Be warned that encapsulating small amounts of ASP may cause performance penalties, rather than gains. Typically, this happens when a small amount of ASP code is wrapped into a COM object. In this case, the overhead of creating and invoking the COM object outweighs the benefit of the compiled code. It is a matter of trial and error to determine what combination of ASP script and COM object code produces the best performance. Note that Microsoft has vastly improved script and ADO performance in Windows 2000/IIS 5.0 over Windows NT® 4.0/IIS 4.0. Thus, the performance advantage enjoyed by compiled code over ASP code has decreased with the introduction of IIS 5.0.

For great discussions about the benefits and pitfalls of using COM objects in ASP, see ASP Component Guidelines and Programming Distributed Applications with COM and Microsoft Visual Basic 6.0. If you do deploy COM components, it is particularly important that you stress test them. In fact, all ASP applications should be stress tested as a matter of course.

Acquire Resources Late, Release Early

Here's a short tip for you. In general, it's best to acquire resources late and release them early. This goes for COM objects as well as file handles and other resources.

ADO Connections and recordsets are the prime candidates for this optimization. When you are done using a recordset, say after painting a table with its data, release it immediately, rather than waiting until the end of the page. Setting your VBScript variable to Nothing is a best practice. Don't let the recordset simply fall out of scope. Also, release any related Command or Connection objects. (Don't forget to call Close() on recordsets or Connections before setting them = Nothing.) This shortens the time span in which the database must juggle resources for you, and releases the database connection to the connection pool as quickly as possible.

ASP Tips: Tip 5: Do Not Cache Database Connections in the Application or Session Objects and Using the Session Object Wisely

Caching ADO Connections is usually a bad strategy. If one Connection object is stored in the Application object and used on all pages, then all pages will contend for use of this connection. If the Connection object is stored in the ASP Session object, then a database connection will be created for every user. This defeats the benefits of connection pooling and puts unnecessarily high stress on both the Web server and the database.

Instead of caching database connections, create and destroy ADO objects on every ASP page that uses ADO. This is efficient because IIS has database connection pooling built in. More accurately, IIS automatically enables OLEDB and ODBC connection pooling. This ensures that creating and destroying connections on each page will be efficient.

Since connected recordsets store a reference to a database connection, it follows that you should not cache connected recordsets in the Application or Session objects. However, you can safely cache disconnected recordsets, which don't hold a reference to their data connection. To disconnect a recordset, take the following two steps:

    Set rs = Server.CreateObject("ADODB.RecordSet")
rs.CursorLocation = adUseClient ' step 1

' Populate the recordset with data
rs.Open strQuery, strProv

' Now disconnect the recordset from the data provider and data source
rs.ActiveConnection = Nothing ' step 2

More information about connection pooling can be found in the ADO and SQL Server references.


Now that we've espoused the virtues of caching in Applications and Sessions, we're going to suggest avoiding the Session object. Sessions have several pitfalls when used with busy sites, as we'll discuss. By busy, we generally mean sites requiring hundreds of pages a second or thousands of simultaneous users. This tip is even more important for sites that must scale horizontally—that is, those sites that utilize multiple servers to accommodate load or implement fault tolerance. For smaller sites, such as intranet sites, the conveniences of Sessions are worth the overhead.

To recap, ASP automatically creates a Session for every user that hits a Web server. Each Session has about 10 KB of memory overhead (on top of whatever data is stored in the Session), and slows all requests down a bit. The Session stays alive until a configurable timeout period, usually 20 minutes.

The biggest issue with Sessions is not performance but scalability. Sessions don't span Web servers; once a Session is created on one server, its data stays there. This means that if you use Sessions in a Web farm, you have to devise a strategy for each user's requests to always be directed to the server on which the user's Session exists. This is referred to as "sticking" a user to a Web server. The term "sticky sessions" derives from this. "Stuck" users will lose their Session state if the Web server crashes, since Sessions are not persisted to disk.

Strategies for implementing sticky sessions include hardware and software solutions. Solutions such as Network Load Balancing in Windows 2000 Advanced Server and Cisco's Local Director can implement sticky sessions, at the cost of some scalability. These solutions are not perfect. Rolling your own software solution at this point of time is not recommended (we used to use ISAPI filters and URL mangling and such).

The Application object doesn't span servers either; if you need to share and update Application data across the Web farm, you'll need to use a back-end database. Read-only Application data is still useful in Web farms, however.

Most mission-critical sites will want to deploy at least two Web servers, if for no other reason than increasing uptime (handling failover and server maintenance). Therefore, in designing your mission-critical application, you'll need to either implement "sticky sessions," or simply avoid Sessions, as well as any other state-management technique that stores user state on individual Web servers.

If you are not using Sessions, be sure to turn them off. You can do this for your application through the Internet Services Manager (see the ISM documentation). If you decide to use Sessions, you can minimize their performance impact in several ways.

You can move content that doesn't require Sessions (such as Help screens, visitor areas, and so forth.) into a separate ASP application that has Sessions turned off. On a page-by-page basis, you can provide a hint to ASP that you won't need the Session object on a given page; use the following directive placed at the top your ASP page:

<% @EnableSessionState=False %>

One good reason to use this directive is that Sessions create an interesting problem with framesets. ASP guarantees that only one request from a Session will be executing at any time. This insures that if a browser requests multiple pages for one user, only one ASP request will touch the Session at a time; this avoids multithreading problems when accessing the Session object. Unfortunately, as a result, all pages in a frameset will be painted in a serialized manner, one after another, rather than simultaneously. The user may have to wait a long time for all of the frames. The moral of the story: if certain frameset pages have no reliance on the Session, be sure to tell ASP, using the @EnableSessionState=False directive.

As an alternative to using the Session object, there are many options for managing Session state. For small amounts of state (less than 4 KB), we usually recommend using Cookies, QueryString variables, and hidden-form variables. For larger amounts of data such as shopping carts, a back-end database is the most appropriate choice. A lot has been written about state-management techniques in Web server farms. See the Session state references for more details.

ASP Tips: Tip 4: Avoid Caching Non-Agile Components in the Application or Session Objects

While caching data in the Application or Session object can be a good idea, caching COM objects can have serious pitfalls. It is often tempting to stuff frequently-used COM objects into the Application or Session objects. Unfortunately, many COM objects, including all of those written in Visual Basic 6.0 or earlier, can cause serious bottlenecks when stored in the Application or Session objects.

Specifically, any component that is not agile will cause performance bottlenecks when cached in the Session or Application objects. An agile component is a component marked ThreadingModel=Both that aggregates the Free-threaded marshaler (FTM), or is a component that is marked ThreadingModel=Neutral. (The Neutral model is new to Windows® 2000 and COM+.) The following components are not agile:

  • Free-threaded components (unless they aggregate the FTM).
  • Apartment-threaded components.
  • Single-threaded component.

Configured components (Microsoft Transaction Server (MTS)/COM+ library and server packages/applications) are not agile unless they are Neutral-threaded. Apartment-threaded components and other non-agile components work best at page scope (that is, they are created and destroyed on a single ASP page).

In IIS 4.0, a component marked ThreadingModel=Both was considered agile. In IIS 5.0, that is no longer sufficient. The component must not only be marked Both, it must also aggregate the FTM. Agility in Server Components describes how to make C++ components written with the Active Template Library aggregate the FTM. Be aware that if your component caches interface pointers, those pointers must themselves be agile, or must be stored in the COM Global Interface Table (GIT). If you can't recompile a Both-threaded component to aggregate the FTM, you can mark the component as ThreadingModel=Neutral. Alternatively, if you don't want IIS performing the agility check (thus, you want to allow non-agile components to be stored at Application or Session scope), you can set AspTrackThreadingModel to True in the metabase. Changing AspTrackThreadingModel is not recommended.

IIS 5.0 will throw an error if you attempt to store a non-agile component created with Server.CreateObject in the Application object. You can work around this by using <object runat="server" scope="application"> in Global.asa, but this is not recommended, as it leads to marshaling and serialization, as explained below.

What goes wrong if you cache non-agile components? A non-agile component cached in the Session object will "lock down" the Session to an ASP worker thread. ASP maintains a pool of worker threads that services requests. Normally, a new request is handled by the first-available worker thread. If a Session is locked down to a thread, then the request has to wait for its associated thread to become available. Here's an analogy that might help: you go to a supermarket, select some groceries, and pay for them at checkout stand #3. Thereafter, whenever you pay for groceries at that supermarket, you always have to pay for them at stand #3, even though other checkout stands might have shorter or even empty lines.

Storing non-agile components at Application scope has an even worse effect on performance. ASP has to create a special thread to run non-agile, Application-scoped components. This has two consequences: all calls have to be marshaled to this thread and all calls are serialized. Marshaling means that the parameters have to be stored in a shared area of memory; an expensive context switch is made to the special thread; the component's method is executed; the results are marshaled into a shared area; and another expensive context switch reverts control to the original thread. Serialization means that all methods are run one at a time. It is not possible for two different ASP worker threads to be executing methods on the shared component simultaneously. This kills concurrency, especially on multiprocessor machines. Worse still, all non-agile Application-scoped components share one thread (the "Host STA"), so the effects of serialization are even more marked.

Confused? Here are some general rules. If you are writing objects in Visual Basic (6.0) or earlier, do not cache them in the Application or Session objects. If you don't know an object's threading model, don't cache it. Instead of caching non-agile objects, you should create and release them on each page. The objects will run directly on the ASP worker thread, so there will be no marshaling or serialization. Performance will be adequate if the COM objects are running on the IIS box, and if they don't take a long time to initialize and destroy. Note that Single-threaded objects should not be used this way. Be careful—VB can create Single-threaded objects! If you have to use Single-threaded objects this way (such as a Microsoft Excel spreadsheet) don't expect high throughput.

ADO recordsets can be safely cached when ADO is marked as Free-threaded. To mark ADO as Free-threaded, use the Makfre15.bat file, which is typically located in the directory \\Program Files\Common\System\ADO.

Warning: ADO should not be marked as Free-threaded if you are using Microsoft Access as your database. The ADO recordset must also be disconnected In general, if you cannot control the ADO configuration on your site (for example, you are an independent software vendor [ISV] who sells a Web application to customers who will manage their own configurations), you are probably better off not caching recordsets.

Dictionary components are also agile objects. The LookupTable loads its data from a data file and is useful for combo-box data as well as configuration information. The PageCache object from Duwamish Books provides dictionary semantics, as does the Caprock Dictionary. These objects, or derivatives thereof, can form the basis of an effective caching strategy. Note that the Scripting.Dictionary object is NOT agile, and should not be stored at Application or Session scope.

ASP Tips: Tip 3: Cache Data and HTML on the Web Server's Disks

Sometimes, you may have too much data to cache in memory. "Too much" is a judgment call; it depends on how much memory you want to consume, as well as the number of items to cache and the frequency of which these items will be retrieved. In any case, if you have too much data for in-memory caching, consider caching data in text or XML files on the Web servers' hard disks. You can combine caching data on disks and in memory to build the optimum caching strategy for your site.

Note that when measuring the performance of a single ASP page, retrieving data on disk may not always be faster than retrieving the data from a database. But caching reduces load on the database and on the network. Under high loads, this will greatly improve overall throughput. Caching can be very effective when caching the results of an expensive query, such as a multitable join or a complex stored procedure, or caching large result sets. As always, test competing schemes.

ASP and COM provide several tools for building disk-based caching schemes. The ADO recordset Save() and Open() functions save and load recordsets from disk. You could use these methods to rewrite the sample code from the Application data caching tip, above, substituting a Save() to file for the code that writes to the Application object.

There are a few other components that work with files:

  • Scripting.FileSystemObject allows you to create, read, and write files.
  • MSXML, the Microsoft® XML parser that comes with Internet Explorer, supports saving and loading XML documents.
  • The LookupTable object (sample, used on MSN) is a great choice for loading simple lists from disk.

Finally, consider caching the presentation of data on disk, rather than the data itself. Prerendered HTML can be stored as an .htm or .asp file on disk; hyperlinks can point to these files directly. You can automate the process of generating HTML using commercial tools such as XBuilder, or Microsoft® SQL Server™ Internet publishing features. Alternatively, you can #include HTML snippets into an .asp file. You can also read HTML files from disk using the FileSystemObject, or use XML for early rendering.

ASP Tips: Tip 2: Cache Frequently-Used Data in the Application or Session Objects

The ASP Application and Session objects provide convenient containers for caching data in memory. You can assign data to both Application and Session objects, and this data will remain in memory between HTTP calls. Session data is stored per user, while Application data is shared between all users.

At what point do you load data into the Application or Session? Often, the data is loaded when an Application or Session starts. To load data during Application or Session startup, add appropriate code to Application_OnStart() or Session_OnStart(), respectively. These functions should be located in Global.asa; if they are not, you can add these functions. You can also load the data when it's first needed. To do this, add some code (or write a reusable script function) to your ASP page that checks for the existence of the data and loads the data if it's not there. This is an example of the classic performance technique known as lazy evaluation-don't calculate something until you know you need it. An example:
<%
Function GetEmploymentStatusList
Dim d
d = Application("EmploymentStatusList")
If d = "" Then
' FetchEmploymentStatusList function (not shown)
' fetches data from DB, returns an Array
d = FetchEmploymentStatusList()
Application("EmploymentStatusList") = d
End If
GetEmploymentStatusList = d
End Function
%>
Similar functions could be written for each chunk of data needed.

In what format should the data be stored? Any variant type can be stored, since all script variables are variants. For instance, you can store strings, integers, or arrays. Often, you'll be storing the contents of an ADO recordset in one of these variable types. To get data out of an ADO recordset, you can manually copy the data into VBScript variables, one field at a time. It's faster and easier to use one of the ADO recordset persistence functions GetRows(), GetString() or Save() (ADO 2.5). Full details are beyond the scope of this article, but here's a function that demonstrates using GetRows() to
return an array of recordset data:

' Get Recordset, return as an Array
Function FetchEmploymentStatusList
Dim rs
Set rs = CreateObject("ADODB.Recordset")
rs.Open "select StatusName, StatusID from EmployeeStatus", _
"dsn=employees;uid=sa;pwd=;"
FetchEmploymentStatusList = rs.GetRows() " Return data as an Array
rs.Close
Set rs = Nothing
End Function
A further refinement of the above might be to cache the HTML for the list, rather than the array. Here's a simple sample:
' Get Recordset, return as HTML Option list
Function FetchEmploymentStatusList
Dim rs, fldName, s
Set rs = CreateObject("ADODB.Recordset")
rs.Open "select StatusName, StatusID from EmployeeStatus", _
"dsn=employees;uid=sa;pwd=;"
s = "<select name=""EmploymentStatus">" & vbCrLf
Set fldName = rs.Fields("StatusName") ' ADO Field Binding
Do Until rs.EOF
' Next line violates Don't Do String Concats,
' but it's OK because we are building a cache
s = s & " <option>" & fldName & "</option>" & vbCrLf
rs.MoveNext
Loop
s = s & "</select>" & vbCrLf
rs.Close
Set rs = Nothing ' See Release Early
FetchEmploymentStatusList = s ' Return data as a String
End Function
Under the right conditions, you can cache ADO recordsets themselves in Application or Session scope. There are two caveats:

If you cannot guarantee that these two requirements will be met, do not cache ADO recordsets. In the
Non-Agile Components and Don't Cache Connections tips below, we discuss the dangers of storing COM objects in Application or Session scope.

When you store data in Application or Session scope, the data will remain there until you programmatically change it, the Session expires, or the Web application is restarted. What if the data needs to be updated? To manually force an update of Application data, you can call into an administrator-access-only ASP page that updates the data. Alternatively, you can automatically refresh your data periodically through a function. The following example stores a time stamp with the cached data and refreshes the data after a certain time interval.
<%
' error handing not shown...
Const UPDATE_INTERVAL = 300 ' Refresh interval, in seconds
' Function to return the employment status list
Function GetEmploymentStatusList
UpdateEmploymentStatus
GetEmploymentStatusList = Application("EmploymentStatusList")
End Function

' Periodically update the cached data
Sub UpdateEmploymentStatusList
Dim d, strLastUpdate
strLastUpdate = Application("LastUpdate")
If (strLastUpdate = "") Or _
(UPDATE_INTERVAL < DateDiff("s", strLastUpdate, Now)) Then

' Note: two or more calls might get in here. This is okay and will simply
' result in a few unnecessary fetches (there is a workaround for this)

' FetchEmploymentStatusList function (not shown)
' fetches data from DB, returns an Array
d = FetchEmploymentStatusList()

' Update the Application object. Use Application.Lock()
' to ensure consistent data
Application.Lock
Application("EmploymentStatusList") = d
Application("LastUpdate") = CStr(Now)
Application.Unlock
End If
End Sub
For another example, see World's Fastest ListBox with Application Data.

Be aware that caching large arrays in Session or Application objects is not a good idea. Before you can access any element of the array, the semantics of the scripting languages require that a temporary copy of the entire array be made. For example, if you cache a 100,000-element array of strings that
maps U.S. zip codes to local weather stations in the Application object, ASP must first copy all 100,000 weather stations into a temporary array before it can extract just one string. In this case, it would be much better to build a custom component with a custom method to store the weather stations—or to use one of the dictionary components.

One more comment in the spirit of not throwing out the baby with the bath water: Arrays provide fast lookup and storage of key-data pairs that are contiguous in memory. Indexing a dictionary is slower than indexing an array. You should choose the data structure that offers the best performance for your situation.

Sunday, July 30, 2006

ASP Tips: Tip 1: Cache Frequently-Used Data on the Web Server

Performance is a feature. You need to design for performance up front, or you get to rewrite your application later on. That said, what are some good strategies for maximizing the performance of your Active Server Pages (ASP) application?

This article presents tips for optimizing ASP applications and Visual Basic® Scripting Edition (VBScript). Many traps and pitfalls are discussed. The suggestions listed in this article have been tested on http://www.microsoft.com and other sites, and work very well. This article assumes that you have a basic understanding of ASP development, including VBScript and/or JScript, ASP Applications, ASP Sessions, and the other ASP intrinsic objects (Request, Response, and Server).

Often, ASP performance depends on much more than the ASP code itself. Rather than cover all wisdom in one article, we list performance-related resources at the end. These links cover both ASP and non-ASP topics, including ActiveX® Data Objects (ADO), Component Object Model (COM), databases, and Internet Information Server (IIS) configuration. These are some of our favorite links-be sure to give them a look.

Tip 1: Cache Frequently-Used Data on the Web Server

A typical ASP page retrieves data from a back-end data store, then paints the results into Hypertext Markup Language (HTML). Regardless of the speed of your database, retrieving data from memory is a lot faster than retrieving data from a back-end data store. Reading data from a local hard disk is also usually faster than retrieving data from a database. Therefore, you can usually increase performance by caching data on the Web server, either in memory or on disk.

Caching is a classic space-for-time tradeoff. If you cache the right stuff, you can see impressive boosts in performance. For a cache to be effective, it must hold data that is reused frequently, and that data must be (moderately) expensive to recompute. A cache full of stale data is a waste of memory.

Data that does not change frequently makes good candidates for caching, because you don't have to worry about synchronizing the data with the database over time. Combo-box lists, reference tables, DHTML scraps, Extensible Markup Language (XML) strings, menu items, and site configuration variables (including data source names (DSNs), Internet Protocol (IP) addresses, and Web paths) are good candidates for caching. Note that you can cache the presentation of data rather than the data itself. If an ASP page changes infrequently and is expensive to cache (for example, your entire product catalog), consider pregenerating the HTML, rather than repainting it for every request.

Thursday, July 27, 2006

A Fix() Function in T-SQL

Someone today asked me for a SQL Server function that would round a number to the first two significant places. The idea would be that 0.243 would round to 0.24, 0.00592 would round to 0.0059, and 34600 would round to 35000.

Like all good developers, I start with creating some tests. :)

I'm not using a proper testing tool here, this is just for fun - so I'm going to just make a resultset with two columns - the one that I should get, and the one that I expect to get. And I'll throw in a couple of extra numbers, to make sure I cater for the negative cases as well.

select * from
( select 0.243 param, dbo.fix(0.243) result, 0.24 wanted
union all select 0.00592, dbo.fix(0.00592), 0.0059
union all select 34600, dbo.fix(34600), 35000
union all select -3323, dbo.fix(-3323), -3300
union all select -3.59, dbo.fix(-3.59), -3.6
) t
where t.result <> t.wanted

Obviously this won't work yet, I don't have a function called fix (which I'm naming for the comparison with the 'fix' function in other systems). So let's create it:

create function dbo.fix(@num float) returns float as
begin
return (@num)
end

I'm using float because I'm lazy. I could use numeric, but float is quick to type, and does the job for now.

Right. Now I run my test, and all five results come back. Lovely. So now let's fix the function.

So how do we do this... well, the round() function in SQL Server will round a number nicely. It takes a parameter which is the number of decimal places you want. And it takes negative numbers. round(12345,-3) gives 12000. That's just what I'm after. So let's think about it. I want the length of the number. The length of 12345 is obviously 5 - so that works. I can convert the number into a string, count the length and get 5. Then I can get -3 from it by subtracting it from 2, which is the number of significant places I want.

Terrific. That'll work for positive integers. But the length of "1.2345" is 6, which clearly isn't right.

Here's where I feel myself turning into a real geek. I'm going to use the log10() function. log10(12345) is 4.09. Better still, log10(999) is 2.9996 and log10(1000) is 3. That's terrific. I can round that number down, add one, and there's my number length. It even works for fractions. log10(1.2345) is a little over 0, log10(0.9) is just under 0. log10(0.00343) is about -2.5. So if I round all these DOWN (that's the floor() function), and add one, I get what I need the length to be. I can't use the ceiling() function, because that wouldn't work for 1000 - for which I want a length of 4, not 3. I do actually need to round down and then add one, in case rounding down doesn't do anything.

So now I have my 'length', and if I subtract that from 2, then hopefully I get the number that I can use for the round function.

alter function dbo.fix(@num float) returns float as
begin
declare @res float
select @res = round(@num,2-(1+floor(log10(@num))))
return (@res)
end

Well this seems to do it. Of course I could change the 2 to a 1, and stop adding one to my floor. I run my test query, and no rows are returned. Great. All done.

Except that it isn't. High-school maths reminds me that you can't take the log of a negative number. So what's going on with my tests? I check the Messages (rather than the results grid), and I see a domain error has occurred. It'd be nice if this would kick a proper error, one that would display red text and stop my query from running at all, but such is life.

Either way, we know it's a problem now, so we can fix it.

I can't take the log of a negative number, so let's check to see if the number if negative, and if it is, we can use the log of the negative number instead. I mean, I still want to round 3.6 with a value of 1, and -34334 with -3. So I'll put a case statement into my function:

alter function dbo.fix(@num float) returns float as
begin
declare @res float
select @res = case
when @num > 0 then round(@num,1-floor(log10(@num)))
else round(@num,1-floor(log10(-@num)))
end
return (@res)
end

Now my test works without an error message. "(0 row(s) affected)" is what I want to see, and there it is. I'm happy.

A minor change is to let it take another function to show many significant digits are required. Simple change:

alter function dbo.fix(@num float, @digits int) returns float as
begin
declare @res float
select @res = case
when @num > 0 then round(@num,@digits-1-floor(log10(@num)))
else round(@num,@digits-1-floor(log10(-@num)))
end
return (@res)
end

And the guy who asked me for this? Well, he's gobsmacked that there really is a use for logarithms in the real world.

Original Source: http://www.sqlservercentral.com/articles/Advanced+Querying/afixfunctionintsql/2487/

Thursday, July 06, 2006

SQL - Date Convert & Sort Problem

The Problem

I'll use Northwind for an example, since it's one that most DBAs have access to on a test server. You can also reset it up from the SQL Server media if you need to. Let's look at the orders table, in particular, this query:

select top 5
orderid
, customerid
, requireddate
from orders
order by requireddate desc
This is straightforward, give me the 5 orders that need to go out by the most forward "requireddate" on back. Running this gives me
orderid     customerid requireddate                                          
----------- ---------- ------------------------------------------------------
11061 GREAL 1998-06-11 00:00:00.000
11059 RICAR 1998-06-10 00:00:00.000
11074 SIMOB 1998-06-03 00:00:00.000
11075 RICSU 1998-06-03 00:00:00.000
11076 BONAP 1998-06-03 00:00:00.000

Notice that I have the June 11, 98 order first, then the June 10, the 3rd, etc. This is what I wanted, giving me the orders that will be due and working backwards. Running something like this with some flag to note if I've processed them allows me to see the newest orders coming in. I know it's a little funny, but the dates made more sense with the requireddate than the orderdate in this example.

So now suppose that your developer says that the date time thing messes him up and can you please just send back the date formatted a little nicer. Ok, no problem, so you change the query:

select top 5
o.orderid
, o.customerid
, CONVERT(char(10), o.requireddate, 101) as requireddate
from orders o
order by requireddate desc
Easy enough, we add a convert to send this back as a character value instead of a datetime and then we use the 101 conversion code to specify that we want mm/dd/yyyy.
Because this is going back to a client and it's not a query we're reading in QA, we want to give the column a name that can be referenced. So we choose the same name, requireddate, to make it easier on the developer. How many of you have changed a name and had a developer scream? And if they didn't, it it because they reference values by position? Tsk, tsk, that's a no-no.

In any case, you might run this, see that it runs and send it on. But did you examine the results carefully? They look like this:

orderid     customerid requireddate
----------- ---------- ------------
10763 FOLIG 12/31/1997
10764 ERNSH 12/31/1997
10370 CHOPS 12/31/1996
10371 LAMAI 12/31/1996
10761 RATTC 12/30/1997
Notice that the first date isn't the June 11th, 1998, but rather December 31st, 1997!?!?!!!?

Whoopsi. Hopefully things weren't too date dependent.

Still, it's interesting and I bet a few people get caught by this because adding a convert doesn't. While I'm sure that there are some better explanations and more technical ones, I'll give you mine because I think it's easier to understand.

I grabbed the query plan and put it up here:

Notice the order in which things are done. First the rows are retrieved. Then the new column is computed, this being the CONVERTed date field. Lastly the sort takes place, not on the original column value as stored in the table, but rather the new computer column, as changed to a character format. In this case, the worktable in tempdb, which houses only the result data, including the computed character column, is used for the sorting. So you get 12/31 as a higher value than 06-11 and so it comes first.

So how do you avoid this?

First of all, you read lots of articles like this one where someone else has made the mistake, you chuckle at them, and then it sticks in your mind as something to watch out for. Next you rename columns when you change their values. That way when someone comes down the road and adds a new order by or group by, it doesn't affect things. That may not always work and there are definitely reasons to keep the column name the same, so then you rely on rule 1 :)

The other thing you do, and which I did not, is test carefully. Even the smallest change can affect results in ways you did not think of. That's what happened here and I thought that a simple order by would solve things. I didn't check the data carefully. I most likely used my briefcase, which has 3 articles in it, so the issue may not have even appeared for me.

As with most things, this came down to proper QA and testing. We all move too fast at times and sometimes it matters, and sometimes it doesn't. I'm sure you see similar things in your company. Mistakes and bugs will slip through. They do in all software, even in lots of other industries. What you try your best to do is learn and make fewer ones as you move forward.

--------------------------------------------------------------------------

You could read the actual article in http://www.sqlservercentral.com/columnists/sjones/anotherdbawhoops.asp