What they don’t tell you about Microsoft SQL Server

 

Part 1: Excessive Lock Escalation in Microsoft SQL Server 2000

 

Howard Long

3 July 2002

Updated 17 February 2003

Updated 3 March 2003

 

Copyright © Howard Long 2002,2003

 

Introduction

 

Lock escalation is the automatic process which converts row or page locks into entire table locks. This article investigates why SQL Server 2000 (and similarly SQL Server 7) performs escalation in ways that have previously been undocumented.

 

When compared to a SQL Server 6.5 system, there is a dramatic increase in lock escalation in SQL Server 2000 in some circumstances.

 

The result of this lock escalation is severe contention and therefore serialization with OLTP systems.

 

Identifying Lock Escalation

 

Using the SQL Server 2000 Profiler, the Lock:Escalation event records the object id of the table which is being escalated. The name of the table can be determined by using the object_name(id) function.

 

As well as showing the Lock:Escalation event, the SQL:StmtStarting and/or SP:StmtStarting can be of use. However, be wary of using these last two events on busy systems because they can generate an enormous amount of information. It may be better to narrow down the event by specifying a filter on a known SPID or LoginName in such circumstances.

 

Figure 1: SQL Profiler showing lock escalation

 

So what happens in SQL Server 2000?

 

Take a table which is about 800 bytes wide, and there are a million rows (see Appendix A for the code to generate this data).

 

create table table1

(

  ID int identity(1,1),

  filler1 char(255),

  filler2 char(255),

  filler3 char(255)

)

 

There is a non-clustered unique index on an identity int field.

 

create unique index ix_table1 on table1(ID)

 

A large number of rows, together with the width of the table and the non-clustered index are all important parts of the reproduction of the scenario. Perform a select on the data which will return 0.5% of the data:

 

dbcc freeproccache

go

select * from table1 where id between 1 and 5000

 

You will find that the select will escalate to a shared table lock. Indeed, if you were running SQL Server 2000 before SP2, you would find that the above query would escalate when returning only 765 rows. SP2 and SP3 escalate at 4583 rows.

 

The problem this causes in OLTP systems is that if the front end is quite slow at processing the data, the shared table locks generated by the read-only query will create contention for other users attempting to update, insert or delete. Not a good situation!

 

When one looks under the cover, things are a little more subtle. Re-run the query with the Profiler showing Lock:Acquired and Lock:Released events. The result appears to show each index seek leaf node and the corresponding row data locks being acquired and released as the query runs through. So logically there shouldn’t be a reason for escalation if the locks are being relinquished very soon after they are acquired. The reality is somewhat different.

 

To see what was really happening, the author wrote a small VB DB Library application, SQL2KTest to identify whether the locks really were being dropped. The application itself allows the user to key in an arbitrary query, but by default the query is the same as that above which creates the lock escalation.

 

Figure 2: SQL2KTest application used to show that row locks are not released

 

The application allows the user to deliberately simulate a slow client application by fetching only a limited number of rows from the result set at a time. It is possible to show that the locks are not released. Running the SQL2KTest default query, retrieving the first 1000 rows (SQL Server 2000 SP2 and above), and then executing sp_lock within Query Analyzer, many locks will still be in place.

 

Once more than 4852 rows have been retrieved, the locks are escalated into a single shared table lock. While this may be good for the performance of the single process running the query (because lock management overhead is greatly reduced), shared table locks can have a devastating affect on OLTP systems because no users can update the table in question.

 

We know that we are in trouble if a read-only process retrieves more than 4852 rows are retrieved. So could we not simply use a locking hint such as WITH (ROWLOCK)? This will improve the situation, but the reality is that often the area of the database in which the query is working is also the area that the OLTP transactions are doing work. Also the developer must be aware that there are performance and resource limitations to using WITH (ROWLOCK) on particularly large result sets.

 

spid   dbid   ObjId       IndId  Type Resource         Mode     Status

------ ------ ----------- ------ ---- ---------------- -------- ------

.

.

.

55     14     2041058307  2      KEY  (7900ac71caca)   S        GRANT

55     14     2041058307  2      KEY  (b000a5c6c0c4)   S        GRANT

55     14     2041058307  2      KEY  (10007c77a28e)   S        GRANT

55     14     2041058307  2      KEY  (07006d72ae42)   S        GRANT

55     14     2041058307  2      KEY  (a700b4c3cc08)   S        GRANT

55     14     2041058307  2      KEY  (c200dc7a7206)   S        GRANT

55     14     2041058307  2      KEY  (8700d9ff26a0)   S        GRANT

55     14     2041058307  2      KEY  (620005cb104c)   S        GRANT

55     14     2041058307  2      KEY  (750014ce1c80)   S        GRANT

55     14     2041058307  2      KEY  (d500cd7f7eca)   S        GRANT

55     14     2041058307  2      KEY  (1c00c4c874c4)   S        GRANT

55     14     2041058307  2      KEY  (bc001d79168e)   S        GRANT

55     14     2041058307  2      KEY  (ab000c7c1a42)   S        GRANT

55     14     2041058307  2      KEY  (0b00d5cd7808)   S        GRANT

55     14     2041058307  2      KEY  (c10005bf0515)   S        GRANT

55     14     2041058307  0      PAG  1:359            IS       GRANT

55     14     2041058307  2      KEY  (8400003a51b3)   S        GRANT

55     14     2041058307  2      KEY  (6100dc0e675f)   S        GRANT

55     14     2041058307  2      KEY  (7600cd0b6b93)   S        GRANT

55     14     2041058307  2      KEY  (9300113f5d7f)   S        GRANT

55     14     2041058307  2      KEY  (d60014ba09d9)   S        GRANT

55     14     2041058307  2      KEY  (1f001d0d03d7)   S        GRANT

55     14     2041058307  2      KEY  (bf00c4bc619d)   S        GRANT

55     14     2041058307  0      RID  1:359:0          S        GRANT

55     14     2041058307  2      KEY  (a800d5b96d51)   S        GRANT

55     14     2041058307  2      KEY  (08000c080f1b)   S        GRANT

.

.

.

 

Interestingly a clustered index rather than a non-clustered index produces the desired result, and locks are dropped as they are created. We’ll see later why this is the case.

 

So what happened in SQL Server 6.5?

 

When the above situation is run in SQL Server 6.5, no lock escalation occurs. This means that in a significant number of OLTP situations, SQL Server 6.5 can perform much better than SQL Server 2000. It is also a reason why corporates are struggling to upgrade to SQL Server 2000 and show any benefits when placed into a production environment. Indeed, it has often been the case that SQL Server 2000 will actually degrade performance.

 

When the SQL2KTest application is run on an identical dataset on SQL Server 6.5, we can show that only the current index seek page and data pages are left locked using sp_lock. (Note that SQL Server 6.5 only supported page locks and not row level locking, although pages were only 2K bytes compared to 8K in SQL Server 2000).

 

spid   locktype                            table_id    page        dbname

------ ----------------------------------- ----------- ----------- ---------------

10     Sh_intent                           848006052   0           testdatabase

10     Sh_page                             848006052   19152       testdatabase

10     Sh_page                             848006052   578356      testdatabase

 

Under SQL Server 6.5, unlike SQL Server 2000, the locks really are released as the index seek procedes. This result is that locking contention is far less likely.

 

In essence it now appears that for an identical query, the default isolation level under SQL Server 2000 is more conservative than that of SQL Server 6.5.

 

Under the hood

 

To discover what is going on here, it’s necessary to look a little further under the hood and think “query plan”.

 

Figure 3: Query plan generated in Query Analyzer without trace flag 8719

 

By default, SQL Server 2000 will not show you the whole story. Firstly, an undocumented trace flag 8719 will allow you to see that the query actually generates an additional Bookmark Lookup step and this step is essential in our investigation. To specify a trace flag, open the Properties of the SQL Server in question from Enterprise Manager and press the Startup Parameters button. The parameter is ‘-T8719’ and the ‘T’ is case sensitive.

 

Figure 4: Adding the 8719 trace flag in the Startup Parameters

 

Once the trace flag has been set in the Startup Parameters, it is necessary to restart the SQL Server.

Figure 5: Query plan generated in Query Analyzer with trace flag 8719

 

Once the server has restarted, re-run the Query. There will be an additional Bookmark Lookup in there which is keeping locks open on the table and not relinquishing them. This additional Bookmark Lookup is performing a prefetch which although beneficial to our process, is not necessarily beneficial to others because it effectively raises the default isolation level to READ REPEATABLE by not releasing locks while the statement runs.

 

A further investigation reveals that if the statistics indicate that there would be less than 10 rows returned, a different query plan is generated without the prefetch.

 

Figure 6: Query plan generated with trace flag 8719 in place, and when the statistics indicate less than 10 rows will be returned

 

Minimizing lock escalation and associated serialization

 

Now we have a clue as to some changes we can do to make the optimizer behave in the way we’d like: i.e. an isolation level maintained at the default READ COMMITTED so that locks are not held for the duration of the query and therefore lock escalation is reduced. Below are some methods.

 

 

 

 

 

 

Conclusion

 

Here we have a situation where it would seem that the performance of an OLAP RDBMS has been at the expense of OLTP.

 

What Microsoft were really attempting to achieve is difficult to gauge: was it the desire to promote the isolation level for read repeatability, or the individual performance increase to be gained from the prefetch? Either way, neither was documented externally at the time of writing.

 

Whichever way, the more one investigates SQL Server 2000’s nuances, the greater the realisation that SQL Server is not as ‘self-tuning’ as Microsoft marketing would have us believe.

 

The possible resolutions described are not perfect but they do go a long way to reducing lock contention in OLTP systems using SQL Server 2000.

 

The author is indebted to the Microsoft SQL Server developers and support engineers for their help in identifying and providing solutions for the lock escalation encountered in SQL Server 2000.

 

Appendix A

 

The test database population query below requires a 1.3Gb database with the “trunc. log on chkpt.” option set on. It take 15 minutes to run on a 700MHz laptop.

 

set nocount on

go

if exists (select * from sysobjects where name='table1' and type='U')

begin

  drop table table1

end

go

-- Create a fairly wide table, with separate fillern char(255) columns as 6.5 has char(255) limit

create table table1

(

  ID int identity(1,1),

  filler1 char(255),

  filler2 char(255),

  filler3 char(255)

)

go

-- Populate the table with a million rows

declare @count int

select @count=0

while @count<1000000

begin

  insert into table1(filler1,filler2,filler3) select '','',''

  select @count=@count+1

end

go

create unique index ix_table1 on table1(ID)

go

 

 

 

Reference

 

1.      INF: Resolving Blocking Problems That Are Caused by Lock Escalation in SQL Server: http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B323630