Index Tuning - Choosing a Clustered Index Column

Posts   
1  /  2
 
    
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 31-May-2005 10:23:43   

I would be interested to hear from any DBAs out there about their experience with choosing the most appropriate column for the clustered index.

I have a faily large DB with a bunch of tables with > 10M rows. The PKs are Guids, INSERTs are frequent (10%) and PKs are never updated. 95% of reads are on recent data.

Currently the clustered index is on the Guid PK, but obviously INSERT performance is suffering as a result.

I have been reading about different optimisation techniques which seem to give contradictory guidance. Some argue that the clustered index should based on a randomly distributed column, like a Guid. The argument states that this avoids having hot spots on the disk, which in my case is relevant because most reads are of recently inserted data which may conflict with insert page locks.

The argument against a randomly distributed clustered index states that INSERTs will be slow because of the frequent page splits. In addition the fact that all the recently added data (which in my case is read most often) will likely be cached as it all resides on a dense cluster of DB pages. Those for this approach recommend choosing an alternative column than the PK, a CreatedDate for example...

Intermediary or "link" tables (that only have 2 uniqueidentifier columns acting as the composite PK) are another question again... Some argue that you could add a dummy Identity column to act as the clustered index and add another composite index to the PKs. This will speed up inserts, but given these tables are paricularly narrow is this really necessary given the fact that the additional index will double the table's storage requirements.

Does anyone have any production experience of such issues and are willing to share your experiences?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 31-May-2005 10:46:06   

Clustered indexes are normally set on the PK, as fetches on a table are normally done using a PK, which then makes it very easy to read the right value.

the problem with GUIDs is that they're very large and apparently random. I'd seriously consider an identity column of bigint or int. Not only is inserting than much faster, as the clustered index is easier to update (the values are the index on themselves, with guid's that's another problem, you've to scan), they're also smaller.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 31-May-2005 10:57:36   

Otis wrote:

I'd seriously consider an identity column of bigint or int. Not only is inserting than much faster, as the clustered index is easier to update (the values are the index on themselves, with guid's that's another problem, you've to scan), they're also smaller.

Thanks Frans, but unfortunately as well as operating a central server, the product is distributed at mutliple offline clients sites and PK uniqueness is essential. I thought long and hard before choosing Guids as PKs and in the end it was the lesser of two evils.

Otis wrote:

(the values are the index on themselves, with guid's that's another problem, you've to scan)

Do you mean when the DB engine is perfoming a comparison during the lookup? I thought about this too, and couldn't find much evidence to support major performance issues as a result.

Jimmy posted an article some time ago (http://www.informit.com/articles/article.asp?p=25862&seqNum=5&rl=1) and there is some discussion on long url also...

So I'm stuck with Guids now and still trying to figure out what column to choose for the clustered index... disappointed

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 31-May-2005 11:29:21   

Well, if a new GUID is to be inserted, it has to be positioned in the right spot, as a clustered index is an ordered index on disk. To find the spot, sqlserver has to find the right spot and with a 16 byte number that's much harder than with an int, as the 16byte number doesn't fit in a CPU register.

So even if a balanced tree is used, it still takes more time to find the right spot, especially with large numbers (and thus a lot of leafs)

But that's not the problem. The problem is the way a clustered index is used. A clustered index is used to order the memory pages for a table on disk: the clustered index is the order in which rows are stored physically. As GUID's are random, you'll get a lot of page fragmentation on disk, as the chance you've to insert a page in the middle of the table is pretty high (with an identity column, that's never happening).

from BOL:

Clustered indexes have one row in sysindexes with indid = 1. The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point the key value in the inserted row fits in the ordering sequence.

I.o.w.: if you use GUIDs as PKs, I think it's very wise to just use an index, not a clustered index, due to the randomness of the GUIDs.

With this knowledge though you can now order the pages on disk in a way so reads are done very fast AND indexed. Though, if the PK isn't suitable for a clustered index, I think I wouldn't use it on another field, unless you have a lot of read operations for multiple rows on that table, and you always order the results in a particular way.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 31-May-2005 11:54:41   

Otis wrote:

I.o.w.: if you use GUIDs as PKs, I think it's very wise to just use an index, not a clustered index, due to the randomness of the GUIDs.

From wrote:

Try to avoid creating clustered indexes on sequential key fields that are inserted monotomaitcally, such as on an identity column. This can create a "hot spot" at the end of the table that results in possible locking contention and deadlocks at the end of the table and the index. Additionally, the clustered index will not be reusing available space on preceeding data pages because all new rows sort to the end of the table. ... The general recommendation is that you try to cluster on a data value that's somewhat randomly distributed throughout your table.

Which is what I meant (above) by "I have been reading about different optimisation techniques which seem to give contradictory guidance." disappointed

Otis wrote:

With this knowledge though you can now order the pages on disk in a way so reads are done very fast AND indexed. Though, if the PK isn't suitable for a clustered index, I think I wouldn't use it on another field, unless you have a lot of read operations for multiple rows on that table, and you always order the results in a particular way.

So are you suggestion not having a clustered index at all? I will look at this option...

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 31-May-2005 12:19:18   

Marcus wrote:

Otis wrote:

I.o.w.: if you use GUIDs as PKs, I think it's very wise to just use an index, not a clustered index, due to the randomness of the GUIDs.

From wrote:

Try to avoid creating clustered indexes on sequential key fields that are inserted monotomaitcally, such as on an identity column. This can create a "hot spot" at the end of the table that results in possible locking contention and deadlocks at the end of the table and the index. Additionally, the clustered index will not be reusing available space on preceeding data pages because all new rows sort to the end of the table. ... The general recommendation is that you try to cluster on a data value that's somewhat randomly distributed throughout your table.

Which is what I meant (above) by "I have been reading about different optimisation techniques which seem to give contradictory guidance." disappointed

Well, I'd never go for COMP guid's. You need uniqueness, so comp guids won't work.

So guid's it is. Then, a clustered index won't work, see above and your links. Therefore no clustered index is in this case better. Though you can create a clustered index on a field which is always higher than the previous value, so new pages get added to the end of the tree. Insert performance is then very high, though read performance can degrade if you never read the pages in sequence in which they're added.

Otis wrote:

With this knowledge though you can now order the pages on disk in a way so reads are done very fast AND indexed. Though, if the PK isn't suitable for a clustered index, I think I wouldn't use it on another field, unless you have a lot of read operations for multiple rows on that table, and you always order the results in a particular way.

So are you suggestion not having a clustered index at all? I will look at this option...

Definitely no clustered index on the GUID pk's. I never looked into this, but after your question and reading some stuff about this, I can only say: never use a clustered index on a GUID pk unless insert performance isn't a big issue. If you have a field in the table on which you sort ascending all the time, it's wise to add the clustered index on that column, because when fetching the data, it's already sorted when reading the pages from disk.

Frans Bouma | Lead developer LLBLGen Pro
Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 01-Jun-2005 03:31:06   

Another thing to watch for, is that when inserting data, if the data pages are full and a page split is required, then sql server must execute the split immediately. the problem with inserting into a clustered index is 1 it has to find where on disk to put the data, then shuffle all of the records.

I would, create a clustered index on an identity column, then a secondary index with just the guid in it. In both indexes, I would use fill factor 90% and pad index. By using fill factor 90 and pad index there should always be some space available to add data, and unless you are firehosing data, mandatory page splits can be avoided, and sql server can lazy write the data, e.g. when it has time.

the leaf pages of the secondary index will always be ordered in asc or desc order using the fields in the index and if they are build on top of a table with a clustered index, they will point directly to the datarows on disk that need to be fetched.

another thing that you can do, if you have additional physical drives, is place the index files on a seperate physical drive.

by clustering on an identity field, you would almost certainly always be inserting at the end of the file, and sql server could handle maintaining and splitting the secondary index in a more effecient manner.

what I have done in the past, even when insert performance couldnt be mitigated and was slowing batch processing was to drop the index prior to performing the long running operation and rebuilding it after the operation was completed. this took the process from 4 hours to 2 minutes. that being said, the long runniing operation only needed to happen 2 times per year and day to day inserts were rare and it was more important to get the data faster than it was to create it. so given this set of criteria, dropping and recreating the index worked for us.

so thats my theory.

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 01-Jun-2005 05:05:19   

Don't forget in SQL Server that the clustered index value is stored in all the secondary indexes. (I don't know if this changes in 2005). Does the table have a lot of secondary indexes? Every row in every one of them will have that GUID stored in it as the row locator.

The GUID column itself is a great candidate for a secondary index because of high selectivity. I would go with DevilDog's suggestion. Also make sure you defragment your database at some point. It must be pretty bad right now.

[Edit] I have found this book to be pretty good for SQL Server optimization. I see now it's in a second edition (I have only the first).

amazon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 01-Jun-2005 09:40:46   

Great info, guys! simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 01-Jun-2005 12:17:29   

Yes... Thanks guys. It seems that no-one considers the disk "hot-spot" to be an issue then...

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 01-Jun-2005 12:55:28   

Marcus wrote:

Yes... Thanks guys. It seems that no-one considers the disk "hot-spot" to be an issue then...

What's a disk hotspot?

Frans Bouma | Lead developer LLBLGen Pro
Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 01-Jun-2005 13:09:13   

i dont really consider it to be an issue. typically you get hot spots due to lack of ventilation, and IF youre always reading from the same particular disk sector. if you have a good raid array monitoring tool you will be able to see hot spots on your disks.

Ultimately, since the guids are random, and the sample is large, you should see a fairly even distribution of data. dbcc show_statistics will show you the groupings / distribtion of data. I think that you should find that the data has a fairly normal distribution.

dbcc showcontig will also show you the fragementation of the pages associated with a table or index.

one last thing to consider, is that if you are frequently causing data to move, then you need to defrag your indexes more often, otherwise performance may decrease over time. you can turn on the option to automatically update stats, but there is no guarantee when this will occur, so it could kick off during a time where high availability is required, so you might consider doing this when not alot of people are online, say as part of a weekly or monthly db maintenance plan.

when defragging and recreating indexes look at dbcc INDEXDEFRAG and dbcc DBREINDEX. INDEXDEFRAG can be done online and doesnt hold any locks that could prevent a read, update, insert, or delete.

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 01-Jun-2005 15:57:02   

Well, Marcus, it turns out that the author of that book, Sajal Dam, is right here in Austin where I live. I sent him an email and he suggested we get together sometime, so perhaps I can run your scenario by him in person and see what he thinks (he might make me pay for the coffee, though wink )

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 01-Jun-2005 18:35:47   

Otis wrote:

Marcus wrote:

Yes... Thanks guys. It seems that no-one considers the disk "hot-spot" to be an issue then...

What's a disk hotspot?

Frans from above:

From wrote:

Try to avoid creating clustered indexes on sequential key fields that are inserted monotomaitcally, such as on an identity column. This can create a "hot spot" at the end of the table that results in possible locking contention and deadlocks at the end of the table and the index. Additionally, the clustered index will not be reusing available space on preceeding data pages because all new rows sort to the end of the table. ... The general recommendation is that you try to cluster on a data value that's somewhat randomly distributed throughout your table.

I understood "hot spots" were areas of lock contention. If all new data is being added to the end of a table AND the data users are interested in reading/updating is mostly "recently added data", then you can see how transactions will attempt locks on mostly the same database pages and hence you have a hot spot while the rest of the table's data pages are rarely locked.

JimFoye wrote:

Well, Marcus, it turns out that the author of that book, Sajal Dam, is right here in Austin where I live. I sent him an email and he suggested we get together sometime, so perhaps I can run your scenario by him in person and see what he thinks (he might make me pay for the coffee, though wink )

Jim - I guess I'll owe you a coffee!! simple_smile

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 01-Jun-2005 18:43:15   

The general recommendation is that you try to cluster on a data value that's somewhat randomly distributed throughout your table.

You are spot-on about the contradictory nature of tuning advice. Compare that statement to this one from http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdb/cm_8_des_05_5h6b.asp

A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. This can help increase the performance of this type of query. Also, if there is a column(s) that is used frequently to sort the data retrieved from a table, it can be advantageous to cluster (physically sort) the table on that column(s) to save the cost of a sort each time the column(s) is queried.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 01-Jun-2005 20:09:15   

JimFoye wrote:

You are spot-on about the contradictory nature of tuning advice.

Yes... I guess it's a toss up between page lock contention, insert performance, number of read I/Os to fetch a given set of data and sort performance.

On the benefits of sort performance... I wonder can a clustered index sorted ASCENDING be used for a DESC sort? i would imagine the answer is yes, since it could just reads backwards...

Also on sort performance, if the data is spread out maybe you are only including every nth row of a table. Say Orders are presorted on OrderDate, but a given customer's orders will likely be be on seperate data pages because you will have more than 1 customer (hopefully)... Maybe the clustered index in this case should be on CustomerID, then OrderDate which would keep the relevant data ajacent and sorted.

A mine field...

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 01-Jun-2005 21:47:55   

On the benefits of sort performance... I wonder can a clustered index sorted ASCENDING be used for a DESC sort? i would imagine the answer is yes, since it could just reads backwards...

Yep.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 02-Jun-2005 11:44:35   

Marcus wrote:

Otis wrote:

Marcus wrote:

Yes... Thanks guys. It seems that no-one considers the disk "hot-spot" to be an issue then...

What's a disk hotspot?

Frans from above:

From wrote:

Try to avoid creating clustered indexes on sequential key fields that are inserted monotomaitcally, such as on an identity column. This can create a "hot spot" at the end of the table that results in possible locking contention and deadlocks at the end of the table and the index. Additionally, the clustered index will not be reusing available space on preceeding data pages because all new rows sort to the end of the table. ... The general recommendation is that you try to cluster on a data value that's somewhat randomly distributed throughout your table.

I understood "hot spots" were areas of lock contention. If all new data is being added to the end of a table AND the data users are interested in reading/updating is mostly "recently added data", then you can see how transactions will attempt locks on mostly the same database pages and hence you have a hot spot while the rest of the table's data pages are rarely locked.

Aha, thanks for the info! simple_smile

Though I really don't understand that recommendation you quoted. That suggests to create a clustered index on a GUID, though at the same time that will produce a lot of page splits, and thus ALSO a lot of slowness...

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 02-Jun-2005 14:25:00   

Otis wrote:

Though I really don't understand that recommendation you quoted. That suggests to create a clustered index on a GUID, though at the same time that will produce a lot of page splits, and thus ALSO a lot of slowness...

Maybe this would be relevant if the application requires frequent RepeatableRead or Serializable isolation level SELECTs and UPDATEs but few INSERTs. In that case clustering on a Guid would produce very low lock contention (unless user's were selecting/updating exactly the same data).

I have played with moving my clustered indexes off the PK GUID columns and initial tests are not showing substantial INSERT performance gains. SELECTs on the other hand are now much slower with Bookmark Lookups accounting for > 90% of the query cost in some cases... disappointed

I will keep playing until I find an optimal setting and will obviously report back any results here.

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 02-Jun-2005 20:45:41   

Just curious, but what is the were clause you are using in your select statements?

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 03-Jun-2005 17:45:21   

So you tried our suggestions and got the exact opposite effect as predicted? Hmmm....Jack Murphy has returned to Ireland?

I'm meeting with Sajal Dam Sunday and I'm going to run this by him (why not? might as well get my coffee's worth). If you have any additional info you want me to have post it here before then.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 03-Jun-2005 20:37:53   

Devildog74 wrote:

Just curious, but what is the were clause you are using in your select statements?

Devildog here is an example of the bookmark lookup being 90% of the total for a mere 90 rows... This is using the Paging method and was tested on a copy of the DB on my workstation:

and the query is (cleaned up version):

SELECT ...
FROM ( [Folder] INNER JOIN [FolderPermission] 
ON  [Folder].[FolderUID]=[FolderPermission].[FolderUID]) 
WHERE ( [FolderPermission].[GroupUID] IN (@GroupUID1, @GroupUID2, @GroupUID3, @GroupUID4) 
And [FolderPermission].[CanRead] = @CanRead5 And NOT [Folder].[FolderUID] 
IN (SELECT [FolderPermission].[FolderUID] AS [FolderUID] FROM [FolderPermission] 
WHERE ( [FolderPermission].[CanRead] = @CanRead6 And [FolderPermission].[GroupUID] 
IN (@GroupUID7, @GroupUID8, @GroupUID9, @GroupUID10))))

and the full version:

exec sp_executesql N'CREATE TABLE #TempTable ([__rowcnt][int] IDENTITY (1,1) NOT NULL,[FolderUID][UniqueIdentifier] NULL,[OwnerUID][UniqueIdentifier] NULL,[FolderType][TinyInt] NULL,[Reference][NVarChar](50) NULL,[DisplayName][NVarChar](250) NULL,[Description][NVarChar](3000) NULL,[PublishDate][SmallDateTime] NULL,[IsPublished][Bit] NULL,[CreatedDate][SmallDateTime] NULL,[IsActive][Bit] NULL,[IsMarkedForDelete][Bit] NULL,[Timestamp] binary(8) NULL);INSERT INTO #TempTable ([FolderUID],[OwnerUID],[FolderType],[Reference],[DisplayName],[Description],[PublishDate],[IsPublished],[CreatedDate],[IsActive],[IsMarkedForDelete],[Timestamp]) SELECT DISTINCT TOP 49 [ModernArk].[dbo].[SdsArk_Folder].[FolderUID] AS [FolderUID],[ModernArk].[dbo].[SdsArk_Folder].[OwnerUID] AS [OwnerUID],[ModernArk].[dbo].[SdsArk_Folder].[FolderType] AS [FolderType],[ModernArk].[dbo].[SdsArk_Folder].[Reference] AS [Reference],[ModernArk].[dbo].[SdsArk_Folder].[DisplayName] AS [DisplayName],[ModernArk].[dbo].[SdsArk_Folder].[Description] AS [Description],[ModernArk].[dbo].[SdsArk_Folder].[PublishDate] AS [PublishDate],[ModernArk].[dbo].[SdsArk_Folder].[IsPublished] AS [IsPublished],[ModernArk].[dbo].[SdsArk_Folder].[CreatedDate] AS [CreatedDate],[ModernArk].[dbo].[SdsArk_Folder].[IsActive] AS [IsActive],[ModernArk].[dbo].[SdsArk_Folder].[IsMarkedForDelete] AS [IsMarkedForDelete],[ModernArk].[dbo].[SdsArk_Folder].[Timestamp] AS [Timestamp] FROM ( [ModernArk].[dbo].[SdsArk_Folder] INNER JOIN [ModernArk].[dbo].[SdsArk_FolderPermission] ON  [ModernArk].[dbo].[SdsArk_Folder].[FolderUID]=[ModernArk].[dbo].[SdsArk_FolderPermission].[FolderUID]) WHERE ( [ModernArk].[dbo].[SdsArk_FolderPermission].[GroupUID] IN (@GroupUID1, @GroupUID2, @GroupUID3, @GroupUID4) And [ModernArk].[dbo].[SdsArk_FolderPermission].[CanRead] = @CanRead5 And NOT [ModernArk].[dbo].[SdsArk_Folder].[FolderUID] IN (SELECT [ModernArk].[dbo].[SdsArk_FolderPermission].[FolderUID] AS [FolderUID] FROM [ModernArk].[dbo].[SdsArk_FolderPermission] WHERE ( [ModernArk].[dbo].[SdsArk_FolderPermission].[CanRead] = @CanRead6 And [ModernArk].[dbo].[SdsArk_FolderPermission].[GroupUID] IN (@GroupUID7, @GroupUID8, @GroupUID9, @GroupUID10)))) ORDER BY [ModernArk].[dbo].[SdsArk_Folder].[PublishDate] DESC;SELECT [FolderUID],[OwnerUID],[FolderType],[Reference],[DisplayName],[Description],[PublishDate],[IsPublished],[CreatedDate],[IsActive],[IsMarkedForDelete],[Timestamp] FROM #TempTable WHERE [__rowcnt] > @__rownoStart AND [__rowcnt] <= @__rownoEnd;DROP TABLE #TempTable', N'@GroupUID1 uniqueidentifier,@GroupUID2 uniqueidentifier,@GroupUID3 uniqueidentifier,@GroupUID4 uniqueidentifier,@CanRead5 tinyint,@CanRead6 tinyint,@GroupUID7 uniqueidentifier,@GroupUID8 uniqueidentifier,@GroupUID9 uniqueidentifier,@GroupUID10 uniqueidentifier,@__rownoStart int,@__rownoEnd int', @GroupUID1 = '13FFFEDE-4EE0-4754-84E2-1F9E339882FA', @GroupUID2 = 'AA5BCACF-1FDB-4FA8-BF3F-E3876B27A866', @GroupUID3 = '94E31AA3-A0B7-46E4-A40A-F43C9266DB03', @GroupUID4 = 'FD872A62-CD27-49A7-9946-A9A361BFDFDB', @CanRead5 = 1, @CanRead6 = 2, @GroupUID7 = '13FFFEDE-4EE0-4754-84E2-1F9E339882FA', @GroupUID8 = 'AA5BCACF-1FDB-4FA8-BF3F-E3876B27A866', @GroupUID9 = '94E31AA3-A0B7-46E4-A40A-F43C9266DB03', @GroupUID10 = 'FD872A62-CD27-49A7-9946-A9A361BFDFDB', @__rownoStart = 24, @__rownoEnd = 48

The tables concerned only have about 15K rows in them... is it usual for the bookmark lookups to be taking so long?

JimFoye wrote:

So you tried our suggestions and got the exact opposite effect as predicted? Hmmm....Jack Murphy has returned to Ireland?

He has indeed! simple_smile

I have even de-normalised one of my BIG tables to eliminate a JOIN... the tuning continues... smile

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 03-Jun-2005 21:41:29   

Follow up questions?

Is is the FolderUID indexed in the SdsArk_Folder AND SdsArk_FolderPermission table? If so, what kind of index?

Are the GroupUID and CanRead fields in the SdsArk_FolderPermission table indexed? If so, what kind and are they in the same index and if they are what is the field order?

if you just run this query what does the execution plan look like?

 SELECT [ModernArk].[dbo].[SdsArk_FolderPermission].[FolderUID] AS [FolderUID] 
        FROM [ModernArk].[dbo].[SdsArk_FolderPermission] WHERE ([ModernArk].[dbo].[SdsArk_FolderPermission].[CanRead] = @CanRead6 
            And [ModernArk].[dbo].[SdsArk_FolderPermission].[GroupUID] 
            IN (@GroupUID7, @GroupUID8, @GroupUID9, @GroupUID10)

Order of fields in an index can have an impact on the usability of an index. Also, the fields included in joins can be indexed to increase performance of joins.

The bookmark lookup is telling you that it is flagging records that match the criteria (e.g. bookmarking them). The IO cost is relatively high for the bookmark lookup. Ideally you would want the IO to be as far away from 0.0 as possible. The other thing that is concerning is the row size. The row size is showing 3k, so I beleive that means that that particular portion of the query has to travers many leaf pages to find the data because there can only be 8k worth of data on a page.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 03-Jun-2005 22:03:13   

Devildog74 wrote:

Is is the FolderUID indexed in the SdsArk_Folder AND SdsArk_FolderPermission table? If so, what kind of index?

The following indexes exist:


ALTER TABLE [dbo].[SdsArk_Folder] ADD 
    CONSTRAINT [PK_SdsArk_Folder] PRIMARY KEY  NONCLUSTERED 
    (
        [FolderUID]
    )  ON [PRIMARY] 

ALTER TABLE [dbo].[SdsArk_FolderPermission] WITH NOCHECK ADD 
    CONSTRAINT [PK_SdsArk_FolderPermission] PRIMARY KEY  CLUSTERED 
    (
        [FolderUID],
        [GroupUID],
        [SkinAccountUID]
    )  ON [PRIMARY] 

 CREATE  INDEX [SdsArk_FolderPermission2] ON [dbo].[SdsArk_FolderPermission]([CanRead]) ON [PRIMARY]

 CREATE  INDEX [SdsArk_FolderPermission3] ON [dbo].[SdsArk_FolderPermission]([CanRead], [GroupUID]) ON [PRIMARY]

Devildog74 wrote:

if you just run this query what does the execution plan look like?

 SELECT [ModernArk].[dbo].[SdsArk_FolderPermission].[FolderUID] AS [FolderUID] 
        FROM [ModernArk].[dbo].[SdsArk_FolderPermission] WHERE ([ModernArk].[dbo].[SdsArk_FolderPermission].[CanRead] = @CanRead6 
            And [ModernArk].[dbo].[SdsArk_FolderPermission].[GroupUID] 
            IN (@GroupUID7, @GroupUID8, @GroupUID9, @GroupUID10)

That query returns a single Index Seek with a Sub Tree cost of 0.006.

Devildog74 wrote:

Order of fields in an index can have an impact on the usability of an index. Also, the fields included in joins can be indexed to increase performance of joins.

As far as I can tell the indexes are all there and are being used.

Devildog74 wrote:

The bookmark lookup is telling you that it is flagging records that match the criteria (e.g. bookmarking them). The IO cost is relatively high for the bookmark lookup. Ideally you would want the IO to be as far away from 0.0 as possible. The other thing that is concerning is the row size. The row size is showing 3k, so I beleive that means that that particular portion of the query has to travers many leaf pages to find the data because there can only be 8k worth of data on a page.

Regarding the row size (is that "row count" ?)... there is an nvarchar field on that table which has a size of 3000. Are you saying that it has reserved this space even thought the field values do not occupy the full field size?

Have a good weekend! simple_smile

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 04-Jun-2005 23:59:01   

I am not too sure about the space being reserved or not, but I beleive that to be the case. I do know the more pages that have to be bookmarked the slower the query will execute. This is because the query has to traverse more pages to find its data.

I ran some samples on my pc over the weekend, and I am not sure how similar they will be to the results that you are finding.

I created a folders and folderpermission table with 30000 rows in it. I created 10 different groups and a completely random guid for each folder.

Here is what the table schema looks like:

 CREATE TABLE [Folder] (
    [FolderKey] [int] IDENTITY (1, 1) NOT NULL ,
    [FolderUID] [uniqueidentifier] NOT NULL ,
    [OwnerUID] [uniqueidentifier] NULL ,
    [FolderType] [tinyint] NULL ,
    [Reference] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DisplayName] [nvarchar] (250) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Description] [nvarchar] (3000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PublishDate] [smalldatetime] NULL ,
    [IsPublished] [bit] NULL ,
    [CreatedDate] [smalldatetime] NULL ,
    [IsActive] [bit] NULL ,
    [IsMarkedForDelete] [bit] NULL ,
    [Timestamp] [binary] (8) NULL ,
    CONSTRAINT [PK_Folder] PRIMARY KEY  NONCLUSTERED 
    (
        [FolderUID]
    ) WITH  FILLFACTOR = 90  ON [PRIMARY] 
) ON [PRIMARY]
GO

CREATE  CLUSTERED  INDEX [IX_Folder] ON [dbo].[Folder]([FolderKey]) WITH  FILLFACTOR = 90,  PAD_INDEX  ON [PRIMARY]
GO


CREATE TABLE [FolderPermission] (
    [FolderPermissionKey] [int] IDENTITY (1, 1) NOT NULL ,
    [CanRead] [bit] NOT NULL CONSTRAINT [DF_FolderPermission_CanRead] DEFAULT (0),
    [FolderUID] [uniqueidentifier] NOT NULL ,
    [GroupUID] [uniqueidentifier] NOT NULL ,
    [SkinAccoountUID] [uniqueidentifier] NOT NULL ,
    CONSTRAINT [PK_FolderPermission] PRIMARY KEY  NONCLUSTERED 
    (
        [FolderUID],
        [GroupUID],
        [SkinAccoountUID]
    ) WITH  FILLFACTOR = 90  ON [PRIMARY] ,
    CONSTRAINT [FK_FolderPermission_Folder] FOREIGN KEY 
    (
        [FolderUID]
    ) REFERENCES [Folder] (
        [FolderUID]
    )
) ON [PRIMARY]
GO

 CREATE  CLUSTERED  INDEX [IX_FolderPermission] ON [dbo].[FolderPermission]([FolderPermissionKey]) WITH  FILLFACTOR = 90,  PAD_INDEX  ON [PRIMARY]
GO

 CREATE  INDEX [IX_FolderPermission_1] ON [dbo].[FolderPermission]([FolderUID]) WITH  FILLFACTOR = 90,  PAD_INDEX  ON [PRIMARY]
GO

 CREATE  INDEX [ix_FolderPermission_GrpUID_CanRead] ON [dbo].[FolderPermission]([GroupUID], [CanRead]) WITH  FILLFACTOR = 90,  PAD_INDEX  ON [PRIMARY]
GO

Having the PK as you have it on the folderpermission will force data to move on every insert. in my tables, the data is similar to that of a heap, i.e. all data always goes at the end of the file and the secondary indexes are updated. because the size of the secndary index indexes are small, sql doesnt need to work as hard to place the inserted record.

when I wrote my test case in nunit, I used llblgen to create 30000 records, and they were all inserted in a matter of seconds. however, after the insert, showcontig showed me pretty heavy fragmentation on the secondary indexes, so I ran dbcc indexdefrag on the secondary indexes. the indexes were fragmented because of all of the data being inserted.

then I ran the sp_execsql command that you provided above, using 4 different guids for the grp id.

all in all, using the schema above, my execution plans show that the most detrimental operation for the entire sp_execsql is the sort operation that takes place. the sp_exec sql command will pull the data back in subsecond response time.

you will also see that i am clustering on the identity fields, which forces all new data to the end of the file.

hope this helps.

Also here is my unit test that I used to create the data:

        [Test]
        public void CreateDataMethod()
        {
            int ix =0;
            for (int i = 0; i < 30000; i++)
            {
                FolderPermissionController controller = ControllerFactory.GetController(
                    FolderPermissionController.FolderPermissionControllerKey) as FolderPermissionController;

                FolderEntity folder = new FolderEntity();
                folder.FolderUID = Guid.NewGuid();
                folder.OwnerUID = Guid.NewGuid();
                folder.FolderType = 1;
                folder.Reference = "Reference";
                folder.DisplayName = "Folder A";
                folder.Description = "Some Folder";
                folder.PublishDate = DateTime.Now;
                folder.IsPublished = true;
                folder.IsActive = true;
                folder.IsMarkedForDelete = false;

                controller.Save(folder,true);
            
                if (ix == 10)
                {
                    ix = 0;
                }

                FolderPermissionEntity permission = new FolderPermissionEntity();
                permission.CanRead = true;
                permission.GroupUID = GetGroup(ix);
                permission.SkinAccoountUID = Guid.NewGuid();
                permission.FolderUID = folder.FolderUID;
                ++ix;
                
                controller.Save(permission);
                Console.WriteLine("Saved Record " + i);
            }


        }

        private Guid GetGroup(int index)
        {
            System.Guid result = Guid.NewGuid();
            switch (index)
            {
                case 0:
                    result = new Guid("{3E1E7754-E81B-4c57-BBBA-956CB20C098D}");
                    break;
                case 1:
                    result = new Guid("{9C7E1BE7-0179-4902-A764-EB1FB98CCB08}");
                    break;
                case 2:
                    result = new Guid("{7F04048E-64E6-49cc-88D3-E4C7F691512D}");
                    break;
                case 3:
                    result = new Guid("{946580A4-1C5B-42c9-8C94-B8719BAC7CA8}");
                    break;
                case 4:
                    result = new Guid("{4C555491-091B-429b-90C3-D2DA4B1C25B4}");
                    break;
                case 5:
                    result = new Guid("{1FC1DA2F-35B4-4fd6-A755-E301E8F6D32A}");
                    break;
                case 6:
                    result = new Guid("{B2242049-D66B-4d4c-9280-EA2E88753D11}");
                    break;
                case 7:
                    result = new Guid("{BFD53AE0-239C-4b08-992B-C180405D385F}");
                    break;
                case 8:
                    result = new Guid("{3C12B7A6-DA67-4551-B169-5EB52E2376D5}");
                    break;
                case 9:
                    result = new Guid("{FA44F77E-3165-4eee-B841-4BCC85E76DE3}");
                    break;
            }

            return result;
        }

I also found one very interesting tidbit. If I run the query below without the orderby, the respose comes back instantly and the execution plan is totally different whereby all steps are using index scans and bookmark lookups. With the orderby clause left out of the query, all steps have IO less than 0.007 which is ideal, compared to 0.5. If I leave the ordere by clause in the tsql, the statement takes 2 seconds to execute. When I examine the sort step of the execution plan when the order by is in the statement, the IO is 25.0 which is really really bad.

 SELECT DISTINCT TOP 49 [Folder].[FolderUID] AS [FolderUID],
[Folder].[OwnerUID] AS [OwnerUID],
[Folder].[FolderType] AS [FolderType],
[Folder].[Reference] AS [Reference],
[Folder].[DisplayName] AS [DisplayName],
[Folder].[Description] AS [Description],
[Folder].[PublishDate] AS [PublishDate],
[Folder].[IsPublished] AS [IsPublished],
[Folder].[CreatedDate] AS [CreatedDate],
[Folder].[IsActive] AS [IsActive],
[Folder].[IsMarkedForDelete] AS [IsMarkedForDelete],
[Folder].[Timestamp] AS [Timestamp] 

FROM ( [Folder] 
INNER JOIN [FolderPermission] ON 
    [Folder].[FolderUID]=[FolderPermission].[FolderUID]) 
WHERE ( [FolderPermission].[GroupUID] 
    IN ('{7F04048E-64E6-49CC-88D3-E4C7F691512D}', 
 '{26D7182C-9964-40A4-A71B-6412DF936292}' , '{946580A4-1C5B-42C9-8C94-B8719BAC7CA8}',
'{3C12B7A6-DA67-4551-B169-5EB52E2376D5}') 
And [FolderPermission].[CanRead] = 1 
And NOT [Folder].[FolderUID] 
    IN (SELECT [FolderPermission].[FolderUID] AS [FolderUID] 
FROM [FolderPermission] 
WHERE ( [FolderPermission].[CanRead] = 2 
And [FolderPermission].[GroupUID] IN ('{7F04048E-64E6-49CC-88D3-E4C7F691512D}', 
 '{26D7182C-9964-40A4-A71B-6412DF936292}' , '{946580A4-1C5B-42C9-8C94-B8719BAC7CA8}',
'{3C12B7A6-DA67-4551-B169-5EB52E2376D5}')))) 

ORDER BY [Folder].[PublishDate] DESC;
1  /  2