Index Tuning - Choosing a Clustered Index Column

Posts   
1  /  2
 
    
alexdresko
User
Posts: 336
Joined: 08-Jun-2004
# Posted on: 05-Jun-2005 00:23:34   

This thread makes me feel stupid. simple_smile Is there a good book or something I can read to get me up to speed on some of this high level stuff? OR, since this is stuff that I don't normally have to deal with, should I just pretend there are things I'll never understand? simple_smile

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 05-Jun-2005 14:10:25   

Devildog74 wrote:

I ran some samples on my pc over the weekend, and I am not sure how similar they will be to the results that you are finding.

Thanks VERY much for looking at this in so much detail... smile I really appreciate your input!

I know this thread was all about choosing which column(s) for clustered index, but your last post has produced some very positive improvements in my read performance aswell! smile

I didn't fully appreciate the significance of index fragmentation and have now defragged the entire DB. Some of my tables have > 2 million rows and indexes were HEAVILY fragmented... flushed

There's a handy little script in BOL under DBCC SHOWCONTIG which defrags the entire DB. When it completed (a long time) IO performance had improved dramatically.

Devildog74 wrote:

Having the PK as you have it on the folderpermission will force data to move on every insert. in my tables, the data is similar to that of a heap, i.e. all data always goes at the end of the file and the secondary indexes are updated. because the size of the secndary index indexes are small, sql doesnt need to work as hard to place the inserted record.

I tested adding a clustered index based on the <TableName>Key identity field as you described and insert performance did improve (20%). I'm not sure that this figure is sufficient in order to justify the negative affect on read performance (queries fetching entity collections) which in general is more important.

The real answer to ALL these problems is to remove the Guids as PKs altogether and replace them with int or bigint keys. I can then add a mapping table which can map between the Guids keys (which are used outside the application for uniqueness between sites) and int keys. The nice thing about this and something that I previously overlooked is that the Guid -> int mapping is immutable and therefore can be successfully cached by all application servers reducing the need for extra lookups.

I am going to consider making this change for a future version of the application as this will not only solve the INSERT performance problems but will also make JOINs faster...

Devildog74 wrote:

I also found one very interesting tidbit. If I run the query below without the orderby, the respose comes back instantly and the execution plan is totally different whereby all steps are using index scans and bookmark lookups. With the orderby clause left out of the query, all steps have IO less than 0.007 which is ideal, compared to 0.5. If I leave the ordere by clause in the tsql, the statement takes 2 seconds to execute. When I examine the sort step of the execution plan when the order by is in the statement, the IO is 25.0 which is really really bad.

Very interesting simple_smile ... If you remove the DISTINCT clause, the plans are then as you would expect...

Thanks everyone for your input smile

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 05-Jun-2005 14:51:26   

alexdresko wrote:

This thread makes me feel stupid. simple_smile Is there a good book or something I can read to get me up to speed on some of this high level stuff? OR, since this is stuff that I don't normally have to deal with, should I just pretend there are things I'll never understand? simple_smile

Most SQL Server books have a section on Performance Tuning, but as you can see from the discussion, attitudes / guidance varies.

I think it does absolutely no harm to understand the implications that some of your development design choices have on the final production system. My choice of Guids as PKs is a good example of something that led to problems further down the line...

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 05-Jun-2005 18:35:09   

Damn, DevilDog, you put the rest of us to shame!

I didn't fully appreciate the significance of index fragmentation and have now defragged the entire DB. Some of my tables have > 2 million rows and indexes were HEAVILY fragmented... flushed

There's a handy little script in BOL under DBCC SHOWCONTIG which defrags the entire DB. When it completed (a long time) IO performance had improved dramatically.

Yeah, fragmentation can be a real performance drag, see my previous suggestion that after changing your PK you would probably want to defrag the entire thing, as it's probably gotten pretty bad due to prior indexing. I didn't know about that script, I wrote a custom one that was based on Sajal Dam's book and which I tweaked for some reason (don't remember what now).

The real answer to ALL these problems is to remove the Guids as PKs altogether and replace them with int or bigint keys. I can then add a mapping table which can map between the Guids keys (which are used outside the application for uniqueness between sites) and int keys.

I have a situation now where I have to have an external database that will have some data replicated from Micrsoft CRM, which uses GUIDS everywhere. In effect I'm doing the same thing as you, using my own int PKs and keeping the GUIDS as (undeclared) FKs. Again....I wouldn't want to use a GUID as a PK unless it was absolutely necessary. But the relational model doesn't really restrict you to how many PKs you use!

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 06-Jun-2005 01:58:22   

Sajal says that the "hot spot" problem is pretty much a thing of the past, partly because of changes Microsoft made after 6.5, and partly because modern hardware (caching and disk arrays) means that inserts are taking place in multiple physical locations and times anyway.

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 06-Jun-2005 10:12:44   

JimFoye wrote:

Damn, DevilDog, you put the rest of us to shame!

I dont know if I would say that. I just hope I was able to help.

It isnt very often that I get to geek out on something interesting and technical, and besides, I was trying to put off writing the code that I was "supposed" to be writing over the weekend, so helping out Marcus was a refreshing break.

jtgooding
User
Posts: 126
Joined: 26-Apr-2004
# Posted on: 06-Jun-2005 14:41:19   

I love topics like this, they are usually good at pointing out that there isn't one rule fits all scenarios. Even from my last company to my current, simply the difference in hardware, table sizes requires an entirely different set of rules.

In my last company we would hit a million or two rows on our largest customer systems but we ran on what I would consider minimalistic hardware, single CPU, raid 5 etc.

At the current company we have 8 terabytes of data on a fiber SAN, 8 way CPU's, 8 gigs of RAM per box, clustered. Think we have 4 setups like this serving around 2 thousand people over something like 64 T1's.

Our system cache's so much data and the SAN is so effecient that even doing full table scans aren't that painful, now the director would kick my ass if I started doing that so obviously we don't =P

For my view on it, insert performance is hadly noticable no matter what index we use unless you are working with under powered hardware for the size DB, we cluster off of analysis of what is the most common queries to put the desired results on the same pages. In this case it is almost always the Identity column, or a multi-column key including a date.

This greatly reduces the number of page reads and how much cache is used per table allowing for more of the desired data vs. historical data to be cached over our entire system of 1700ish tables.

Our greatest issue is always blocking, especially with so many users located remotely, without NOLOCK on queries while SQL Server is spooling requested rows it appears to block until the row is actually sent to the user, not sure if it is supposed to behave like this but it appears to, especially with /shiver Crystal Reports.

Well enough rambling, just saying that there is no one cure all for perfromance, especially given differences in hardware resources, user demands etc. The best option is to build a test database, profile it under various loads and find a solution that works for you in your environment.

John

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 06-Jun-2005 19:19:09   

jtgooding wrote:

The best option is to build a test database, profile it under various loads and find a solution that works for you in your environment.

Agreed, SQL Profiler is a very powerful and helpful tool.

Its funny that you mention RAID5. I used to think it was fast until my company purchased a new db server that services 500 concurrent users. Its got an internal RAID1 on one channel a RAID 1/0 on the second channel, and a second U320 RAID controller running another RAID 1/0 array on a direct attached storage device.

Failover wasnt a priority due to our SLAs so we didnt cluster or anything like that. I ended up putting data files only on one of the RAID 1/0s, tempdb secondary file groups and index files on the other RAID 1/0 and all transaction logs on the RAID1. It screams and does a lot of parallel reads.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 07-Jun-2005 10:36:02   

And just when you thought this thread was coming to a close... Kimberley steps up to the mark: http://www.sqlskills.com/blogs/kimberly/PermaLink.aspx?guid=bdaee3f7-1e15-414b-b75f-a290db645159

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 07-Jun-2005 13:42:32   

Its a pretty good article that was posted. I think the key to to the entire article is when she says "base-table fragmentation", this is referring to the pages of the data table. I dont agree that a 16 byte guid is narrow compared to a 4 byte int or 8 byte big int.

But for the most part, I think the reccomendations in the article are right on. If your root / base pages are fragmented and out of order, all indexes built on top of those pages are going to have to work that much harder to get to the physical data in the base pages.

I dont have a lot of experience with using extended procedures but I would be leary of using one just to get a sequential guid, and this nievity probably just stemms from my lack of experience with extended procs.

1  /  2