Performance of Polymorphic fetches

Posts   
 
    
Posts: 30
Joined: 17-Sep-2006
# Posted on: 17-Sep-2006 11:38:47   

Hi there,

I make fairly extensive usage of a two-level hierarchy in my database (the supertype, "ContentItem", and then a load of subtypes such as "Article", "Code", "Book" etc). I saw the mention in the docs that in order to load the correct subtype, LLBLGen uses a join over all the possible subtypes. Given the number involved here, and the fact that I'm going to be very frequently wanting to fetch a content object based on the ID, and then find out what content type it is, I don't think using that join-everything behaviour is going to be a great idea. Also, looking at the joins that it is doing, it seems to be unnecessarily including the supertype (ContentItem) more than once....? This is what it looks like at the moment:

LEFT JOIN [df_ContentArticle] ON [df_ContentItem].[UniqueID]=[df_ContentArticle].[UniqueID]) LEFT JOIN [df_ContentBook] ON [df_ContentItem].[UniqueID]=[df_ContentBook].[UniqueID]) LEFT JOIN [df_ContentCode] ON [df_ContentItem].[UniqueID]=[df_ContentCode].[UniqueID]) LEFT JOIN [df_ContentEvent] ON [df_ContentItem].[UniqueID]=[df_ContentEvent].[UniqueID]) LEFT JOIN [df_ContentJob] ON [df_ContentItem].[UniqueID]=[df_ContentJob].[UniqueID]) LEFT JOIN [df_ContentOrganisation] ON
[df_ContentItem].[UniqueID]=[df_ContentOrganisation].[UniqueID]) LEFT JOIN [df_ContentProduct] ON [df_ContentItem].[UniqueID]=[df_ContentProduct].[UniqueID]) LEFT JOIN [df_ContentReview] ON [df_ContentItem].[UniqueID]=[df_ContentReview].[UniqueID]) LEFT JOIN [df_ContentSampleChapter] ON
[df_ContentItem].[UniqueID]=[df_ContentSampleChapter].[UniqueID]) LEFT JOIN [df_ContentUserGroup] ON [df_ContentItem].[UniqueID]=[df_ContentUserGroup].[UniqueID]) LEFT JOIN [df_ContentWebSite] ON [df_ContentItem].[UniqueID]=[df_ContentWebSite].[UniqueID]) WHERE ( ( [df_ContentItem].[UniqueID] = @UniqueId1))',N'@UniqueId1 int',@UniqueId1=1010

What do you think my best option would be? Call a seperate stored procedure to determine the content type first, and then a call to FetchEntity with the appropriate content type? Or write an SP that does all that and performs the appropriate join without a round-trip to the client?

I'd appreciate any advice you have!

Thanks

James

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39927
Joined: 17-Aug-2003
# Posted on: 17-Sep-2006 12:56:28   

james.crowley wrote:

Hi there,

I make fairly extensive usage of a two-level hierarchy in my database (the supertype, "ContentItem", and then a load of subtypes such as "Article", "Code", "Book" etc). I saw the mention in the docs that in order to load the correct subtype, LLBLGen uses a join over all the possible subtypes. Given the number involved here, and the fact that I'm going to be very frequently wanting to fetch a content object based on the ID, and then find out what content type it is, I don't think using that join-everything behaviour is going to be a great idea.

It's inevitable. How else would you be able to determine what the type is of a row? You can't, unless you have all fields of all the (direct) subtypes and check which ones are not null. The thing is: you can't do it differently as you have to decide what tables to join without looking at ContentItem's data, because the entity's data can be in 1, 2, maybe 10 other tables and it's unknown at fetch time in which ones.

This is caused by the fact that the relational model doesn't allow you to define inheritance in the model: with a ROW in ContentItem, you can't determine in a generic sql statement which tables to join, well , you can of course with a LEFT JOIN and a predicate, which is what the query does.

Also, looking at the joins that it is doing, it seems to be unnecessarily including the supertype (ContentItem) more than once....? This is what it looks like at the moment:

LEFT JOIN [df_ContentArticle] ON [df_ContentItem].[UniqueID]=[df_ContentArticle].[UniqueID]) LEFT JOIN [df_ContentBook] ON [df_ContentItem].[UniqueID]=[df_ContentBook].[UniqueID]) LEFT JOIN [df_ContentCode] ON [df_ContentItem].[UniqueID]=[df_ContentCode].[UniqueID]) LEFT JOIN [df_ContentEvent] ON [df_ContentItem].[UniqueID]=[df_ContentEvent].[UniqueID]) LEFT JOIN [df_ContentJob] ON [df_ContentItem].[UniqueID]=[df_ContentJob].[UniqueID]) LEFT JOIN [df_ContentOrganisation] ON
[df_ContentItem].[UniqueID]=[df_ContentOrganisation].[UniqueID]) LEFT JOIN [df_ContentProduct] ON [df_ContentItem].[UniqueID]=[df_ContentProduct].[UniqueID]) LEFT JOIN [df_ContentReview] ON [df_ContentItem].[UniqueID]=[df_ContentReview].[UniqueID]) LEFT JOIN [df_ContentSampleChapter] ON
[df_ContentItem].[UniqueID]=[df_ContentSampleChapter].[UniqueID]) LEFT JOIN [df_ContentUserGroup] ON [df_ContentItem].[UniqueID]=[df_ContentUserGroup].[UniqueID]) LEFT JOIN [df_ContentWebSite] ON [df_ContentItem].[UniqueID]=[df_ContentWebSite].[UniqueID]) WHERE ( ( [df_ContentItem].[UniqueID] = @UniqueId1))',N'@UniqueId1 int',@UniqueId1=1010

I don't see ContentItem being joined multiple times... It's in the predicates of the ON clauses a lot, to tell the RDBMS engine to join the data only if the row read from ContentItem is related to the row in the table joined, which is exactly what you want, because it lets the RDBMS engine figure out which rows to join from the subtype tables to the ContentItem rows, exactly what you want.

What do you think my best option would be? Call a seperate stored procedure to determine the content type first, and then a call to FetchEntity with the appropriate content type? Or write an SP that does all that and performs the appropriate join without a round-trip to the client?

I'd appreciate any advice you have!

In the inheritance documentation, a word of warning is mentioned (Pitfalls with inheritance), which hints you that target per entity can give performance problems if the inheritance tree has a lot of leafs. The performance dip is inenvitable, as the only way to fetch the data properly for ALL the subtypes in 1 set is to do a join. You can't create a single query which fetches all contentitems into their proper type instance without the joins you run into. SQL is a set-based language, so it's not imperatively traversing rows and on each row you're defining the decision what to join next. In fact, the RDBMS is doing this behind the scenes with clever RA, so I don't think it will be faster if you write it out yourself (no offence to you, but RDBMS-s are optimized for this kind of work).

In Conceptual Schema and Relational Database Design by Halpin/Nijssen, they discuss two ways of flattening an entity inheritance hierarchy: one with 1 table per entity type, and one with 1 table for a complete hierarchy branch. The complete hierarchy in a single table is faster, but has drawbacks too.

It also depends on what exactly is stored in the ContentItem fields: is this a lot of data or not.

With inheritance, LLBLGen Pro always fetches the rows fetched as the type they really represent, as that's what you want: if you tell LLBLGen Pro to fetch a collection of ContentItem entities, you'll get the entities in their actual type, like Article, Book etc. If you just want the ContentItem fields, use a dyn. list or typed list, eventually with a projection onto ContentItem entity instances directly (V2.0) simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Posts: 30
Joined: 17-Sep-2006
# Posted on: 17-Sep-2006 13:49:34   

Otis wrote:

This is caused by the fact that the relational model doesn't allow you to define inheritance in the model: with a ROW in ContentItem, you can't determine in a generic sql statement which tables to join, well , you can of course with a LEFT JOIN and a predicate, which is what the query does.

Ah, sorry - I should have been clearer. The primary situation in which I'm going to be fetching ContentItems and actually care about the additional data is when there is only a single row involved, matching a specific ID; hence the comment about just looking up the Content type in a stored procedure and performing the (single) select with appropriate joins then.

I don't see ContentItem being joined multiple times...

You're quite right. The wrapping of the lines screwed with my head smile

If you just want the ContentItem fields, use a dyn. list or typed list, eventually with a projection onto ContentItem entity instances directly (V2.0) simple_smile

Could you give me a quick example of how to do that? Thanks very much!

James

Walaa avatar
Walaa
Support Team
Posts: 14995
Joined: 21-Aug-2005
# Posted on: 18-Sep-2006 07:06:17   

Please refer to LLBLGen Pro manual for the following titles: "Using the generated code -> Adapter/SelfServicing -> Using TypedViews, TypedLists and Dynamic Lists" "Using the generated code -> Adapter/SelfServicing ->Fetching DataReaders and projections"

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39927
Joined: 17-Aug-2003
# Posted on: 18-Sep-2006 10:15:39   

You could add a surrogate discriminator column to ContentItem, one which isn't used by LLBLGen Pro, but is used by you. (you've to set it to the proper value in an override of OnInitialized in the entity classes of course). You then first fetch it with a scalar query, and use it then to fetch the data using a dynamic list which you project onto an entity class.

Frans Bouma | Lead developer LLBLGen Pro