Looking for Self Join help (Adapter)

Posts   
 
    
ScottCate
User
Posts: 48
Joined: 11-May-2005
# Posted on: 23-May-2005 08:31:27   

I have root directories (think file system) working, but I can't seem to load child categories in a single pass. I can get them one at a time, but I'm confident there is a better way.

Here's my setup.

dbo.Account Creates .. AccountEntity dbo.Directory Creates .. DirectoryEntity

2 replatioships.

dbo.AccountDirectory (AccountPK | DirectoryPK)

I can get the acccount + all the Account Directories.

Here's where I get lost.

In dbo.Directories, there is a ParentDirectory field. If it's null, that's a root node, and should be loaded from the account. What I need is a collection of Directories that start from the directories found in the account (and all of their recursive child directoires).

I can't find any samples in the docs about self joins.

Thanks for the help, or pointing to docs that I'm missing. (Please feel free to move this thread if I've posted in the wrong section)

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 23-May-2005 11:12:06   

ScottCate wrote:

In dbo.Directories, there is a ParentDirectory field. If it's null, that's a root node, and should be loaded from the account. What I need is a collection of Directories that start from the directories found in the account (and all of their recursive child directoires).

Hi,

Frans will correct me here if I'm wrong, but I don't believe there is any easy way to recurse self joins in LLBLGen. I have a similiar setup, a graph rather than a tree, but the requirement to recurse is the same. The best I could come up with was to minimise the number of queries required to recurse the tree to (n + 1) where n is the deepest level of the tree I need to recurse to.

In your example you would

1) Fetch all the accounts and prefetch all their root directories. 2) In your application, recurse using Foreach fetched account and foreach prefetched account directory: 3) Add the directoy's PK to a new ArrayList. Also add the directory entity to a new Hashtable using its PK as a key. 4) Create a new EntityCollection, and fill it with all child directories (irrespective of their parents) using the previously compiled ArrayList of parent directory PKs using a PredicateFactory.CompareRange predicate. 5) Foreach directory returned use it's parent PK to lookup the parentEntity from the Hashtable and manually add this child directory to parent entity's child collection 6) if the entitycollection.count > 0 Repeat 3

Its not for the faint hearted but it works with reasonable performance. Hope it helps! simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39788
Joined: 17-Aug-2003
# Posted on: 23-May-2005 11:49:43   

In SQL it's not possible to fetch a hierarchy at once when the hierarchy is stored in a single table. You can use Marcus' approach, you can also use a precalculation table with some precalc code. In that table, you store for every leaf the path to the root (so leaf's parent, then that parent's parent etc.).

The precalc makes sure updates of teh parent field of a leaf updates the leaf's paths. It's not easy, but the result is you can fetch a complete graph in one select.

Example: I've a category table for my cms.:


CREATE TABLE [CESys_Categories] (
    [CategoryID] [int] IDENTITY (1, 1) NOT FOR REPLICATION  NOT NULL ,
    [Description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [ParentCategoryID] [int] NOT NULL ,
    [LastModifiedAt] [datetime] NOT NULL ,
    CONSTRAINT [PK_CESys_Categories] PRIMARY KEY  CLUSTERED 
    (
        [CategoryID]
    )  ON [PRIMARY] 
) ON [PRIMARY]

precalc table:


CREATE TABLE [CESys_CategoryParents_Precalc] (
    [CategoryID] [int] NOT NULL ,
    [ParentCategoryIDInPath] [int] NOT NULL ,
    CONSTRAINT [PK_CESys_CategoryParents_Precalc] PRIMARY KEY  CLUSTERED 
    (
        [CategoryID],
        [ParentCategoryIDInPath]
    )  ON [PRIMARY] 
) ON [PRIMARY]

When you delete a category, the precalc's have to be deleted as well of course.

Now the procedure I use (it's a stored proc based system, little old wink )


--------------------------------------------------------------------
-- Stored procedure that will insert in CESys_CategoryParents_Precalc all
-- parent CategoryID's of the given CategoryID. Consider category A. A has
-- 2 childs: B and C. B has a child D. The childs plus the root in this hierarchy are stored on
-- 'leafs' (pathnodes) on the path from outerleaf (here: D or C) to the root (here: A)
-- This would result in the following parentinpath table:
-- 
-- CategoryID | ParentCategoryIDInPath
---------------------------------------
--   A    |  0
--   B    |  A
--   C    |  A
--   D    |  B
--   D    |  A
--
-- This way, fast non recursive retrieval is possible of all categories in the 
-- hierarchical path of a given category. If an Item is assigned to category D,
-- it has category B and A also, because these categories are in the path.
--
-- Gets: iCategoryID int, ID of category which parent categories should be determined
-- Gets: iLeafCategoryID int, ID of category on the pathnode (leaf) on the path from iCategoryID to the root of
--                          the Category hierarchy. In the example above, this can be B or A for iCategoryID=D.
-- Returns: iErrorCode int, 0 if succeeded, > 0 if SQLserver error.
-- Uses recursion (calls itself)
--------------------------------------------------------------------
-- 08-aug-2001
--  [FB] First Version
--------------------------------------------------------------------
ALTER     PROCEDURE [pr_cesys_PrecalculateCategoryIDPathsPerCategory]
    @iCategoryID int,
    @iLeafCategoryID int,
    @iErrorCode int OUTPUT
AS
DECLARE @iParentCID int,
    @bLocalTrans bit
-- User is allowed to perform the action.
IF @@TRANCOUNT >0
BEGIN
    SAVE TRANSACTION transPCPPC
    SET @bLocalTrans=0
END
ELSE
BEGIN
    -- Start new transaction
    BEGIN TRANSACTION transPCPPC
    SET @bLocalTrans=1
END
-- Get the parent of the given Category.
SELECT @iParentCID=ParentCategoryID FROM CESys_Categories WHERE CategoryID=@iLeafCategoryID
SELECT @iErrorCode=@@ERROR
IF @iErrorCode<>0
    GOTO Handler
-- Check if the categoryID found is a parent
IF @iParentCID > 0
BEGIN
    -- has a parent. store parent as ParentCategoryIDInPath in CESys_CategoryParents_Precalc
    INSERT CESys_CategoryParents_Precalc (CategoryID, ParentCategoryIDInPath)
        VALUES (@iCategoryID, @iParentCID)
    SELECT @iErrorCode=@@ERROR
    IF @iErrorCode<>0
        GOTO Handler
    -- went ok. Dive into recursion. This will limit the amount of levels in a category hierarchie to 32.
    EXEC pr_cesys_PrecalculateCategoryIDPathsPerCategory @iCategoryID, @iParentCID, @iErrorCode OUTPUT
    IF @iErrorCode<>0
        GOTO Handler
END
ELSE
BEGIN
    -- No parent. Insert a record with '0' as the parent ONLY if this routine
    INSERT CESys_CategoryParents_Precalc (CategoryID, ParentCategoryIDInPath)
        VALUES (@iCategoryID, 0)
    SELECT @iErrorCode=@@ERROR
    IF @iErrorCode<>0
        GOTO Handler
END
-- Done
IF @bLocalTrans=1
BEGIN
    COMMIT TRANSACTION transPCPPC
END
SELECT @iErrorCode=0
RETURN
Handler:
    -- iErrorCode has to have a value
    ROLLBACK TRANSACTION transPCPPC
    RETURN

This is doable in your own code as well without a problem (in C#/VB.NET with llblgen pro code) So in the rare cases the category gets updated (i.e. changes graph position), this routine is ran. The selects of the graph is very fast, just a join and a silly sort simple_smile

Frans Bouma | Lead developer LLBLGen Pro
psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 23-May-2005 16:13:58   

Hey,

Reading this thread is making me wonder if we are using the right approach. The current project we are working on has a lot of hierarchy tables in it. They are all single table self-joins.

We currently use a recursive prefetch path to fetch our self-referencing entity graph. To get the parent nodes, I use a dynamic typed list to grab the PK (or PK list) and create a predicate with it. Then I grab the base entity (or collection of base entities) with the predicate and prefetch path.

This probably leads to a lot of queries that return nothing, which for our purposes probably isn't that big a deal. It does avoid the loop of "read from database => process in code => read from the database => process in code". I think we are running more queries, but they all happen via the same connection and at the same time vs. doing a lot of "chatting" back and forth.

I'm not sure if that's a good tradeoff.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 23-May-2005 17:30:27   

If you can use a precaculated table as Frans describes then there is no better alternative... You may find however that you have some other criteria for recursion (permissions for example) which makes precaculation much more complex if not impossible.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39788
Joined: 17-Aug-2003
# Posted on: 23-May-2005 17:59:22   

You can also use an approach like CELKO describes [urldescription="here"]http://groups-beta.google.com/group/microsoft.public.sqlserver.programming/browse_frm/thread/a0971b548a1daa06/d94afd1d56858df4?q=tree+store+schema+author:CELKO&rnum=4&hl=en#d94afd1d56858df4[/url]

Frans Bouma | Lead developer LLBLGen Pro
Ian avatar
Ian
User
Posts: 511
Joined: 01-Apr-2005
# Posted on: 21-Oct-2005 23:14:24   

In your example you would

1) Fetch all the accounts and prefetch all their root directories. 2) In your application, recurse using Foreach fetched account and foreach prefetched account directory: 3) Add the directoy's PK to a new ArrayList. Also add the directory entity to a new Hashtable using its PK as a key. 4) Create a new EntityCollection, and fill it with all child directories (irrespective of their parents) using the previously compiled ArrayList of parent directory PKs using a PredicateFactory.CompareRange predicate. 5) Foreach directory returned use it's parent PK to lookup the parentEntity from the Hashtable and manually add this child directory to parent entity's child collection 6) if the entitycollection.count > 0 Repeat 3

Marcus, please could you post some simple code as I can't quite make out what you mean? Cheers, I.

alexdresko
User
Posts: 336
Joined: 08-Jun-2004
# Posted on: 21-Nov-2005 21:40:20   

Frans,

Couldn't you design LLBLGen to load an entire table's data into memory, wire up the self joins, and discard rows that aren't used?

I didn't realize until now that I might be shooting myself in the foot by using recursive self-joins...

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39788
Joined: 17-Aug-2003
# Posted on: 22-Nov-2005 09:58:38   

It's a possibility, though I think it's often not what people want as it might be you want do perform additional processing. But I see what you mean if you want to load a graph of objects which are all stored in a single table and which is now a bit of a pain to retrieve and setup (ok, not a true pain, but 'ome assembly is required')

Frans Bouma | Lead developer LLBLGen Pro
greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 11-Aug-2007 16:10:56   

I liked Joe Celko's approach...quite elegant. But for my application, the issues with Celko's approach skewed me back towards Otis' approach.

Here's a good article that has a discussion of the downsides of the Celko approach. http://www.sqlteam.com/article/more-trees-hierarchies-in-sql