HnD | PrefetchPath with maxNumberOfItemsToReturn, Page 1

PrefetchPath with maxNumberOfItemsToReturn

Posts

crisu01
User

Posts: 4
Joined: 01-Jun-2004

# Posted on: 25-Nov-2004 14:18:56

Im fetching a EntityCollection using PrefetchPath and only want the first 500 or so items from over a million. I know that's not good policy, but that's the kind of searches my users want. Using maxNumberOfItemsToReturn I get the first 500 items with no problem and the generated SQL looks like:

SELECT * FROM (SELECT DISTINCT ... FROM "SENDEJOURNAL" WHERE ( ( "SENDEJOURNAL"."KANAL" = :KanalId1) And ( "SENDEJOURNAL"."BEGINN" >= :Beginn2) ORDER BY "SENDEJOURNAL"."BEGINN" ASC) WHERE rownum <= 500

Now I have to fetch rather a lot of related entities for each item in the resultset and PrefetchPath would be wonderful. But PrefetchPath obviously ignores MaxResults and Sorter for the fetching of the related items and the generated SQL looks like:

SELECT ... FROM "PRODFOLGE" WHERE ( "PRODFOLGE"."ID" IN (SELECT "SENDEJOURNAL"."FOLGE_ID" AS "FolgeId" FROM  "PRODFOLGE", "SENDEJOURNAL" WHERE  "PRODFOLGE"."ID"(+)="SENDEJOURNAL"."FOLGE_ID" AND ( ("SENDEJOURNAL"."KANAL" = :KanalId1) And ( "SENDEJOURNAL"."BEGINN" >= :Beginn2) )

and

SELECT ... FROM "PRODSTAMM" WHERE ( "PRODSTAMM"."ID" IN (SELECT "PRODFOLGE"."STAMM_ID" AS "StammId" FROM "PRODFOLGE" WHERE ( "PRODFOLGE"."ID" IN (SELECT "SENDEJOURNAL"."FOLGE_ID" AS "FolgeId" FROM  "SENDEJOURNAL" WHERE  ( ( "SENDEJOURNAL"."KANAL" = :KanalId1) And ( "SENDEJOURNAL"."BEGINN" >= :Beginn2) ) ) )

More or less the SQL-Statement for fetching from SENDEJOURNAL should be in the subquery for the IN-Predicate in the other queries. I know that's not the most performable way known to man.

I have looked into the code of the ORMSupportClasses and when FetchEntityCollection is called with MaxItems and Sorter, FetchPrefetchPath is also called with MaxItems and a Sorter. But these parameters are not used.

Now I have tried something rather quick and very dirty and changed the code in FetchPrefetchPath like:

if(currentElement.Relation.StartEntityIsPkSide)
{
  rootEntitiesArePkSide = true;

  // root objects are PK side, so the subquery will contain the PK fields,
  // and the entities to fetch the FK fields
  if (j == 0)
    elementFilter.PredicateExpression.Add(new FieldCompareSetPredicate(
      currentElement.Relation.GetFKEntityFieldCore(j), null, 
      currentElement.Relation.GetPKEntityFieldCore(j), null, SetOperator.In,
      elementFilterSubQueryFilter, elementFilterSubQueryRelations,
      ((EntityRelation)currentElement.Relation).AliasFKSide,
      maxNumberOfItemsToReturn, sortClauses));
    else
      elementFilter.PredicateExpression.Add(new FieldCompareSetPredicate(
        currentElement.Relation.GetFKEntityFieldCore(j), null, 
        currentElement.Relation.GetPKEntityFieldCore(j), null, SetOperator.In,
        elementFilterSubQueryFilter, elementFilterSubQueryRelations,
        ((EntityRelation)currentElement.Relation).AliasFKSide, 0, null));
}

Hopefully injecting the missing statements into the Expression. But I was stopped short from trying it out because I could not rebuild the OracleDQE dll for the changed ORMSupportClasses dll due to lack of the DDTek.Oracle assembly.

So is there anything in the pipeline regarding FetchEntityCollection using PrefetchPath and MaxItems? It would be very helpful for me in easing a performance bottleneck. For the protocol I have also looked into using a typed list. But unfortunatedly thats not possible because in typed lists I am not able to join to another entity more than once.

Thanks

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 25-Nov-2004 15:03:55

(you can build an ODP.NET build for oracle by commenting out the build commands for 10g and datadirect in make11.cmd in the OracleDQE folder)

What you suggested doesn't work, that's why it's not implemented this way. I'll illustrate it using an example.

Say I fetch 20 customer objects and for each customer object I want 10 order objects. So I have to fetch 200 order objects. However when I specify a limit, I get 10 order objects, as the TOP in the query for the orders will limit my resultset to 10, not 200.

It will fetch the data now and will limit, per customer the orders stored. It will throw away the rest of the data.

This isn't the most efficient way of doing it. In fact, when I wrote this, I thought of adding the maxfilter (amountRootElements * maxChildsPerRoot) which will at least limit the amount of elements fetched), which would help greatly. I'll add that later today. (the total amount of objects in a path node is never more than the max limit per child * the amount of root elements, only equal or slightly less, which is a marginal issue)

As prefetch paths are build on top of the actual fetching framework, it doesn't have an open datareader it can walk through and stop when a limit is reached.

I've toyed with adding subquery SQL into the query to add a top per row but that was very slow (as in: seconds slower for 100 rows or so) so I skipped that.

I hope that limit I suggested above will work out. I'll let you know when I've it ready for testing.

(edit): the sorter is applied to the complete fetch of the node, not to the subquery, as the sorter doesn't matter in the subquery (it's an IN query after all).

I'll now test the change, and will mail you a new batch afterwards.

Frans Bouma | Lead developer LLBLGen Pro

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 25-Nov-2004 15:51:21

~~Ok, it seems to work

I've mailed you a new runtime libraries archive (to the address you registered with on this forum). It's a .zip archive, if you don't receive it, please let me know.

I couldn't believe it was so simple. I spend hours and hours trying to get this to work, but the most simple solution never occured to me . ~~

(edit) Hmm. it doesn't work as expected....

Frans Bouma | Lead developer LLBLGen Pro

crisu01
User

Posts: 4
Joined: 01-Jun-2004

# Posted on: 25-Nov-2004 16:04:10

Hmmm. I've thought about something like:

SELECT * FROM "PRODFOLGE" WHERE ("PRODFOLGE"."ID" IN (SELECT "FOLGE_ID" FROM (SELECT DISTINCT "SENDEJOURNAL"."FOLGE_ID", "SENDEJOURNAL"."BEGINN" FROM "SENDEJOURNAL" WHERE ("SENDEJOURNAL"."KANAL" = 1) ORDER BY "SENDEJOURNAL"."BEGINN" ASC) WHERE rownum <= 500)));

Where the original query is (more or less) inserted into the innermost IN clause. Of course this is not very fast. But a lot faster than fetching 500.000 lines from PRODFOLGE when only 500 are necessary.

This isn't the most efficient way of doing it. In fact, when I wrote this, I thought of adding the maxfilter (amountRootElements * maxChildsPerRoot) which will at least limit the amount of elements fetched), which would help greatly. I'll add that later today. (the total amount of objects in a path node is never more than the max limit per child * the amount of root elements, only equal or slightly less, which is a marginal issue)

I dont know if it is feasible to limit the amount of elements fetched especially in combination with a sorter and maxItems. Say I want the first 50 customers starting with "C" in alphabetical order and all orders for these customers. Currently using PrefetchPaths all orders would be fetched (could be a lot!). Even saying that I only want the first 10 orders of these customers and fetching only 500 orders would not help because there is no way that the right orders for the selected customers are fetched because sorting is also ignored.

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 25-Nov-2004 16:48:33

(lots of text deleted)

I begin to realize it isn't solvable. I'll check it out more further. I did apply the limit to the wrong query, so you can throw away the dlls I've send you.

The limit I wanted to add would result in (for 10 orders for 20 customers) (SELECT * FROM (SELECT * from ORDERS WHERE CustomerID IN (SELECT CustomerID FROM Customers WHERE... ) WHERE rownum <=200);

but that doesn't give correct results of course. If customer 1 has 2000 orders, only orders of customer1 are returned.

Frans Bouma | Lead developer LLBLGen Pro

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 25-Nov-2004 18:21:39

ps: you could create a dyn. list to overcome the bare performance bottleneck, if that really is the problem. See the typedlist documentation for an example how to construct and fetch a dyn. list, which can bridge the time until this is addressed (both hte multi-join in typed list and the prefetch path issue).

How is the class graph layout of the graph you want to fetch, as that can help me understand the query you send (i.e. which objects have which relations with eachother). At the moment the 'fastest' query I have on a tiny northwind db to fetch the last 4 orders for all customers in france is


-- select last 4 orders of each customer in france.
SELECT  *
FROM    Orders O 
WHERE   CustomerID IN
    (
        SELECT  CustomerID
        FROM    Customers
        WHERE   Country ='France'
    )
    AND 
    OrderID IN
    (
        SELECT  TOP 4 OrderID
        FROM    Orders
        WHERE   CustomerID IN
            (
                SELECT  CustomerID
                FROM    Customers 
                WHERE   Country ='France'
            )
            AND CustomerID = O.CustomerID
        ORDER BY
            OrderDate DESC
    )
ORDER BY
    CustomerID ASC

Which takes almost a second to execute on a dual proc box, however is constructable with the current DQE's.

I don't see any speed increase options anywhere, the join variant was much slower and took over 7 seconds... unacceptable.

(ps: removing the first subquery seems logical, but confuses the optimizer of sqlserver a lot, the query will then take over 7 seconds!)

Frans Bouma | Lead developer LLBLGen Pro