LINQ Prefetch Path not stopp at Level 1

Posts   
 
    
ITFromBit
User
Posts: 8
Joined: 19-Feb-2021
# Posted on: 29-Jan-2022 16:55:38   

Hi

I have two tables:

job category

job and category have a m:1 relationship, so each job has a category_id.

Now I would like to query job with a PrefetchPath to category.

(from j in md.Job select j).WithPath(new PathEdge<CategoryEntity>(JobEntity.PrefetchPathCategory)).ToList();

This query retrieves all jobs with the associated category object (so far so good) - but in the category object the jobs collection (which lists all jobs for the given category) is also populated - for each job I get a full list of the same jobs under the category node.

I did not ask LLBLGEN to follow this path that is to say, I did not specify a subpath for category -> jobs.

BTW: QueryFactory does exactly the same thing.

What am I doing wrong here?

Best M

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 30-Jan-2022 10:01:19   

That's by design, but it's not as bad as you think: it executes 2 queries, not 3.

if you do:

myJob.Category = myCategory;

then this is also true:

myCategory.Jobs.Contains(myJob);

So when you specified the prefetch path job-category, it'll fetch for each job its related category, and when it merges the sets, it'll add the related job objects (from the first query) in the Jobs collections of each category.

If you don't want this, e.g. because you keep Category in memory and therefore it keeps the Jobs in memory too, you should make the Category-Jobs relationship hidden (delete the navigator in the designer). This is a global action, so if you use the navigator Category.Jobs in your queries, you should rewrite those to use the Job.Category navigator.

Frans Bouma | Lead developer LLBLGen Pro
ITFromBit
User
Posts: 8
Joined: 19-Feb-2021
# Posted on: 30-Jan-2022 11:33:23   

Aah, I see.

Well it is bad if the entity is serialized for some reason. And it comes totally unexpected, because that is what prefetch paths are there: to explicitly specify what sub entities I would like to include. So LLBLGEN does something on it's own, that contradicts the idea of prefetch paths, in my opinion.

If I fetch 500 Jobs I would expect to get 1000 entities, 500 Jobs-entities and 500 category-entities, now I am getting 251.000 Entities, because in addition to the 1000 entities I want, I also get 500*500 times the job entity - and the very job entities that I query in the first place, so I get 250.000 job entities that I already have. But since that are only object references (I presume?) it should, at least, not be a memory issues.

And, BWT, the data is also wrong: The jobs collection under the category only contains the 500 jobs, which is only a subset (the selected 500 jobs in the first place).

Nonethless: I should have expected that I need to add a sub query path to get exactly that, from category -> job using the category2job 1:m-navigator.

Removing the navigator will solve the problem but also eliminate that feature that I can query the category and add a subpath to the jobs, as you mention.

I really wished that I can configure this default merging behaviour. However: For now, I will remove the navigator. Thank you for your support and your excellent ORM.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 31-Jan-2022 09:45:51   

ITFromBit wrote:

Aah, I see.

Well it is bad if the entity is serialized for some reason. And it comes totally unexpected, because that is what prefetch paths are there: to explicitly specify what sub entities I would like to include. So LLBLGEN does something on it's own, that contradicts the idea of prefetch paths, in my opinion.

Prefetch paths is about fetching related entities. So if you fetch set A and its related set B (that thus means, the B instances related to the A instances matching the predicates defining the set A), then A is still related to B from B's point of view. That's why both sides of the relationship are kept in sync. Our framework does this since its beginning in 2003. The resultset doesn't show how it's fetched, it's a fetched set, with related entities, which you can navigate from both sides. It's in our opinion weird if you pick a Category in your fetched set and can't navigate to the related Job instances from the fetch while from the other side you can.

If I fetch 500 Jobs I would expect to get 1000 entities, 500 Jobs-entities and 500 category-entities, now I am getting 251.000 Entities, because in addition to the 1000 entities I want, I also get 500*500 times the job entity - and the very job entities that I query in the first place, so I get 250.000 job entities that I already have. But since that are only object references (I presume?) it should, at least, not be a memory issues.

You get 500 times the job entity in memory, as it fetches 2 queries, one for jobs and one for categories and merges the instances in memory. It doesn't copy job instances, it references job instances. It's a graph.

And, BWT, the data is also wrong: The jobs collection under the category only contains the 500 jobs, which is only a subset (the selected 500 jobs in the first place).

Yes, because it's a fetched set, not a view on the database. The data therefore isn't wrong at all. The concept is exactly the same as when you'd fetch a category and some jobs based on a predicate. That too would contain less job instances in memory than there are in the database. That doesn't mean the data is 'wrong', it means your set has a subset of the data in the database. That's always the case with fetched sets. You can never assume a fetched set in memory represents the application state, it's always a subset of that, materialized based on a set of predicates.

Nonethless: I should have expected that I need to add a sub query path to get exactly that, from category -> job using the category2job 1:m-navigator.

if you want to fetch categories and all their jobs, then yes. You fetch jobs and their related categories. Jobs that aren't in the jobs set aren't in memory at that moment.

Removing the navigator will solve the problem but also eliminate that feature that I can query the category and add a subpath to the jobs, as you mention. I really wished that I can configure this default merging behaviour. However: For now, I will remove the navigator. Thank you for your support and your excellent ORM.

What's the downside for you of the current system? You can navigate the fetched graph (which can never be seen as a real life view on the database!) from either entity.

Frans Bouma | Lead developer LLBLGen Pro
ITFromBit
User
Posts: 8
Joined: 19-Feb-2021
# Posted on: 31-Jan-2022 09:51:58   

No, that is perfect. One just need to know. My opinion is: If I want to see the reverted relationship, back from category to job, all I need to do is add a prefetch path and do it explicitly. Different views here, but for a good reason. I get used to it and never discovered it until now.

The only downside is serialization. One has to eliminate the unwanted entity references under category, otherwise the JSON file gets really, really big. Of course the serializer can be configured to use object references, instead copying the object, but that may not be supported on the consumers side and often is not.

So thank you again for your explanations. Looking forward to rel 5.9!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 31-Jan-2022 09:59:48   

ITFromBit wrote:

No, that is perfect. One just need to know. My opinion is: If I want to see the reverted relationship, back from category to job, all I need to do is add a prefetch path and do it explicitly. Different views here, but for a good reason. I get used to it and never discovered it until now.

The only downside is serialization. One has to eliminate the unwanted entity references under category, otherwise the JSON file gets really, really big. Of course the serializer can be configured to use object references, instead copying the object, but that may not be supported on the consumers side and often is not.

It is indeed a problem for serialization as the references back from category to job are seen as a cycle. I think you can configure json serializers to stop serializing at that point, but it might miss elements in the graph that way. If you are using it in serialization, the removal of the 1:n navigator of a relationship is a good choice. (The m:1 navigator is necessary for the the fk-pk relationship, as not having the m:1 navigator means the pk side isn't 'visible' from the fk side, so fk's aren't synced with their pk's).

Alternatively, you could look into derived models for serialization, however it depends on whether the client the serialized data is used in is a thick client or a thin client, a thick .net client might benefit from using entity instances there instead of 'dumb' dto's.

So thank you again for your explanations. Looking forward to rel 5.9!

Should be up later today simple_smile

Frans Bouma | Lead developer LLBLGen Pro