Limiting Entities Affected by Recursive Saves

Posts   
 
    
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 26-May-2021 22:16:18   

Build Version: 5.5 (5.5.1) RTM Build Date: 17-Jan-2019 Project: Adapter project targetting .NET 4.5.2 Database: MS SQL 2019

So we currently have a job that takes generates a collection of entities from user-submitted data, validates them, and then adds the valid entities to a unit of work. Other processing happens and other kinds of entities may or may not be added to the unit of work, and then that unit of work is committed. Each of these add actions sets the recurse flag to true.

We're finding that in some cases, an entity we created but did not add to the unit of work gets saved when we call commit. We think that this is possibly because the recursive save traversal finds its way to the entity objects that were removed during validation and not added to the unit of work.

First, is this possible? If the answer is yes, is there a way to restrict how the recursive save finds new entities? or a way to prevent an entity from being caught by a recursive save? We'd like to avoid having to manually add all the entities to the unit of work without saving recursively.

Thanks!

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 27-May-2021 08:29:25   

mprothme wrote:

First, is this possible? If the answer is yes, is there a way to restrict how the recursive save finds new entities? or a way to prevent an entity from being caught by a recursive save? We'd like to avoid having to manually add all the entities to the unit of work without saving recursively.

Yes. It's possible. I would recommend to investigate how that entity ended up in the save graph. It might be wise to check the code to know the reason. The entity itself doesn't have state about whether it belongs to a UOW. However the IsDirty and IsNew flags could be useful if you want to exclude some entity from being saved.

David Elizondo | LLBLGen Support Team
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39773
Joined: 17-Aug-2003
# Posted on: 27-May-2021 10:49:19   

To investigate why that entity ended up in teh queue, check if the entity is referenced by another entity which is in the UoW, so it gets visited that way. If you want to detach an entity from the graph, call ((IEntity2)entity).DetachFromGraph() to make sure it's removed from all referencing entities so it shouldn't end up in an UoW

Frans Bouma | Lead developer LLBLGen Pro
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 27-May-2021 21:37:44   

Thanks for the replies!

There is a parent grouping entity that the validated entities point to (many to one), and In this case, we set the entity object on all the entities before validation, so my guess is that the save went [Validated_Entity] -> [Group] -> [Excluded_Entity]. When we remove the assignment and just set the ID we don't run into the case where the excluded entity is saved. Just to be safe we'll probably call detach from graph just in case.

A follow-up question. Assuming I add all the entities I want to save (both parent and child) to a unit of work individually, does it determine the correct order for saving and assign ids to the right objects? One solution that was proposed internally is to build the entities we want to save as we do today, but individually add each element and subelement to the unit of work with recurse set to false.

ex. today we have something like this

var uow = new UnitOfWork2();
var parentEntityCollection = GetEntitiesToValidate(); 
//validation and filtering happens
uow.AddCollectionForSave(parentEntityCollection, false, true) // add collection indicating it should save recursively
/// commit the transaction

but would something like this have the same effect (assuming the only entities the recursive save would catch are the child entities)

var uow = new UnitOfWork2();
var parentEntityCollection = GetEntitiesToValidate(); 
//validation and filtering happens
var childEntities = parentEntityCollection.SelectMany(parent => parent.children).ToArray()
uow.AddCollectionForSave(childEntities , false, false) // add all entities without recursive save
uow.AddCollectionForSave(parentEntityCollection, false, false) // add all entities without recursive save
/// commit the transaction
daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 28-May-2021 08:36:19   

mprothme wrote:

There is a parent grouping entity that the validated entities point to (many to one), and In this case, we set the entity object on all the entities before validation, so my guess is that the save went [Validated_Entity] -> [Group] -> [Excluded_Entity]. When we remove the assignment and just set the ID we don't run into the case where the excluded entity is saved. Just to be safe we'll probably call detach from graph just in case.

Good you figured it out. And yes, it's better to use just set the ID to not mess with hard references when it's not necessary.

mprothmeA wrote:

ex. today we have something like this

var uow = new UnitOfWork2();
var parentEntityCollection = GetEntitiesToValidate(); 
//validation and filtering happens
uow.AddCollectionForSave(parentEntityCollection, false, true) // add collection indicating it should save recursively
/// commit the transaction

but would something like this have the same effect (assuming the only entities the recursive save would catch are the child entities)

var uow = new UnitOfWork2();
var parentEntityCollection = GetEntitiesToValidate(); 
//validation and filtering happens
var childEntities = parentEntityCollection.SelectMany(parent => parent.children).ToArray()
uow.AddCollectionForSave(childEntities , false, false) // add all entities without recursive save
uow.AddCollectionForSave(parentEntityCollection, false, false) // add all entities without recursive save
/// commit the transaction

Those two blocks won't have the same effect. In the first case LLBLGen can resolve the dependencies because you are saving a graph recursively. In the second block the framework will try to save the child entities first and it will fail because missing parent ID field.

David Elizondo | LLBLGen Support Team
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 28-May-2021 18:43:48   

Just to confirm, even if object references are set (that tie the parent to the child and vice versa), the UnitOfWork2 class doesn't or isn't able to determine the order of inserts?

If not, I think that would be a really nice QOL feature to add, maybe something like one of the following:

  • The Unit of work determines can determine the insert order using similar logic to how a recursive save works, assuming entities reference each other and are all present in the unit of work.
  • When using a Unit of Work, allow limiting the recursive save so that only entities added explicitly to the UOW are traversed.
  • A generic way to filter what entities the traversal visits, potentially through a predicate. This could be passed on commit or adapter.save if the recurse flag was true.

In any of the above cases, if a required entity wasn't added (FK reference for example), throwing an error would be fine.

Our application is fairly normalized, so It's a common case where we'd create the Group, Element, and Child entities at the same time. Because no entities are previously saved this would require that we use references instead of IDs.

Assuming an Element was filtered out and not added to the unit of work, it would still get caught by a recursive save (through the group). We can use one of the previously mentioned methods to remove it from the save, but we'd need to execute that action not only on the entity itself but also on any descendants that might get caught up in the traversal.

From a development perspective, this means that if we add new entity types to our structure we have to be vigilant in making sure that those types get caught by the removal/detach process. Because filtering is conditional, this means that if we missed a type we'd sometimes save a filtered record, but not always. For me at least, it's difficult to debug the traversal to figure out where we made a mistake (If there's a nice way to do this please let me know simple_smile ).

If we had one of the above options, then if we add new entities and forget to add them explicitly to the Unit of Work or predicate or whatever, we just don't ever save that type, which feels a lot easier to track down in testing. Additionally, if you don't add something you need and get an error (eg required because of a foreign key), then that's pretty easy to track down as well.

I know that both of the above cases can be solved by "just be careful when you add new things" but the reality is always that mistakes get made, and then the question is "how easy is this to catch in initial testing, and how easy is it to debug if we don't."

I want to end this by mentioning that I really like your product, we've just been bit by this a few times, and it's always been a challenge to sort out. I also get that this may be impossible to do given the current architecture and how traversal works, but I figured I'd toss it out there just in case.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39773
Joined: 17-Aug-2003
# Posted on: 29-May-2021 10:42:12   

mprothme wrote:

Just to confirm, even if object references are set (that tie the parent to the child and vice versa), the UnitOfWork2 class doesn't or isn't able to determine the order of inserts?

If references are set, it forms a graph and that graph is topologically sorted (which is a sort based on dependency), when the UoW is committed, that is, if you specified a recursive save when adding the entity to the UoW. This is true by default for adapter, so unless you explicitly specify it, all entities added are seen as a member of a graph and all their references are determined and then sorted.

If not, I think that would be a really nice QOL feature to add, maybe something like one of the following:

  • The Unit of work determines can determine the insert order using similar logic to how a recursive save works, assuming entities reference each other and are all present in the unit of work.
  • When using a Unit of Work, allow limiting the recursive save so that only entities added explicitly to the UOW are traversed.
  • A generic way to filter what entities the traversal visits, potentially through a predicate. This could be passed on commit or adapter.save if the recurse flag was true.

In any of the above cases, if a required entity wasn't added (FK reference for example), throwing an error would be fine.

The graph visit is breadth first and this is a simple but crucial algorithm, it's not something we like to tinker with (as doing so might lead to edge cases which fail. We sadly had that in a distant past). So it visits all nodes in the graph, breadth first, then sorts these based on dependency using a topological sort. However you can specify if the entity shouldn't be recursed when adding the entity to the UoW: specify false for recurse (it's in an overload). This doesn't necessarily mean the entity won't be visited as part of another entity's recursive traversal of the graph. It only specifies that the entity added with that call isn't traversed further when all elements in the UoW are handled.

In general this should work fine tho. Either entities are new and to be inserted, or changed and should be updated, or empty and are ignored in the save process. If nothing changed, things aren't going to lead to phantom inserts.

Our application is fairly normalized, so It's a common case where we'd create the Group, Element, and Child entities at the same time. Because no entities are previously saved this would require that we use references instead of IDs.

Assuming an Element was filtered out and not added to the unit of work, it would still get caught by a recursive save (through the group). We can use one of the previously mentioned methods to remove it from the save, but we'd need to execute that action not only on the entity itself but also on any descendants that might get caught up in the traversal.

From a development perspective, this means that if we add new entity types to our structure we have to be vigilant in making sure that those types get caught by the removal/detach process. Because filtering is conditional, this means that if we missed a type we'd sometimes save a filtered record, but not always. For me at least, it's difficult to debug the traversal to figure out where we made a mistake (If there's a nice way to do this please let me know simple_smile ).

I understand, so you add these entities to the graph to make validation etc. easier, but you don't want them to be persisted when the entities of the graph they're added to are saved, as they were just there for e.g. validation? And to make sure these don't end up being saved, you now have to jump through hoops.

But, how would that in practice be done easier in another form? You still have to precisely specify which entities to skip. And are entities related to these entities also to be skipped? If yes, what if these entities are also reachable by other paths in the graph? (as that happens) What if that path is traversed first, should these entities then be skipped or kept? simple_smile

If we had one of the above options, then if we add new entities and forget to add them explicitly to the Unit of Work or predicate or whatever, we just don't ever save that type, which feels a lot easier to track down in testing. Additionally, if you don't add something you need and get an error (eg required because of a foreign key), then that's pretty easy to track down as well.

I know that both of the above cases can be solved by "just be careful when you add new things" but the reality is always that mistakes get made, and then the question is "how easy is this to catch in initial testing, and how easy is it to debug if we don't."

Aren't these entities you don't want to save, really readonly entities? You can mark entities as readonly in the designer. This will make them non-persistable, so even though they're in the graph, they won't be persisted. You can also create an authorizer and deny persistence based on e.g. data on the active thread that's available to the authorizer and deny persistence. You won't get an exception, the entity is just ignored. You can inject the authorizer using a dependencyinjection scope if you want, so you can decide at specific places to inject only that authorizer into that entity, in cases where the entity has to be saved elsewhere (otherwise it's always readonly simple_smile ) This might help making things easier for you.

I want to end this by mentioning that I really like your product, we've just been bit by this a few times, and it's always been a challenge to sort out. I also get that this may be impossible to do given the current architecture and how traversal works, but I figured I'd toss it out there just in case.

Thanks! I hope the tools I've pointed out above might help solve this in a generic way. If not, let us know simple_smile

Frans Bouma | Lead developer LLBLGen Pro
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 01-Jun-2021 22:43:41   

I understand, so you add these entities to the graph to make validation etc. easier, but you don't want them to be persisted when the entities of the graph they're added to are saved, as they were just there for e.g. validation? And to make sure these don't end up being saved, you now have to jump through hoops.

Well, we’re not explicitly adding them to the graph to make validation easier. We construct the entities with their relationships so that when we save recursively, we catch everything we intended to save in the right order, and we don’t need to worry about that when we develop new functionality. We typically run validation on entity objects because everything needs to at least translate to that to be saved, so it gives us a consistent set of types to develop validations against.

Our typical use case is:

  1. Receive data via an endpoint or some other means.
  2. Translate data to entity objects. If the received data is nested, it usually ends up going into parent child tables, this means that necessarily the entity objects include parent entities and child entities.
  3. Validate the entity objects. Typically, when we iterate over each “parent” object, validating it and its child entities. If validation passes the entity is added to the Unit of Work, if it fails its not added.
  4. Commit the Unit of Work.

The issue that sometimes the traversal ends up in our list of entities we don’t want to save via another “intermediary entity”. Normally we’d just assign FK values instead of the actual entity object so that there is no intermediary for it to travel through, but sometimes that’s not possible because the intermediary is also new.

If we run into that case, need to determine what the intermediary entity is (which is hard), and then reorganize code so that we save it first without recursion, and assign its ID value instead using an object reference.

But, how would that in practice be done easier in another form? You still have to precisely specify which entities to skip. And are entities related to these entities also to be skipped? If yes, what if these entities are also reachable by other paths in the graph? (as that happens) What if that path is traversed first, should these entities then be skipped or kept?

We were thinking that the simpler approach would be to allow the UOW (or some other construct) to determine save order using a graph consisting of only entities explicitly added to the UOW. In that case, the developer must ensure that everything that should be saved ends up in the UOW, but they don’t need to worry that something not added to the UOW ends up persisted (or worry about exclusions). If they forget to add something, then that entity never gets saved, but with decent unit testing, that's fairly obvious. This does introduce the case where we don't add an entity to the UOW that would be required for save (because of a non-nullable FK) but that's also not too hard to track down if you know the underlying table structure.

In practice it’s not too hard to write an extension method on Entity2 that uses reflection to iterate through Entity/Collection properties on an object to pull everything you wanted into a flat list that could be added to a unit of work. We’d initially planned on doing this with code like the following:

// using reflection, iterate through all properties and entity collections present on parent entity, where the type is in the provided list
var toSave = entities.SelectMany(parentEntity => parentEntity.GetRelatedEntities(new[]{
    typeOf(ChildEntity), 
    typeOf(GroupEntity)
})).Distinct().ToArray();
// code that adds each item in toSave to a UnitOfWork

but since the UOW doesn’t determine the save order for entities you add to it individually this approach wouldn't work.

Aren't these entities you don't want to save, really read-only entities?

We do want to be able to save all the "types" we're submitting. Our validation removes newly submitted items if that item has errors, but saves the ones that don't (we give the user the option to fix their errors after the initial save). Sometimes the group an item ties to doesn't yet exist and if that happens we create a new group, which ends up acting as the "intermediary" between entities we've removed and entities we've added to the unit of work.

I think the Authorizer sounds like it might do what we want, but getting that hooked into dependency injection and passing it the set of entities we want to use for filtering when we save might be a bit complex for our case (it may introduce more challenges than it solves) but thanks for the suggestion!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39773
Joined: 17-Aug-2003
# Posted on: 02-Jun-2021 09:03:21   

The authorizer is actually very easy to setup, it shouldn't take much code at all, there's an example in the documentation.

Only persisting the entities added to the UoW as a graph isn't really going to work, because they might not form a graph. So it might not be possible to determine the order, simply because the entity tying things together isn't reachable (as it's not added). If you add all entities with 'recurse' set to false, any entity in the UoW is examined but not recursed, however this will miss entities not added to the UoW but which should have been persisted (this might be entities which become dirty when an entity is saved, for instance they now reference a new entity instead of a previous existing entity, so the FK will change).

The root problem therefore is IMO that some entities shouldn't be persisted, but I still don't understand why they should be skipped in your case: you describe a case with an intermediary entity, but why it should be skipped is still unclear to me (as it has to be persisted if it's new).

Frans Bouma | Lead developer LLBLGen Pro
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 02-Jun-2021 17:48:53   

So first of all thanks for chatting with me for this long, I really appreciate it!

The root problem, therefore, is IMO that some entities shouldn't be persisted, but I still don't understand why they should be skipped in your case: you describe a case with an intermediary entity, but why it should be skipped is still unclear to me (as it has to be persisted if it's new).

Sorry for the confusion, it's not that the intermediary should be skipped, it's that we have the following case

  1. We create 3 new entity objects, Parent A, Parent B, and Group 1 (our intermediary). Additionally, each parent has some number of Children that reverence it via an FK. The entities are tied to each other like so:
    • Parent Entity A -> Group 1
    • Parent Entity B -> Group 1 (same object reference that Parent entity A points to)
  2. We validate Parent A and Parent B. Parent A passes validation, Parent B Fails
  3. Parent A is added to the UOW with recursive save set to true, and Parent B is not.
  4. We call UOW.Commit
  5. The recursive save traverses Parent A -> Group 1 -> Parent B. As a result Parent B and its children are saved even though it wasn't added to the UOW

What we'd like to have is a way to say "Don't traverse to Parent B from Group 1", or more generally, "Don't traverse to anything not added to the UOW". We could call detach on Parent B and its children, but our thought was that if you individually added Parent A, the children of Parent A and Group 1 to the UOW with recurse set to false, it would be nice to have the UOW determine that it needs to save Group 1, then Parent A, then the children of Parent A first (Parent A has an FK to Group 1, and the Children of Parent A have an FK to Parent A).

If you add all entities with 'recurse' set to false, any entity in the UoW is examined but not recursed, however, this will miss entities not added to the UoW but which should have been persisted (this might be entities that become dirty when an entity is saved, for instance, they now reference a new entity instead of a previously existing entity, so the FK will change).

I think we get that. Our thought was that today a developer can:

  1. use recursion, and potentially run into the case where a removed entity is unintentionally saved. This is what we use today, and it causes us some issues.
  2. use recursion, and intentionally detach all of the entities that should not be saved. This works, but it's difficult to validate and debug. There are a lot of cases where you think you detach everything and the filtered entity doesn't save, but in reality, you missed an entity and didn't hit the edge case where an entity reference was set instead of its FK value. If you do miss something and you notice the error in production, you end up having to recreate a very specific case to reproduce the bug and figure out what's going on.

Our thought was that it would be nice to allow developers to be intentional, explicitly adding all entities they want to save to the UOW, but because the UOW doesn't determine insert order for new entities, a developer can't use it in this fashion. Specifically that you could use the same graph and traversal algorithm, but add a check (controlled by an optional boolean flag on UOW.Commit) that, on traversal, validates that the traversed entity was directly added to the Unit of Work. If an entity is not present, it is treated as if it was explicitly detached from the graph (so not saved, and not used to traverse to additional entities).

We understand that this could result in save errors, or entities not being saved, but validation in unit tests is then reduced to checking if everything saved and that there were no errors (which seems a lot easier).

When we talked about the authorizer, we thought we might make the set of entities we wanted to save available to it, and remove any inserts that weren't in that list, but it seemed like it might be challenging to do that communication (we'll dig into it further though).

Given that this functionality doesn't exist today, does LLBLGen Pro expose any way for you to construct your own graph, run your recursion algorithm, and get back the ordered results?

Also, other than being skipped in the save, what are the effects of detaching something from the graph, and can you re-attach an entity at a later date?

Thanks again!

Walaa avatar
Walaa
Support Team
Posts: 14987
Joined: 21-Aug-2005
# Posted on: 02-Jun-2021 23:32:03   

IMHO, keeping an invalid entity around might not be the best thing to do here. I'd explicitly remove it from the graph that would be persisted once proven invalid, or rewind the process and allow for the entity to be modified to reach the valid state before resuming the saving process.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39773
Joined: 17-Aug-2003
# Posted on: 03-Jun-2021 09:59:42   

I agree with Walaa, that's the best route to take.

If you really don't want to, you could add a property, e.g. 'bool ValidForSave', to the CommonEntityBase class (in a partial class) and if the validator marks the entity invalid, here ParentB, set that property to true, then use a simple authorizer which checks if that property is set to true, and if so, simply deny the save action. This will cause the system to simply skip ParentB, as the authorizer denies the save, no queries is generated, and it's skipped as if nothing happens. Of course, if you indirectly also mark all entities related to ParentB as 'invalid' if ParentB is invalid, then this won't work, but then you have a bigger problem.

Frans Bouma | Lead developer LLBLGen Pro
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 15-Jun-2021 21:14:46   

Thanks for your advice here, I think we have enough to go on, I really appreciate it.

Just out of curiosity, if you detach an entity from the graph, is there a way/method to re-attach it? Mainly so that if we saved valid entries after detaching all of the invalid records, we could later re-attatch them, allow the user to modify, and then save recursively?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39773
Joined: 17-Aug-2003
# Posted on: 16-Jun-2021 10:05:46   

mprothme wrote:

Thanks for your advice here, I think we have enough to go on, I really appreciate it.

Just out of curiosity, if you detach an entity from the graph, is there a way/method to re-attach it? Mainly so that if we saved valid entries after detaching all of the invalid records, we could later re-attatch them, allow the user to modify, and then save recursively?

Just reference it to one of the entities in the graph and it will be found. E.g. if you have myCustomer and myOrder and you detach myOrder from the graph and later on you want it to take part of it, you can do myCustomer.Orders.Add(myOrder); and it's back in the graph.

Frans Bouma | Lead developer LLBLGen Pro
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 16-Jun-2021 16:11:31   

Thanks!