TargetPerEntityHierarchy and Nullable fields

Posts   
 
    
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 29-Sep-2017 12:31:32   

I've been using TargetPerEntity inheritance very successfully for many years. (v4.2) Now, I've started a new project, I thought I would look at TargetPerEntityHierarchy.

1) The warning in the documentation about 'Mixing of inheritance hierarchy types': Can I assume just this refers to mixing of hierarchy types within the same inheritance tree but it is OK to use both TargetPerEntityHierarchy and TargetPerEntity inheritances trees within the same project?

2) The docs mention

Every field of the root entity is defined as mandatory (not nullable), while every field of a subtype is defined as nullable.

I don't understand why it says the root entity fields are not nullable. Indeed, before I read this, I created a TargetPerEntityHierarchy with nullable root entity fields and they got created in the database as expected. Documentation error?

Now the mistake I made was creating a subtype that had a not nullable field. There were no errors or warnings generated but of course there was a crash when trying to save a record. (Maybe there is now a warning in later version of LLBLGen about a non-null fields in TargetPerEntityHierarchy subtypes?)

I understand of course why the database table fields has to be nullable but I think I assumed LLBLGen would put a non-null property on the Entity and deal with it being nullable in the database (ie if not specifically set via the property, then save a Null).

Should I assume then that I either have to make all TargetPerEntityHierarchy subtypes fields nullable (and use .Value in code for those fields) or change that inheritance tree to TargetPerEntity to lose that restriction?

3) Are there any differences, inheritance-wise, between 4.2 and 5.2? I'm thinking of upgrading but as you know, I have added some tweaks to DataAccessAdapter et al over the years and I can't risk breaking those. At least for my main project, maybe my new one is the opportunity to see what the latest version can do.

simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 29-Sep-2017 12:38:58   

Sorry - posted to wrong group.

Moved simple_smile -- Otis

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 29-Sep-2017 15:26:13   

simmotech wrote:

I've been using TargetPerEntity inheritance very successfully for many years. (v4.2) Now, I've started a new project, I thought I would look at TargetPerEntityHierarchy.

1) The warning in the documentation about 'Mixing of inheritance hierarchy types': Can I assume just this refers to mixing of hierarchy types within the same inheritance tree but it is OK to use both TargetPerEntityHierarchy and TargetPerEntity inheritances trees within the same project?

Correct. Mixing them in the same inheritance hierarchy won't work as we don't support that.

2) The docs mention

Every field of the root entity is defined as mandatory (not nullable), while every field of a subtype is defined as nullable.

I don't understand why it says the root entity fields are not nullable. Indeed, before I read this, I created a TargetPerEntityHierarchy with nullable root entity fields and they got created in the database as expected. Documentation error?

Looks like it. You have an exact location for me (url is fine) so we can have a look?

Now the mistake I made was creating a subtype that had a not nullable field. There were no errors or warnings generated but of course there was a crash when trying to save a record. (Maybe there is now a warning in later version of LLBLGen about a non-null fields in TargetPerEntityHierarchy subtypes?)

I understand of course why the database table fields has to be nullable but I think I assumed LLBLGen would put a non-null property on the Entity and deal with it being nullable in the database (ie if not specifically set via the property, then save a Null).

Should I assume then that I either have to make all TargetPerEntityHierarchy subtypes fields nullable (and use .Value in code for those fields) or change that inheritance tree to TargetPerEntity to lose that restriction?

Nullable fields are required for subtypes as a sibling type B might map different fields than a subtype A and saving A doesn't write the fields of B, I think that's the cause of the crash you ran into (if one of the fields in B is non-nullable). It doesn't look for 'all fields' in the target table, as it doesn't know these: you might map just a subset of fields in an entity and the mappings don't contain fields which aren't mapped.

3) Are there any differences, inheritance-wise, between 4.2 and 5.2? I'm thinking of upgrading but as you know, I have added some tweaks to DataAccessAdapter et al over the years and I can't risk breaking those. At least for my main project, maybe my new one is the opportunity to see what the latest version can do.

Feature-wise, no. Other than better detection of inheritance trees to add additional filters etc. The adjustments made to data access adapter should be migratable pretty easily I think: methods which were virtual are still virtual. v4.2 already uses QueryParameters, so I think it will be rather minor.

You can always try it out of course and see how far you'll get simple_smile

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 29-Sep-2017 16:41:42   

From the docs (I forgot about the online version)... https://www.llblgen.com/Documentation/4.2/Designer/hh_start.htm look under the subheading 'Physical representation in the relational data model and supported discriminator field types' and it starts on line 2. (v5.2 is the same BTW)

You can always try it out of course and see how far you'll get simple_smile

OK you smooth talker you - I'll have a play over the weekend. simple_smile

I am a bit shocked however looking at the notes about migrating from v4.2 to v5.0 about removing the option for String uniquing! frowning

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 29-Sep-2017 17:07:03   

simmotech wrote:

From the docs (I forgot about the online version)... https://www.llblgen.com/Documentation/4.2/Designer/hh_start.htm look under the subheading 'Physical representation in the relational data model and supported discriminator field types' and it starts on line 2. (v5.2 is the same BTW)

Thanks, we'll look into it!

You can always try it out of course and see how far you'll get simple_smile

OK you smooth talker you - I'll have a play over the weekend. simple_smile

sunglasses

I am a bit shocked however looking at the notes about migrating from v4.2 to v5.0 about removing the option for String uniquing! frowning

Yes, it turned out it was actually slowing things down considerably and as the string was already allocated (by the datareader) it didn't have any net effect on memory (measuring it didn't reveal more strings in memory when it was removed, memory allocated stayed the same, which is logical, the datareader already allocated the string before it's read), so there was no real reason to have it in the framework as it only had downsides, so it was cut.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 29-Sep-2017 18:52:33   

Probably too late now, but I think it should have been left in but as an opt-in feature (ie not set this._stringCacheForFetcher = new UniqueList<string>(256); in the DaoBase ctor)

I remain convinced that for bigger/long lived fetches it does improve things, potentially by a lot.

Sure, the data reader allocates strings regardless (pity UniqueList can't be used at that level) but that was always the case. The difference is what happens when a GC occurs.

With UniqueList, and assuming the entities are kept, all but one of the strings will be collected whereas without it, potentially thousands of the same strings remain. So, at some point, overall memory has to be less with UniqueList. If you are not seeing that then I guess it is how it was measured. If it was done in a fetch loop just after ProjectionUtils.UniqueStringInRow being called then you wouldn't see any change whereas if you measured after a larger fetch was complete, I cannot think why you wouldn't see a difference.

When that GC occurs and what level is also relevant. If a GC0 occurs whilst the datareader is still looping fetching data then the duplicates will go immediately with Uniquelist because the datareader only has the latest set of strings and all the entities are sharing the single instances in UniqueList. This also has the advantage of freeing memory more quickly because it is just a GC0 collect and not the more thorough GC1 collect.

Now without UniqueList, a GC0 occuring won't save any duplicate string memory and will increase the likelihood of a GC1 collection being needed. That is much slower than a GC0 so if they occur during a datareader loop then that must slow down the fetch too.

20% performance hit is a worry - that seems like a lot to me but if it is generally true then making the feature opt-in would resolve that - just turn it on where it helps.

Your claims about GCs taking up to 15 minutes doesn't tally with my experience at all. In theory, that could be the case but even so, just means the memory savings are realized a bit later. It is also import to remember that the memory is shared between all threads. I run most of my queries (in a WinForms app) on a background thread to leave the app responsive but just clicking on things can force GCs (my machine has 32GB but .NET seems frugal with its allocations). So I find it more likely that my objects live through GCs.

Anyway, that 20% hit is annoying me - going to see if I can do better than UniqueList<string> just for my own edification.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 01-Oct-2017 11:30:52   

simmotech wrote:

Probably too late now, but I think it should have been left in but as an opt-in feature (ie not set this._stringCacheForFetcher = new UniqueList<string>(256); in the DaoBase ctor)

I remain convinced that for bigger/long lived fetches it does improve things, potentially by a lot.

Sure, the data reader allocates strings regardless (pity UniqueList can't be used at that level) but that was always the case. The difference is what happens when a GC occurs.

With UniqueList, and assuming the entities are kept, all but one of the strings will be collected whereas without it, potentially thousands of the same strings remain. So, at some point, overall memory has to be less with UniqueList. If you are not seeing that then I guess it is how it was measured. If it was done in a fetch loop just after ProjectionUtils.UniqueStringInRow being called then you wouldn't see any change whereas if you measured after a larger fetch was complete, I cannot think why you wouldn't see a difference.

The only use case where it would make a difference is if you cache a large set of entities and keep them around for a long period of time so it outlives a GC. But even in that situation, what is the net gain? Unless you cache a tremendous amount of entities (and why would you do that?) it's not going to gain you much memory. After all: the memory that matters is the memory that is consumed with every fetch, which is something you aren't avoiding with the feature.

The price you have to pay for it however is huge: it was a significant amount in the time used to materialize objects. As the only use case is the caching of large sets of entities (and we're not talking 100 or so, as that would gain you perhaps a few KB, that will never matter, but 10s of thousands if not more to make it show up. Remember only if you cache them long enough!), and that can be better served if you use resultset caching, the feature was something that's not worth having, on the contrary. We didn't keep it as no-one would enable it, or better: no-one would search for this in the docs to enable it: it would almost never benefit you. I doubt it will benefit you too.

When that GC occurs and what level is also relevant. If a GC0 occurs whilst the datareader is still looping fetching data then the duplicates will go immediately with Uniquelist because the datareader only has the latest set of strings and all the entities are sharing the single instances in UniqueList. This also has the advantage of freeing memory more quickly because it is just a GC0 collect and not the more thorough GC1 collect.

Now without UniqueList, a GC0 occuring won't save any duplicate string memory and will increase the likelihood of a GC1 collection being needed. That is much slower than a GC0 so if they occur during a datareader loop then that must slow down the fetch too.

20% performance hit is a worry - that seems like a lot to me but if it is generally true then making the feature opt-in would resolve that - just turn it on where it helps.

I measured it a lot, there's no way around it: it hurt our performance badly. Opt-in on a feature that almost everyone will pass on is a burden as we have to maintain it, so we cut it.

Your claims about GCs taking up to 15 minutes doesn't tally with my experience at all. In theory, that could be the case but even so, just means the memory savings are realized a bit later.

Please think about the actual use case when things will gain you anything here and how much that in practice will gain you in that situation. You assume I've cut it in a whim but that's not the case. I spent close to a week on this alone.

It is also import to remember that the memory is shared between all threads. I run most of my queries (in a WinForms app) on a background thread to leave the app responsive but just clicking on things can force GCs (my machine has 32GB but .NET seems frugal with its allocations). So I find it more likely that my objects live through GCs.

No, you make a mistake there: the feature did uniquing per fetch, so per adapter call. If you had multiple fetches running on multiple threads, you didn't share the same uniquelist across them (as that would have required locks!), but ever adapter instance would have its own uniquelist instance. Next fetch, new adapter, new unique list.

that's also why resultset caching is the better choice if you want caching of these values as you then can really share data across entity objects and save memory and have better control over it too.

Anyway, that 20% hit is annoying me - going to see if I can do better than UniqueList<string> just for my own edification.

Save your energy, it's not going to work simple_smile Been there done that. When I saw the string uniquing was a major bottleneck I first looked at how to speed things up in that area but it's not doable, uniquelist is rather fast. Hashset doesn't do much better.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 02-Oct-2017 09:49:44   

I now agree with you frowning

I couldn't download v5.2 (licence expired) so I had a play with this yesterday.

Whilst I could speed up UniqueList by ~5-10%, it wasn't anywhere near enough to justify its use - only getting access to the datareader and uniquing the strings as they are read by the TDS parser thingy would possibly make any difference and that is very locked down - not even I'm mad enough to try tweaking that!

The thing I was pointing out about fetching on background threads wasn't about sharing a UniqueList but that the GCs would be much more likely than the 15 minutes previously mentioned and hence the opportunity to realize any savings made.

It also dawned on me that I wasn't actually aware DataAccessAdapter used a UniqueList by default. flushed flushed flushed

Looking back at my app, there is one place in particular where are large TypedView stays around always and is updated periodically but with few expected changes so I kept a UniqueStringList as a static and applied it just to the columns where I new a lot of duplicates would be. I'll leave that in but I'll search for a place in DataAccessAdapter to set StringCacheForFetcher to null before it gets used.

A 20% speed improvement for free - so thanks for your work on that! smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 02-Oct-2017 10:40:59   

Glad we're on the same page simple_smile The optimization inside SqlDataReader could be done using string interning (which has a downside according to MS) but it does require altering .NET code. You could make a proposal for .NET core tho wink An optional optimization which interns a string before it's created from a series of bytes inside the datareader. How SqlDataReader does this is rather clever, they have a byte array with the bytes coming from the server and they convert parts of that to new type instances with a generic struct/class, if I recall correctly. Memory can be saved if it can check before it creates a string, that a series of bytes will lead to a string which is already interned. If that's the case it should return the interned string. This would save memory from the get go, but at the expense that it has to check for an interned string which can hurt performance, but it's a balance: you gain memory while paying some CPU.

Corrected the doc error in v5.1-v5.3's docs

Frans Bouma | Lead developer LLBLGen Pro