HnD | Insert / Update query batching question, Page 1

Insert / Update query batching question

Posts

usschad
User

Posts: 77
Joined: 11-Sep-2008

# Posted on: 11-Aug-2022 18:48:55

Reading the documentation, I noticed the section about Inserts which aren't batchable: https://www.llblgen.com/Documentation/5.9/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/Adapter/gencode_querybatching.htm#inserts-which-arent-batchable

I have two questions:

1) The entities I would like to batch insert do have a self referencing column, but it is always null on insert. I'm just curious if it will still batch the inserts if the self referencing FK is null.

2) Also, assuming the self referencing column wasn't there, if I was saving entities in a batch, but they had child records, does it still batch the inserts, or does it have to insert each sequentially so that it can retrieve the PK to set the FK on the child objects?

            using var adapter = new DataAccessAdapter();
            var coll = new EntityCollection<InstrumentEntity>(entities);
            adapter.BatchSize = 50;
            var unitOfWork = new UnitOfWork2();
            unitOfWork.AddCollectionForSave(coll, false, true);
            await unitOfWork.CommitAsync(adapter, true);

The number of records I'm saving at the moment in my tests are 10,000 root entities, with maybe 60-70,0000 child entities. When I test in Azure, it runs infinitely, the job ran for 11 hours (but I'm hoping that isn't an LLBLGen issue), I ran it locally and the database save in the code above took about 7 minutes. It obviously isn't batching, but I wanted to get more clarity on 'why', hence the 2 questions. If it is just the self referencing FK, then I can probably pull that relationship into a separate table as I don't think that is a very common thing we'll be doing, but if it also doesn't batch because of the child elements requiring the Identity to be pulled back, then I might need a different approach. 10,000 records isn't the upper level of what we need to be able to support.

Walaa
Support Team

Posts: 14995
Joined: 21-Aug-2005

# Posted on: 12-Aug-2022 02:30:49

7 minutes is a huge amount of time indeed. I remember there was a fix earlier in this process, which runtime library version number are you using?

usschad
User

Posts: 77
Joined: 11-Sep-2008

# Posted on: 12-Aug-2022 04:17:38

Sorry, I meant to include all that:

5.9.0

But still interested in answers to those first two questions so that I can understand with certainty when batching will or won't occur.

Thanks.

Otis
LLBLGen Pro Team

Posts: 39903
Joined: 17-Aug-2003

# Posted on: 12-Aug-2022 10:12:02

You added a link to a PR on a repo in your first post, which likely was meant to be a link to our documentation

In any case, entities with relationships to themselves aren't batchable. The main reason is that it can't know if it has to update rows during the query so it skips those, and these are executed individually.

If you save a graph, e.g. Customer, Order, OrderLines, it'll batch queries per entity type, as it uses a topological sort per entity type. Older versions would end up with a queue like Customer1, Order1, OrderLines1, OrderLines2, Customer2, Order2, OrderLines3 ... etc. (where every customer has 1 order). the batching algorithm now packs the customers together, the orders and the orderlines, in such a way that it first sorts types based on dependency and then the instances of these types. (as any customer comes before any order).

Keep batch sizes small, like 80-120 is optimal in most cases, but it depends on the # of parameters generated per entity (so if you have very big entities, lower the batch size). If you use high batch sizes, you get a lot of parameters, and it's likely slower. But as you're working on Azure, the db delays are astronomical compared to anything else, so any extra roundtrip is likely already slower than the DB having to parse 2000 parameters.

Inserting a lot of rows shouldn't take 7minutes tho. Inserting a lot of rows into a DB inside 1 transaction isn't that taxing on LLBLGen Pro (benchmarks show that, we're a couple of ms behind EFCore in this run, but that's basically fluctuating DB performance, looking at the individual run numbers ), but it can be really slow if the DB's files aren't pre-sized. I.e. inserting a lot of rows will make the DB increase the file size of your catalog probably which can be slow and it'll store a row in the .log file of your catalog, and as this file isn't big, it'll take a while too.

So you might want to check where the bottleneck is with a profiler (e.g. use 1000 entities first and profile that)

Frans Bouma | Lead developer LLBLGen Pro

usschad
User

Posts: 77
Joined: 11-Sep-2008

# Posted on: 12-Aug-2022 13:01:39

Thanks. I updated that link, not that you didn't know what I was referring to, but for posterity.

FWIW, that 7minute db insert wasn't reaching out to Azure, I was using my localDb.

I'll run a profiler before I change anything just to see what is happening now. Then I can remove the self referencing FK since we aren't currently using that and try again to see how much that changes things.

Otis
LLBLGen Pro Team

Posts: 39903
Joined: 17-Aug-2003

# Posted on: 12-Aug-2022 14:12:47

usschad wrote:

Thanks. I updated that link, not that you didn't know what I was referring to, but for posterity.

yeah I figured as much

FWIW, that 7minute db insert wasn't reaching out to Azure, I was using my localDb.

That's what I understood indeed: going to local db, so there's something weird going on. 7 minutes is truly slow.

I'll run a profiler before I change anything just to see what is happening now. Then I can remove the self referencing FK since we aren't currently using that and try again to see how much that changes things.

Please let us know your findings.

Frans Bouma | Lead developer LLBLGen Pro

usschad
User

Posts: 77
Joined: 11-Sep-2008

# Posted on: 29-Aug-2022 16:43:40

Hi, just checking in. I never could get the profiler to work, locally or in Azure. I think the ability to use App Insights profiler on Function Apps is relatively new; it gives me some sort of progress indicator (that spins forever) saying that profiling is starting, but nothing ever really happens.

But I did remove that self referencing FK and it got it down to about 2 minutes (from 7).

Of course, in Azure, it wasn't really running 11 hours, it is a Function App, and I believe it was just timing out after 30 minutes. With the remove of the FK, Azure was able to process in just under 30 minutes. Since it was still slow, we looked at the App Insights performance measures and saw some serious spikes in memory/CPU. We scaled up both the DB and App Service Plan together and it inserted the 60-70K records in ~4 minutes.

In your opinion, is this time expected/acceptable for that number of records? Or does something still sound wrong?

At the moment, it is less of an emergency, since it is completing in a reasonable amount of time. But if there's a chance of making it more efficient, I'd love to figure it out as I imagine it will only get slower as our volume scales.

Walaa
Support Team

Posts: 14995
Joined: 21-Aug-2005

# Posted on: 29-Aug-2022 22:12:31

By Profiler, we mean the ORM Profiler, it's available in the downloads section under My Account, when you sign in to the LLBLGen.com with your customer ID.

Otis
LLBLGen Pro Team

Posts: 39903
Joined: 17-Aug-2003

# Posted on: 30-Aug-2022 09:41:12

usschad wrote:

Hi, just checking in. I never could get the profiler to work, locally or in Azure. I think the ability to use App Insights profiler on Function Apps is relatively new; it gives me some sort of progress indicator (that spins forever) saying that profiling is starting, but nothing ever really happens.

But I did remove that self referencing FK and it got it down to about 2 minutes (from 7).

Of course, in Azure, it wasn't really running 11 hours, it is a Function App, and I believe it was just timing out after 30 minutes. With the remove of the FK, Azure was able to process in just under 30 minutes. Since it was still slow, we looked at the App Insights performance measures and saw some serious spikes in memory/CPU. We scaled up both the DB and App Service Plan together and it inserted the 60-70K records in ~4 minutes.

In your opinion, is this time expected/acceptable for that number of records? Or does something still sound wrong?

It still sounds like a lot. inserting a lot of data in azure will make you run into the IOPS limit of your vm tho, so that might also be a limiting factor. Also, try to pre-size the database file if that's possible (so define a size that's bigger than you have now). this way the DB server won't resize the file when it's full and there is still data to insert (this is the biggest bottleneck in databases when inserting data most of the time) This also goes for the log file btw! Inserting 70K rows in 1 transaction will make sql server create 1 entry in the log file per record till the transaction is complete. So make sure that file is also not too small.

At the moment, it is less of an emergency, since it is completing in a reasonable amount of time. But if there's a chance of making it more efficient, I'd love to figure it out as I imagine it will only get slower as our volume scales.

What Walaa said, please use our ORM Profiler, and use it e.g. locally so you can see what the app is really doing. You can then tweak locally to make it faster and deploy the results of that on azure.

Frans Bouma | Lead developer LLBLGen Pro