HnD | SQL statement batches for performance., Page 1

SQL statement batches for performance.

Posts

JamesUK
User

Posts: 7
Joined: 25-Oct-2005

# Posted on: 30-Oct-2005 02:27:26

My experimental LLBLGen project is progressing but when monitoring behavior of the mapper via SQL Server Profiler I have noted that statement batches are not used.

Batching SQL statements is a very simple technique that pays big performance dividends.

In an earlier post on this subject it was claimed that statement batches had not been implemented in LLBLGen because some DBs do not support then. This might be so but a driver-level pass through shim would deal with the issue.

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 30-Oct-2005 10:04:24

Batching can only be applied if you know up front that a set of actions will be executed. If you call a set of SaveEntity() calls on a DataAccessAdapter instance, when should the batch be sent? Don't think that by concatenating sql statements, batching automatically works. There is a limit on the # of parameters and the length of the query. In .NET 2.0, the SqlClient uses batching for DataAdapter calls, though they had several problems to make this a generic routine hence it's not public, to illustrate the problems you can run into.

Frans Bouma | Lead developer LLBLGen Pro

JamesUK
User

Posts: 7
Joined: 25-Oct-2005

# Posted on: 30-Oct-2005 11:39:37

Otis wrote:

Batching can only be applied if you know up front that a set of actions will be executed.

Sure but isn’t that why you created the concept of a UnitOfWork in your product? Furthermore there are cases when LLBLGen knows it must create its own unit of work since it will declare a transaction around a sequence of updates.

Otis wrote:

If you call a set of SaveEntity() calls on a DataAccessAdapter instance, when should the batch be sent?

Even within the scope of a single SaveEntity() a batch of statements can be helpful. A trace of such an example prompted me to post my question. I had two classes A and B in a 1:n relationship. I created a new A and then added 20 new class B's to the contained collection and did a save. LLBLGen knew it had to declare a transaction so a batch of statements would have been an option in this case.

Otis wrote:

Don't think that by concatenating SQL statements, batching automatically works. There is a limit on the # of parameters and the length of the query. In .NET 2.0, the SqlClient uses batching for DataAdapter calls, though they had several problems to make this a generic routine hence it's not public, to illustrate the problems you can run into.

I frequently batch 50 to 100 statements on my own projects to no ill effect. You could always set batching below a conservative threshold.

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 30-Oct-2005 12:19:32

JamesUK wrote:

Otis wrote:

Batching can only be applied if you know up front that a set of actions will be executed.

Sure but isn’t that why you created the concept of a UnitOfWork in your product? Furthermore there are cases when LLBLGen knows it must create its own unit of work since it will declare a transaction around a sequence of updates.

Sure, but the transaction doesn't have to be over when unitofwork commits nor does the transaction have to start when commit is called. I understand there are cases where you theoretically could say: 'hey, you're now doing a sequence of statements, this could be batched', though not all situations which look like these situations are actually batch-candidates. Though I agree with you that batching can be applied in theory in some situations.

But there is another problem. See below.

Otis wrote:

If you call a set of SaveEntity() calls on a DataAccessAdapter instance, when should the batch be sent?

Even within the scope of a single SaveEntity() a batch of statements can be helpful. A trace of such an example prompted me to post my question. I had two classes A and B in a 1:n relationship. I created a new A and then added 20 new class B's to the contained collection and did a save. LLBLGen knew it had to declare a transaction so a batch of statements would have been an option in this case.

Correct, it knows the queues when they're processed.

That doesn't mean it could be batched automatically. Your example is a good one: new A and new set of B's. All B's will have A's PK in their FK once A is saved.

Now, A has an identity field as PK or sequence applied to the PK. How are you going to save A's new PK value in the batch? You can't do that, as you have to fetch the new PK value, sync it with all the FK's related to A's PK, then save the B's. Only at THAT moment B's queries can be produced, because the FK field in the B's is changed.

This is one of the core reasons I didn't build in batching in the first place. The amount of code to get every situation correct is that complex that it might (likely) be slower than the small amount of extra traffic: the connection is kept open, so any new statement sent is executed without much delay. Also, the rdbms parser doesn't have to cut in the sql buffer as well, statements are parsed more faster (though that won't be that significant).

Otis wrote:

Don't think that by concatenating SQL statements, batching automatically works. There is a limit on the # of parameters and the length of the query. In .NET 2.0, the SqlClient uses batching for DataAdapter calls, though they had several problems to make this a generic routine hence it's not public, to illustrate the problems you can run into.

I frequently batch 50 to 100 statements on my own projects to no ill effect. You could always set batching below a conservative threshold.

I would if I could do that reliably, but it's often not a situation you can use. Sure, in some systems it might be usable, but in other systems/schemas it might not. Also, the orm core operates on a fairly high abstraction level, which means that it works on entities, produces query objects and asks them to execute themselves. Because this is generic code, it requires highly specialized pipelines per db if I change it for batching support (if that would be possible) as for example ODP.NET doesn't support batching as well as access, SqlCE and firebird. I don't know if that's worth the effort, speed wise. I also don't know (never read anything about it) what a batched query does with the query execution plans inside a db: are these cached per separate query or for the whole batch?

Frans Bouma | Lead developer LLBLGen Pro

JamesUK
User

Posts: 7
Joined: 25-Oct-2005

# Posted on: 30-Oct-2005 13:01:30

Thank you for taking time out on a Sunday to provide such a detailed reply! Please don't spend too much time dealing with an LLBLGen beginner like me, put all those brain cycles into producing an amazing LLBLGen V2 product.

Anyhow at least I understand your thinking on this subject and now appreciate I will have to engineer a different mechanism for those occasions in my application where batching would help performance.

Otis wrote:

That doesn't mean it could be batched automatically. Your example is a good one: new A and new set of B's. All B's will have A's PK in their FK once A is saved.

Now, A has an identity field as PK or sequence applied to the PK. How are you going to save A's new PK value in the batch? You can't do that, as you have to fetch the new PK value, sync it with all the FK's related to A's PK, then save the B's. Only at THAT moment B's queries can be produced, because the FK field in the B's is changed.

In these special situations I would be happy to pass a flag to LLBLGen to indicate it can leave the PK's and FK's out of sync on the middle tier post completion of a batched save.

Otis wrote:

This is one of the core reasons I didn't build in batching in the first place. The amount of code to get every situation correct is that complex that it might (likely) be slower than the small amount of extra traffic: the connection is kept open, so any new statement sent is executed without much delay. Also, the rdbms parser doesn't have to cut in the sql buffer as well, statements are parsed more faster (though that won't be that significant).

I strongly suspect the extra cpu cycles required to plan a batched save would be minor compared to the saving obtained by reducing calls across a LAN to a DB server and probably to a local DB server as well. I have done a lot of this type of tuning and I am not talking about shaving milliseconds off, sometimes the end result gets a "wow" from the end user as a noticeable pause is reduced to a sub second event.

Otis wrote:

I also don't know (never read anything about it) what a batched query does with the query execution plans inside a db: are these cached per separate query or for the whole batch?

I can only speak about SQL Server but I am 95% confident the arrival of batched or single statements has no effect on management of query plans, this will be done per prepared statement.

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 31-Oct-2005 09:53:10

JamesUK wrote:

Thank you for taking time out on a Sunday to provide such a detailed reply! Please don't spend too much time dealing with an LLBLGen beginner like me, put all those brain cycles into producing an amazing LLBLGen V2 product.

All (potential) customers are important to us

Otis wrote:

This is one of the core reasons I didn't build in batching in the first place. The amount of code to get every situation correct is that complex that it might (likely) be slower than the small amount of extra traffic: the connection is kept open, so any new statement sent is executed without much delay. Also, the rdbms parser doesn't have to cut in the sql buffer as well, statements are parsed more faster (though that won't be that significant).

I strongly suspect the extra cpu cycles required to plan a batched save would be minor compared to the saving obtained by reducing calls across a LAN to a DB server and probably to a local DB server as well. I have done a lot of this type of tuning and I am not talking about shaving milliseconds off, sometimes the end result gets a "wow" from the end user as a noticeable pause is reduced to a sub second event.

Ok, I'll do some testing then to see what difference it makes .

Otis wrote:

I also don't know (never read anything about it) what a batched query does with the query execution plans inside a db: are these cached per separate query or for the whole batch?

I can only speak about SQL Server but I am 95% confident the arrival of batched or single statements has no effect on management of query plans, this will be done per prepared statement.

Thanks for the info

Frans Bouma | Lead developer LLBLGen Pro

Marcus
User

Posts: 747
Joined: 23-Apr-2004

# Posted on: 02-Nov-2005 15:04:43

I've got agree that batching has a HUGE potential for performance improvement... This is something I was going to tackle myself if it wasn't going to make the product...

Otis
LLBLGen Pro Team

Posts: 39933
Joined: 17-Aug-2003

# Posted on: 02-Nov-2005 15:41:49

Marcus wrote:

I've got agree that batching has a HUGE potential for performance improvement... This is something I was going to tackle myself if it wasn't going to make the product...

Keep in mind that the concerns I have put forward are likely to make the batching support be pretty limited, if at all available.

Frans Bouma | Lead developer LLBLGen Pro