Prepared statements?

Posts   
 
    
magic
User
Posts: 125
Joined: 24-Nov-2008
# Posted on: 21-May-2009 16:05:49   

I'm wondering how to make prepared statements work with LGP? A search both on the documentation as well as in the forum didn't return a single result with this keywords.

We are importing high amounts of records from xml into the database on a regular basis. I know from implementations with Oracle that prepared statements can have a major impact on the runtime when looking at such scenarios.

MTrinder
User
Posts: 1461
Joined: 08-Oct-2008
# Posted on: 21-May-2009 21:23:32   

Sorry, but prepared statements aren't supported as they require the same # of parameters etc. and we don't cache query objects on an open connection for this. (as the next insert / update could be using different parameters)

If you are doing any serious amount of importing into SQL server then investigate SQL's BCP utility - basically hoses a preprepared data file straight into the table and is exceptionally fast.

Matt

magic
User
Posts: 125
Joined: 24-Nov-2008
# Posted on: 21-May-2009 22:28:10   

Thank you about the tip with BCP, I will investigate this.

Nevertheless, I'm not happy about a solution that is not based on LGP cause I want to do things with the same technology. Either LGP, or not LGP ...

I'm also not happy about the fact that LGP can't do prepared statements though ... what if you save a whole entity collection? Why can't this run through a prepared statement?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39749
Joined: 17-Aug-2003
# Posted on: 22-May-2009 10:56:22   

magic wrote:

Thank you about the tip with BCP, I will investigate this.

Imports done by tools for bulk imports are very fast compared to the normal route. If you don't do data processing before inserts, these tools can significantly increase performance as they bypass the whole SQL pipeline.

the problem discussed by you only occurs with imports so I'd look at these tools for these specific purposes.

Nevertheless, I'm not happy about a solution that is not based on LGP cause I want to do things with the same technology. Either LGP, or not LGP ...

I'm also not happy about the fact that LGP can't do prepared statements though ... what if you save a whole entity collection? Why can't this run through a prepared statement?

It's not that we forgot about this, we didn't implement it because it's not really useful compared to the compromises we have to make to get it implemented. The problem is that inserts and updates aren't done with the same parameters over and over again, ever insert and update can be different and is treated as such. Preparing statements only works if you use the same command over and over again, as a prepared command requires that it isn't altered otherwise it would be required to prepare it again. The problem arises to have multiple prepared commands in a list and then pick one already used or create a new one and prepare that one as well, place it in a list etc.

This requires governing and control by code overlooking the whole operation, as it is very inefficient to prepare statements which aren't re-used (as preparing takes a performance hit). It also requires that the DQE is aware of this list of commands so it won't re-create a command. This in all requires the distributed pipeline we have now to be centralized and packed together again, being able to peek into eachother's datastructures, something which we'll not do.

On several databases it's also not really a true performance enhancement because execution plans are already re-used at the RDBMS level for parametrized queries, mitigating the fact that the statement is prepared.

So to add this we would need a separate pipeline optimized for this, which first sorted the entities based on changed fields so we can create per group a command and prepare it if necessary. This can be tedious however as comparing these entities can take significant time if your entities are big. (as you have to determine the different sets of parameters).

I'm not sure if this justifies the code and if it will be any faster as it is today. Our DQEs are very fast, as well as the rest of the pipeline, as well as more and more modern databases with their execution plan caching and optimization. Creating different sets of parameters for different prepared commands (and prepare them) is also not always faster for every collection, only for big collections. I.o.w.: everything is possible to build in, the only thing is if it's of any use and that it justifies the code added, and more importantly: how to switch it on and off because as I explained above, you only want this with large(r) sets of similar entities.

Frans Bouma | Lead developer LLBLGen Pro
magic
User
Posts: 125
Joined: 24-Nov-2008
# Posted on: 01-Jun-2009 17:09:46   

Otis, thank you for the explanations.

Otis wrote:

Imports done by tools for bulk imports are very fast compared to the normal route. If you don't do data processing before inserts, these tools can significantly increase performance as they bypass the whole SQL pipeline.

In this case we can't use BCP because we have to do data processing before the inserts.

A last question to sum up the runtime optimization of multiple inserts (SaveEntity vs. SaveEntityCollection): Should there be a difference in the runtime of these two code snippets (in theory), or should the runtime be totally the same regardless of the method?

foreach (Entity e in EntityCollection c)
{
    adapter.SaveEntity(e);
}

adapter.SaveEntityCollection(c);

MTrinder
User
Posts: 1461
Joined: 08-Oct-2008
# Posted on: 02-Jun-2009 21:52:17   

I can't imagine there will be any significant difference - the same queries are going to be run against the db, and that will be the bottleneck.

Care to try it out and let us know how you get on ?

Matt

magic
User
Posts: 125
Joined: 24-Nov-2008
# Posted on: 02-Jun-2009 22:04:10   

MTrinder wrote:

I can't imagine there will be any significant difference - the same queries are going to be run against the db, and that will be the bottleneck.

Care to try it out and let us know how you get on ?

I currently have so fluctuating response times from my (development) db that I'm afraid I won't be able to get reasonable stats for a comparison. Or I would have to have to write a small application to gather stats from a big amount of runs.

I was thinking that theoretically the database should cache executions plans etc. if a query is executed various times in a row. So a set of runs should be faster than the sum of the single runs. I was also thinking though that LGP itself could have optimization routines for saving a whole entity collection, but it seems like it doesn't.

Also, when LGP is processing/saving a tree of entities (recursive save) where each layer has multiple nodes, is it saving the information going by breadth-first search, depth-first search or by another approach?

MTrinder
User
Posts: 1461
Joined: 08-Oct-2008
# Posted on: 02-Jun-2009 22:15:58   

SQL caches execution plans regardless of where and when the query comes from, so any benefit gained from a chached plan will apply equally to both scenarios.

For the other recursive tree save, it saves in the order required to avoid issues in the database, so parents have to be saved before children etc...

magic
User
Posts: 125
Joined: 24-Nov-2008
# Posted on: 02-Jun-2009 22:29:50   

MTrinder wrote:

SQL caches execution plans regardless of where and when the query comes from, so any benefit gained from a chached plan will apply equally to both scenarios.

For the other recursive tree save, it saves in the order required to avoid issues in the database, so parents have to be saved before children etc...

sure, I understand that. What I was trying to say though was:

If you have chains of entities (arrows being foreign key relations):

A1 -> B1 -> C1 -> D1
A1 -> B2 -> C2 -> D2
A1 -> B2 -> C3 -> D3

And you execute

adapter.SaveEntity(A1, true, true);

It could go either:

adapter.SaveEntity(A1);
adapter.SaveEntity(B1);
...(C1);
...(D1);
...(B2);
...(C2);
etc.

or

adapter.SaveEntity(A1);
adapter.SaveEntity(B1);
...(B2);
...(C1);
...(C2);
etc.

Since I would think that the cache for executions plans etc. has only a certain size, executing the SaveEntity() method for a big tree (a couple of hundred maybe thousand of times), the approach could make a difference, couldn't it?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39749
Joined: 17-Aug-2003
# Posted on: 03-Jun-2009 10:44:59   

It's about transactions. If you call sequentially SaveEntity(), and don't start a transaction first explicitly on the adapter, every time a new transaction is started, and committed. This could cause overhead on the DB. With the collection, 1 transaction is started for the whole collection.

It's therefore key: - if you sequentially call SaveEntity, start a transaction manually, THEN call SaveEntity() multiple times, and commit the transaction - if you call SaveEntityCollection, you can leave everything to the adapter, - you can also use a unitofwork, which will commit everything in 1 go as well.

If you call SaveEntity to do a recursive save, the graph is traversed every time. With a unit of work this is done for every entity as well, but it uses the same lookup tables to see if it already has traversed an entity previously, which could be a little bit more efficient, but you'll need a LOT of entities to see this difference.

Frans Bouma | Lead developer LLBLGen Pro
magic
User
Posts: 125
Joined: 24-Nov-2008
# Posted on: 03-Jun-2009 16:01:08   

Otis, thank you for the explanation.

We have a web service that "uploads" data to the server. We start a transaction at the beginning and commit at the end of the processing.

So I understand it does not really matter whether I save each entity in the leaves of the data structure, or whether I save the "root node" and let LGP do the recursive saving, right?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39749
Joined: 17-Aug-2003
# Posted on: 03-Jun-2009 16:25:20   

magic wrote:

Otis, thank you for the explanation.

We have a web service that "uploads" data to the server. We start a transaction at the beginning and commit at the end of the processing.

So I understand it does not really matter whether I save each entity in the leaves of the data structure, or whether I save the "root node" and let LGP do the recursive saving, right?

No in that case it doesn't matter much. However, 'starting a transaction' with a webservice, it might be you're now using a MS DTC controlled transaction, correct? This could be slower than for example uploading all entities at once, create a UoW at the service and save them all at once in 1 normal transaction.

Frans Bouma | Lead developer LLBLGen Pro