Upsert (Insert Updates)?

Posts   
 
    
Bashar
User
Posts: 108
Joined: 11-Nov-2004
# Posted on: 21-Jun-2005 12:00:14   

Hi there!

I'm at the Microsoft SQL Yukon training in Paris France with Omar, and one of the interesting subjects that came up during the training was 'upserts'; an update statement that will insert new records if it finds nothing to update. And I was wondering, since we're using some form of dynamic SQL statements in LLBL couldn't you just have this as one of the save parameters for the adapter.

I'd simply ask the adapter to update the record and insert it if it cannot find it.

Just a thought there for you.

Thanks, Bashar Lulu

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39801
Joined: 17-Aug-2003
# Posted on: 21-Jun-2005 12:23:01   

upserts shiver wink

It seems odd to me. The thing is: the SaveEntity action already is an upsert indate or whatever you want to call it: it checks if it has to insert an entity or update an entity.

Though it does that based on what it knows of the entity in memory, not of what's in the database. I don't see why you would want an insert if an update was suggested. That suggests, the entity was removed after it was read and it now has to be re-stored... though, why was it deleted in the first place? simple_smile IMHO it's then better to investigate why the delete took place and solve it higher-up in the application than to decide on a very low level 'oh well, update didn't work because the entity is gone, lets just insert it...', as it's a concurrency issue which should IMHO be solved on a high level, or am I wrong in this?

Frans Bouma | Lead developer LLBLGen Pro
Bashar
User
Posts: 108
Joined: 11-Nov-2004
# Posted on: 21-Jun-2005 13:38:03   

I think you're right, in general practices. simple_smile But, I'll give you a situation where upserts would be implemented. I have an application where information from the stock exchange is fed (from a comma separated text file) into the application, a number of calculations are applied to the information inserted and then different graphs are built.

Now, the information entered various in size depending on the graph they want, but it can for several hundred thousand records.

The problem is, most of the time the information is already available in the database, but on other times it isn't and again sometimes the information outdated. So the best solution is to reload all the records again (from the text file) and then create the graph from the new information.

This can also be done using the return values from the SaveEntity method for an update and if that returns 0 records I should go ahead and save the record. But this would require at least 2 trips to the database per record which is not necessarily very efficient.

On the other hand with an upsert this would very easily be implemented utilizing a single database trip!

Comm ca?

Thanks, Bashar Lulu

alexdresko
User
Posts: 336
Joined: 08-Jun-2004
# Posted on: 21-Jun-2005 15:01:55   

Isn't this the same thing as an upsert?

DAL.EntityClasses.Customer customer = new DAL.EntityClasses.Customer(14); // <-- customer ID

if (customer.Fields.State != EntityState.Fetched) { customer = new DAL.EntityClasses.Customer(); }

customer.Name = "Alex"; adapter.SaveEntity(customer);

arschr
User
Posts: 894
Joined: 14-Dec-2003
# Posted on: 21-Jun-2005 15:16:10   

I think upserts are handy in at least 2 ways 1) the logic is happening on the server 2) when the master source od data is not the database and you are syncing the database such a merge replication. Of course if the master copy is elsewhere you may really need a delupsert.

Bashar
User
Posts: 108
Joined: 11-Nov-2004
# Posted on: 21-Jun-2005 15:36:47   

alexdresko wrote:

Isn't this the same thing as an upsert?

DAL.EntityClasses.Customer customer = new DAL.EntityClasses.Customer(14); // <-- customer ID

if (customer.Fields.State != EntityState.Fetched) { customer = new DAL.EntityClasses.Customer(); }

customer.Name = "Alex"; adapter.SaveEntity(customer);

I don't really understand C# that well but I do believe you're making 2 trips to the database. This would take a drastic amount of time when trying to update a large number of records!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39801
Joined: 17-Aug-2003
# Posted on: 22-Jun-2005 10:51:50   

Bashar wrote:

I think you're right, in general practices. simple_smile But, I'll give you a situation where upserts would be implemented. I have an application where information from the stock exchange is fed (from a comma separated text file) into the application, a number of calculations are applied to the information inserted and then different graphs are built.

Now, the information entered various in size depending on the graph they want, but it can for several hundred thousand records.

The problem is, most of the time the information is already available in the database, but on other times it isn't and again sometimes the information outdated. So the best solution is to reload all the records again (from the text file) and then create the graph from the new information.

This can also be done using the return values from the SaveEntity method for an update and if that returns 0 records I should go ahead and save the record. But this would require at least 2 trips to the database per record which is not necessarily very efficient.

On the other hand with an upsert this would very easily be implemented utilizing a single database trip!

Thanks for the example, that's indeed a situation in which you might want to do this. The question then is: is SqlServer offering this through a statement or do you have to catch the error in T-SQL and convert it into an update?

Frans Bouma | Lead developer LLBLGen Pro
Bashar
User
Posts: 108
Joined: 11-Nov-2004
# Posted on: 22-Jun-2005 11:31:23   

I was thinking more in the sense of the 'If Exists' function in t-sql. First check for the existance of the record and then act upon the returned value. If it does exist then update it, otherwise insert it. This is ok since it's all done on the server.

Does this sound ok, or are there limitations here? disappointed

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39801
Joined: 17-Aug-2003
# Posted on: 23-Jun-2005 11:24:00   

Bashar wrote:

I was thinking more in the sense of the 'If Exists' function in t-sql. First check for the existance of the record and then act upon the returned value. If it does exist then update it, otherwise insert it. This is ok since it's all done on the server. Does this sound ok, or are there limitations here? disappointed

If (select ... exists) does a table scan as well (or pk index scan), so it's not particular faster. I think it would be more appropriate to use the TRY CATCH statement now available in T-SQL: perform the INSERT, if it fails, (and you end up in the CATCH for the pk violation error), perform an update. Though I'm not sure if the hit of the error is slower than the exists query. One thing it might have as an advantage (but I'm guessing here) is that the execution plan isn't recompiled every time, and with the IF you run that risk.

Frans Bouma | Lead developer LLBLGen Pro
PhilD
User
Posts: 19
Joined: 23-Jun-2005
# Posted on: 23-Jun-2005 15:03:54   

Otis wrote:

Bashar wrote:

I was thinking more in the sense of the 'If Exists' function in t-sql. First check for the existance of the record and then act upon the returned value. If it does exist then update it, otherwise insert it. This is ok since it's all done on the server. Does this sound ok, or are there limitations here? disappointed

If (select ... exists) does a table scan as well (or pk index scan), so it's not particular faster. I think it would be more appropriate to use the TRY CATCH statement now available in T-SQL: perform the INSERT, if it fails, (and you end up in the CATCH for the pk violation error), perform an update. Though I'm not sure if the hit of the error is slower than the exists query. One thing it might have as an advantage (but I'm guessing here) is that the execution plan isn't recompiled every time, and with the IF you run that risk.

One of the standard CRUD procs is to write a "Save" using this pattern:

IF EXISTS (SELECT * FROM MyTable WHERE PKColumns = ...) BEGIN UPDATE... END ELSE BEGIN INSERT... END

Obviously this will be a ** seek ** on the index used to implement the PK, not a scan.

PhilD
User
Posts: 19
Joined: 23-Jun-2005
# Posted on: 23-Jun-2005 15:10:39   

Bashar wrote:

Hi there!

I'm at the Microsoft SQL Yukon training in Paris France with Omar, and one of the interesting subjects that came up during the training was 'upserts'; an update statement that will insert new records if it finds nothing to update. And I was wondering, since we're using some form of dynamic SQL statements in LLBL couldn't you just have this as one of the save parameters for the adapter.

I'd simply ask the adapter to update the record and insert it if it cannot find it.

Just a thought there for you.

Thanks, Bashar Lulu

And the MS implementation of this feature utterly sucks, because you basically have to write both the INSERT and the UPDATE in one syntactical statement! It would have been much better if SQL server was to figure out one from the other (I think MySQL has something that can do this using a SAVE statement...or maybe it was POSTGRES...I can't quite remember)

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39801
Joined: 17-Aug-2003
# Posted on: 23-Jun-2005 16:00:23   

PhilD wrote:

Otis wrote:

Bashar wrote:

I was thinking more in the sense of the 'If Exists' function in t-sql. First check for the existance of the record and then act upon the returned value. If it does exist then update it, otherwise insert it. This is ok since it's all done on the server. Does this sound ok, or are there limitations here? disappointed

If (select ... exists) does a table scan as well (or pk index scan), so it's not particular faster. I think it would be more appropriate to use the TRY CATCH statement now available in T-SQL: perform the INSERT, if it fails, (and you end up in the CATCH for the pk violation error), perform an update. Though I'm not sure if the hit of the error is slower than the exists query. One thing it might have as an advantage (but I'm guessing here) is that the execution plan isn't recompiled every time, and with the IF you run that risk.

One of the standard CRUD procs is to write a "Save" using this pattern:

IF EXISTS (SELECT * FROM MyTable WHERE PKColumns = ...) BEGIN UPDATE... END ELSE BEGIN INSERT... END

Obviously this will be a ** seek ** on the index used to implement the PK, not a scan.

Yes I meant an index scan/seek, The table scan is of course if you don't use pk columns to match if the row already exists, but that was pretty foolish of myself: why would you ever do that without unique (and thus non-indexed) columns flushed

Frans Bouma | Lead developer LLBLGen Pro
Bashar
User
Posts: 108
Joined: 11-Nov-2004
# Posted on: 24-Jun-2005 09:49:44   

So.. what's the final verdict...

Are you gonna add this in? confused

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39801
Joined: 17-Aug-2003
# Posted on: 24-Jun-2005 10:31:16   

Bashar wrote:

So.. what's the final verdict...

Are you gonna add this in? confused

I'm undecided. The main issue is: what to do in the update? Just overwrite all fields? Or merge data as in: leave it to the developer what to do?

Frans Bouma | Lead developer LLBLGen Pro