HnD | Setting PK field, Page 1

Setting PK field

Posts

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 26-Apr-2012 11:41:05

For our simple reference data types, we want to use byte/tinyint. For possible compatibility with Sql Compact in the future, which doesn't support identity column other than int/bigint I believe, we are considering removing the identity from the PK in these tables.

What is the recommended way for Adapter of inserting one or a list of new entities (of the same type) and incrementing the PK(s) accordingly?

Cheers Simon

Walaa
Support Team

Posts: 14995
Joined: 21-Aug-2005

# Posted on: 26-Apr-2012 19:36:28

You can hook into one of the events raised on the process of saving an entity.

For example you can use the validation framework, and handle the SaveEntityBeforeSave. There you should check if this IsNew entity, and if so fetch the new sequence from some ser ice or static method.

Otis
LLBLGen Pro Team

Posts: 39910
Joined: 17-Aug-2003

# Posted on: 27-Apr-2012 09:42:54

be aware that any sequencing not done by the DB might result in duplicates, e.g. when the transaction rolls back or when multiple threads access the central sequence source.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 29-Apr-2012 08:26:43

Otis wrote:

be aware that any sequencing not done by the DB might result in duplicates, e.g. when the transaction rolls back or when multiple threads access the central sequence source.

No duplicates on a Primary Key and there won't be a central sequence source - all based on the single table involved.

Input will be an EntityCollection<T> (where Count is typically just one).

I've added this code to a DataAccessAdapter partial class (do I really have to create an entity just to find its PK??)

        public TKeyType FetchMaxValue<TEntity, TKeyType>() where TEntity:EntityBase2, new()
        {
            var entity = new TEntity();

            var result = GetScalar(entity.Fields.PrimaryKeyFields[0], AggregateFunction.Max);

            return result == DBNull.Value
                    ? default(TKeyType)
                    : (TKeyType) result;
        }

so I can call this and then set the PK incrementally on each of the entities in the list and save them. All within the same transaction of course.

This is just for reference data which is added to occasionally - I wouldn't do this for normal tables - they stay as int PKs with Db sequences. But it seems silly to use an int for Titles (i.e. Mr, Mrs etc) - there isn't going to be 4 billion of those!

Otis
LLBLGen Pro Team

Posts: 39910
Joined: 17-Aug-2003

# Posted on: 29-Apr-2012 12:00:38

This won't work, multiple threads (e.g. in a webapplication) or multiple desktop applications accessing the same database could end up with the same PK value. Never do sequencing yourself, leave it to the DB.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 30-Apr-2012 09:41:51

Otis wrote:

This won't work, multiple threads (e.g. in a webapplication) or multiple desktop applications accessing the same database could end up with the same PK value. Never do sequencing yourself, leave it to the DB.

Hmm, I do get worried when you tell me something contradictory to my understanding since you know way more about databases than I ever will

So let me tell you my thinking and you can tell me where I am wrong.

1) Databases didn't always have sequences available so there must have been a way of achieving this.

2) SQL Compact doesn't support sequence fields on tinyint PKs. Are you suggesting we should use int PKs regardless just to ensure sequencing?

3) The database will not allow duplicate PKs so, at worst, an error will be thrown which can be caught and the action replayed.

4) My plan was to have a transaction (correctly configured**) which will lock the table for the duration of the transaction and therefore it is safe to retrieve the highest PK in use, increment it for each of the new entities, and save those entities back before committing the transaction.

** Need a WITH (HOLDLOCK, UPDDLOCK) or WITH (TABLOCKX, HOLDLOCK) on the SELECT I think - I assume there is somewhere in LLBLGEN where I can poke these hints in??

Otis
LLBLGen Pro Team

Posts: 39910
Joined: 17-Aug-2003

# Posted on: 30-Apr-2012 09:51:44

With that you're effectively re-building identity columns, but not as efficiently as a DB can do it.

Sequences need a centralized, atomic system to work. I.o.w.: the system which increases them must, in all cases, return a new, unique number. So increasing an existing number requires a lock on the existing number's location, increase it, store the new number and then return it. There are ways to do that, but why bother? It's built-in and without locks. So it will work more efficiently than your system with a restrictive lock: your lock makes all inserts wait for that single lock.

So, use the int for PKs, and use Identity, don't waste time on this, as it's not going to give you any advantage, on the contrary. In general these things were made with stored procs, so I think you can find more of them with google and also advice against them

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 30-Apr-2012 11:25:55

Otis wrote:

With that you're effectively re-building identity columns, but not as efficiently as a DB can do it.

Sequences need a centralized, atomic system to work. I.o.w.: the system which increases them must, in all cases, return a new, unique number. So increasing an existing number requires a lock on the existing number's location, increase it, store the new number and then return it. There are ways to do that, but why bother? It's built-in and without locks. So it will work more efficiently than your system with a restrictive lock: your lock makes all inserts wait for that single lock.

So, use the int for PKs, and use Identity, don't waste time on this, as it's not going to give you any advantage, on the contrary. In general these things were made with stored procs, so I think you can find more of them with google and also advice against them

I agree 100% for most tables and will do that but this is for tiny reference tables which will have insertions once-in-a-blue-moon, if ever.

Is is really worth having int FKs to them in potentially hundreds of thousands of transactions?

I'm certainly not suggesting this should be a feature of LLBLGen, but if there is a place in DataAccessAdapter where I can insert table hints then it shouldn't take too long to write the code to do this. I then have the option to do it this way. Is there such a place?

Otis
LLBLGen Pro Team

Posts: 39910
Joined: 17-Aug-2003

# Posted on: 30-Apr-2012 17:54:20

simmotech wrote:

Otis wrote:

With that you're effectively re-building identity columns, but not as efficiently as a DB can do it.

Sequences need a centralized, atomic system to work. I.o.w.: the system which increases them must, in all cases, return a new, unique number. So increasing an existing number requires a lock on the existing number's location, increase it, store the new number and then return it. There are ways to do that, but why bother? It's built-in and without locks. So it will work more efficiently than your system with a restrictive lock: your lock makes all inserts wait for that single lock.

So, use the int for PKs, and use Identity, don't waste time on this, as it's not going to give you any advantage, on the contrary. In general these things were made with stored procs, so I think you can find more of them with google and also advice against them

I agree 100% for most tables and will do that but this is for tiny reference tables which will have insertions once-in-a-blue-moon, if ever.

Why do you need sequenced fields then? If they almost never get inserted, you could get away with the costs of doing it yourself, though I wouldn't spend time on this, as IMHO it's wasted time on a problem that's already been solved.

I'm certainly not suggesting this should be a feature of LLBLGen, but if there is a place in DataAccessAdapter where I can insert table hints then it shouldn't take too long to write the code to do this. I then have the option to do it this way. Is there such a place?

What kind of hints do you want to add? Only no-lock hints are possible, but if you want to append other hints to table names it's a little more work and requires a custom DQE. I don't really recommend it in this case.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 30-Apr-2012 18:14:08

I would like to use sequenced fields everywhere, I really would. LLBLGen and SQL Server support them for all int sizes.

SQL Server Compact doesn't however. So I either have to compromise my design and use int PK/FKs everywhere or spend a bit of time writing a single method that emulates sequenced fields. Once its written I don't need to worry about it. I really don't mind low-level code.

Walaa
Support Team

Posts: 14995
Joined: 21-Aug-2005

# Posted on: 30-Apr-2012 19:25:05

IMHO, there is no use of this. PKs are recommended to be artificial, i.e. Sequence. And I see no value to encrypt them

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 01-May-2012 08:51:05

Walaa wrote:

IMHO, there is no use of this. PKs are recommended to be artificial, i.e. Sequence. And I see no value to encrypt them

Arghh!!Sql Server Compact does not support Sequences for non-int PKs! Not possible, not allowed. Will not work.

If I want to use a non-int PK (which I do because it is the correct thing to do database design-wise and database size-wise) then I will have to emulate a Sequence.

I'm not asking you guys to make this a feature and I'm not asking you to write the code for me, I just needed a clue as to what component will let me add the hints I need to make this work (custom DQE). Frans said "This won't work" but once those hints are in then, I believe, it is guaranteed to produce a non-duplicate PK in a thread-safe manner. (If it doesn't work then I will cheat and catch the duplicate key exception and retry until it does )

Who mentioned encryption??

Otis
LLBLGen Pro Team

Posts: 39910
Joined: 17-Aug-2003

# Posted on: 01-May-2012 10:32:37

Please calm down... , we didn't make SQL CE suck so much, Microsoft did.

If you want this, I recommend you call a proc which produces the new PK value for you and then set the PK with the value you got from the proc. You otherwise need to be able to insert a value you create inside the INSERT query, which is not possible to generate with the code as you can't emit expressions in the INSERT query using the current DQE design.

So if you really REALLY want to go this route (which you shouldn't), create a proc, call it to obtain the new value, and set the PK with that value. Dunno if CE has proc support now, but you could hack that in with a piece of SQL which specifically works on CE and obtains the new value for it.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 02-May-2012 06:42:50

Otis wrote:

Please calm down... , we didn't make SQL CE suck so much, Microsoft did.

If you want this, I recommend you call a proc which produces the new PK value for you and then set the PK with the value you got from the proc. You otherwise need to be able to insert a value you create inside the INSERT query, which is not possible to generate with the code as you can't emit expressions in the INSERT query using the current DQE design.

So if you really REALLY want to go this route (which you shouldn't), create a proc, call it to obtain the new value, and set the PK with that value. Dunno if CE has proc support now, but you could hack that in with a piece of SQL which specifically works on CE and obtains the new value for it.

This is bizarre... we really must be talking at cross purposes here.

Getting the next PK is not a problem, its MAX(ID) + 1 - don't need stored procedures for that and SQL Compact does not support Stored Procedures anyway.

It was always the plan to set the PK locally, not mess about with INSERT SQL building. This is the code:-

        var id = adapter.FetchMaxValue<TEntity, byte>();

        var toSave = new EntityCollection<TEntity>();

        foreach (var description in descriptions)
        {
            var entity = new TEntity();

            entity.SetNewFieldValue(valueFieldName, ++id);
            entity.SetNewFieldValue(descriptionFieldName, description);

            toSave.Add(entity);
        }

            adapter.SaveEntityCollection(toSave);

Its one SELECT query followed by an INSERT QUERY all within one transaction. This will work in my scenario 999 times out of 1000 but just to be absolutely sure, if I put the correct table locking hints on the SELECT query then they are preserved to the end of the transaction (not just the SELECT query) thus preventing the race conditions you are concerned about.

I will create an extension method similar to FetchMaxValue called FetchMaxPKAndLockForUpdate and hide any hacks in there so the application code stays clear (or even implement the whole code above as an extension method)

This is neat, logical, efficient, safe and will just require inserting a string within a string. I have not looked at the DQE yet but I'm guessing its possible somehow.

Cheers Simon

Otis
LLBLGen Pro Team

Posts: 39910
Joined: 17-Aug-2003

# Posted on: 02-May-2012 10:33:06

simmotech wrote:

Otis wrote:

Please calm down... , we didn't make SQL CE suck so much, Microsoft did.

If you want this, I recommend you call a proc which produces the new PK value for you and then set the PK with the value you got from the proc. You otherwise need to be able to insert a value you create inside the INSERT query, which is not possible to generate with the code as you can't emit expressions in the INSERT query using the current DQE design.

So if you really REALLY want to go this route (which you shouldn't), create a proc, call it to obtain the new value, and set the PK with that value. Dunno if CE has proc support now, but you could hack that in with a piece of SQL which specifically works on CE and obtains the new value for it.

This is bizarre... we really must be talking at cross purposes here.

Getting the next PK is not a problem, its MAX(ID) + 1 - don't need stored procedures for that and SQL Compact does not support Stored Procedures anyway.

It was always the plan to set the PK locally, not mess about with INSERT SQL building. This is the code:-
        var id = adapter.FetchMaxValue<TEntity, byte>();

        var toSave = new EntityCollection<TEntity>();

        foreach (var description in descriptions)
        {
            var entity = new TEntity();

            entity.SetNewFieldValue(valueFieldName, ++id);
            entity.SetNewFieldValue(descriptionFieldName, description);

            toSave.Add(entity);
        }

            adapter.SaveEntityCollection(toSave);
Its one SELECT query followed by an INSERT QUERY all within one transaction. This will work in my scenario 999 times out of 1000 but just to be absolutely sure, if I put the correct table locking hints on the SELECT query then they are preserved to the end of the transaction (not just the SELECT query) thus preventing the race conditions you are concerned about.

Current situation, max ID is 10. Now two threads want to insert a row. BOTH do max(ID) and both get 10 back. You can only prevent this, by locking the entire table for READS, so the transaction who locks first locks readers and writers. This means that effectively you create a single user database: no selects whatsoever can happen on that table, because if two selects can happen, you run the risk of duplicate IDs. Our two threads both will insert a value of 11 for the ID-> boom.

I will create an extension method similar to FetchMaxValue called FetchMaxPKAndLockForUpdate and hide any hacks in there so the application code stays clear (or even implement the whole code above as an extension method)

This is neat, logical, efficient, safe and will just require inserting a string within a string. I have not looked at the DQE yet but I'm guessing its possible somehow.

It's not efficient, you have to lock the entire table exclusively, to prevent two reads of the rows at the same time: two insert actions from two different sources (two desktop apps targeting the same DB, two threads in a website) need to be executed atomically after each other: T1 Locks entire table, T1 reads max ID, T1 updates max with 1, T1 inserts new row, T1 unlocks table. T2 Locks entire table, T2 reads max ID, T2 updates max with 1, T2 inserts new row, T2 unlocks table.

There's no other way: T2 can't be allowed to read a single row in the table, as that would make T2's process continue with the value and potentially increase the same value as T1 had read.

I've seen your proposed solution a lot during the years, it's not unique but it shares a common thing: it's unsafe, inefficient and actually not really clever: instead of using the safe and efficient mechanism built into the DB, you re-invent it with your own system which requires table-locking which makes it perform very poorly compared to the built-in system.

I really don't see why you want to waste time on this: it makes your application perform slower, and if you don't want to lock the entire table, you'll run into the problem that there might be duplicate IDs.

Simon, it's your time of course and your software, so if you want to waste time on this and make your software perform sub-optimal, that's your choice. What annoys me a bit is that you apparently think you can beat the system and do what others thought they could do too but failed in doing: creating a sequence system outside the DB that's safe and efficient.

I referred to a stored proc solution because you can then use a sequence table. See, your solution is the worst of its kind for this. There's another one, which might even work out, yet it's also a waste of time because it creates exactly what the DB system already provides.

You create a table, Sequences CREATE TABLE Sequences ( ID int Value byte, CONSTRAINT ID_PK PRIMARY KEY (ID) );

Now you create a proc which has an input parameter for the sequence id and an output parameter which returns the new sequence value.

What it does is: - SELECT with exclusive lock the value for the sequence with ID specified - update the value +1 and update the value in the row with the new value - return the new value, committing the transaction.

This locks just 1 row, not the entire table.

It's still very stupid to use this, as it simply re-implements identity fields using interpreted SQL statements, but as you are so determined to re-implement it again, at least choose the less stupid way to do it, not the MAX(ID)+1 route: this locks just 1 row not an entire table, it selects and updates the value in a transaction inside a proc, so no roundtrips needed, so the locks are present not as long as in the worst case scenario, namely your solution.

But I digress... I've mentioned it enough times that re-implementing identity is a dumb move.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 03-May-2012 13:47:59

Thanks for your eloquent explanation of the race condition. Your summary, that the table must be locked, is what I concluded also, hence me mentioning it back in my third message in this thread.

** Need a WITH (HOLDLOCK, UPDDLOCK) or WITH (TABLOCKX, HOLDLOCK) on the SELECT I think

I think you over-dramatise things by claiming I am effectively "creating a single user database". I don't agree at all. Readers will only be delayed in the extremely unlikely event that a writer has locked the table and even then for a millisecond or two. Remember this is simple reference data, not transactional data. Additions to the tables might be one or two a month. Fewer in most cases.

"application perform slower", "sub-optimal". This probably hurt the most, I am Mr-Optimiser! But premature optimization is the root of all evil and beside, the size-saving will at least cancel out any overhead and perhaps even increase speed. None of which would be noticeable to any user anyway.

What annoys me a bit is that you apparently think you can beat the system and do what others thought they could do too but failed in doing: creating a sequence system outside the DB that's safe and efficient.

Its safe. And its efficient enough for its purpose. I'm not trying to 'beat the system' but on the other hand I'm not going to let the system beat me. Mandatory support for Sql Compact would have me change my byte-sized FKs to int-sized FKs just to speed up the handful of inserts/year by a millionth of a second; The whole reason for supporting Sql Compact is to save space, nothing else. Adding 75% waste to each byte FK is ludicrous. Now _that _ is sub-optimal.

Your stored procedure idea is not in scope since Sql Compact does not support them.

But I digress... I've mentioned it enough times that re-implementing identity is a dumb move.

Not as dumb as changing bytes to ints just to have an easy life.

Anyway, for anyone who is interested, here is the code in full:

    public partial class DataAccessAdapter
    {
        string tableHints;

        public bool SaveEntityWithPKSequenceEmulation<TEntity>(TEntity entityToSave) where TEntity: EntityBase2
        {
            return SaveEntityCollectionWithPKSequenceEmulation(new EntityCollection<TEntity>(new[] { entityToSave })) == 1;
        }

        public int SaveEntityCollectionWithPKSequenceEmulation<TEntity>(EntityCollection<TEntity> collectionToSave) where TEntity: EntityBase2
        {
            // Quick return if nothing to do
            if (collectionToSave.Count == 0) return 0;

            // Find the primary key for the entity type
            var primaryKeyFields = collectionToSave[0].Fields.PrimaryKeyFields;
            if (primaryKeyFields.Count != 1) throw new InvalidOperationException("Must be exactly one Primary Key field.");

            // Validate the PK type
            var primaryKeyField = primaryKeyFields[0];
            if (primaryKeyField.DataType != typeof(byte) && primaryKeyField.DataType != typeof(short)) throw new InvalidOperationException("Primary Key must be byte or short.");
            var isByte = primaryKeyField.DataType == typeof(byte);

            // Start a transaction
            StartTransaction(IsolationLevel.ReadCommitted, "PKSequenceEmulation");

            // Find the highest PK currently in use
            // with a full table lock (TABLOCKX) and hold that lock (HOLDLOCK)
            // until the end of the transaction
            tableHints = "TABLOCKX, HOLDLOCK";
            var scalar = GetScalar(primaryKeyField, AggregateFunction.Max);
            var id = scalar == DBNull.Value ? (short) 0 : Convert.ToInt16(scalar);

            // Check there is room to add the new entities
            if (id + collectionToSave.Count > (isByte ? byte.MaxValue : short.MaxValue)) throw new OverflowException("Not enough Primary Keys left.");

            // Assign each entity their PK
            foreach(var entity in collectionToSave)
            {
                entity.SetNewFieldValue(primaryKeyField.Name, (isByte ? (object) (byte) ++id : ++id));
            }

            // Save as normal
            var result = SaveEntityCollection(collectionToSave);

            // No exception so we can commit
            Commit();

            return result;
        }

        protected override IRetrievalQuery CreateSelectDQ(IEntityFields2 fieldsToFetch, IFieldPersistenceInfo[] persistenceInfoObjects, IPredicateExpression filter, long maxNumberOfItemsToReturn, ISortExpression sortClauses, IRelationCollection relationsToWalk, bool allowDuplicates, IGroupByCollection groupByClause, int pageNumber, int pageSize)
        {
            // Do the normal thing
            var result = base.CreateSelectDQ(fieldsToFetch, persistenceInfoObjects, filter, maxNumberOfItemsToReturn, sortClauses, relationsToWalk, allowDuplicates, groupByClause, pageNumber, pageSize);

            // Do we need to add some table hints?
            if (!string.IsNullOrEmpty(tableHints))
            {
                // Yes so mangle the query to insert them
                result.Command.CommandText = Regex.Replace(result.Command.CommandText, @"FROM\s+[a-z\][.]+", "$0 WITH (" + tableHints + ")", RegexOptions.IgnoreCase);

                // Reset because this is a one-shot thing
                tableHints = null;
            }

            return result;
        }
   }

It supports byte and int16 as PK types and has been tested with the unit test below - 3 background threads each continually adding 2 entities at a time until there is no more room whilst the main thread is displaying the count frequently

        [Test]
        public void TestSaveEntityCollectionWithInt16PKSequenceEmulationMultiThreadingThrash()
        {
            var writerThreadsProcess = new WaitCallback(
                state =>
                    {
                        var threadName = state.ToString();

                        while(true)
                        {
                            var collectionToSave = new EntityCollection<AssetClassEntity>
                                            {
                                                new AssetClassEntity { Description = threadName + " item 1" },
                                                new AssetClassEntity { Description = threadName + " item 2" }
                                            };
                            try
                            {
                                using(var adapter = new DataAccessAdapter())
                                {
                                    adapter.SaveEntityCollectionWithPKSequenceEmulation(collectionToSave);
                                }
                            }
                            catch(OverflowException)
                            {
                                break;
                            }
                        }
                    }
                );

            ThreadPool.QueueUserWorkItem(writerThreadsProcess, "Thread #1");
            ThreadPool.QueueUserWorkItem(writerThreadsProcess, "Thread #2");
            ThreadPool.QueueUserWorkItem(writerThreadsProcess, "Thread #3");

            var pkField = AssetClassFields.ID;

            // Main thread read
            while(true)
            {
                using (var adapter = new DataAccessAdapter())
                {
                    var scalar = adapter.GetScalar(pkField, AggregateFunction.Max);
                    var count = scalar == DBNull.Value ? (short) 0 : Convert.ToInt16(scalar);
                    Console.WriteLine(count);

                    if (count > short.MaxValue - 2) break;

                    Thread.Sleep(100);
                }
            }
        }

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 24-May-2012 18:43:14

Just found another good use for this.

We have some reference data that the user can add to but we want to maintain a number of 'well-known' special items that are there when the app is installed and their IDs can be used in code. It could be that we want to add new 'well-known' items in a future version and we won't be able to do that if the IDs just increment.

So I've added a parameter

short reservedCount = 0

to the two public methods and changed the id declaration to

var id = Math.Max(reservedCount, scalar == DBNull.Value ? (short) 0 : Convert.ToInt16(scalar));

Now we can reserve say 50 items and let users start adding theirs above that.