HnD | Entity Objects and Serialization, Page 1

Entity Objects and Serialization

Posts

jeffreygg
User

Posts: 805
Joined: 26-Oct-2003

# Posted on: 16-Jul-2004 00:01:22

Hi, Frans. I'm curious about how well your objects will deserialize (in my case for persistence) when taking into account things like column name and/or index changes and especially column additions and removals.

I'm working on the next phase of my custom/ad-hoc query system. In the first round I found a problem with deserializing my query fields and entities when the underlying schema changed - particularly when the number of fields changed in an entity's .Fields() collection (I use my own custom objects right now, not yours). When I deserialized an entity from disk, the formatter (I think it's the formatter) needs to instantiate a "new" object of the same type, then transfer the information over. However, if the objects are different, either because the number of members changed, or in my case because the number of items in the .Fields() collection changed, the deserialization fails.

I realized in my case that I didn't abstract enough so I'm working on separating entity and field definitions (name, expression, etc) from instructions (alias, sort order, group order), but I'm losing a lot of schema information such as relationships in the process.

I realize you must have most of this already taken care of (and so I'd rather not reinvent the horse), but I'm curious about how the LLBLGen entities, fields, and new features you're putting in (many joins to same table, etc) will handle similar situations... Thanks.

Jeff...

Oh, yea: and how are the new features progressing? Still looking for a beta near the end of the month?

Otis
LLBLGen Pro Team

Posts: 39826
Joined: 17-Aug-2003

# Posted on: 16-Jul-2004 09:23:11

jeffreygg wrote:

Hi, Frans. I'm curious about how well your objects will deserialize (in my case for persistence) when taking into account things like column name and/or index changes and especially column additions and removals.

Rule of thumb: at the serialization side, the same code has to be used as on the deserialization side. If you're using different versions of your code on either side, things will go wrong.

I'm working on the next phase of my custom/ad-hoc query system. In the first round I found a problem with deserializing my query fields and entities when the underlying schema changed - particularly when the number of fields changed in an entity's .Fields() collection (I use my own custom objects right now, not yours). When I deserialized an entity from disk, the formatter (I think it's the formatter) needs to instantiate a "new" object of the same type, then transfer the information over. However, if the objects are different, either because the number of members changed, or in my case because the number of items in the .Fields() collection changed, the deserialization fails.

True, you then get errors because the formatter can't deserialize a member.

I realized in my case that I didn't abstract enough so I'm working on separating entity and field definitions (name, expression, etc) from instructions (alias, sort order, group order), but I'm losing a lot of schema information such as relationships in the process.

I realize you must have most of this already taken care of (and so I'd rather not reinvent the horse), but I'm curious about how the LLBLGen entities, fields, and new features you're putting in (many joins to same table, etc) will handle similar situations... Thanks.

Serialization is about data. So you serialize a set of entities to a stream (file, block of mem) and in fact you're serializing the data of those entities to the stream, not the entities themselves. So if I create new instances at the deserialization action and store the data I receive in these new instances, I'll have the same entities at the other end. That's the idea.

So everything that is baked into an entity when you instantiate it without data is also available when you re-create an entity right before the deserialized data is stored in that new entity instance object. This means: relations and code making things possible.

As all data is stored inside an entity, including synchronization setup data, that data is serialized with the entity as well. This means that when you serialize a customer with a filled order collection, the orders in that collection will have a reference to the customer object holding the collection after deserialization plus will the customer object have sync info stored for the order objects in its collection.

Xml Serialization/deserialization with ReadXml and WriteXml follow a slighter different path: in there I rebuild the graph, which in fact rebuilds all sync info and references.

Oh, yea: and how are the new features progressing? Still looking for a beta near the end of the month?

I hope so yes. The last 4 days were a real struggle with the join-with-self/multiple join stuff, but it's almost done now. I have added alias support for entities in a relation, and you can specify in a filter which entity you want to filter on, but that comes back in a lot of places and the join routine had to be rewritten completely, which was harder to get right than I thought, but it's working now (I can load a customer filtered on visiteraddress.street and billingaddress.street, which was impossible before ). The only thing which will be a struggle now is the multi-fetch stuff. However in the past weeks I've worked out a lot of ideas for the various things to be implemented, like paging, aggregate functions, expressions etc, so it looks good.

Frans Bouma | Lead developer LLBLGen Pro

jeffreygg
User

Posts: 805
Joined: 26-Oct-2003

# Posted on: 16-Jul-2004 09:56:50

Well, I guess my main concern is not of serializing the data, but the definition of the query in the form of collections of fields, relations, sort instructions and group instructions. My question is how well these items will stand up in the scenario above (i.e., Serialize query, change schema, deserialize query).

It seems to me that a query system should be able to withstand any schema changes short of relationship changes as those are the only definitions in the schema that, were they to change, really actually change the schema. The rest are just names.

Jeff...

BTW, good deal on the next beta. Looking forward to it!

Otis
LLBLGen Pro Team

Posts: 39826
Joined: 17-Aug-2003

# Posted on: 16-Jul-2004 10:33:29

jeffreygg wrote:

Well, I guess my main concern is not of serializing the data, but the definition of the query in the form of collections of fields, relations, sort instructions and group instructions. My question is how well these items will stand up in the scenario above (i.e., Serialize query, change schema, deserialize query).

Well, you have data, which is passed by value, and you have behavior commands, defined to act on that data. These two are separated but at the same time closely connected. I'll explain that below

It seems to me that a query system should be able to withstand any schema changes short of relationship changes as those are the only definitions in the schema that, were they to change, really actually change the schema. The rest are just names.

No, this is a misunderstanding The same misunderstanding which feeds the myth that you should be able to 'just change some mappings when the schema changes and your code should continue to work'. Of course, if I rename a field in the table customer from Foo to Bar, nothing will change, the entity field mapped on that table is not renamed, the mapping just changes. Also, if you hide a relation in the code on the deserialization side, you wont get the code for utilizing that relation when you instantiate a serialized stream of data back into entity objects, as that is defined in the code.

However what about changing the PK field from int to GUID? or you decide to add 2 columns of data? or split up a table into two tables? That last one is perhaps obvious that it might fail, but the other smaller changes have severe impact. What about a column in a table changes its type, from ntext to nvarchar(100). Will the data fit? No, probably not.

It's also not logical: your application is distributed, but that doesn't mean it's not a single application anymore. To make one layer understand the data another layer sends to it, it has to make sure communication is possible. That's the main thing: you want the same type of instances with the same data at the other side, how is up to the layers below your code. However if you use different code at the other end, it will not work, as it will never be able to reinstantiate the same type of objects, so the data to be merged with these instances probably will not match.

BTW, good deal on the next beta. Looking forward to it!

Designer support for typed lists will be done later btw, but that's logical as the designer support for the new stuff will require a total new typed list editor. You will be able to formulate lists in code to fill datatables btw, so it's not that big of an issue for now.

Frans Bouma | Lead developer LLBLGen Pro

jeffreygg
User

Posts: 805
Joined: 26-Oct-2003

# Posted on: 16-Jul-2004 11:07:42

Otis wrote:

However what about changing the PK field from int to GUID? or you decide to add 2 columns of data? or split up a table into two tables? That last one is perhaps obvious that it might fail, but the other smaller changes have severe impact. What about a column in a table changes its type, from ntext to nvarchar(100).

Actually, yes: aside from the table split(which would result in different relationships anyway), I think a properly constructed query system should be able to weather those changes. Why? Because what's persisted is not the definitions themselves, but simply a proxy that stands in for the definitions: the "thing" the user wants to see, no matter how the underlying data is stored.

In your case the proxy is the EntityFieldIndex enums that allow one to retrieve a field without knowing anything about how that field is defined. If that proxy is persisted, along with specific instructions about how to render it (alias, sort order, group order, etc) then that field (and perhaps even it's "parent" table/entity) can be changed ad infinitum without affecting the stored query. The only problem I see is if the column's type changes, in which case everything will still work fine except for filter/criteria/parameters (and even those might still work).

The problem I'm working through right now is getting through storing the relationships between the requested field and appropriate parent table/entity when taking into account multiple joins to the same table. I know that you've gotten past this issue (man, if it took you 4 days I think I underestimated how long it's gonna take me) and so I want to evaluate LLBLGen for the purpose. My decision point is that if I use LLBLGen for my query system are the serialized queries abstracted well enough from the definitions that I can change the schema (up to a point) without throwing all of the user's carefully designed queries in the trash bin when they are deserialized after the changes.

Will the data fit? No, probably not.

See, I think we're not talking on the same level here. The data gets generated when the deserialized query is executed. It's not stored anywhere...

Jeff...

Otis
LLBLGen Pro Team

Posts: 39826
Joined: 17-Aug-2003

# Posted on: 16-Jul-2004 11:27:42

Oh, I thought you were talking about the different types of code on either side of a serialization / deserialization chain.

If you have a set of predicates which are serialized to disk, you serialize entity field objects and for example data to filter on, inside predicate objects which are containers of entity fields the predicate works on, values and operators.

At runtime, when the predicate is executed, it is transformed into SQL. It is at that point where the actual mapping is consulted, thus the database information: which table and which table field the field is mapped on etc.

So if that field is moved from place 2 to place 10 in a table, that's not important. For selfservicing, it is important to note that persistence info is stored inside the field. This means that when you serialize a predicate to disk, and you alter the field name, the predicate data will contain field data mapped on the old database field. For adapter, this data is stored in the persistenceinfo factory which is working with entity names and entity field names to look up the mapping info at runtime, so when you change a mapping it will work. However if you remove an entity field it will of course not work.

Predicates are not designed to be kept, as they reflect a moment in time at runtime. In your situation, if you're creating a system which creates predicate objects at runtime, which is what I understand you're doing, I'd store the data which goes into that system to produce the predicates at runtime. You can then 'replay' the creation of the predicates at runtime.

(man, if it took you 4 days I think I underestimated how long it's gonna take me)

Nah, once you get the idea, it's pretty simple. Here's the routine that does the work at the moment: RelationCollection.ToQueryText(). As you can see, it's a lot of alias mumbo jumbo. Initial unit tests succeed, so it looks promissing .


/// <summary>
/// Converts the set of relations to a set of nested JOIN query elements using ANSI join syntaxis. Oracle 8i doesn't support ANSI join syntaxis
/// and therefore the OracleDQE has its own join code.
/// It uses a database specific creator object for database specific syntaxis, like the format of the tables / views and fields. 
/// </summary>
/// <param name="uniqueMarker">int counter which is appended to every parameter. The refcounter is increased by every parameter creation,
/// making sure the parameter is unique in the custom filter predicates</param>
/// <returns>The string representation of the INNER JOIN expressions of the contained relations, when ObeyWeakRelations is set to false (default)
/// or the string representation of the LEFT/RIGHT JOIN expressions of the contained relations, when ObeyWeakRelations is set to true</returns>
/// <exception cref="ApplicationException">When the DatabaseSpecificCreator is not set</exception>
/// <exception cref="ORMRelationException">when the relation set contains an error and is badly formed. For example when the relation collection
/// contains relations which do not have an entity in common, which can happen when a bad alias is specified</exception>
public string ToQueryText(ref int uniqueMarker)
{
    if(_databaseSpecificCreator==null)
    {
        throw new System.ApplicationException("DatabaseSpecificCreator object not set. Cannot create query part.");
    }

    // Hashtable with the Object name + the alias as key (f.e. "[dbo].[Customers] C") and a boolean as value which signals if the
    // object is added weakly or not.
    Hashtable objectsWithAliasesAdded = new Hashtable();
    StringBuilder queryText = new StringBuilder(256);
    
    // clear any previously created objects
    _customFilterParameters = new ArrayList();

    for(int i=0;i<List.Count;i++)
    {
        EntityRelation relation = (EntityRelation)this[i];
        string pkElement, fkElement, aliasPKSide, aliasFKSide, pkElementReference, fkElementReference, joinType;
        bool pkElementAddedWeak=false, fkElementAddedWeak=false, addFKSide=true, addPKSide=true, relationChainIsWeak=false;

        // construct the "PKelement jointype JOIN FKelement" join fragment
        aliasPKSide = relation.AliasPKSide;
        aliasFKSide = relation.AliasFKSide;
        pkElement = _databaseSpecificCreator.CreateObjectName(relation.GetPKFieldPersistenceInfo(0));
        fkElement = _databaseSpecificCreator.CreateObjectName(relation.GetFKFieldPersistenceInfo(0));
        pkElementReference = pkElement;
        fkElementReference = fkElement;
        joinType = "INNER";

        if(aliasPKSide.Length>0)
        {
            pkElement+= " " + aliasPKSide;
            pkElementReference = aliasPKSide;
        }

        if(aliasFKSide.Length>0)
        {
            fkElement+= " " + aliasFKSide;
            fkElementReference = aliasFKSide;
        }

        // check if PK side or FK side are already added to the query text. If so, and if we're not the first iteration,
        // we can drop the side already added. 
        if(i>0)
        {
            if(objectsWithAliasesAdded.ContainsKey(pkElement))
            {
                // PK side already added 
                addPKSide = false;
                relationChainIsWeak = (bool)objectsWithAliasesAdded[pkElement];
            }
            else
            {
                // pk side is not added, fk side has to be in the list of already added elements. If not, the relation
                // set contains an error (FROM A INNER JOIN B ON A.x = B.x INNER JOIN D on C.x = D.x -> error, C is not in the list)
                if(!objectsWithAliasesAdded.ContainsKey(fkElement))
                {
                    // not added as well. Error
                    throw new ORMRelationException("Relation at index " + i + " doesn't contain an entity already added to the FROM clause. Bad alias?");
                }
                relationChainIsWeak = (bool)objectsWithAliasesAdded[fkElement];
                addFKSide = false;
            }
        }

        if( ((_obeyWeakRelations && relation.IsWeak) || 
                relationChainIsWeak || 
                (relation.WeaknessHint == RelationWeaknessHint.IsWeak))
            && !(relation.WeaknessHint == RelationWeaknessHint.IsStrong))
        {
            if(relation.TypeOfRelation==RelationType.ManyToOne)
            {
                // Always join towards the FK in this situation (m:1 relation).
                // Order.CustomerID - Customer.CustomerID, where Order.CustomerID can be null
                // PK side is mentioned first, FK side is mentioned second, so a RIGHT join will
                // include all elements of the FK side, in this case Order, despite a NULL.
                pkElementAddedWeak=true;
                fkElementAddedWeak=false;

                if(addFKSide)
                {
                    joinType = "RIGHT";
                }
                else
                {
                    // swap join type from RIGHT to LEFT, as the join order of the elements changes: not: PK right join FK, but FK LEFT JOIN PK,
                    // as FK side is already in the join list.
                    joinType = "LEFT";
                }
            }
            else
            {
                // Always join towards the PK in this situation (1:n or 1:1 relation)
                pkElementAddedWeak=false;
                fkElementAddedWeak=true;

                if(addFKSide)
                {
                    joinType = "LEFT";
                }
                else
                {
                    // swap join type from LEFT to RIGHT, as the join order of the elements changes: not: PK left join FK, but FK RIGHT JOIN PK,
                    // as FK side is already in the join list.
                    joinType = "RIGHT";
                }
            }
        }

        if(addFKSide)
        {
            objectsWithAliasesAdded[fkElement] = fkElementAddedWeak;
        }
        if(addPKSide)
        {
            objectsWithAliasesAdded[pkElement] = pkElementAddedWeak;
        }

        // construct query elements
        if(addPKSide && addFKSide)
        {
            queryText.AppendFormat(" {0} {1} JOIN {2} ON ", pkElement, joinType, fkElement);
        }
        else
        {
            if(addPKSide)
            {
                // pk side only
                queryText.AppendFormat(" {0} JOIN {1} ON ", joinType, pkElement);
            }
            else
            {
                // fk side only
                queryText.AppendFormat(" {0} JOIN {1} ON ", joinType, fkElement);
            }
        }

        // create ON clauses.
        for(int j=0;j<relation.AmountFields;j++)
        {
            if(j>0)
            {
                queryText.Append(" AND");
            }
            queryText.AppendFormat(" {0}.{1}={2}.{3}", 
                pkElementReference, 
                _databaseSpecificCreator.CreateFieldNameSimple(relation.GetPKFieldPersistenceInfo(j), relation.GetPKEntityFieldCore(j).Name),
                fkElementReference,
                _databaseSpecificCreator.CreateFieldNameSimple(relation.GetFKFieldPersistenceInfo(j), relation.GetFKEntityFieldCore(j).Name));
        }

        // if this EntityRelation has a custom filter, add that filter with AND. 
        if(relation.CustomFilter!=null)
        {
            if(relation.CustomFilter.Count>0)
            {
                relation.CustomFilter.DatabaseSpecificCreator = _databaseSpecificCreator;
                queryText.AppendFormat(" AND {0}", relation.CustomFilter.ToQueryText(ref uniqueMarker));
                // add parameters created by this custom filter to our general list.
                _customFilterParameters.AddRange(relation.CustomFilter.Parameters);
            }
        }
    }
    return queryText.ToString();
}

Frans Bouma | Lead developer LLBLGen Pro

jeffreygg
User

Posts: 805
Joined: 26-Oct-2003

# Posted on: 16-Jul-2004 11:49:14

Ok, I had a similar, if simplified version of what you did (I only allowed 1:1 relations so it was much easier), but here's my question: How did you associate a given field with a given entity in the relation chain. If the "Location" table is listed twice in the relation chain using different relationships, how did you track which field goes with which "instance" of the "Location" table?

The second question I have is basically my earlier question - how well will your system work with schema changes - I know that obviously I will have to regenerate the LLBL, but if I have a persisted query, how well will it weather the change. Or perhaps the better question is: "What changes to the database's schema will prevent those serialized queries from begin deserialized?".

I appreciate the code though and will try to answer my question as it relates specifically to the relationship portion you provided. I've reviewed it, but haven't looked at it specifically regarding my deserialization problem.

With that I'm going to see if my bed and I can get along...

Jeff...

Otis
LLBLGen Pro Team

Posts: 39826
Joined: 17-Aug-2003

# Posted on: 16-Jul-2004 21:36:15

jeffreygg wrote:

Ok, I had a similar, if simplified version of what you did (I only allowed 1:1 relations so it was much easier), but here's my question: How did you associate a given field with a given entity in the relation chain. If the "Location" table is listed twice in the relation chain using different relationships, how did you track which field goes with which "instance" of the "Location" table?

For entity fetches this is simple: the table supplying the data for the fetch is not aliased. For typed lists, an alias for the entity the field originates from is mandatory. A field therefore contains (in the case of a typed list) the alias for the entity it belongs to.

It's not foolproof: aliassing the stuff wrong in the filters will not give you the results planned. IMHO this is ok, as long as the developer has enough information about which aliases an entity has received in a typed list (will be generated into the code, when the designer is updated)

The second question I have is basically my earlier question - how well will your system work with schema changes - I know that obviously I will have to regenerate the LLBL, but if I have a persisted query, how well will it weather the change. Or perhaps the better question is: "What changes to the database's schema will prevent those serialized queries from begin deserialized?".

When you rename a table or field used in the predicate, you have to create them again. That's about it.

Frans Bouma | Lead developer LLBLGen Pro

jeffreygg
User

Posts: 805
Joined: 26-Oct-2003

# Posted on: 17-Jul-2004 00:10:37

Otis wrote:

For entity fetches this is simple: the table supplying the data for the fetch is not aliased. For typed lists, an alias for the entity the field originates from is mandatory. A field therefore contains (in the case of a typed list) the alias for the entity it belongs to.

Hmmm, when does the system generate the alias? If I build a fields collection for a typed list (that includes multi-joins) by adding fields from Table A and Table B, then persist/serialize the collection, then deserialize it and decide to add another field from Table A, will the alias from the pre-serialize Table A match the alias from the post-serialize Table A? i.e., will fields added after the deserialization contain the the same alias as the fields added before the serialization when they're added from the same table/relation? (Whew, hope that came across right). I'm worried that if the alias is generated at the original time of creation of the typed list, then subsequent field additions from the same table will not contain the same alias.

When you rename a table or field used in the predicate, you have to create them again. That's about it.

Great news! Beyond the original needs (multi-joins/aggregates), the persistence issue is my only other critical requirement (at least until I come up with more ) Thanks, Frans. Looking forward to the update.

Jeff...

Otis
LLBLGen Pro Team

Posts: 39826
Joined: 17-Aug-2003

# Posted on: 17-Jul-2004 10:18:06

jeffreygg wrote:

Otis wrote:

For entity fetches this is simple: the table supplying the data for the fetch is not aliased. For typed lists, an alias for the entity the field originates from is mandatory. A field therefore contains (in the case of a typed list) the alias for the entity it belongs to.

Hmmm, when does the system generate the alias? If I build a fields collection for a typed list (that includes multi-joins) by adding fields from Table A and Table B, then persist/serialize the collection, then deserialize it and decide to add another field from Table A, will the alias from the pre-serialize Table A match the alias from the post-serialize Table A? i.e., will fields added after the deserialization contain the the same alias as the fields added before the serialization when they're added from the same table/relation? (Whew, hope that came across right). I'm worried that if the alias is generated at the original time of creation of the typed list, then subsequent field additions from the same table will not contain the same alias.

Field lists of typedlists are stored in ResultsetField objects, one per typed list. THis is a subclass of EntityFields(2). It gets a new Add overload which accepts the entity alias for the entity the field is belonging to. That's about it

So per field you have to specify an alias IF you want to do join multiple tables. If not, you don't need to specify an alias of course.

When you rename a table or field used in the predicate, you have to create them again. That's about it.

Great news! Beyond the original needs (multi-joins/aggregates), the persistence issue is my only other critical requirement (at least until I come up with more ) Thanks, Frans. Looking forward to the update.

I'm finally done (except some of the unit tests) with the join with self/multiple times joining the same table, addition of predicate factories, predicates and sort clauses to accept aliases... The non-ansi join equivalent for oracle was hard...

The expression stuff I cooked up (have to implement it) looks promissing too

Frans Bouma | Lead developer LLBLGen Pro