HnD | Reusing Fields in Projection, Page 1

Reusing Fields in Projection

Posts

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 08-Jul-2013 09:11:54

This query fails with a System.ArgumentException : An item with the same key has already been added.

This is the reuse of Amount and VATAmount fields for the GrossTransactionAmount property. I would have expected that the expression parser would just have reused the expression rather than trying to create a new, identical one.

Using aliases does work but results in two identical columns being brought back twice which seems wasteful.

I also tried hosting out the two fields into variables but still get the same exception.

.Select(() => new
                              {
                  ID = AccountsTransactionFields.ID.ToValue<int>(),
                  LegalBodyID = AccountsTransactionFields.LegalBodyID.ToValue<int?>(),
                  NetTransactionAmount = AccountsTransactionFields.Amount.Sum().ToValue<decimal>(),
                  VATTransactionAmount = AccountsTransactionFields.VATAmount.Sum().ToValue<decimal>(),
                  GrossTransactionAmount = [b]AccountsTransactionFields.Amount.Sum().ToValue<decimal>() + AccountsTransactionFields.VATAmount.Sum().ToValue<decimal>()[/b],
                  AllocatedAmount = qf.Field("y", "TotalAllocatedAmount").Sum().ToValue<decimal>(),
                  TotalAllocatedItems = qf.Field("y", "TotalAllocatedItems").Count().ToValue<int>(),
                  LastAllocationDate = qf.Field("y", "LastAllocationDate").Max().ToValue<DateTime>()
          }

Otis
LLBLGen Pro Team

Posts: 39912
Joined: 17-Aug-2003

# Posted on: 08-Jul-2013 10:49:23

You have to use aliases, as a scalar query in the projection can't be re-used in the same projection. It can be re-used if you calculate the scalar query in a query which you append with a projection (so the query with the scalar is a derived table).

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 08-Jul-2013 11:31:25

I don't understand.

.Select(() => new { A = AccountsTransactionFields.ID.ToValue<int>(), B = AccountsTransactionFields.ID.ToValue<int>(), }

Surely the expression evaluator would see this as the same? Or at least the "AccountsTransactionFields.ID" part?

I can't see how the result given to A could ever be different to that given to B

Otis
LLBLGen Pro Team

Posts: 39912
Joined: 17-Aug-2003

# Posted on: 08-Jul-2013 11:38:37

I meant:

SELECT (A+B) AS C, (C+D) AS E FROM X;

because that's what you're doing with the Sum(). Instead do:

SELECT T.C, (T.C+T.D) AS E FROM (SELECT A+B AS C, D FROM X) T

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 08-Jul-2013 12:41:38

Hi Frans

Your're on fire with the extra quick responses today

Unfortunately I'm not and I don't understand your final message at all.

Lets forget my first example as it was more complicated anyway and I eventually replaced all three fields with this (to get rid of the nulls):-

new Expression(Functions.Coalesce(AccountsTransactionFields.Amount, 0), ExOp.Add, Functions.Coalesce(AccountsTransactionFields.VATAmount, 0)).Sum().As("GrossTransactionAmount").ToValue<decimal>()

As I understand it, in a SQL query at least, a field statement can be reused many times and will always return the same value (per row)

SELECT
  x.Amount AS A,
  x.Amount as B,
FROM ....

x.Amount is evaluated just once but returned twice.

Further onto that, I assume that

SELECT
  x.Amount AS A,
  x.Amount as B,
  x.Amount / 2 as C
FROM ....

would still evaluate x.Amount just the once but divide it by 2 before assigning it to C. Thus three outputs all from the same, single input.

Now, my understanding of a projector function is that it will, for each row in a rowset, project the values from that row onto something else, applying expressions if required etc.

The way I thought LLBLGEN was doing this is to take the lambda expression and have it parsed into an expression tree. From that expression tree, you are 'interpreting' the subexpressions etc. to produce the final outputs (properties) then combined into a single anonymous type final output.

My guess (and this is all it was) is that the string "AccountsTransactionFields.ID.ToValue<int>()" from my previous AB example would have been parsed to an Expression (somewhere in the middle of an ExpressionTree) and stored in a Dictionary. Now, either the .NET lambda parser OR your LLBLGen 'interpreter' can realise that the Expression being assigned to A and the Expression assigned to B are identical and can then optimize by not having to process the same Expression twice.

Now by the same token (no pun intended), the "AccountsTransactionFields.ID" could also be in the dictionary in its own right. So if I added "C=AccountsTransactionFields.ID + 100" (or whatever) to my projection, its output could be calculated from the same input as used in the outputs for A and B. There would be no need for aliases since only one field comes back from the database; it is just used differently for the project outputs.

You said that a scalar query can't be reused in the same projection but you didn't say why.

Walaa
Support Team

Posts: 14995
Joined: 21-Aug-2005

# Posted on: 09-Jul-2013 18:31:30

.Select(() => new { ID = AccountsTransactionFields.ID.ToValue<int>(), LegalBodyID = AccountsTransactionFields.LegalBodyID.ToValue<int?>(), NetTransactionAmount = AccountsTransactionFields.Amount.Sum().ToValue<decimal>(), VATTransactionAmount = AccountsTransactionFields.VATAmount.Sum().ToValue<decimal>(), GrossTransactionAmount = AccountsTransactionFields.Amount.Sum().ToValue<decimal>() + AccountsTransactionFields.VATAmount.Sum().ToValue<decimal>(), AllocatedAmount = qf.Field("y", "TotalAllocatedAmount").Sum().ToValue<decimal>(), TotalAllocatedItems = qf.Field("y", "TotalAllocatedItems").Count().ToValue<int>(), LastAllocationDate = qf.Field("y", "LastAllocationDate").Max().ToValue<DateTime>() }

I suggest you re-project the results while doing the addition you need.

eg: Check the Total property in the below example.

var q = from C in metaData.MyEntity
        select new 
        {
            Id = C.Id,
            Amount = C.Amount,
            VAT = C.VAT
        };


var q1 = from C in q
         select new
         {
             Id = C.Id
            Amount = C.Amount,
            VAT = C.VAT
            Total = C.Amount + C.VAT
         };

var x = q1.ToList();

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 10-Jul-2013 10:09:18

Thanks Walaa but I'm not using Linq to LLCoolJ - I'm having enough trouble with Queryspec.

Anyway, this question has moved on a little.

What I am questioning is why the Queryspec expression compiler is making a one to one mapping between the number of returned results and the fields it creates to populate them?

I have been looking at the (decompiled) source code and can see how it works (well partly - there is a lot going on in there!) and I have tried to prove to myself I was right.

Take this example:-

        [Test]
        public void TestTweakToQueryExpressionCompilation()
        {
            var qf = new QueryFactory();

            var query = qf.AccountsTransaction
                .Select(() => new
                              {
                                  A = AccountsTransactionFields.ID.ToValue<int>(),
                                  B = AccountsTransactionFields.ID.ToValue<int>() + 1
                              }
                );

            PerformTestInTransactionWithRollback(adapter =>
                {
                    var result = adapter.FetchQuery(query);

                    Assert.AreEqual(result[0].A + 1, result[0].B);
                });
        }

Currently, LLBLGen will throw an exception because it doesn't like the same field being used twice - one of them needs to be aliased. Adding this does work but the same value is brought back from the database twice-

SELECT
  [AccountsTransaction].[ID],
  [AccountsTransaction].[ID] AS [dup] 
FROM
  [AccountsTransaction]

Now after reverting back to the original test code and changing .Select() to .SelectXXX() and adding a custom version of ProjectionLambaTransformerXXX, the query works without an exception, without needing an artificial alias and returns only one field from the database.

What I did was change HandleToValueCall() to detect that expressionToHandle.Arguments[0] had already been processed and reused the previously generated Expression and avoided incrementing _indexerIndex. This meant that _selectArguments now only needs one entry and ".Call $r.get_Item(0)" on the Resultset got used twice - once in each projection. Exactly what the Projection said to do. Now mine was obviously a fudge owing to lot of code being private/internal but it would only need a Dictionary<object, Expression> and an appropriately clever Comparer** to make this work for all expressions within the projection.

As well as caching the simple Field Expression like I did, you could also cache other immutable Expressions - AccountsTransactionFields.ID.ToValue<int>() in this example thus saving another Expression compilation on B which is a duplicate of the one done for A.

Its all win-win. The data returned from the database will be the minimum necessary; the Expressions generated will be fewer thus also taking less time to create and reducing memory usage. It is also safe because for a given ResultSet row, the values are immutable.

**The Comparer would have to recognize that Field.X is not the same as Field.X.Sum() for example plus other attributes settings in the Field but I guess you have already come across most of this elsewhere in LLBLGen. Now if both those happen to be present, the transformer could automatically internally alias the scalar version in a well-known manner to prevent the duplicate key issue. But once that is done, both could be reused any number of times within the projection - exactly like SQL does (although it has the advantage of being able to use a string as the key)

Otis
LLBLGen Pro Team

Posts: 39912
Joined: 17-Aug-2003

# Posted on: 10-Jul-2013 11:51:20

The fields have to be aliased because the code doesn't check whether the values are indeed the same so without an alias there can't be a reference to it. So please specify an alias for the field if they are the same.

Frans Bouma | Lead developer LLBLGen Pro

Otis
LLBLGen Pro Team

Posts: 39912
Joined: 17-Aug-2003

# Posted on: 10-Jul-2013 11:59:49

A little more info ->

simmotech wrote:

What I am questioning is why the Queryspec expression compiler is making a one to one mapping between the number of returned results and the fields it creates to populate them?

because it has to: there has to be a 100% verified and unique connection between value coming from the DB and its destination and it should always work in all cases.

I have been looking at the (decompiled) source code and can see how it works (well partly - there is a lot going on in there!) and I have tried to prove to myself I was right.

Take this example:-

        [Test]
        public void TestTweakToQueryExpressionCompilation()
        {
            var qf = new QueryFactory();

            var query = qf.AccountsTransaction
                .Select(() => new
                              {
                                  A = AccountsTransactionFields.ID.ToValue<int>(),
                                  B = AccountsTransactionFields.ID.ToValue<int>() + 1
                              }
                );

            PerformTestInTransactionWithRollback(adapter =>
                {
                    var result = adapter.FetchQuery(query);

                    Assert.AreEqual(result[0].A + 1, result[0].B);
                });
        }

That's the .NET part, but this is what's going on:


                .Select(() => new
                              {
                                  A = AccountsTransactionFields.ID.ToValue<int>(),
                                  B = AccountsTransactionFields.ID.ToValue<int>() + 1
                              }

will be converted to:

.Select(AccountsTransactionFields.ID, AccountsTransactionFields.ID)

and a lambda is created:

(a, b) => new
              {
                  A = (int)a[b[0]],
                  B = (int)a[b[1]] + 1
              }

It doesn't check whether the fields it obtains from the lambda are equal or not, as it can't do that. In your situation it might be obvious, but throw in a couple of expressions and they're not easily comparable, if at all.

It will return all the elements always, it doesn't check whether one element is equal to the other and therefore it can return less elements: that not only messes up the resultset layout, it also makes the code tremendously complex, and it's not said it will work in all cases. it's also a bit silly: you requested the field twice, that's your code, if you don't want it to be returned twice, don't fetch it twice.

Will this be changed etc.? No, sorry.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 10-Jul-2013 15:28:40

That was a rather pissy response to be honest.

Especially this bit:-

it's also a bit silly: you requested the field twice, that's _your_ code, if you don't want it to be returned twice, don't fetch it twice.

That was noddy code purely to describe the mechanism.

And actually, I didn't want to fetch it twice, I wanted to fetch it once then use it twice which isn't the same thing.

It is LLBLGen imposing what I believe to be an artificial limitation.

I'm pretty sure that SQL doesn't do an Index scan twice for this query SELECT COUNT() AS Count1, COUNT() AS Count2 FROM Region It works out what to do for "COUNT(*)" in that scope and then reuses it as many times as it is seen in the SELECT list (and GROUP BY for that matter).

Well that is noddy code again so here is something from Northwind:

SELECT
p.ProductName,
COUNT(*) AS TotalCount,
SUM(od.Quantity * od.UnitPrice) AS TotalAmount,
SUM(od.Quantity * od.UnitPrice) / COUNT(*) AS AverageAmount
FROM [Order Details] od
JOIN Orders o ON o.OrderID = od.OrderID
JOIN Products p ON p.ProductID = od.ProductID
GROUP BY p.ProductName

COUNT(*) and SUM(od.Quantity * od.UnitPrice) are both evaluated once, not twice.

You said:-

will be converted to:

.Select(AccountsTransactionFields.ID, AccountsTransactionFields.ID)

and a lambda is created:

Code:
(a, b) => new
             {
                 A = (int)a[b[0]],
                 B = (int)a[b[1]] + 1
             }

which makes me think you didn't fully read was I was suggesting.

What I was suggesting is that the conversion becomes (in your terms)

.Select(AccountsTransactionFields.ID)

and a lambda is created:

(a, b) => new
             {
                 A = (int)a[b[0]],
                 B = (int)a[b[0]] + 1
             }

That means your next comment

It doesn't check whether the fields it obtains from the lambda are equal or not, as it can't do that.

may not be relevant as the field expression used in the projection would be one and the same.

and

that not only messes up the resultset layout, it also makes the code tremendously complex

is also not relevant since the resultset layout is 'better' than it is currently because it is only as big as it needs to be and contains no duplicate values. The projection elements won't access anything outside that array so nothing is messed up. Tremendously complex? Not sure which code you were thinking about here but outside the ProjectionLambdaTransformer class, well nothing at all changes that I can see. The only difficult part is writing a comparer identifying whether Field/FieldExpression variations are equivalent or not. I suspect that a combination of usage of DynamicQueryEngineBase.DetermineIfFieldAliasIsRequired and ExpressionTreeKeyCreator would solve 90% of that anyway.

I don't mind suggesting things that ultimately get declined. I don't feel I am wasting my time because often there is something I hadn't thought of or, like the 'use bits instead of bools' suggestion I made a few days ago, the benefits aren't certain or not enough to justify the change.

But I do try to ensure I am not talking bollocks before I post a suggestion. In this case, I spent a lot of time studying decompiled code then more time proving it does work at least in the scenarios I can think if. If you can say it won't work in scenario X because Y then fine, just tell me.

I do however object to having my suggestions dismissed immediately as silly - I am not a f*ckwit.

Otis
LLBLGen Pro Team

Posts: 39912
Joined: 17-Aug-2003

# Posted on: 10-Jul-2013 16:44:56

simmotech wrote:

That was a rather pissy response to be honest.

Especially this bit:-
it's also a bit silly: you requested the field twice, that's _your_ code, if you don't want it to be returned twice, don't fetch it twice. 
That was noddy code purely to describe the mechanism.

It's a general reply to your request to be able to read a value twice without an alias: if you want it twice, you have to alias it, but why would one do that, because it's rather silly: to obtain a value once is enough.

Well, if it sounded 'pissy', I just see again a wall of text in front of me, just as I predicted when I wrote my reply: a debate that goes on and on about a topic that's actually an edge case and has a proper solution. What I want to avoid is spending 2 hours a day on debates on super-edge cases. I have explained why it's the way it is.

And actually, I didn't want to fetch it twice, I wanted to fetch it once then use it twice which isn't the same thing.

But aren't you able to do that, by aliasing them? I have explained above that that's necessary. I can't say it more simply: identical elements have to be aliased, it's that simple. That you might not want to do that is duly noted, but besides the point: I cant allow that.

It is LLBLGen imposing what I believe to be an artificial limitation.

I'm pretty sure that SQL doesn't do an Index scan twice for this query SELECT COUNT() AS Count1, COUNT() AS Count2 FROM Region It works out what to do for "COUNT(*)" in that scope and then reuses it as many times as it is seen in the SELECT list (and GROUP BY for that matter).

SELECT x.C AS Count1, X.C AS Count2
FROM (SELECT COUNT(*) AS C FROM Region) x

You again assume that what you think works in an isolated case works always on all databases. But that isn't the case. You aren't faced with that problem, I am, so I solved it the way it works on all databases: force an alias for the edge case where an element is required twice: the query isn't aware of the DB it's run on, so it has to work with what works everywhere, always.

The query above aliases the value twice, like you do, but obtains it once. If you want to re-use a value twice, but obtain it once, and you don't want to alias things (and I really wonder why that is such a problem. It clearly takes less time than these debates) use the approach I specified right above

Well that is noddy code again so here is something from Northwind:
SELECT
p.ProductName,
COUNT(*) AS TotalCount,
SUM(od.Quantity * od.UnitPrice) AS TotalAmount,
SUM(od.Quantity * od.UnitPrice) / COUNT(*) AS AverageAmount
FROM [Order Details] od
JOIN Orders o ON o.OrderID = od.OrderID
JOIN Products p ON p.ProductID = od.ProductID
GROUP BY p.ProductName
COUNT(*) and SUM(od.Quantity * od.UnitPrice) are both evaluated once, not twice.

On all databases, in all cases? No. You assume it is, or if it isn't, it isn't your problem (which is fine). But it is to us. That's the main issue I have with the debates you start: you have good points, but you put aside the arguments I come up with, simply because they don't apply to your world, but our code is required to work outside that world as well.

I was referring to this:


SELECT  x.ProductName, 
        x.TotalCount, 
        x.TotalAmount,
        (x.TotalAmount / x.TotalCount) AS AverageAmount
FROM (  SELECT  p.ProductName,
                COUNT(*) AS TotalCount,
                SUM(od.Quantity * od.UnitPrice) AS TotalAmount
        FROM [Order Details] od
                INNER JOIN Orders o ON o.OrderID = od.OrderID
                INNER JOIN Products p ON p.ProductID = od.ProductID
        GROUP BY p.ProductName) x

Same query, single evaluations, no duplicate retrievals.

What you want us to do is re-use expressions, because 'they're already there', but as I explained above, that's not going to work because the code is more complex than you might think.

You said:-
will be converted to:

.Select(AccountsTransactionFields.ID, AccountsTransactionFields.ID)

and a lambda is created:

Code:
(a, b) => new
             {
                 A = (int)a[b[0]],
                 B = (int)a[b[1]] + 1
             }
which makes me think you didn't fully read was I was suggesting.

What I was suggesting is that the conversion becomes (in your terms)
.Select(AccountsTransactionFields.ID)
and a lambda is created:
(a, b) => new
             {
                 A = (int)a[b[0]],
                 B = (int)a[b[0]] + 1
             }

I exactly meant that, which is not doable, because we have to verify whether all fields in the normal projection are indeed the same. A field comparison isn't hard, a complex expression, scalar queries, case trees with expression etc. etc. are. It's also, sorry, code which is only there to overcome an edge case which doesn't need this code to run.

I'm not going to spend considerable amounts of time on complex code which serves an edge case which only pops up if one refuses to write a query differently .

That means your next comment

It doesn't check whether the fields it obtains from the lambda are equal or not, as it can't do that.

may not be relevant as the field expression used in the projection would be one and the same.

No they're not. They're different objects.

Tremendously complex? Not sure which code you were thinking about here but outside the ProjectionLambdaTransformer class, well nothing at all changes that I can see. The only difficult part is writing a comparer identifying whether Field/FieldExpression variations are equivalent or not. I suspect that a combination of usage of DynamicQueryEngineBase.DetermineIfFieldAliasIsRequired and ExpressionTreeKeyCreator would solve 90% of that anyway.

This paragraph perfectly illustrates what I meant earlier: that you shove aside the bigger issues we have to face if we simply go ahead and do what you think is OK. I don't blame you for not seeing what we have to deal with, you only work with your code, but please do understand that when I say things, they're not because I'm unable to write a piece of code, but first and foremost: we have to deal with larger scopes than you, and therefore I see things differently.

To give you an idea how complex this code can become, please look at the code which tries to find fields which refer to derived table fields: PersistenceCore.FixupDerivedTableTargetingFields, to give you an idea. and this is just a small part in a large chain of calls to get from a query description to a SQL query. Perhaps for your particular query it's obvious what should be done: just re-use the same field with the same aggregate. It's not hard to compare a field object and its aggregate function. It becomes much harder when you take into account large expressions, case statements, function calls which accept fields, expressions, aggregates, tables joined multiple times, inheritance etc. etc. Not impossible, but seriously hard code, and if that code is there just to make it easier to write a query in a specific edge case situation, it's not a valuable addition as the time spent on it is not worth it: first of all it takes a lot of time, and the code is complex and has to work always on all databases, and thus has to maintained for years to come.

Sorry, but then I say: No.

I don't mind suggesting things that ultimately get declined. I don't feel I am wasting my time because often there is something I hadn't thought of or, like the 'use bits instead of bools' suggestion I made a few days ago, the benefits aren't certain or not enough to justify the change.

A bugfix justifies a change, making something inconvenient more convenient for a lot of people justifies a change and expanding on a vision of the code justifies a change. All other things don't.

But I do try to ensure I am not talking bollocks before I post a suggestion. In this case, I spent a lot of time studying decompiled code then more time proving it does work at least in the scenarios I can think if. If you can say it won't work in scenario X because Y then fine, just tell me. I do however object to having my suggestions dismissed immediately as silly - I am not a f*ckwit.

If I thought you are a fuckwit, I'd just said 'no' and wouldn't have given you an explanation several times already in this thread. (which took me over 2 hours already, today alone).

Support is free, we help everyone for free. But I don't have unlimited time, on the contrary. I'm more than willing to explain why things are the way they are, but please do realize that we have a bigger scope to support than you. I have no obligation to explain things to you how they are and why things might not work the way you think they'll be, but I'll try, as I think it might help the user better understand the code etc.

I do however want to ask from you that if I explain things to you and say 'it's going to be complex and it's the way it is because of <reason>', you understand that I'm not saying that as an excuse but because that's the reason it's the way it is. If I want to fill this thread with excuses I'd just say "No, not gonna happen" and not explain things ever. It would surely take much less time.

Btw, the full sourcecode is available on the website, no decompile needed (and it comes with comments)

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 10-Jul-2013 18:44:04

Frans

Thanks for taking the time.

I have a few comments (which don't ask you to reconsider!) then I'll leave it alone.

A bugfix justifies a change, making something inconvenient more convenient for a lot of people justifies a change and expanding on a vision of the code justifies a change. All other things don't.

I like this - its very clear.

SELECT T.C, (T.C+T.D) AS E
FROM
(SELECT A+B AS C, D FROM X) T

Right, I'm finally getting it that I will have to learn how to write queries in a different way. I don't find writing SQL Queries easy - never have - but I have learned enough to write SQL queries in a particular way which has worked for me. So its not a case of me "refusing to write a query differently", its me not knowing how to, basically.

I didn't get it from the simple one here but the other examples you provided seemed clearer to me. Thanks. I will practice writing queries to use subqueries.

may not be relevant as the field expression used in the projection would be one and the same.

No they're not. They're different objects.

That's where we differ I think. Maybe my terminology was wrong - I meant it as the resultset[x] to which the compiled Expression uses to project its result.

You may be right and there is still something I don't know but as I see it, what I was suggesting will work on all databases, and all queries - large or small, derived/nested or whatever it doesn't matter because I wasn't getting in deep like you do to actually create the expression. That would still be necessary but when it is recognized that given x is the input and y is the output then the next time you see an x (regardless of how big or complicated it is) you can use y again. The range of x on which this process can be done would be limited to those where x goes to y.

COUNT(*) and SUM(od.Quantity * od.UnitPrice) are both evaluated once, not twice.

On all databases, in all cases? No. You assume it is, or if it isn't, it isn't your problem (which is fine). But it is to us. That's the main issue I have with the debates you start: you have good points, but you put aside the arguments I come up with, simply because they don't apply to your world, but our code is required to work outside that world as well.

Now if whatever database is used returns different values for the same field/expression in the same query then regardless of whether it is aliased or not then you have problems anyway!

Walaa
Support Team

Posts: 14995
Joined: 21-Aug-2005

# Posted on: 10-Jul-2013 19:30:43

I guess we don't have much to add, here. This thread might turn very useful in the future if we find the need to revisit this routine.