Breaking change in Linq provider coming up

Posts   
 
    
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39769
Joined: 17-Aug-2003
# Posted on: 28-Aug-2009 12:05:36   

Starting with build 2.6.9.828 of the linq provider a breaking change is implemented. This is unfortunate, but we have to implement this change as it creates wrong results in nested aggregates.

The change is about distinct. By default, due to a mistake, the 'allowduplicates' flag on a QueryExpression is left as 'false'. This means that by default, all queries will get distinct filtering. For most entity queries this is not a problem, the system already performs duplicate filtering. For other queries it might be a bit less straight forward. Some queries which now rely on this bug will produce different results. However, actually, the queries always have produced the wrong results in the first place.

The problem becomes bigger when nested aggregates are used: the query aggregated with a scalar value contains 'distinct', which creates in some cases wrong results as duplicates are filtered out.

We had several unit-tests which failed after making the change, but we feel we have to as otherwise the queries are not definable in Linq as one can't set the AllowDuplicates flag to true with a linq construct (only to false using Distinct()).

The fix is made at line 74 in the updated ExpressionClasses\QueryExpression.cs. If you feel this change hurts your application, compile the sourcecode of the linq provider and comment out line 74, of the sourcecode of build 2.6.9.828 or later.

The current build has the fix at line 70 of that same class, commented out.

We apologize for this inconvenience.

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39769
Joined: 17-Aug-2003
# Posted on: 28-Aug-2009 13:17:48   

The following two queries failed in our tests, which illustrate the possible impact for your code: Q1:


LinqMetaData metaData = new LinqMetaData(adapter);
var q1 = from soh in metaData.SalesOrderHeader
        from sod in soh.SalesOrderDetailCollection
        select soh;
var q = from soh in q1
        select new { soh.SalesOrderId };

int count = 0;
foreach(var v in q)
{
    count++;
    Assert.IsTrue(v.SalesOrderId > 0);
}
Assert.AreEqual(31465, count);

This now results in: 121317 instead 31465

Q2:


LinqMetaData metaData = new LinqMetaData(adapter);
var q1 = from c in metaData.Customer
         join o in metaData.Order on c.CustomerId equals o.CustomerId
         select c;

var q = from c in q1
        select new { c.CustomerId, c.CompanyName };

int count = 0;
foreach(var v in q)
{
    count++;
}
Assert.AreEqual(89, count);

This now results in 818 rows (one for each order), instead of 89.

Q1's SQL:


SELECT [LPA_L1].[SalesOrderId] 
FROM 
(
    SELECT  [LPA_L3].[SalesOrderID] AS [SalesOrderId], [LPA_L3].[RevisionNumber], [LPA_L3].[OrderDate], [LPA_L3].[DueDate], 
            [LPA_L3].[ShipDate], [LPA_L3].[Status], [LPA_L3].[OnlineOrderFlag], [LPA_L3].[SalesOrderNumber], [LPA_L3].[PurchaseOrderNumber], 
            [LPA_L3].[AccountNumber], [LPA_L3].[CustomerID] AS [CustomerId], [LPA_L3].[ContactID] AS [ContactId], [LPA_L3].[SalesPersonID] AS [SalesPersonId],
            [LPA_L3].[TerritoryID] AS [TerritoryId], [LPA_L3].[BillToAddressID] AS [BillToAddressId], [LPA_L3].[ShipToAddressID] AS [ShipToAddressId], 
            [LPA_L3].[ShipMethodID] AS [ShipMethodId], [LPA_L3].[CreditCardID] AS [CreditCardId], [LPA_L3].[CreditCardApprovalCode], 
            [LPA_L3].[CurrencyRateID] AS [CurrencyRateId], [LPA_L3].[SubTotal], [LPA_L3].[TaxAmt], [LPA_L3].[Freight], [LPA_L3].[TotalDue], 
            [LPA_L3].[Comment], [LPA_L3].[rowguid] AS [Rowguid], [LPA_L3].[ModifiedDate] 
    FROM ( [AdventureWorks].[Sales].[SalesOrderHeader] [LPA_L3]  INNER JOIN [AdventureWorks].[Sales].[SalesOrderDetail] [LPA_L4]  
            ON  [LPA_L3].[SalesOrderID]=[LPA_L4].[SalesOrderID])
) [LPA_L1]

Due to the join, duplicates will occur. As a value is selected instead of entities, this will create a longer list of values. The following fix solves it:


var q = (from soh in q1
        select new { soh.SalesOrderId }).Distinct();

-> 'Distinct()' is appended to the query as we're interested in the # of unique values.

SQL Q2, which has the same problem:

SELECT [LPA_L1].[CustomerId], [LPA_L1].[CompanyName] 
FROM 
(
    SELECT  [LPA_L3].[CustomerID] AS [CustomerId], [LPA_L3].[CompanyName], [LPA_L3].[ContactName], [LPA_L3].[ContactTitle], [LPA_L3].[Address], [LPA_L3].[City], 
            [LPA_L3].[Region], [LPA_L3].[PostalCode], [LPA_L3].[Country], [LPA_L3].[Phone], [LPA_L3].[Fax] 
    FROM ( [Northwind].[dbo].[Customers] [LPA_L3]  INNER JOIN [Northwind].[dbo].[Orders] [LPA_L4]  
            ON  [LPA_L3].[CustomerID] = [LPA_L4].[CustomerID])
) [LPA_L1]

Again, appending distinct to the query solves it:

var q = (from c in q1
        select new { c.CustomerId, c.CompanyName }).Distinct();

So this only occurs in: - fetches of value lists / custom projections. Fetches of entities are not affected either: - joins are used and the fetched data is on the '1' side of a 1:n relation which is used in the join (which create duplicates) or: - a custom projection is used on an entity set where less fields are fetched which could potentially contain duplicate rows.

If you before wondered why duplicates were automatically filtered out in custom projections, this wasn't a 'feature' but actually a bug.

Frans Bouma | Lead developer LLBLGen Pro