Again about DISTINCT in SQL (select Entity from View)

Posts   
 
    
Posts: 21
Joined: 28-Jul-2008
# Posted on: 17-Aug-2009 09:13:02   

Hi ! We again start test LLBLGen in out project - and we have a problem: 1. Simple VIEW in DB contains two table with LEFT JOIN, ID field is PK 2. Generated entity from this view (we try manual mark ID as PK/identity - no changes in result) 3. Select (using linq) from DB some entities with simple query from our view 4. In generated SQL we see DISTINCT - for what ? Time to take 20 records (we use paging) more 5-6 sec, if we use SAME query but witout DISTINCT keyword - execution time less 0.1 sec. Possible without manual creating Projectors,Projection and etc. fetch entity without DISTINCT keyword in SQL ?

P.S. Lastest (today) LLBLGen from site, Oracle native drivers, Oracle 10g with lastest patchset, lastest ODP.NET.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 17-Aug-2009 10:30:52   

With simple query you mean: var q = from c in metadata.Customer where c.Country=="Russia" select c;

?

Without joins, distinct shouldn't be there on entity fetches. For projections, so not entity fetches, it is due to this: a bug in the linq provider which we can't fix at the moment as it will break existing code: the allowduplicates flag is not initialized to true in linq queries. See line 69-70 in ExpressionClasses\QueryExpression.cs in the linq provider sourcecode. This way, AllowDuplicates is always false initially, so distinct is always emitted. Normally, one would want distinct values anyway, so it's not a big deal, but that's precisely the point why we can't add AllowDuplicates=true; as init: code which now depends on this 'feature' (read: bug) is going to break if we change this. I hate it, trust me, but I can't change this now. You can uncomment the line in the sourcecode to fix it manually.

I understand from your post (it would have been better if you would have posted that simple query, as entity fetches are different from projections) that you used an entity fetch? (just to be sure)

(edit) the issue I mentioned above doesn't pop up always, it's only in effect in some cases. For example:


var q = from e in metaData.Emp
        where e.Job.StartsWith("S")
        select new { e.Empno, e.Deptno, e.Ename };

or


var q = from e in metaData.Emp
        where e.Job.StartsWith("S")
        select e;

don't emit Distinct.

(table or view, that's no difference, the DQE has no clue if the target is a table or a view)

Frans Bouma | Lead developer LLBLGen Pro
Posts: 21
Joined: 28-Jul-2008
# Posted on: 18-Aug-2009 06:01:26   

Thanks for quick answer ! A little more details:

  1. In DB view:
CREATE VIEW MY_VIEW AS SELECT T1.*, T2.FIELD_1, T2.FIELD_2 FROM T1 LEFT JOIN T2 ON T1.ID = T2.T1_ID;

this view contains about 5 million records.

  1. In LLBLGen mapper we just create "Entity From View" - MyView, and mark ID as "part of PK"

  2. In C# code:


var q = from e in metaData.MyView select e;
var result = q.Take(20).ToList();

  1. When we turn on debugging trough OracleDQE="4" - we see SQL query (after delete some junk):

SELECT DISTINCT .....all_MY_VIEW_fields..... FROM MY_VIEW WHERE ROWNUM <= 20;

  1. In result time to take 20 record - more 6 sec, if I just manual execute this query in SQLDeveloper - same, when I remove "DISTINCT" keyword - execution time about 0.01sec
Posts: 21
Joined: 28-Jul-2008
# Posted on: 18-Aug-2009 06:32:26   

New information. I use simple table for better understanding internal processes:


using System;
using System.Diagnostics;
using Model.DatabaseSpecific;
using Model.EntityClasses;
using Model.FactoryClasses;
using Model.HelperClasses;

namespace LLBLGenExperiment
{
    internal class LlblGenExperiment
    {
        private static void Main(string[] args)
        {
            using (var adapter = new DataAccessAdapter())
            {
                var userEntities = new EntityCollection<UserEntity>(new UserEntityFactory());
                Trace.WriteLine("FetchEntityCollection");
                adapter.FetchEntityCollection(userEntities, null);
                Trace.WriteLine("Fetched " + userEntities.Count + " entities.");
            }
        }
    }
}

result:


FetchEntityCollection
'LLBLGenExperiment.vshost.exe' (Managed): Loaded 'D:\Work\Projects\Glavsnab.Ring.Realty\src\Enterprise\Tools\LLBLGenExperiment\bin\x86\Debug\SD.LLBLGen.Pro.DQE.Oracle10g.NET20.dll'
Method Enter: CreatePagingSelectDQ
Method Enter: CreateSelectDQ
Method Enter: CreateSelectDQ
Generated Sql query: 
    Query: SELECT "ACCOUNT"."ACCOUNT_ID" AS "Id", "ACCOUNT"."LAST_NAME" AS "LastName", "ACCOUNT"."EMAIL" AS "Email", "ACCOUNT"."ACTIVATED" AS "Activated", "ACCOUNT"."PASSWORD_HASH" AS "PasswordHash", "ACCOUNT"."PASSWORD_SALT" AS "PasswordSalt", "ACCOUNT"."DATE_CREATE" AS "DateCreate", "ACCOUNT"."DATE_LAST_LOGON" AS "DateLastLogOn", "ACCOUNT"."DATE_LAST_ACTIVITY" AS "DateLastActivity", "ACCOUNT"."DATE_LAST_PASSWORD_CHANGE" AS "DateLastPasswordChange", "ACCOUNT"."REGISTRATION_GUID" AS "RegistrationGuid", "ACCOUNT"."DATE_REGISTRATION" AS "DateRegistration", "ACCOUNT"."PHONE" AS "Phone", "ACCOUNT"."IS_DELETED" AS "IsDeleted", "ACCOUNT"."FIRM_ID" AS "FirmId", "ACCOUNT"."FIRST_NAME" AS "FirstName" FROM "ACCOUNT"

Method Exit: CreateSelectDQ
Method Exit: CreatePagingSelectDQ: no paging.
'LLBLGenExperiment.vshost.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\System.EnterpriseServices\2.0.0.0__b03f5f7f11d50a3a\System.EnterpriseServices.Wrapper.dll', Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
'LLBLGenExperiment.vshost.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System.Design\2.0.0.0__b03f5f7f11d50a3a\System.Design.dll', Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
Fetched 158 entities.

All look fine, BUT when I add paging (thrid parameter in FetchEntityCollection method):


adapter.FetchEntityCollection(userEntities, null, 20);

then result is:


FetchEntityCollection
...
Query: SELECT * FROM (SELECT DISTINCT "ACCOUNT"."ACCOUNT_ID" AS "Id", "ACCOUNT"."LAST_NAME" AS "LastName", "ACCOUNT"."EMAIL" AS "Email", "ACCOUNT"."ACTIVATED" AS "Activated", "ACCOUNT"."PASSWORD_HASH" AS "PasswordHash", "ACCOUNT"."PASSWORD_SALT" AS "PasswordSalt", "ACCOUNT"."DATE_CREATE" AS "DateCreate", "ACCOUNT"."DATE_LAST_LOGON" AS "DateLastLogOn", "ACCOUNT"."DATE_LAST_ACTIVITY" AS "DateLastActivity", "ACCOUNT"."DATE_LAST_PASSWORD_CHANGE" AS "DateLastPasswordChange", "ACCOUNT"."REGISTRATION_GUID" AS "RegistrationGuid", "ACCOUNT"."DATE_REGISTRATION" AS "DateRegistration", "ACCOUNT"."PHONE" AS "Phone", "ACCOUNT"."IS_DELETED" AS "IsDeleted", "ACCOUNT"."FIRM_ID" AS "FirmId", "ACCOUNT"."FIRST_NAME" AS "FirstName" FROM "ACCOUNT") WHERE rownum <= 20
...
Fetched 20 entities.

We can see that DISTINCT keyword ! They ALWAYS appear when we using paging. For simple tables SELECT * FROM T1 fully equal SELECT DISTINCT * FROM T1, hovewer for JOINED tables like out VIEW (with left join) adding DISTINCT keyword after SELECT bring to catastrophic increase execution time rage

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 18-Aug-2009 10:11:01   

The limiter value (the 20) is indeed triggering the DISTINCT, I'll look into this if this is necessary or a bug.

(edit) I think it's a bug: distinct is always requested if there is a limit. However the caller can verify if there is really a requirement for it (i.e. the fetch is for entities, so they're already unique, if there are no duplicate causing relations in the relationscollection).

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 18-Aug-2009 12:34:22   

I've altered the runtime a bit, so with no relationships you won't see distinct when fetching entities. It's still not ideal, there are cases where distinct might have been avoided but you will see it in the query, simply because detection for duplicates should be more advance than it is now. We've planned to do that work in 3.0

Attached you'll find a temp build of this fix. Could you please test this to see if it makes a difference?

Frans Bouma | Lead developer LLBLGen Pro
Posts: 21
Joined: 28-Jul-2008
# Posted on: 19-Aug-2009 05:47:59   

Thanks for DLL ! Work gread - no DISTINCT in SQL and execution time reduce to milliseconds simple_smile DISTINCT still exist in COUNT(*) expression - when we use query.Count() command, but in this case no different, DISTINCT ignored.

Thanks again for quick help.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 19-Aug-2009 09:47:45   

exception.cpp wrote:

Thanks for DLL ! Work gread - no DISTINCT in SQL and execution time reduce to milliseconds simple_smile DISTINCT still exist in COUNT(*) expression - when we use query.Count() command, but in this case no different, DISTINCT ignored.

Thanks again for quick help.

You mean that Distinct is emitted with the Count() query? That's actually an issue in the linq provider. See QueryExpression.cs at the top, we forgot to initialize AllowDuplicates to true, and adding that now will break code unfortunately. So we postponed that till v3.0. You can uncomment the line we commented out to get that behavior in the linq provider sourcecode if you want.

Frans Bouma | Lead developer LLBLGen Pro