Linq to LLBLGen Pro creates an index in memory

Posts   
 
    
RJReinders
User
Posts: 12
Joined: 04-May-2023
# Posted on: 04-May-2023 12:54:02   

I am working on a project to convert complex batchjobs from a legacy system to .net using LLBLGen Runtime Framework. The batchjobs all follow a similar pattern: perform a complex query using (mostly left) joins, determine additional properties and write the result set to a file. This file is then used as the source for the next query.

Using Linq to LLBLGen Pro I can write a very effective (well-performing) first query, injecting subqueries where necessary (including in the first from statement). Simplified example:

var db = new LinqMetaData(adapter);

var inputQuery = 
            from jInput in db.BatchJobInput
            where
                jInput.UserCode == user &&
                jInput.LogNr == logNr
            select jInput;
            
var query = from input in inputQuery
            join cl in db.Cluster
                on new { input.AdministrationNr, ClusterNr = input.ClusterNr.GetValueOrDefault() }
                equals new { cl.AdministrationNr, cl.ClusterNr }
                into clJoin
            from cl in clJoin.DefaultIfEmpty(null)
            
            // other joins
            
            select new Line
            {
                ClusterFound = cl != null,
                ClusterNr = cl.ClusterNr
                
                // other properties
            };      

However, when I put the resulting Line objects into a POCO (the way the original implementation wrote them to a file) and use that as the input for the next query, LLBL starts behaving very strangely. Simplified example:

var db = new LinqMetaData(dataAccessAdapter);

IQueryable<OutcomePoco> query = from subFileLine in subFile1.Lines.AsQueryable()
    join cl in db.Cluster
        on new { subFileLine.AdministrationNr, subFileLine.WorkOrder.ClusterNr }
        equals new { cl.AdministrationNr, cl.ClusterNr }
        into clJoin
    from cl in clJoin.DefaultIfEmpty(null)
    // additional joins
    select new OutcomePoco(subFileLine)
    {
        ClusterExists = cl != null,
        ClusterDescription = cl.Description

        // additional properties from other joins
    };

var test = query.ToArray();

The logging shows that LLBL will read the entire Cluster table into memory (a Select * From statement without where clause) rather than using the values provided in the Lines to filter the set. This will be an unworkable solution for my business case, because some of the tables contain over a million records. I don't understand why linq to llblgen behaves so differently when providing a (.net?) IQueryable over an (LLBL?) IQueryable.

The 1st scenario described above even works correctly if the subquery is generated by a different LinqMetadata instance than the main query into which it is injected (same DataAccessAdapter instance though), so I assumed I could just work with IQuerables from any source. If I execute the subquery first and use its result set as input for the main query, I again observe this behaviour (completely forgets to WHERE and reads everything into memory).

How might I resolve this issue? Passing the first query (not executed) as input parameter to the next query is also no solution, since I need to manipulate the resultset in the meantime (calculating or concatenating fields, etc.). Also I would be very interested in understanding what I'm doing wrong and why the runtime behaves this way.

edit: I should mention the underlying database is an Oracle database and I am using Oracle.ManagedDataAccess.Core in the DQE.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39766
Joined: 17-Aug-2003
# Posted on: 05-May-2023 09:57:01   

The main issue is that this linq query:

var db = new LinqMetaData(dataAccessAdapter);

IQueryable<OutcomePoco> query = from subFileLine in subFile1.Lines.AsQueryable()
    join cl in db.Cluster
        on new { subFileLine.AdministrationNr, subFileLine.WorkOrder.ClusterNr }
        equals new { cl.AdministrationNr, cl.ClusterNr }
        into clJoin
    from cl in clJoin.DefaultIfEmpty(null)
    // additional joins
    select new OutcomePoco(subFileLine)
    {
        ClusterExists = cl != null,
        ClusterDescription = cl.Description

        // additional properties from other joins
    };

var test = query.ToArray();

Will be converted in its entirety to SQL, however you join a set coming from a file to a table. The file is in memory, the table is in the database. Linq makes it easy to write a single query over multiple sources as if they all are originating in the same location but in practice this isn't the case in your situation: the lines from the file aren't in the database so you can't join a table with these lines.

What you can do however is rewrite the query a bit to pass id's from the lines in the file to the query as an IN clause, using the Linq's Contains() function (so first project the file to objects, then use Contains(), a bit similar to query 7 in this example: https://www.llblgen.com/Documentation/5.10/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/Linq/gencode_linq_generalusage.htm#queryable-contains), however it depends on how many lines are present in the subFile1 file.

As you're left-joining with the lines in the file, it is a bit tricky as an IN clause might do an inner join instead. Alternatively it might be easier to inline the query that produced the file altogether and run it as 1 query, skipping the file route?

Frans Bouma | Lead developer LLBLGen Pro