Doing Bulk/Direct INSERT INTO with Dynamic Select Query in v5.9

Posts   
 
    
DaveH2022
User
Posts: 5
Joined: 15-Sep-2022
# Posted on: 12-Nov-2022 02:21:03   

Hi Frans/others,

I've resolved all MY issues upgrading LLBL v4.2 to 5.9. I'm down to this one special case that I've worked around, but am hoping for a more efficient solution.

We have one case where we let the user define filtering criteria to find records. We start an IRelationPredicateBucket with our internal "base criteria" and then dynamically add 0-N additional predicates based on the user's selections. (Which means it would not be practical to implement in a stored procedure.)

With LLBL v5.9 I can execute FetchDataReader with that IRelationPredicateBucket and an IEntityFields2 list of fields to fetch, and LLBL builds and executes the exact "select query" that I need -- in this case returning 2 columns for records that match the dynamic filtering criteria. For example, it starts with: exec sp_executesql N'SELECT @p2 AS [RuleId], [DB].[Schema].[Table].[Id] FROM ...

Which is to say my IRelationPredicateBucket and my IEntityFields2 variables are still getting populated just as they were under v4.2, and v5.9 is able to use those in FetchDataReader to build (and execute) the "select query."

The challenge: The "select" query might return 100,000 or more rows, which are the values that need to be directly inserted into a many-to-many join table. In v4.2 we implemented an example that used GetFieldPersistenceInfos and CreateQueryFromElements to get LLBL to "build the select statement" which we would then prepend with insert into [DB].[Schema].[Table] (ColumnName1, ColumnName2) and then ExecuteActionQuery so that we'd deliver a single "insert into X select from Y" query to SQL Server, which "instantly" inserts 100,000 records into a table.

I have created a workaround (use FetchDataReader to fetch only the one ID value I need, then build/execute batches of "insert into" statements). It does work and using adapter.BatchSize made it bearable. However, when there's a large number of matching records (a common use case) it is significantly slower than the "SQL-Server-Side" code we used with v4.2.

In v4.2 we implemented an example that started with IRelationPredicateBucket and IEntityFields2 and would leverage LLBL methods to build the "select query" (with all the dynamic predicates) and prepend the result with INSERT INTO Table(Col1, Col2) which could then be executed in a single trip to the SQL Server (the server executes the "insert into X select from Y" almost instantly, even with when many records need to be inserted).

The code in v4.2 used GetFieldPersistenceInfos which I know is a breaking change v5.8. I implemented the changes detailed at https://www.llblgen.com/Documentation/5.9/LLBLGen%20Pro%20RTF/migratingcode.htm#breaking-changes-v5.8 so I was able to get the GetFieldPersistenceInfos call to execute.

However, our v4.2 code is also using CreateQueryFromElements which I assume has been removed/renamed/moved because using reflection to retrieve that method info just returns NULL (so our code can't call the method). I found https://www.llblgen.com/tinyforum/Thread/22320 from 2013 that suggested using CreateSelectDQ instead of CreateQueryFromElements, so I added:

public IRetrievalQuery OurCreateSelectDQ(QueryParameters parameters)
{
    return base.CreateSelectDQ(parameters);
}

to our derived QueryCreationManager. (I also added protected override SD.LLBLGen.Pro.ORMSupportClasses.Adapter.QueryCreationManager into our partial class InsertFromQueryDataAccessAdapter : DataAccessAdapter.)

I created the QueryParameters as the example showed (but using the populated variables being used successfully in the v4.2 code...confirmed no attributes are null). However, calling base.CreateSelectDQ throws a null reference exception (with no innerException).

After trying many different variations, I finally gave up and wrote the workaround so the code will function, but it's very noticeably slower when a large number of records need to be inserted (which unfortunately is a common scenario).

The question: Is there any way to ask LLBL to build a SQL select statement and just return the SQL to me (without executing it)? The starting point: IRelationPredicateBucket bucket has the filter criteria, and IEntityFields2 fieldsToFetch identifies the fields to fetch. The variables are valid (that is, calling FetchDataReader would result in LLBL building and successfully executing the exact SELECT query that I need).

If I could get the actual SQL that would be executed, I could just inject the "insert into" piece and I'd get exactly what I need. For example, here's what SQL Profiler captured from FetchDataReader, followed by how I want to change it:

exec sp_executesql N'SELECT @p2 AS [RuleId], [DB].[Schema].[Table].[Id] FROM ...
exec sp_executesql N'**INSERT INTO [DB].[Schema].[OtherTable] (Col1, Col2)** SELECT @p2 AS [RuleId], [DB].[Schema].[Table].[Id] FROM ...

Executing that slightly modified SQL would go to SQL Server in a single trip, and SQL Server would "instantly" insert 100,000 records into the specified table (using the select query that LLBL built dynamically).

Thanks! --Dave

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 12-Nov-2022 10:21:22   

The queries are created by passing a filled QueryParameters object to the DynamicQueryEngine. To make sure it will create a query, the persistence info's for all used mapped elements have to be merged with the QueryParameters' contents. This is done by the QueryCreationManager since 5.8. To create a query you thus have to do 2 things, and one of them is an internal method call, which thus makes this problematic. The methods are:

var entityfactory = _queryCreationManager.PrepareParametersForCollectionFetch(parameters);
IRetrievalQuery selectQuery = _queryCreationManager.CreateSelectDQ(parameters));

You can obtain the querycreationmanager yourself and call CreateSelectDQ through a method you can add yourself to the derived class but you can't call PrepareParametersForCollectionFetch as it's internal. Reflection can help but if there's another way that's preferable.

As there was never a use case to do this, our interfaces aren't open for this.

I however think there's an easier workaround. When you call FetchDataReader(fields, filter ... ), it'll call into FetchDataReader(readerBehavior, QueryParameters), which will build the SELECT query and call FetchDataReader(IRetrievalQuery, readerBehavior) but not execute it yet.

Override public virtual IDataReader FetchDataReader(IRetrievalQuery queryToExecute, CommandBehavior readerBehavior) in your specific adapter. You'll get the SQL query passed in with the queryToExecute object. Now, to insert the rows of the select into the table you want instead of fetching them, modify the DbCommand in the queryToExecute: prepend the query with your INSERT into ... fragment. Then call the base method and you'll execute the reader. As the query executes an insert, the # of rows is 0, but the query still executes. You could append a SELECT 1 from table or whatever if this fails, I haven't tested it, but I think it'll work fine.

The reader obviously has no value, but call Read() on it once and then close it.

Frans Bouma | Lead developer LLBLGen Pro
DaveH2022
User
Posts: 5
Joined: 15-Sep-2022
# Posted on: 15-Nov-2022 19:00:34   

I also would prefer to avoid calling private methods. I like the concept of overriding FetchDataReader and prepending the "insert into {Db.Schema.Table} (Col1, Col2)" so I'll try it out.

In the meantime I improved my workaround. Initially I fetched the single Id column to get the (100,000) values that needed to be inserted, and then (in batches) I was creating entity, setting its 2 values, and appending to a UOW for save (executed in a single trip). With 2 parameters the batch size was 1,000 (to stay under SQL Server's 2100 parameters limit).

I decided that if I was OK with the concept of injecting a hard-coded "insert into X (Col1,Col2)" into a query (which would break in the unlikely event the table or either column was renamed) then why not just write the SQL? I still leverage LLBL to select the 1 value I need, bu instead of creating entities I use a StringBuilder to build a query like declare @staticId uniqueidentifier = 'GUID'; insert into X(Col1,Col2) select @staticId, t.Id from (select Id='Guid1' union all select 'Guid2' union all select 'Guid3') t

That eliminated the overhead of creating/populating many entities that get used to build a batch of 1K "insert into" statements to be executed, and it reduced the number of trips because I could bump the batch size to 5K or higher, which gets the speed "close enough" to the "one statement in a single trip" speed that it should be an acceptable approach...but I'm going to try your "override FetchDataReader" approach too.

I know this topic has come up over the years. Perhaps you'd consider adding a public BulkInsert method? It could take something like:

IRelationPredicateBucket selectJoinsAndFilter -- for building the "from...where..." part
IEntityFields2 fieldsToSelect -- subset of selectJoinsAndFilter fields to build the "select col1 [, col2 [, col3]]" part
IEntity2 / typeof(IEntity2) entityToInsertInto -- to get the [Db].[Schema].[Table] to insert into
IEntityFields2 fieldsToInsertInto  -- identifies fields in the specified "insertIntoEntity"

The first 2 parameters already work in FetchDataQuery to build the required "select" statement. The last 2 parameters would provide your method the info needed to dynamically build the "insert into X (col1 [, col2 [,col3]])" to prepend before the "select". (Actually fieldsToInsertInto would have the restriction/sanity check that all the fields must be from the same Entity, so you could probably get the entity detail from any of those fields [instead of having to separately provide entityToInsertInto]).

Thanks!

--Dave

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 16-Nov-2022 14:12:26   

DaveH2022 wrote:

I also would prefer to avoid calling private methods. I like the concept of overriding FetchDataReader and prepending the "insert into {Db.Schema.Table} (Col1, Col2)" so I'll try it out.

In the meantime I improved my workaround. Initially I fetched the single Id column to get the (100,000) values that needed to be inserted, and then (in batches) I was creating entity, setting its 2 values, and appending to a UOW for save (executed in a single trip). With 2 parameters the batch size was 1,000 (to stay under SQL Server's 2100 parameters limit).

I decided that if I was OK with the concept of injecting a hard-coded "insert into X (Col1,Col2)" into a query (which would break in the unlikely event the table or either column was renamed) then why not just write the SQL? I still leverage LLBL to select the 1 value I need, bu instead of creating entities I use a StringBuilder to build a query like declare @staticId uniqueidentifier = 'GUID'; insert into X(Col1,Col2) select @staticId, t.Id from (select Id='Guid1' union all select 'Guid2' union all select 'Guid3') t

That eliminated the overhead of creating/populating many entities that get used to build a batch of 1K "insert into" statements to be executed, and it reduced the number of trips because I could bump the batch size to 5K or higher, which gets the speed "close enough" to the "one statement in a single trip" speed that it should be an acceptable approach...but I'm going to try your "override FetchDataReader" approach too.

Also remember we have a Plain SQL api which offers parameterization: https://www.llblgen.com/Documentation/5.9/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/gencode_executing_plain_sql.htm which can help you executing the whole query with parameters efficiently.

I know this topic has come up over the years. Perhaps you'd consider adding a public BulkInsert method? It could take something like:

IRelationPredicateBucket selectJoinsAndFilter -- for building the "from...where..." part
IEntityFields2 fieldsToSelect -- subset of selectJoinsAndFilter fields to build the "select col1 [, col2 [, col3]]" part
IEntity2 / typeof(IEntity2) entityToInsertInto -- to get the [Db].[Schema].[Table] to insert into
IEntityFields2 fieldsToInsertInto  -- identifies fields in the specified "insertIntoEntity"

The first 2 parameters already work in FetchDataQuery to build the required "select" statement. The last 2 parameters would provide your method the info needed to dynamically build the "insert into X (col1 [, col2 [,col3]])" to prepend before the "select". (Actually fieldsToInsertInto would have the restriction/sanity check that all the fields must be from the same Entity, so you could probably get the entity detail from any of those fields [instead of having to separately provide entityToInsertInto]).

It's come up from time to time indeed, however defining an API that covers most scenarios is more complex than you'd think. Bulk insert is also a scenario that's often done with SqlBulkCopy, or importing external data from a source; if the source is a query run on the database itself, do the projection and the insert fields line up etc. The query to fetch the data with should be as flexible as any select query so it's a matter of e.g. any linq/queryspec query you can formulate as source, which requires a way to formulate such a query to pass into the api.

It's an interesting concept we might look into at one point, but it's very specific for a given use case that if the api isn't flexible enough you'll have to fall back to a workaround system anyway.

Frans Bouma | Lead developer LLBLGen Pro
DaveH2022
User
Posts: 5
Joined: 15-Sep-2022
# Posted on: 16-Nov-2022 19:06:54   

defining an API that covers most scenarios is more complex than you'd think

That wouldn't surprise me! However, the scenarios I saw seemed similarly narrow to mine, where we are "frustratingly close" to a direct insert: we're able to create a dynamic IRelationPredicateBucket selectJoinsAndFilters and IEntityFields2 selectFields that FetchDataQuery uses to return the exact rows and columns that need to be inserted. We just lack a method that could also accept a separate list of IEntityFields2 insertIntoFields (limitations: all fields must be from the same entity and maybe the entity also must map to a Table not a View). The method would use insertIntoFields to start the SQL command text with "insert into Table1(col1,col2)" and then append the "select" SQL command text that FetchDataQuery already builds.

If it helps, my use case is not loading lots of data from a file. (We have a separate "data load" that'll process 100K records over just a few minutes [leveraging LLBL to try fetching an entity, applying the incoming data to the entity, then saving the entity...which does nothing if the entity wasn't actually changed]. Very efficient, and "a few minutes" is totally fine in an offline service.)

My use case is an admin using our web app to build an "Audience" by creating a series of individual Rules, like "ADD Users in Job Codes X,Y,Z" then "REMOVE Users in (US state of) TX or FL". The admin can preview a paged list of Users for each Rule. Because some of the filtering options can be complex, we simplify by executing the actual query only once, and the resulting UserIds get stored in a Rule-to-User join table. Not only can the "paged preview" just be a simple "Rule-to-User join to User" query, but the Rule-to-User join table will end up with the results for all the Rules, which an action stored procedure uses to adjust the overall Audience-to-User join table.

The admin's filtering criteria gets serialized and saved so that we can (A) restore the filtering criteria into the UI so they can adjust their filtering criteria and preview/save changes, and (B) have code that uses that filtering criteria to dynamically add predicates to a predicate bucket in order to update the Rule-to-User join table for a Rule.

The underlying User data is "constantly" updated, so we must regularly "refresh" every Rule (adjusting the Rule-to-User join table) and every Audience (adjusting Audience-to-User join table) so we want those processes to be efficient. We must use application code (not a stored procedure) to do the building of the dynamic selectQuery. Ideally, we'd not be fetching (100K) records from the DB server to the App server, just so the code can push some form of (100K) "insert into" statements back to the DB server.

Therefore, I'd love to have a helper method (even if it had restrictions/limitations) that internally used the same FetchDataQuery process to build the "select SQL" command text, but prepend it with the dynamically built "insert into JoinTable(Col1,Col2)" SQL command text so that command could be executed directly in the DB server, leveraging its strength to efficiently insert a large batch of [100K] records into a table via a single insert statement. (For now I'll try your suggestion to prepend the static/hard-coded "insert into" SQL. In our case it's highly unlikely that the JoinTable name or columns will change, but it feels "dirty" to hard code that part when the selectQuery is built dynamically by LLBL.)

Thanks again for listening, and for continuing to safely improve LLBLGen--it continues to serve us well (must be 10 -15 years now).

--Dave

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 17-Nov-2022 10:20:03   

Thanks for your feedback! simple_smile We'll look into in what form (or if) we should add this to the API. Won't be in the next version tho.

Frans Bouma | Lead developer LLBLGen Pro