Inline Integer Predicate Parameters in Generated Queries

Posts   
 
    
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 04-Aug-2021 21:46:10   

Build Version: 5.5 (5.5.1) RTM Build Date: 17-Jan-2019 Project: Adapter project targetting .NET 4.5.2 Database: MS SQL 2019

Is there currently any way to either configure or customize the way LLBLGen pro generates queries so that range filters based on integer or numeric inputs are inlined into the query instead of passed as parameters, specifically when the query generates something like.

Where ([LPA_L1].Field IN (@P1,...,Pn)) -- P1 through PN are all integer values

In our application, the database is fairly normalized, and nearly every table has an integer identity primary key. Bulk operations are also common, which means we routinely end up filtering data based on large sets of integer values, and as a result, consistently run into the 2100 parameter limit that MS SQL enforces.

What this usually means is that we end up with a lot of batching logic to pull and filter based on those integer key values. This isn't too bad if we're just filtering on the primary key, but we have lots of cases where a user can filter on that in addition to other reference data fields (ex, we have a contract table that has a product, location, and company foreign keys that users can filter off of). In those cases, we either have to do multiple queries that usually pull more data than we need, or implement complex baching logic with the goal of minimizing the number of queries given the submitted filters while staying under the parameter limit.

I'm assuming that this feature isn't supported/enabled out of the box for security reasons. Our thought here was that since we only update data through entity classes, the type safety of the integer field would prevent users from pushing in strings or other things that might allow for SQL injection or some other security vulnerability, and having access to this feature would massively simplify a lot of our code.

Also if it helps, we're typically using IQueriable<EntityType> resolved out of the LinqMetaData to build our queries.

Thanks!

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 05-Aug-2021 08:45:11   

mprothme wrote:

Build Version: 5.5 (5.5.1) RTM Build Date: 17-Jan-2019 Project: Adapter project targetting .NET 4.5.2 Database: MS SQL 2019

Thanks for the details.

mprothme wrote:

Is there currently any way to either configure or customize the way LLBLGen pro generates queries so that range filters based on integer or numeric inputs are inlined into the query instead of passed as parameters, specifically when the query generates something like.

In short: No, at least with the normal ways of fetching (QuerySpec, Linq2Linq, Low level API).

mprothme wrote:

I'm assuming that this feature isn't supported/enabled out of the box for security reasons. Our thought here was that since we only update data through entity classes, the type safety of the integer field would prevent users from pushing in strings or other things that might allow for SQL injection or some other security vulnerability, and having access to this feature would massively simplify a lot of our code.

You are right, it's for security reasons.

A couple of workarounds in your case:

a) Use a Custom SQL Function as constant and specify whatever you need there. Ref...

b) Use plain sql and then map the results to entity objects. Ref...

David Elizondo | LLBLGen Support Team
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 06-Aug-2021 18:14:05   

Thanks for the reply.

I don't think we'd go the route of writing our own SQL queries, just because that seems to defeat a lot of the purpose for using an ORM in general.

I think you're saying this, but just to clarify, there's no way to take something like the following, and replace that filter with something that avoids the 2100 parameter limit while using an IQueryable resolved out of LinqMetaData, even through the use of a custom function?

int[] filters = GetFilters();// pseudo-code 
var results = entityQueryable.where(entity => filters.contains(entity.integer_key)).ToArray()

If that's the case, can you go into a few examples of how allowing something like this with integer keys could lead to security issues? I just ask because we couldn't think of any off the top of our heads, and I'd like to be able to go back to the team with some solid reasoning why this isn't supported, in part because we've had a few people on our team pushing us to switch to entity framework, and while I think LLBLGen Pro offers a lot of benefits over EF, this doesn't seem to be an issue EF suffers from.

Thanks!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39760
Joined: 17-Aug-2003
# Posted on: 09-Aug-2021 09:29:32   

There's no way currently to work around the 2100 parameter limit in sqlserver till Microsoft lifts that limit for their batching API. We looked into alternatives but there aren't that many (concat everything to a string, split it on the server, etc.). We don't want to inline values in a query no matter what. Integers sound simple enough but other values might also be simple, or hit the 2100 limit, plus we want to avoid embedding elements in a query at all, if we allow one thing what more is allowed? that is a confusing message and we don't want to give that.

That said, in general these values come from somewhere. If they're from the database, it might be you could replace the 2100 values with a subquery that produces them. If they're from outside the database, you could implement it yourself with a custom FieldCompareRange predicate class, where you concat all values together using string.Join(", ", filters) and embed that directly into the output. We don't recommend this, as it's a slippery slope but if you need to do it, you can. I name it a slippery slope as today it's ints, but tomorrow it might be stings, or guids or other typed of values you need to push to the db in a predicate with 2100+ values.

Having 2100 values in an IN clause is pretty slow btw, and as you hit a limit, you're going to send way more. 10,000? Is there a limit? It might be good to look into where the values come from and if they're retrievable from the DB with a subquery. That might sound slow but it is likely faster.

Frans Bouma | Lead developer LLBLGen Pro
mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 10-Aug-2021 17:57:05   

There's no way currently to work around the 2100 parameter limit in sqlserver till Microsoft lifts that limit for their batching API. We looked into alternatives but there aren't that many (concat everything to a string, split it on the server, etc.).

I’m guessing jamming everything into temp tables and joining to the table probably didn’t work or was just not feasible?

We don't want to inline values in a query no matter what. Integers sound simple enough but other values might also be simple, or hit the 2100 limit, plus we want to avoid embedding elements in a query at all, if we allow one thing what more is allowed? that is a confusing message and we don't want to give that.

I agree that may be confusing, but if it was disabled by default and had appropriate warnings given in the documentation I think it wouldn't be out of line with similar solutions in other libraries. I only mentioned integer values because those are most often PK identifiers and don’t seem to be susceptible SQL injection or similar security risks. That said, giving the user the ability to select different data types that do or do not get inlined in an IN clause would be fine as well.

In the end, I think giving the developer the option to make the decision would be preferable, as that seems to be the trend with other major ORM options.

That said, in general these values come from somewhere. If they're from the database, it might be you could replace the 2100 values with a subquery that produces them.

Unfortunately, that’s often not the case for us, or if it is it increases the complexity of various querying solutions (we've had challenges getting nested queries to work when using IQueryables, which is what we nearly always use). It’s often the case that a user has the ability to view and report on large sets of data, and then select various filters from large lists.

For example, it’s common for the user to do something like pull back a report of 2500 orders, and I want to see invoice data tied to any of those items, but only if they tie to a list of 50 products at 10 locations. All the keys we want to filter on are integer primary keys, but I need to batch the ids and either pull back more data than I need to and filter by product/location in memory or implement some kind of nested batching where I keep track of how many parameters I’ve used (even though it’s not always 1 to 1). The in-memory filtering works, but sometimes it means we end up pulling way more info back from the database than we need.

Having 2100 values in an IN clause is pretty slow btw, and as you hit a limit, you're going to send way more. 10,000?

In my experience, it’s not really been that slow, or at least the tradeoff between speed and code complexity ends up being worth it. The +2100 parameter case happens often enough that just about everything has to work with more parameters than that, but it doesn’t happen often enough that it significantly impacts the day-to-day user experience.

You mention using subqueries a lot, is there a good resource/page in your documentation on best practices with writing subqueries using LINQ IQueriables? We’ve just tried that in the past and haven’t had success (we get various kinds of errors about being unable to cast subqueries into different types), but it’s possible we’ve just been using it wrong. I did a bit of searching in your documentation but its likely that my google-fu isn't that great.

And thanks again for taking the time to respond!

Walaa avatar
Walaa
Support Team
Posts: 14986
Joined: 21-Aug-2005
# Posted on: 11-Aug-2021 03:36:56   

You mention using subqueries a lot, is there a good resource/page in your documentation on best practices with writing subqueries using LINQ IQueriables? We’ve just tried that in the past and haven’t had success (we get various kinds of errors about being unable to cast subqueries into different types), but it’s possible we’ve just been using it wrong. I did a bit of searching in your documentation but its likely that my google-fu isn't that great.

This is an example of a subquery in Linq:

var q = (from o in metaData.Order
            where o.ShipVia == 1 &&
            o.CustomerId == (from o2 in metaData.Order
                            where o2.ShipVia == o.ShipVia &&
                            o2.OrderDate == o.OrderDate
                            orderby o2.OrderDate
                            select o2.CustomerId).First()
            orderby o.OrderDate
            select o);

You can also use QuerySpec, which we recommend

mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 11-Aug-2021 17:08:51   

Awesome thanks! I didn't know if there were issues with using the same Queryable inside the query or if there was some other way to do it!

I still think this doesn't solve all of our problems, because if I have a subquery with a filter list it likely still counts towards the parameter limit correct?

Walaa avatar
Walaa
Support Team
Posts: 14986
Joined: 21-Aug-2005
# Posted on: 11-Aug-2021 23:37:42   

The idea behind the subquery suggestion is to avoid sending too many parameters in the query, provided that the values of these parameters can be queried from the database.

Thus a subquery will return data to filter the main query, and these are not parameters.

mprothme avatar
mprothme
User
Posts: 80
Joined: 05-Oct-2017
# Posted on: 12-Aug-2021 17:09:57   

Ah ok, so that wouldn't really work in our case I guess? Just because there's not a set of simple rules or predicates for these queries because it's usually something like

-- SQL SELECT QUERY HERE
Where  Table.PrimaryKeyColumn in (a second large list of user-submitted ids)  
AND Table.ForeignKeyColumn1 in (a large list of user-submitted ids)

So even if we split the foreign key filter into a subquery returning all the results from Table, and then joined to that in the main query, we'd still be passing the same number of parameters.

arschr
User
Posts: 894
Joined: 14-Dec-2003
# Posted on: 12-Aug-2021 17:56:04   

The ugly way I've handled this is to have a tables in the database (not tempdb) with columns like

Table: LookupJobs JobId int Identity, JobDate DateTimeOffset, JobUser nvarchar(25), JobCompleteDate DateTimeOffset null

Table: LookupJobDetail Id bigint identity, JobId int, LookupKey int,

I then, create a job getting a JobId, Insert the jobId and lookupkeys into the detail table and join to it with the jobid as the parameter.

Then there is a process to keep this table clean once the job is complete or old.

One upside is that lookups can be saved and reused if appropriate.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39760
Joined: 17-Aug-2003
# Posted on: 13-Aug-2021 09:25:10   

Indeed, a table with 'temp' values you can join with is usually an approach that could work. There are little other alternatives, besides the ones already mentioned.

Frans Bouma | Lead developer LLBLGen Pro