Filtering using "IN" and a big list of integers

Posts   
 
    
gantww
User
Posts: 4
Joined: 25-Aug-2006
# Posted on: 09-Dec-2006 19:49:57   

Hello all, I have an ASP.NET form where the user selects a bunch of options using checkboxes that are in a GridView. The code in question receives a List<long> of the selected ids.

Now, another table links to the table with the ids being selected on the first page. I need to filter what comes back based on the list. How do I go about doing this?

Should I just create a PredicateExpression and loop through my list, adding a FieldSetComparePredicate for each of the items in the list, or is there another way?

Chester
Support Team
Posts: 223
Joined: 15-Jul-2005
# Posted on: 10-Dec-2006 06:38:20   

The documentation has an example of this in the section titled "Best practises - How do I ... ?". The example is titled "How do I write a filter which does WHERE field IN (1, 2, 5)".

For example, here's a way to do it with an ArrayList:


            //create a list of pks
            ArrayList pks = new ArrayList();
            pks.Add(1);
            pks.Add(4);
            pks.Add(321);
            pks.Add(376);
            pks.Add(424);
            pks.Add(450);

            //now create query to fetch Production.Products that match these pks
            IPredicateExpression filter = new PredicateExpression();
            filter.Add(ProductFields.ProductId==pks);
            ProductCollection products = new ProductCollection();
            products.GetMulti(filter);

arschr
User
Posts: 894
Joined: 14-Dec-2003
# Posted on: 10-Dec-2006 15:06:27   

I would like to see an additional method supported by LlblGenPo 2.5. When you have too large a list for "in" to deal with. The support libraries wold create a temp table to hold the large list and join against it.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39749
Joined: 17-Aug-2003
# Posted on: 10-Dec-2006 17:17:28   

arschr wrote:

I would like to see an additional method supported by LlblGenPo 2.5. When you have too large a list for "in" to deal with. The support libraries wold create a temp table to hold the large list and join against it.

I don't think that will be in the toolkit soon. It means that the code first has to examine the complete predicate expression, then decide if it needs to do a temp table insert (and that means a big pile of inserts) and then join with that instead of the large IN clause. This will mean a complex piece of code which solves a problem which is actually a problem of the application as the large IN clause is better written by a fieldcompareset predicate which uses values from the DB.

The occasions where one would fetch a set of rows based on a large set of values passed in from the client aren't that common, as one could also look for a solution to transform the code which compiles the set to compare with into a filter which obtains the same values however then from a table or multiple tables.

Frans Bouma | Lead developer LLBLGen Pro
jaschag
User
Posts: 79
Joined: 19-Apr-2006
# Posted on: 10-Dec-2006 18:11:38   

Out of interest, what are the limits for an in clause?

arschr
User
Posts: 894
Joined: 14-Dec-2003
# Posted on: 10-Dec-2006 20:59:42   

I don't think that will be in the toolkit soon. ... This will mean a complex piece of code which solves a problem which is actually a problem of the application as the large IN clause is better written by a fieldcompareset predicate which uses values from the DB.

There are two places I can easily think of where this capability becomes useful.

First when using prefetchpaths you can get to a situation where the filter has to perform a "lot" of work to get to a point where it's identified a set of entities to be retrieved, you then want it to retrieve that set of entities, plus a number of sets of one to many child collections using prefetch paths.

Second if you have a set of entities that are identified by compound values I can't think of a way an in clause can deal with it, but a join to a temp table built to identify the keys of the desired entities makes this easy, while building the corresponding predicate expression may exceed the limits allowed for the select query.

Chester
Support Team
Posts: 223
Joined: 15-Jul-2005
# Posted on: 11-Dec-2006 00:26:09   

jaschag wrote:

Out of interest, what are the limits for an in clause?

This is really a question for the database vendors. It depends on the database, size of the tables involved, indexing schemes, etc. The SQL Server parameter limit is around 2500, Oracle is around 1000, and who knows about the rest. confused

I think this question really reveals a limitation in SQL (the language) more than anything. Although the language does have the IN operator, it is often not very efficient for large enumerations. Creating temp tables CAN increase performance, but not always, and I'm not sure every database supports them.

mihies avatar
mihies
User
Posts: 800
Joined: 29-Jan-2006
# Posted on: 11-Dec-2006 11:34:04   

Chester wrote:

I think this question really reveals a limitation in SQL (the language) more than anything. Although the language does have the IN operator, it is often not very efficient for large enumerations. Creating temp tables CAN increase performance, but not always,

One of the perf problems with temp table is that you have to issue a ton on insert statements which results in somehow huge traffic compared to IN where you pass only values. And yes, SQL should support arrays.

jtgooding
User
Posts: 126
Joined: 26-Apr-2004
# Posted on: 11-Dec-2006 14:25:10   

For large set parameters I use an xml doc as a parameter this gets over the 8000 byte limitation and allows me to do a join on the set. I use some SQL similar to this to match a phone list against a massive do not call list.

This allows me to process a large number of requests in a single query, using an 'in' style statement. Note this is Sql Server 2000 code, it can be a bit different in 2005 simple_smile

DECLARE @DocHandle int

EXEC sp_xml_preparedocument @DocHandle OUTPUT, @NumberList

SELECT n.PhoneNumber, n.CallID, r.UpdatedON
FROM    DNPRequestNumbers AS n
    JOIN DNPRequest as r
    ON n.CallID = r.CallID 
    JOIN
    OPENXML (@DocHandle, '/ROOT/Phone',1) WITH (Number  varchar(10)) AS x
    ON n.PhoneNumber = x.Number
ORDER BY r.UpdatedOn
EXEC sp_xml_removedocument @DocHandle 
jaschag
User
Posts: 79
Joined: 19-Apr-2006
# Posted on: 11-Dec-2006 18:13:01   

mihies wrote:

Chester wrote:

I think this question really reveals a limitation in SQL (the language) more than anything. Although the language does have the IN operator, it is often not very efficient for large enumerations. Creating temp tables CAN increase performance, but not always,

One of the perf problems with temp table is that you have to issue a ton on insert statements which results in somehow huge traffic compared to IN where you pass only values. And yes, SQL should support arrays.

You can also use bulk insert statement with a csv file if using sql server but does require specific permissions.

Walaa avatar
Walaa
Support Team
Posts: 14983
Joined: 21-Aug-2005
# Posted on: 12-Dec-2006 07:20:41   

There are two places I can easily think of where this capability becomes useful. First when using prefetchpaths you can get to a situation where the filter has to perform a "lot" of work to get to a point where it's identified a set of entities to be retrieved, you then want it to retrieve that set of entities, plus a number of sets of one to many child collections using prefetch paths.

The following is copied from the docs: LLBLGen Pro offers you to tweak this query generation by specifying a threshold, DataAccessAdapter.ParameterisedPrefetchPathThreshold, what the runtime libraries should do: produce a subquery with a select or a subquery with a range of values. The value set for ParameterisedPrefetchPathThreshold specifies at which amount of the parent entities (in our example, the customer entities) it has to switch to a subquery with a select. ParameterisedPrefetchPathThreshold is set to 50 by default. Tests showed a threshold of 200 is still efficient, but to be sure it works on every database, the value is set to 50.

The sqlserver parameter limit: aroun 2500, oracle: around 1000

Instead of having one IN query with a big range of values to filter upon, you may wish to divide the big list into smaller lists (say, 100 parameters each) and have multiple fetches (each using the FieldCompareRangePredicate) into the same entity collection (divide and conquer).