Get data in batches

Posts   
 
    
peerke
User
Posts: 23
Joined: 19-Aug-2008
# Posted on: 27-Aug-2008 13:23:24   

Hi all.

I'm currently working on an application thats reads data from a database, performs some business logic on that data and then writes it to a file for use in another application. The amount of data can be quite large, over a million records. Does LLBLGen Pro support fetching data in batches? What I would like is to read a batch of say, 10.000 entities/records, perform the magic, and then fetch the next 10.000, until all data is fetched. Reason for this is that I don't want to have a million entities in memory. Is that possible? If so, how would I accomplish this? I can imagine that this might give some issues with records that are inserted by another thread while our program is running.

Any thoughts on this?

Thanks!

peerke
User
Posts: 23
Joined: 19-Aug-2008
# Posted on: 27-Aug-2008 14:22:17   

I found out that this can be accomplished by using the paging functionality of the entitycollections, typed views and typed lists (RTFM!)

This is how I do it:


ContactsTypedView contacts = new ContactsTypedView();
            DateTime lastExportDate = GetLastExportDate(ExportType.Contacts);

            IPredicateExpression lastModifiedFilter = 
                new PredicateExpression(ContactsFields.Lastmodified > lastExportDate.ToString("yyyy-MM-dd"));

            int amount = contacts.GetDbCount(false, lastModifiedFilter);
            int pageSize = 1000;

            int remainder = amount % pageSize;

            int pages = (amount/pageSize);
            if (remainder > 0) pages += 1;
            
            for (int i = 1; i <= pages; i++)
            {
                contacts.Clear();
                contacts.Fill(0, null, true, lastModifiedFilter,null,null,i,pageSize);
                //do magic here
             }

But my question still stands: what about records that are inserted into the set that I'm fetching in batches?

Walaa avatar
Walaa
Support Team
Posts: 14993
Joined: 21-Aug-2005
# Posted on: 27-Aug-2008 14:40:07   

If you are fetching the data in a sorted way (eg: Order By Id), Then you can re-calculate the total number of pages to be fetched inside the for loop. So by the time the routine has finished it should have grabbed all the available records in the database.

Another trick is to fetch Top XXX records WHERE ID > n ORDER By ID. Where n is 0 in the first run, and then it gets updated inside the loop with the max ID of the latest fetched batch.