Working with in memory / cached data

Posts   
 
    
Daniel9 avatar
Daniel9
User
Posts: 19
Joined: 05-Apr-2005
# Posted on: 24-May-2005 06:22:19   

Hi There,

I'm just after a bit of feedback as to what is the best way to go about working with a large set of data that has been retrieved from the database.

I'm working on a 3D modelling system, this has a Model table with model bounds etc. This Model table contains a number of Blocks. Each block has an x,y and z value for where it sits.

I am using the adaptor model and have a prefetch path to load the model and the blocks for a given date into memory. As there are a large number of blocks to be queried, i thought maintaing an in-memory cache of these objects would be the fatest solution. If i want to find a block of a xyz co-ordinate should i be:

Looping through the block collection to see which matches the criteria?

OR

Retriving the data from the database and putting it into a datatable and then use a view on that table to see if any match the criteria.

OR

Create and maintain a Hashtable for each searchable item?

How are other people working with large datasets?

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 24-May-2005 09:44:37   

I had previously written an image processing application which supported some complex computer vision. Speed was the number 1 factor for me as the application was processing bank statements and the criteria was statements per second.

Im not sure if you've ever worked with computer vision, but it involves a very large amount of data which needs to be cross referenced at high speed. Hashtables are the only way to go if you need raw lookup speed.

If all your data will fit into memory, then use a Hashtable, if not then maybe you don't need all you data in memory at the same time. In this case you could use a Hashtable based cache which automatically manages the expiry of old data.

Another idea, depending on how your application is processing the data is to have the data being preloaded into the cache on another thread just before it is required. Loading data from the DB is IO intensive while processing is CPU intensive. I use this "Just In Time" approach which on my current project ModernArk, which is a large digital asset management system. I know in advance what data I will need next and while I am processing the current selections, I am fetching the next.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 24-May-2005 16:18:13   

I also would go for in-memory with hashtable if it's possible. Retrieving data from a database is often slower. I'm not sure if the software is some sort of scene-graph, but if it is I think db access in real time might be too slow, you then, as Marcus said, have to think ahead and fetch the data ahead in time before processing it.

Frans Bouma | Lead developer LLBLGen Pro
Daniel9 avatar
Daniel9
User
Posts: 19
Joined: 05-Apr-2005
# Posted on: 25-May-2005 01:29:21   

Thanks for your thoughts guys.

I'm going to give the Hash table(s) a go, i plan on loading and maintaining them using a separate windows service so we'll get the threading for free.

As far as saving the changed entities goes, from a performance point of view would it be best to listen for changes on the objects using the EntityContentsChanged event? Is there any adverse memory affects to adding new handlers for each entity?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 25-May-2005 09:19:45   

Events can slow down the app, if a lot of events go off when you do a single thing. When you introduce a lot of events, chances are that after a while the app has a lot of event handlers active at any given time. If you take that into account, there's not a problem with events.

Frans Bouma | Lead developer LLBLGen Pro