ASP.Net Caching Questions Options

Posts   
 
    
psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 29-May-2007 15:41:16   

All,

I posted this on Google Groups but didn't get much help/info. I thought I would post it here and see if anyone could provide any insight.


I am designing a system that will involve an IIS/ASP.Net application server. The main purpose of the application server will be to load large amounts of static data into memory and do fast lookups on it. The total cache size at the time of first installation will be roughly 350MB, but may grow to as large as 10GB over the next five years.

My questions:

  1. Can the ASP.Net cache utilize this much memory, assuming the processor and OS are 64-bit?

  2. If the cache size were 10GB, how much memory would the machine need to prevent cache/application recycling (assume very little memory usage in the app outside of the cache).

  3. Since the loading of the static data would be expensive and time consuming, would any other steps need to be taken to avoid application restarts?

  4. Is there any limit to how large a single object in the cache can be (it's likely one dataset or hashtable in our system could approach 200MB).

  5. Finally, is there another caching option that is more efficient/ usable/scalable/etc. than the one provided with ASP.Net?

Thanks for any insight.

Phil

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 30-May-2007 11:12:02   

psandler wrote:

All,

I posted this on Google Groups but didn't get much help/info. I thought I would post it here and see if anyone could provide any insight.


I am designing a system that will involve an IIS/ASP.Net application server. The main purpose of the application server will be to load large amounts of static data into memory and do fast lookups on it. The total cache size at the time of first installation will be roughly 350MB, but may grow to as large as 10GB over the next five years.

That's a lot simple_smile

I'd split it in multiple parts and use multiple servers, with databases which have everything in memory. (so if the DB is 2GB, give it 3GB of memory so it can load it all in memory). Then, I'd use a central governing gate which redirects the request to the proper server with the data. Of course I have no clue how the data is structured or how lookups take place, so it might be this isn't possible at all.

Mind you: the databases are used to store the cached data, so for example if you have reports to show, the data gathered for the report is stored in the db so easy lookup/fetches can take place.

You then can scale out pretty easily: if you need 20GB of data, no problem.

My questions:

  1. Can the ASP.Net cache utilize this much memory, assuming the processor and OS are 64-bit?

In theory it should.

  1. If the cache size were 10GB, how much memory would the machine need to prevent cache/application recycling (assume very little memory usage in the app outside of the cache).

that's indeed a problem: the cache is likely to be part of the app. Another thing is how to lookup the data? Is this key-based? If so, I'm not sure if you have a lot of elements in the cache (say 50 million) it will be very fast.

  1. Since the loading of the static data would be expensive and time consuming, would any other steps need to be taken to avoid application restarts?

Store the data in another process. Databases for example have already caches. You can tune Oracle for example (and also sqlserver I pressume) to have a big cache for queries so if you store the processed data in a db, querying it again is fast and likely will fetch it from the db's cache. You then have to create some sort of partitioning in the data so you can add more servers if you need to.

  1. Is there any limit to how large a single object in the cache can be (it's likely one dataset or hashtable in our system could approach 200MB).

I'm not aware of cache element size limits.

  1. Finally, is there another caching option that is more efficient/ usable/scalable/etc. than the one provided with ASP.Net?

I'd use the ASP.NET cache for page fragments and data directly needed to render pages. Any other data, like large resultsets for reports which are requested often... I'd store them in a DB, with a lot of memory so the DB cache is big and you're fetching the data from memory anyway. Add to that proper partitioning and you're set. The reason for this is that you now predict the total data size is 10GB, but what happens if it will become 20GB? or 100GB? Adding a bunch of blades and ram is then the only thing you need to do. Also, reading a report resultset which is calculated from a lot of tables, from a table which isn't in memory isn't that slow either. So you also could think of using a server with 3-4GB ram and calculate in that the resultsets requested the most are in memory cached by the RDBMS and the ones not requested a lot are on disk and will be read-into memory when needed.

Frans Bouma | Lead developer LLBLGen Pro
PilotBob
User
Posts: 105
Joined: 29-Jul-2005
# Posted on: 30-May-2007 23:01:50   

I guess my question would be, why do you want to cache so much data? Are you sure caching would be faster than the db look ups. Remember, SQL Server automatically caches data... the more memory you give it, the more it will cache.

Also, you can use output caching in ASP.NEt to cache the resulting pages. This way, they don't even go through the page life cycle.

I assume with 20GB of cached data you need someway to do lookups in that cache. How are you storing your cache... is it a dataset, custom collection? I can point you to a web cast, if you are interested where cached dataset performance was compared to going to the db and going to the db was much faster. Of course, this was .Net 1.1 and datasets are supposed to be alot faster in 2.0.

Some other options. Have you looked at any in-process databases. Alot of the performance issues of db retrieval is the process marshaling. I think SQLite and VistaDb are examples of in-process dbs.

If you are looking for rocket fast I've heard good things about Inter-cache's database. What they do is keep 100% of the index in memory so stuff is VERY fast.

You might also want to look at Prevalance. I think there is a C# implementation out there somewhere. Basically, it is a big object cache and files are used to store the state every now and then... but if this is only read only you won't even need this.

BOb

psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 30-May-2007 23:13:51   

Hey Frans,

Thanks for your reply. One thing I may not have made clear is that the data lookups will have to be as fast as possible. So if each lookup takes 1ms instead of 2ms, that would be considered a huge improvement in the system. Also, when I said "application server", I meant it more or less literally--this will be a host process that receives messages to do specific jobs, and does not have a user interface.

I'm not sure if that would change your responses or not.

Further discussion below . . .

Otis wrote:

psandler wrote:

All,

I posted this on Google Groups but didn't get much help/info. I thought I would post it here and see if anyone could provide any insight.


I am designing a system that will involve an IIS/ASP.Net application server. The main purpose of the application server will be to load large amounts of static data into memory and do fast lookups on it. The total cache size at the time of first installation will be roughly 350MB, but may grow to as large as 10GB over the next five years.

That's a lot simple_smile

Yeah. confused

Otis wrote:

I'd split it in multiple parts and use multiple servers, with databases which have everything in memory. (so if the DB is 2GB, give it 3GB of memory so it can load it all in memory). Then, I'd use a central governing gate which redirects the request to the proper server with the data. Of course I have no clue how the data is structured or how lookups take place, so it might be this isn't possible at all.

This falls in line with part of what I was thinking. So assume that the data can be categorized by client (not exactly how it works but it's a good analogy)--each application server could manage a certain number of clients. The "central governing gate" (I like that term simple_smile ) would then have knowledge of which server handled which clients.

Otis wrote:

Mind you: the databases are used to store the cached data, so for example if you have reports to show, the data gathered for the report is stored in the db so easy lookup/fetches can take place.

You then can scale out pretty easily: if you need 20GB of data, no problem.

My questions:

  1. Can the ASP.Net cache utilize this much memory, assuming the processor and OS are 64-bit?

In theory it should.

  1. If the cache size were 10GB, how much memory would the machine need to prevent cache/application recycling (assume very little memory usage in the app outside of the cache).

that's indeed a problem: the cache is likely to be part of the app. Another thing is how to lookup the data? Is this key-based? If so, I'm not sure if you have a lot of elements in the cache (say 50 million) it will be very fast.

Yes, it would be key-based and involve a lot of small values stored in a hashtable, but the data could be nested so that no single hashtable had more than ~100K items in it.

Think in terms of a HUGE set of lookup tables stored in memory, and each request would have to literally do tens of thousands of lookups as fast as possible.

Otis wrote:

  1. Since the loading of the static data would be expensive and time consuming, would any other steps need to be taken to avoid application restarts?

Store the data in another process. Databases for example have already caches. You can tune Oracle for example (and also sqlserver I pressume) to have a big cache for queries so if you store the processed data in a db, querying it again is fast and likely will fetch it from the db's cache. You then have to create some sort of partitioning in the data so you can add more servers if you need to.

Would this perform as well (or nearly as well) as using in-process lookups? I started thinking down the ASP.Net cache route because I didn't want to move the data across the network. It didn't occur to me to use a database on the same server to do the lookups.

I'll have to look into data caching for SQL Server. So the idea would be to query all the data on startup so the data would all be cached?

Otis wrote:

  1. Is there any limit to how large a single object in the cache can be (it's likely one dataset or hashtable in our system could approach 200MB).

I'm not aware of cache element size limits.

  1. Finally, is there another caching option that is more efficient/ usable/scalable/etc. than the one provided with ASP.Net?

I'd use the ASP.NET cache for page fragments and data directly needed to render pages. Any other data, like large resultsets for reports which are requested often... I'd store them in a DB, with a lot of memory so the DB cache is big and you're fetching the data from memory anyway. Add to that proper partitioning and you're set. The reason for this is that you now predict the total data size is 10GB, but what happens if it will become 20GB? or 100GB? Adding a bunch of blades and ram is then the only thing you need to do. Also, reading a report resultset which is calculated from a lot of tables, from a table which isn't in memory isn't that slow either. So you also could think of using a server with 3-4GB ram and calculate in that the resultsets requested the most are in memory cached by the RDBMS and the ones not requested a lot are on disk and will be read-into memory when needed.

I'll have to look into the option of caching in the database, and do some testing around performance of application cache vs. database cache. My tests in the past always included a network component, so perhaps having the database on the same machine would provide different results.

Thanks,

Phil

psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 31-May-2007 00:06:18   

I did some additional cache vs. database lookup testing, which was admittedly unscientific but here were the methodology and results:

I created XXX records in the ASP.Net cache, each with a key of a number between 1 and XXX. I also created a table in the database (on the same machine) that had a single record in it.

I created a new adapter with leaveConnectionOpen = true. I fetched the record from the database once to "warm up" the connection.

I then fetched the key/value from the hash and fetched the record from the database (using GetScalar) 10K times. I used values of 100K and 1M for XXX. The results were pretty much the same.

Fetching 10K times from the database took ~2.5 seconds.

Fetching 10K times from the cache took ~.01 seconds.

I'm not sure if this is definitive or not--I'm sure there is some overhead I'm adding with the database call. Maybe there's something I could use that's faster than GetScalar?

PilotBob, thanks for your response. I'll take a look at Prevalence.

Phil