Product Info: Cache Management (Web Farm)

Posts   
 
    
Posts: 4
Joined: 19-Aug-2006
# Posted on: 19-Aug-2006 14:16:45   

I was looking at ways of managing session in ASP.NET 2.0 in a web farm environment. I wanted something way cheaper than Microsoft SQL Server and a little more ad-hock than setting up MySQL.

I found the following product which looks interesting. Sounds easy to add and delete servers at will, and use only part of a machine if you have it spare.

I fancy having LLBL templates that make use of this. stuck_out_tongue_winking_eye

Its called **memcached **(very original smile ) and you can get it at http://www.danga.com/memcached/

Description from their site:

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

I've not tried it yet, but thought I'd share it anyway.

Best Regards Craig

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 19-Aug-2006 14:40:09   

How is this php oriented unix caching system related to .NET ?

Spam?

Frans Bouma | Lead developer LLBLGen Pro
Posts: 4
Joined: 19-Aug-2006
# Posted on: 20-Aug-2006 01:12:35   

Otis wrote:

How is this php oriented unix caching system related to .NET ?

Spam?

I guess you never looked at the client API's page then (http://www.danga.com/memcached/apis.bml).

So the .NET client is at http://sourceforge.net/projects/memcacheddotnet/

Thanks for assuming its spam, not the warmest of reception for my first post. I mean I am in the forum that you described as "forum for offtopic talk" right? Besides unless I've mis-read its very on topic actually confused

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 20-Aug-2006 12:02:23   

craiginscotland wrote:

Otis wrote:

How is this php oriented unix caching system related to .NET ?

Spam?

I guess you never looked at the client API's page then (http://www.danga.com/memcached/apis.bml).

So the .NET client is at http://sourceforge.net/projects/memcacheddotnet/

I'm sorry, I looked at the page you linked to and it was linux/php oriented. I couldn't find a .net word on it.

Thanks for assuming its spam, not the warmest of reception for my first post. I mean I am in the forum that you described as "forum for offtopic talk" right? Besides unless I've mis-read its very on topic actually confused

We get our share of spam posts on this forum, hence my question (it was just a question). It sometimes looks as a regular post but it's actually a spam post to draw links to a site. When I looked at the page you linked to, I didn't have the feeling it was .net related, so I asked if this was spam or not. Misunderstanding I guess. simple_smile . My appologies if I offended you, you had everything against you (first post, link to an unclear page, me misunderstanding the context etc. ), so I made the error.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 4
Joined: 19-Aug-2006
# Posted on: 20-Aug-2006 18:44:34   

I've since did a little more surfing. Someone's done a Win32 port. So you can have a Windows version running as a console app or server and have a C# .NET client.

http://jehiah.com/projects/memcached-win32/

The comms protocol is documeted, so you could write your own client to I guess (and you can use telnet as a client during debugging).

I'm just trying to see how I can use this with LLBLGen Pro so it would know it need not hit that database for queries. I can do this with plain old embedded SQL strings, but when you add the complexity of LLBL generating the queires for you, then I guess you would have to hash the values you where passing into LLBL instead of trying to guess the output of the Predicate classes?

I'd apprecaite any comments or ideas. Bascially, is a library like this something you would have to add support for within LLBL or is there enough "hooks" to use caching of this type at a higher level?

I can't help feeling that support at a lower level (within LLBL) would allow the predicate classes to make use of multiple such cache objects for each table when building its queries. I beleive its already capable of joining tables together after their return from the database (and that you advise trying to get the db to do such work, but what if LLBL could combine the two! making use of cache for some, and pulling what's needed from the db).

Here is an article explaining how it works and some good replies explaining some issues depending on the nature of your use.

http://www.linuxjournal.com/article/7451

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 21-Aug-2006 11:26:30   

craiginscotland wrote:

I've since did a little more surfing. Someone's done a Win32 port. So you can have a Windows version running as a console app or server and have a C# .NET client.

http://jehiah.com/projects/memcached-win32/

The comms protocol is documeted, so you could write your own client to I guess (and you can use telnet as a client during debugging).

I'm just trying to see how I can use this with LLBLGen Pro so it would know it need not hit that database for queries. I can do this with plain old embedded SQL strings, but when you add the complexity of LLBL generating the queires for you, then I guess you would have to hash the values you where passing into LLBL instead of trying to guess the output of the Predicate classes?

That's indeed a BIG problem: how to determine if the set requested is exactly the same as one already cached? You can only do that by generating the query AND comparing all values.

I'd apprecaite any comments or ideas. Bascially, is a library like this something you would have to add support for within LLBL or is there enough "hooks" to use caching of this type at a higher level?

It depends on why you want to cache. If you want to cache data to avoid doing roundtrips to the database: forget it. The reason is simple. Say you have a given amount of entities in your cache. Then you want to load all customers matching a filter. You can't know if ALL entities in the database which match the filter are in the cache. So you have to fetch the PK's for example of these entities with the filter and compare them with the entities in the cache, then fetch the ones who aren't in the cache, AND fetch the ones which are outdated (so also requires some timestamp field). This effectively creates more overhead than you have when you simply fetch the entities.

If you want to cache because you want unique entity instances, OK, that's a reason. LLBLGen Pro offers the context class for this so you have the tool at hand simple_smile

What's way more efficient is caching of processing output. Say you have a website which is visited by thousands of users per hour. All these visitors request a set of entities when they first view the startpage. If you even cache the startpage for say 10 seconds, you can already save a lot of roundtrips to the db: the visitor doesn't know if that's a cached page and even so: perhaps teh route to the webserver took a slow route and stalled for 10 seconds, so in every situation the data arriving in the webbrowser is stale and outdated anyway. This type of caching not only saves database roundtrips but also saves processing time of the data.

There's also the caching of non-volatile data: list of countries, list of codes which hardly change, list of entities which can be loaded once per day etc. Though you don't need a sophisticated cache for that.

I can't help feeling that support at a lower level (within LLBL) would allow the predicate classes to make use of multiple such cache objects for each table when building its queries.

No, that would be a false feeling of efficiency, as I described above simple_smile

Frans Bouma | Lead developer LLBLGen Pro
sami
User
Posts: 93
Joined: 28-Oct-2005
# Posted on: 21-Aug-2006 13:39:20   

It depends on why you want to cache. If you want to cache data to avoid doing roundtrips to the database: forget it. The reason is simple. Say you have a given amount of entities in your cache. Then you want to load all customers matching a filter. You can't know if ALL entities in the database which match the filter are in the cache. So you have to fetch the PK's for example of these entities with the filter and compare them with the entities in the cache, then fetch the ones who aren't in the cache, AND fetch the ones which are outdated (so also requires some timestamp field). This effectively creates more overhead than you have when you simply fetch the entities.

What about scenarios where you can live with having outdated data, but would benefit greatly by using a distributed caching solution such as memcached. I am sure such scenario is more than viable. So far I have survived by using httpcache in manager classes (single server environment), which is pretty ok to me, but a lower level support would indeed be nice too.

You are suggesting to cache processing output, but you might want to do some processing too, where you actually need the cached data. Of course, care must be taken to make sure caching is a viable solution when doing so.

To OP, thanks for the link, never knew this kind of thing existed. The linux journal article was interesting also.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 21-Aug-2006 14:19:05   

sami wrote:

It depends on why you want to cache. If you want to cache data to avoid doing roundtrips to the database: forget it. The reason is simple. Say you have a given amount of entities in your cache. Then you want to load all customers matching a filter. You can't know if ALL entities in the database which match the filter are in the cache. So you have to fetch the PK's for example of these entities with the filter and compare them with the entities in the cache, then fetch the ones who aren't in the cache, AND fetch the ones which are outdated (so also requires some timestamp field). This effectively creates more overhead than you have when you simply fetch the entities.

What about scenarios where you can live with having outdated data, but would benefit greatly by using a distributed caching solution such as memcached. I am sure such scenario is more than viable. So far I have survived by using httpcache in manager classes (single server environment), which is pretty ok to me, but a lower level support would indeed be nice too.

It has nothing to do with outdated data, it has everything to do with which data matches a given filter. If you want all customers mathing a given filter, how do you know the cache contains all entities you want to retrieve? You can only know that if the cache contains all data, OR you fetch the data from the db and compare it with the cache, which mitigates the purpose of having a cache for avoiding db fetches.

So you might want to suggest that it would be nice to have, but it would add absolutely nothing, OR your code simply looks at the cache and says: "just give me the ones matching the filter from the cache", but that could give less data than you would otherwise get, so i.o.w. your code would then function inproperly.

Thus, if you can proof to me that a cache can avoid db fetches without doing the actual fetch (otherwise why bother wink ), I'm all ears, but I haven't found a way to come up with a way to use a cache for the purpose of avoiding db fetches inside the o/r mapper core.

Sure, IF your code knows a shortcut and simply wants to pull data from a cache, ok, but then you can do that now as well.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 4
Joined: 19-Aug-2006
# Posted on: 21-Aug-2006 23:00:28   

I have not thought this out completely, but I was thinking of the simple cases to get started with.

So read only queries at first...

If I do a query (pseduo code) ike "CustomerID=10 or Region="UK" then I'd pull back the data, cache the LLBL Object and use [CustomerID=10 or Region="UK"] as the key.

I assume going this same query over and over would return the same results. I assume that de-serializing it from cache would be faster than building the sql, running it, parsing it, returning it, and all the other things you must do.

Now,

This obviously falls over as soon as data is changing, but then again (and maybe here I've been MS-SQL 2005 specific) but the database would let you know if data had changed and you could invalidate your cache object (or update depending on your requirements).

One small improvement in this, is having LLBL update commands automatically invalidate that cache (and again refresh if that's what you wanted).

But the real benefit of something similar to this being inside the LLBL engine is for handling the more complex queries as below.

Customer.ID = 10 OR Customer.Countries.Region = UK (say brining back Countries objects of Scotland, England, Wales, Northern Ireland).

LLBL's prediate parsing stuff would normally (roughly for the sake of argument) do a customer query and then a Region query and then some "joining" either in the database or after return (depending on a great many things I'd assume).

This is where LLBL can see what its doing internally, know this is two queries and pull one or other or both "bits" from cache.

Sorry for the utter rubbish example, but this is the sort of level that I think would need either:

  • Inside knowlege of LLBL OR
  • Enough "hooks" to still make that caching decision externally. Like a callback function that to give a hash key into for it to decide if the cache is stale or not, but perhaps LLBL need not know how you generated that hash key.

This would allow a plugable framework that could be used with any caching method/library.

Again, even only doing read queries would be a big help.

Lastly on a slightly different twist I wonder how I could use memcached during development to measure hit rate (by directly querying the cache after an application run) to determine which "objects" would have been worth caching and which would not :-)

Cheers and thanks for the feedback ...

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 22-Aug-2006 11:08:22   

craiginscotland wrote:

I have not thought this out completely, but I was thinking of the simple cases to get started with.

So read only queries at first...

If I do a query (pseduo code) ike "CustomerID=10 or Region="UK" then I'd pull back the data, cache the LLBL Object and use [CustomerID=10 or Region="UK"] as the key.

I assume going this same query over and over would return the same results. I assume that de-serializing it from cache would be faster than building the sql, running it, parsing it, returning it, and all the other things you must do.

Oh, individual entities isn't a problem of course. simple_smile If you want individual entities, it might be a good idea, but with sets it's not, you then run into the problem I discussed above.

Now, This obviously falls over as soon as data is changing, but then again (and maybe here I've been MS-SQL 2005 specific) but the database would let you know if data had changed and you could invalidate your cache object (or update depending on your requirements).

That's not that fast, and it's indeed sqlserver2005 specific.

One small improvement in this, is having LLBL update commands automatically invalidate that cache (and again refresh if that's what you wanted).

But the real benefit of something similar to this being inside the LLBL engine is for handling the more complex queries as below.

Customer.ID = 10 OR Customer.Countries.Region = UK (say brining back Countries objects of Scotland, England, Wales, Northern Ireland).

LLBL's prediate parsing stuff would normally (roughly for the sake of argument) do a customer query and then a Region query and then some "joining" either in the database or after return (depending on a great many things I'd assume).

This is where LLBL can see what its doing internally, know this is two queries and pull one or other or both "bits" from cache.

It can only do that if it's an individual entity. In all other situations, it can't consult the cache, as it can't know if the cache's data == the db's data.

Sorry for the utter rubbish example, but this is the sort of level that I think would need either:

  • Inside knowlege of LLBL OR
  • Enough "hooks" to still make that caching decision externally. Like a callback function that to give a hash key into for it to decide if the cache is stale or not, but perhaps LLBL need not know how you generated that hash key.

This would allow a plugable framework that could be used with any caching method/library.

Again, even only doing read queries would be a big help.

Only for individual entity requests.

Frans Bouma | Lead developer LLBLGen Pro
mihies avatar
mihies
User
Posts: 800
Joined: 29-Jan-2006
# Posted on: 22-Aug-2006 11:24:14   

Frans, I am considering the same approach as OP, but before you say that it doesn't work, listen to me simple_smile - page (or partial) caching is a problem when ajax is involved. I am not an ajax expert but it seems to me like ajax kills that sort of caching (can somebody correct me?) - I can live with the fact that all data might not be displayed (in case of discrepancy between cache and database) in a 60s timeframe

Thus caching selects for a certain timeframe seems best option for me.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 22-Aug-2006 12:01:03   

Ok, I think there are a couple of things which are mixed up. - caching resultsets of a query isn't the same as using an entity cache - cache usage for individual entity fetches is something else than using a cache for set fetches.

Miha, what you're saying is something else than what I understood what the OP wanted to do. Say I want all customers from country X with more than 10 orders. If I run that query, I get for example 20 customer entities.

You say: store that result for n seconds and if the same query is executed within that n seconds, return the same resultset. OK.

That's something else than saying: here's a big cache with entities. You run the query and you don't know if the entities in the cache matching the filter are all the entities you'll get anyway.

The main thing is: correctness. You expect from the o/r mapper core that if you run a query you get the results available at that time. Not just the results from the cache (which can be 1 entity while the db contains 1000 for example).

Caching resultsets for n seconds is something else. You then simply expand what you find 'acceptable staleness of data'.

The sole reason I don't cache resultsets is that the management for this is very hard, as it has to perform reasonable as well (otherwise, why bother). Every set fetched, has to be stored with a unique hash build from the full query + parameters. If you've ever tried to do this, you'll learn that this is almost impossible to achieve, but you could go for a long string as the key. The next step is invalidating the complete set if one entity is updated. One could only achieve that if every entity object used is uniqued by the same cache. Otherwise updating an entity will cause problems as it can be a different entity object with the same data. Trying to unique entities limits the scope of an o/r mapper alot, as you get problems with attaching/detaching etc.

And the fun thing is: the RDBMS also caches resultsets. It's already build in. So why doing all that hard work again, while it's already available?

Furthermore, if the processing code of the resultset is called a lot, why not cache the processing output for n seconds instead? It saves even more power simple_smile

Most caches are used for uniquing of entity objects. LLBLGen Pro uses a context for that, to offer more flexible uniquing than a central cache. Another purpose of caches is often change tracking, something which isn't necessary in LLBLGen Pro as entities do their own changetracking.

Frans Bouma | Lead developer LLBLGen Pro
sami
User
Posts: 93
Joined: 28-Oct-2005
# Posted on: 22-Aug-2006 12:53:48   

So you might want to suggest that it would be nice to have, but it would add absolutely nothing, OR your code simply looks at the cache and says: "just give me the ones matching the filter from the cache", but that could give less data than you would otherwise get, so i.o.w. your code would then function inproperly.

What I am trying to say is that in some scenarios this is totally ok behavior.

mihies avatar
mihies
User
Posts: 800
Joined: 29-Jan-2006
# Posted on: 22-Aug-2006 13:17:45   

Hi Frans,

Right, you are right. I was mentioning my scenario (where I am going to do some caching) and wasn't implying at all that you should cache anything in anyway. Sorry if that was your impression (was it?:-))

You are right - you shouldn't do any such caching at all. Afterall there is your context object one can use to do caching.

mihies avatar
mihies
User
Posts: 800
Joined: 29-Jan-2006
# Posted on: 22-Aug-2006 13:19:04   

Furthermore, if the processing code of the resultset is called a lot, why not cache the processing output for n seconds instead? It saves even more power

Yup, that's my idea.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 22-Aug-2006 14:07:48   

sami wrote:

So you might want to suggest that it would be nice to have, but it would add absolutely nothing, OR your code simply looks at the cache and says: "just give me the ones matching the filter from the cache", but that could give less data than you would otherwise get, so i.o.w. your code would then function inproperly.

What I am trying to say is that in some scenarios this is totally ok behavior.

Could you elaborate on this a bit? For me, if I ask a method to get me the data matching a filter, I want the data possibly matching the filter, not the data the method finds in some cache (which can be 1% of the total data!). I.o.w.: in which scenario's is it not your need to get correct results back?

Frans Bouma | Lead developer LLBLGen Pro