Shared object cache for scalability - any experiences?

Posts   
 
    
csmac3144
User
Posts: 74
Joined: 12-Sep-2007
# Posted on: 21-Nov-2007 17:19:49   

I am looking at http://www.scaleoutsoftware.com as a solution for the scalability limitations in ASP.Net. Enterprise Java does a lot of this stuff, but not .Net out of the box. Has anyone tried building apps like this using LLBLgen as the ORM tool? In particular, I'm interested in how/why you might use an "identity map" and "repository" (Fowler) approach. You can't use the Application object in ASP.Net because it cannot cross nodes. Session can only cross nodes if it is stored in SQL Server, but this has huge limitations if in-memory object graph is large.

I'm wondering what templates people might have created around this scenario, and also any things to watch out for. Also, any opinions on cost/benefits of storing large numbers of objects in RAM and updating db as required (espcially concurrency issues, etc.).

Thanks!

jmeckley
User
Posts: 403
Joined: 05-Jul-2006
# Posted on: 21-Nov-2007 18:44:37   

I just learned of an technique to accomplish this. i haven't had a chance to flesh out the details, but i'll give it a shot.

There is the response/request domain and also the appdomain. the R/R domain is only alive for a single request. appdomain is available for the entire life of the application.

first you need the repository

public interface IRepository
{
     IEnumerable<IWidget> GetAllWidgets();
}

public class Repository : IRepository
{
     IEnumerable<IWidget> GetAllWidgets() 
     { 
          yield return new Widget(); 
     }
}

then you need a proxyrepository which stores local copies of the results.

public class ProxyRepository : IRepository
{
     private IRepository = underlyingRepository;
     private IEnumerable<IWidget> cachedWidgets;

     ProxyRepository(IRepository repositoryToWrap)
     {
           underlyingRepository= repositoryToWrap;
     }

     IEnumerable<IWidget> GetAllWidgets() 
     { 
          if(cachedWidgets == null) cachedWidgets = underlyingRepository.GetAllWidgets();
          return cachedWidgets;
     }
}

now you need a dependency resolver which stores an instance of the IRepository. in it's simplest terms it's a collection of type implementations. for more details on this check out jp's blog [http://www.jpboodhoo.com/blog/StaticGatewayPart2.aspx].

then in the applicationstart event load the dependency resolver with a concrete implmentation (ProxyRepository) of the interface IRepository. This is the area, i'm still trying to understand. It involves some static methods. but essentially your storing a concrete implementation of the objects required for the system (usually ctor args) so it would look something like this

override protected appication_start(eventargs e)
{
   //load dependencyresolver with new ProxyRepository(new Repository()) for IRepository.
}

now when you need the repository call DependencyResolver.GetImplementationOf<IRepository>(); which returns instance of ProxyRepository.

because the objects where loaded into a resolver defined on startup every user is getting the same instance of ProxyRepository. because the results are stored in a private field the values are preserved for the life of the application, or until the field is null.

i realize the real majic of the this approach is missing. this is the piece I'm trying to understand as well. hopefully this could get you started.

csmac3144
User
Posts: 74
Joined: 12-Sep-2007
# Posted on: 21-Nov-2007 18:57:44   

jmeckley:

Thanks for that feedback. That is a very interesting approach. One issue I don't see covered is how you can share the object pool seamlessly across a server farm...

That's one of the main reasons I'm looking to do this, to get the levels of scalability that J2EE people take for granted.

The scaleoutsoftware people basically implement a shared cache which transparently distributes the object pool across N servers. You can then load balance/failover at this level, rather than relying on a SQL Server cluster. SQL Server is fine for some situations, but if you have a large system it becomes prohibitively expensive to read/write all that state data all day long.

Scaleoutsoftware uses a UDP-based commication protocol to synch the cache in real time (TCP is no good because of the way sockets behave when a server dies). They even make a higher-end product which can perform a similar feat across datacenters (geographic failover) although I'm not sure how close to real time that one is.

jmeckley
User
Posts: 403
Joined: 05-Jul-2006
# Posted on: 21-Nov-2007 19:30:48   

Oh, i misunderstood the problem. I haven't had any experience with server farms.

At a previous job the hardware guys had a load balancer between 4 application servers. i think it was configured that a user was directed to the same server for each request within the same session. so the first session requests was routed to 1 of 4. (we'll say 3) so then each subsequent request within that session instance was routed to server 3.

not having any exprience with it, i would think this is more of a hardware, server admin function. in a perferct IT utopia the developer wouldn't have to be concerned with hardware configurationswink

csmac3144
User
Posts: 74
Joined: 12-Sep-2007
# Posted on: 21-Nov-2007 19:50:29   

No it is actually a software problem mostly. ASP.Net can share the Session object among multiple servers by storing it SQL Server between requests (you need a clustered SQL Server to achieve failover there). This allows for a load-balanced ASP.Net application, however it requires shipping Session back and forth to SQL Server with every single request. This works on small apps, but not for anything serious.

On top of this there is simply no way at all to have a shared pool of objects at the Application level in ASP.Net. Hence the third-party add-ons.

You need to program to their API.

Walaa avatar
Walaa
Support Team
Posts: 14987
Joined: 21-Aug-2005
# Posted on: 22-Nov-2007 15:02:11   

Why don't you use a StateServer to store Session objects (in memory). Check my blog post: http://walaapoints.blogspot.com/2007/07/aspnet-session-state-brief.html

csmac3144
User
Posts: 74
Joined: 12-Sep-2007
# Posted on: 22-Nov-2007 15:07:02   

Walaa:

StateServer is still a single point of failure. If you lose that server you lose the whole application. SQL Server is actually better because at least you can create a multi-node cluster so that losing one machine will not bring down the application.

Among other things we build systems that scan no-fly lists and perform other security functions for over 50 airlines. We simply cannot tolerate application failure for any reason.

Also, the Session object is in scope for the current user only. This doesn't solve the problem of application-scoped objects (e.g., a list of all the world's airports and their details) which should be in memory for fast access and accessible to every user on the system at once.

superzaif
User
Posts: 5
Joined: 31-Oct-2005
# Posted on: 22-Nov-2007 23:33:39   

I am still not sure how a hardware load balancer doesn't help in the session scenario. We do this constantly with "sticky sessions" on the load balancers and set the timeout according to requirements. This way session object sharing between servers is never required as the user never looses session with the server they initiated the session on. (Look for products like big/ip, netscaler, etc). For the application level data, it depends on how large the dataset is and how you are using it. We use SQL server analysis services cube with some caching on the app servers. This seems to produce acceptable results for us and also keeps the data fresh as we refresh the cube every hour. For these lookups we use MDX queries and do not involve LLBL in it at all. After all, it's just read-only, fast lookup data.

csmac3144
User
Posts: 74
Joined: 12-Sep-2007
# Posted on: 23-Nov-2007 01:07:07   

Sticky sessions can work in some situations, however they are not a full solution of the sort I am looking for. Once a client establishes affinity with a server that client's session will be lost if the node fails. Primitive affinity systems using IP addresses are prone to failure when clients come in from a proxy (e.g., AOL). Otherwise you need a cookie or something.

The scenario you describe still does not achieve the sort of multi-node, application-scoped object caching that separates J2EE/EJB app servers from .Net.

There is a perception among large system integrators (e.g., we deal with SAIC) that .Net cannot scale to handle large systems. I believe it can, however it cannot do so without a true multi-node caching system. Why Microsoft refuses to build this themselves I don't know. I expect it is because they don't seriously target the high-end enterprise market.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39788
Joined: 17-Aug-2003
# Posted on: 23-Nov-2007 11:00:41   

csmac3144 wrote:

Sticky sessions can work in some situations, however they are not a full solution of the sort I am looking for. Once a client establishes affinity with a server that client's session will be lost if the node fails. Primitive affinity systems using IP addresses are prone to failure when clients come in from a proxy (e.g., AOL). Otherwise you need a cookie or something.

The scenario you describe still does not achieve the sort of multi-node, application-scoped object caching that separates J2EE/EJB app servers from .Net.

There is a perception among large system integrators (e.g., we deal with SAIC) that .Net cannot scale to handle large systems. I believe it can, however it cannot do so without a true multi-node caching system. Why Microsoft refuses to build this themselves I don't know. I expect it is because they don't seriously target the high-end enterprise market.

I think caching is something which is often done at the wrong level. Most websites are actually a read-only system with rarely a write action. If your website has write actions to the content it serves most less than it has read actions, you can win mostly by simply cache the true output and that's at a high level not at a low(er) level where the o/r mapper lives.

Java app servers have a feature which is missing in .NET and that's known as cross system object awareness: on box A you can get a reference to an object on box B without marshalling/serialization issues. On .NET it's not transparent, you have to make objects serializable, they're always copies etc.

Is that truly necessary? No. On the web, the fastest performance is achieved by simply doing a thing just once and only repeat it if you have to.

Take this trick: - you have a dynamic system, it can render the website from data in the db. Rendering the complete system takes for example 5 seconds. - you therefore render on that system the site every 10 seconds, from top to bottom - you cache any output on a different box - your users visit THAT box.

You can now serve the users a site which is never slow, you always serve them data which is roughly up to date (no user will see a delay of 10 seconds) and it can handle a very high load. If the load gets too heavy on the render server, you can increase the window, or make it more advanced: render some parts every second and other parts every minute.

The big point is that you only serve cached data, you never do something twice. This forum for example renders the posts and stores them as-is in the db. Reading a thread, the most expensive operation on a forum, is then an exercise of fetching the html to display and build the page. We could go further and cache the block of messages for a minute. As the length of the page is per user defined, we can't do that at the moment, but if that would be fixed, we could.

A high traffic website should be doing the same thing: stuff which is always shown as readonly data (and most websites are this way: users almost never are able to change the content on a website in depth wink ) should be rendered once and then cached and served from that cache. Why bother re-rendering it for every user? This leaves the dynamic parts and you'll see quickly that the dynamic parts which should be dynamic (e.g. a shopping cart) aren't going to bring your server down either because it's not the majority of the data served to the visitor.

Frans Bouma | Lead developer LLBLGen Pro