Sanity check + Remoting

Posts   
 
    
Tim
User
Posts: 5
Joined: 24-Sep-2004
# Posted on: 27-Jun-2005 12:18:29   

Hi there,

I need a sanity check (1) and some remoting advice (2)

Background:

We're building an n-tier application. Right now we have a database (DB), a business layer (BL), and several client applications (PL).

The BL is essentially a calculation engine. It needs to load entities and use them for processing. It doesn't modify entities. To make the code in the engine "easier" (lazy-loading in particular), it looks like the self-servicing model is the way to go.

The PL, on the other hand, should never be able to save objects itself - it must always pass them to the business layer.

The PL must also be able to show large datasets that are normally straight from the database (usually in the 10s of thousands or rows but in the case of some users they'd like to be able to go through millions of rows in a grid).

(1) So, the sanity check. What I'm inclined to do is the following:

On the BL:

use self-servicing for the calculations within the BL

send/receive datatables to the client (via a remoting interface)

On the PL:

send/receive datatables

Is that a reasonable way of doing things?

(2) Has anybody had experience of exposing large datasets to the client over remoting? I've tried sending reasonable-sized datasets using remoting and the speed is disturbingly bad (the database fetch of 7500 rows took 3 seconds, and then it took another 30 to get it to the client application).

I'm at a bit of a loss as to how to solve this - we don't want the PL to talk directly to the database but everything else we've tried has been too slow to use in anger. Any input is greatly appreciated.

Thanks,

Tim.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 27-Jun-2005 17:50:44   

Tim wrote:

Hi there,

I need a sanity check (1) and some remoting advice (2)

Background:

We're building an n-tier application. Right now we have a database (DB), a business layer (BL), and several client applications (PL).

The BL is essentially a calculation engine. It needs to load entities and use them for processing. It doesn't modify entities. To make the code in the engine "easier" (lazy-loading in particular), it looks like the self-servicing model is the way to go.

The PL, on the other hand, should never be able to save objects itself - it must always pass them to the business layer.

.. which makes you to use adapter instead. I'd use adapter. The little downside of not having lazy loading is compensated with prefetch paths.

The PL must also be able to show large datasets that are normally straight from the database (usually in the 10s of thousands or rows but in the case of some users they'd like to be able to go through millions of rows in a grid).

No-one is able to process on screen even 2000 rows in a grid. I can assure you, no grid will be able to bind to millions of rows, nor do you want to do that. It's the same as giving a person a stack of a million pages to process them.

(1) So, the sanity check. What I'm inclined to do is the following:

On the BL: use self-servicing for the calculations within the BL send/receive datatables to the client (via a remoting interface) On the PL: send/receive datatables Is that a reasonable way of doing things?

No, as you make it unnecessary difficult for yourself simple_smile . I'd use adapter, send the entity collections to the PL, they're there manipulated, PL sends them back to the server, and they're saved there. No extra conversion code.

(2) Has anybody had experience of exposing large datasets to the client over remoting? I've tried sending reasonable-sized datasets using remoting and the speed is disturbingly bad (the database fetch of 7500 rows took 3 seconds, and then it took another 30 to get it to the client application).

A dataset always serializes to XML, even in binary remoting. An entity collection does not, it will be much smaller.

Nevertheless, sending 7500 rows to the client for display is IMHO way too much. No user can cope with 7500 rows in a grid. Also some grids won't like it a lot.

I'm at a bit of a loss as to how to solve this - we don't want the PL to talk directly to the database but everything else we've tried has been too slow to use in anger. Any input is greatly appreciated.

Well, I first would try to send only that data to the PL which is actually processable by the user. It's of no use to send 10,000 rows if the user can only read 100.

Frans Bouma | Lead developer LLBLGen Pro
Tim
User
Posts: 5
Joined: 24-Sep-2004
# Posted on: 28-Jun-2005 10:42:06   

Otis wrote:

.. which makes you to use adapter instead. I'd use adapter. The little downside of not having lazy loading is compensated with prefetch paths.

Ok, I'll investigate the prefetch system.

Otis wrote:

No-one is able to process on screen even 2000 rows in a grid. I can assure you, no grid will be able to bind to millions of rows, nor do you want to do that. It's the same as giving a person a stack of a million pages to process them.

Part of the problem is that at the moment we have quite a fast "fat client" where the user can scroll through large amounts of data should they wish to (it's not efficient, but they like working that way). The users are going to take a lot of convincing that they can't work like that any more.

But ok. We'll force the user to go through their data using filters to cut down the fetch size or paginate the resultset. We'll still need to fetch several thousand rows in some cases (generally they're reports) but we'll cross that bridge when we come to it. simple_smile

Otis wrote:

A dataset always serializes to XML, even in binary remoting. An entity collection does not, it will be much smaller.

We tested that too, but found that the entity collection took even longer to deserialise than the datatable -- we're still investigating that one, though.

Thanks for the help, Otis. simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 28-Jun-2005 11:15:14   

Tim wrote:

Otis wrote:

No-one is able to process on screen even 2000 rows in a grid. I can assure you, no grid will be able to bind to millions of rows, nor do you want to do that. It's the same as giving a person a stack of a million pages to process them.

Part of the problem is that at the moment we have quite a fast "fat client" where the user can scroll through large amounts of data should they wish to (it's not efficient, but they like working that way). The users are going to take a lot of convincing that they can't work like that any more.

But ok. We'll force the user to go through their data using filters to cut down the fetch size or paginate the resultset. We'll still need to fetch several thousand rows in some cases (generally they're reports) but we'll cross that bridge when we come to it. simple_smile

I think it's more a mindset: "I want to be able to look at all the data if I have to". With pagination they can, though you also make sure the system doesn't break down. When 100's of clients are requesting hundreds of thousands of rows, your system is definitely not going to perform, no matter what you choose as data-access technology.

Otis wrote:

A dataset always serializes to XML, even in binary remoting. An entity collection does not, it will be much smaller.

We tested that too, but found that the entity collection took even longer to deserialise than the datatable -- we're still investigating that one, though.

Ok, just to avoid confusion: you loaded data in an entity collection and you serialized / deserialized it using the binary formatter to a binary file and that's slower than Dataset's xml serializing? I have to check it but I seriously doubt it. I wasn't talking about Xml serialization of course, that indeed is slower, (though the compact xml is pretty ok). Espcially the size of the data to send over the wire is much smaller than in the case of XML of the dataset. You have to use remoting of course, but with large amounts of data, you definitely don't want to use Xml webservices.

Frans Bouma | Lead developer LLBLGen Pro
Franck
User
Posts: 2
Joined: 28-Jun-2005
# Posted on: 28-Jun-2005 15:57:36   

Hi Otis,

I work with Tim on this problem. Here are the results of the tests I've done.

I've tried serializing entity collections to a file using the following code :


TABLENAMECollection collec = new TABLENAMECollection();
collec.GetMulti(null);

FileStream sxml = new FileStream("TABLENAMECollection.xml",FileMode.Create);
SoapFormatter formatterxml = new SoapFormatter();
formatterxml.Serialize(sxml,secutable);
sxml.Close();

to serialize a self servicing to a file. I did similar serialization with BinaryFormatter instead of SoapFormatter (as one can chose between those two for standard remoting).

This test gave me a 800Mb file for XML serialization and 73Mb for binary.

I also tried with the adapters filling the entity using the following code :


EntityCollection collec = new EntityCollection(new TablenameEntityFactory());
DataAccessAdapter adapter = new DataAccessAdapter();
adapter.FetchEntityCollection(collec,null);

This test gave a 670Mb file for XML serialization and 58Mb for binary.

The original table is about 7300 rows with about 80 fields/rows. If I convert this into a DataTable keeping only the necessary informations (the actual data), It shrinks those sizes to 17Mb and 5Mb (XML and binary resp.).

It appears that a lot of space is taken by validators or other objects liked to entities which appear to make it unusable for large collections.

Did I miss something here ? Is there another faster way you would use remoting with entities ?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 28-Jun-2005 16:47:23   

Soap is very verbose, that's why I only talked about binary serialization.

An entity contains more objects than a datarow, which causes some extra overhead in binary serialization, though 58MB is a lot for 7500 objects, as everything is pretty much an object reference, not values. When I serialize 2000 orderdetail objects to a file, binary, it's 1.75MB, though that record is pretty small.

Hmm. disappointed

I really don't know why the data is so big, the collection serializes a couple of booleans, a string and other tiny data, then all the entities, the entity serializes the fields object, and a couple of empty hashtables and some strings which are minor and the fields object contains a small hashtable and field objects, but these are also not very big.

I'll see what the cause is of the huge data per entity, as 2000 orderdetail objects shouldn't take much more than perhaps 200 KB.

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 28-Jun-2005 17:35:16   

When I use soap, and serialize 1 entity, the data exported is a lot about object structure, and per field data. As the entities are separate units, and their fields are therefore also not part of a bigger column collection, their object structure data is stored in the exported data, so object restore is possible.

Now, for a single set of entities, without any changes whatsoever, I could get away with storing only the data elements in the output, using an implementation of ISerializable, which are then set back into the field objects created by the constructor when deserializing. This could eliminate a lot of the data in the output. Though as soon as there is some form of hierarchy, or fields have been changed, etc., this result in field references which are hard to setup again, if ever if solely the data is exported. This thus would result in exportation of the current data as well, so nothing would be gained.

A datatable is exported as: column data and then per row the cells. That's it.

To overcome the data-limit, you should use paging. For reports created on the client, where perhaps 100,000 rows or more are to be read to the client for report-generation, I think that's not that efficient, even with datasets. It's then better to perform the report data calculations on the server and send the data to be bound to the report to the client.

Frans Bouma | Lead developer LLBLGen Pro
Franck
User
Posts: 2
Joined: 28-Jun-2005
# Posted on: 30-Jun-2005 08:56:13   

Otis wrote:

When I use soap, and serialize 1 entity, the data exported is a lot about object structure, and per field data. As the entities are separate units, and their fields are therefore also not part of a bigger column collection, their object structure data is stored in the exported data, so object restore is possible.

You're right, by looking in more detail into the XML file, we can see a lot of references.

Otis wrote:

Now, for a single set of entities, without any changes whatsoever, I could get away with storing only the data elements in the output, using an implementation of ISerializable, which are then set back into the field objects created by the constructor when deserializing. This could eliminate a lot of the data in the output. Though as soon as there is some form of hierarchy, or fields have been changed, etc., this result in field references which are hard to setup again, if ever if solely the data is exported. This thus would result in exportation of the current data as well, so nothing would be gained.

A datatable is exported as: column data and then per row the cells. That's it.

It's actually what we end up doing. We're now only storing the necessary elements. And LLBLGen gives us an advantage here since the stream doesn't need to know about any metadata (such as the column name that we would use in "normal" serialization) . Those can be generated using a custom template thus improving the speed a lot and the amount of data transfered as well (we're at around 4Mb now it think). So thanks for developping LLBLGen simple_smile

The only down side appears if we want to use a collection on the client side (for validation for instance). It takes a lot of time creating the collection back from the dataset.

Otis wrote:

To overcome the data-limit, you should use paging. For reports created on the client, where perhaps 100,000 rows or more are to be read to the client for report-generation, I think that's not that efficient, even with datasets. It's then better to perform the report data calculations on the server and send the data to be bound to the report to the client.

You're right, paging is also something we should investigate later on if the speed is still a problem (no doubt it will be).

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39794
Joined: 17-Aug-2003
# Posted on: 30-Jun-2005 11:09:56   

You could opt for having a typedlist filled, then serialize that, pass the blob over, and on the client rebuild the collection from the typedlist, though that indeed might also take some time.

Frans Bouma | Lead developer LLBLGen Pro