- Home
- LLBLGen Pro
- LLBLGen Pro Runtime Framework
Entity Collection Size
Joined: 01-Sep-2005
What are the Limitations on size of Entity Collections - Self Servicing?
We have started looking at retreiving entity collections an are starting to run into application crawl when getting large collections. Each Client will probably not require more than 1 or 2 Collections at a time for main processing (up to 100 properties - lots of Strings) - we can limit the size of these to low 100's. The Clients applications will also require the use of look up tables - 25+ lookup tables with probably 20-25 records each.
This will probably run ok 1 or 2 users.
We are concerned about the AppServer and 9:00am when possibly 30-40 users all log on and receive 1 or 2 moderate size collections and 25+ lookup table collections (Each AppServer may be servicing up to 100 users). This could have an AppServer with ((40x25) + (2x40)) 1000+ collections - worst case ((100x25) + (2x100)) 27000+ collections
We have tried some preliminary tests with plain entities (ie no additional properties). Getting 600 Entities with 60 properties (a lot of strings) works ok. but 15000 seems to take so long we give up!
Joined: 09-Aug-2004
I believe it inherits from CollectionBase which uses an ArrayList as the internal collection. So in theory, the you should be able to hold as many elements as you system can handle.
As far as speed goes, do you need all of those properties? I believe you can select which fields of a table you want to retrieve.
Joined: 01-Sep-2005
Our Application is very lookup intensive. LLBL cannot create multiple entities for a table (if I am wrong - please let me know). The number of properties is not something we can change we need all that data, unless we breakup our tables - but we will just end up with a lot of 1:1 table relations.
Joined: 09-Aug-2004
For some reason I thought, and still think, You can have partial entities.
For strict lookup purposes you can create a typed list in which case you can define which fields from which tables you want to pull. Typed Lists are read only and you don't get entities, you get typedlistrow objects. For display this will work fine.
Do you need to pull all the data at once? Can you do paging? Caching with distributed machines? Just ideas
Is this a web app or a windows app?
Joined: 01-Sep-2005
This is a WinForms App. I know we can use Typed Lists etc. but what conmcerns me is the performance... We have a Table we are at the moment performing some testing on - 60 properties 15000 records. To read in the Table and send over to the client as a DataSet is almost instantaneous - (less than 10 seconds) To read in the Table and send over to the client as an Entity Collection is sloooooow - (a little over 10 minutes)
The Entity collect is the basic entity generated by LLBL - no additional stuff.
PaulMckenzie wrote:
What are the Limitations on size of Entity Collections - Self Servicing?
We have started looking at retreiving entity collections an are starting to run into application crawl when getting large collections. Each Client will probably not require more than 1 or 2 Collections at a time for main processing (up to 100 properties - lots of Strings) - we can limit the size of these to low 100's. The Clients applications will also require the use of look up tables - 25+ lookup tables with probably 20-25 records each.
The selfservicing collections do a lookup for each Add, this will be fixed in 1.0.2005.1 with a flag which allows you to simply add without the check, like in adapter, which will speed up additions of data to large collections a great deal (the method is already there, it's used internally by the fetcher)
Though 25 elements in a collection shouldn't be a problem at all.
This will probably run ok 1 or 2 users.
We are concerned about the AppServer and 9:00am when possibly 30-40 users all log on and receive 1 or 2 moderate size collections and 25+ lookup table collections (Each AppServer may be servicing up to 100 users). This could have an AppServer with ((40x25) + (2x40)) 1000+ collections - worst case ((100x25) + (2x100)) 27000+ collections
We have tried some preliminary tests with plain entities (ie no additional properties). Getting 600 Entities with 60 properties (a lot of strings) works ok. but 15000 seems to take so long we give up!
15000 in one collection? I'm not sure how you're looking up your data, but if you're having large sets of data for lookup, please consider a hashtable.
The find function uses a linear search, so with a lot of objects, this is slow. If you want to have an index on an entity collection for a given field, please create a hashtable, with buckets as the value, in where you store multiple entities, all which have the value of the key (if the field doesn't have unique values, you have to do that). Below I've pasted my multi-value hashtable class, which can help you in great deal.
PaulMckenzie wrote:
LLBL cannot create multiple entities for a table (if I am wrong - please let me know). The number of properties is not something we can change we need all that data, unless we breakup our tables - but we will just end up with a lot of 1:1 table relations.
This will come in 1.0.2005.1
PaulMckenzie wrote:
I know we can use Typed Lists etc. but what conmcerns me is the performance... We have a Table we are at the moment performing some testing on - 60 properties 15000 records. To read in the Table and send over to the client as a DataSet is almost instantaneous - (less than 10 seconds) To read in the Table and send over to the client as an Entity Collection is sloooooow - (a little over 10 minutes)
How are you sending it? Via remoting? that's indeed slow, as a lot of data will create a lot of slowness in the binary serializer. See this thread for more info: http://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=4037
Ok, here's my multi-value hashtable and accompanying helper class. It might help you speed up searches as you can create simple indexes with this class for fast retrieval of your lookup data, even in large datasets. These classes will be part of the ORMSupportClasses 1.0.2005.1
/// <summary>
/// ArrayList which contains solely unique values.
/// </summary>
public class UniqueValueList : ArrayList
{
/// <summary>
/// Creates a new <see cref="UniqueValueList"/> instance.
/// </summary>
public UniqueValueList():base()
{
}
/// <summary>
/// Creates a new <see cref="UniqueValueList"/> instance.
/// </summary>
/// <param name="c">Collection of objects to add to this collection. It will use the </param>
public UniqueValueList(ICollection c)
{
foreach(object o in c)
{
this.Add(o);
}
}
/// <summary>
/// Adds the range specified
/// </summary>
/// <param name="c">Collection with new objects to add</param>
public override void AddRange(ICollection c)
{
UniqueValueList toAdd = c as UniqueValueList;
if(toAdd==null)
{
toAdd = new UniqueValueList(c);
}
foreach(object o in toAdd)
{
this.Add(o);
}
}
/// <summary>
/// Inserts the value at the specified index if it's not already present in the list, otherwise it's a no-op
/// </summary>
/// <param name="index">Index.</param>
/// <param name="value">Value.</param>
public override void Insert(int index, object value)
{
if(!this.Contains(value))
{
base.Insert (index, value);
}
}
/// <summary>
/// Adds the specified value if it's not already in the list.
/// </summary>
/// <param name="value">Value.</param>
/// <returns>index of the value if it's added or the first index it appears on</returns>
public override int Add(object value)
{
int index = this.IndexOf(value);
if(index<0)
{
return base.Add(value);
}
else
{
return index;
}
}
/// <summary>
/// Inserts the range.
/// </summary>
/// <param name="index">Index.</param>
/// <param name="c">C.</param>
public override void InsertRange(int index, ICollection c)
{
UniqueValueList toAdd = c as UniqueValueList;
if(toAdd==null)
{
toAdd = new UniqueValueList(c);
}
base.InsertRange (index, toAdd);
}
/// <summary>
/// Gets or sets the <see cref="Object"/> at the specified index.
/// If the value is already in the list, this operation is a no-op
/// </summary>
/// <value></value>
public override object this[int index]
{
get
{
return base[index];
}
set
{
if(!this.Contains(value))
{
base[index] = value;
}
}
}
}
/// <summary>
/// Specialized hashtable which can store multiple values for a given key. All values are stored in an UniqueValueList as value. When the
/// value is requested, the UniqueValueList is returned, not the actual value.
/// </summary>
public class MultiValueHashtable:Hashtable
{
/// <summary>
/// Initializes a new instance of the <see cref="MultiValueHashtable"/> class.
/// </summary>
/// <param name="capacity">Capacity.</param>
public MultiValueHashtable(int capacity):base(capacity)
{
}
/// <summary>
/// Initializes a new instance of the <see cref="MultiValueHashtable"/> class.
/// </summary>
public MultiValueHashtable():base()
{
}
/// <summary>
/// Creates a new <see cref="MultiValueHashtable"/> instance.
/// </summary>
/// <param name="d">D.</param>
public MultiValueHashtable(IDictionary d):base(d)
{
}
/// <summary>
/// Adds an element with the specified key and value into the <see cref="T:System.Collections.Hashtable"/>.
/// </summary>
/// <param name="key">The key of the element to add.</param>
/// <param name="value">The value of the element to add. If the key is already existing, the value is added to the existing list of values
/// for that key, unless the value also already exists.</param>
public override void Add(object key, object value)
{
UniqueValueList valuesStored = null;
if(base.ContainsKey(key))
{
// already there.
valuesStored = (UniqueValueList)base[key];
}
else
{
// new
valuesStored = new UniqueValueList();
base.Add(key, valuesStored);
}
valuesStored.Add(value);
}
/// <summary>
/// Adds the objects as values for the specified key.
/// </summary>
/// <param name="key">Key.</param>
/// <param name="values">Values.</param>
public void Add(object key, ICollection values)
{
UniqueValueList valuesStored = null;
if(base.ContainsKey(key))
{
// already there.
valuesStored = (UniqueValueList)base[key];
}
else
{
// new
valuesStored = new UniqueValueList();
base.Add(key, valuesStored);
}
valuesStored.AddRange(values);
}
/// <summary>
/// Determines whether the multivaluehashtable contains the key, and if so, if the list of values stored under the key contains the value specified.
/// </summary>
/// <param name="key">Key.</param>
/// <param name="value">Value.</param>
/// <returns>true if the key exists and the list of values contains the specified value</returns>
public virtual bool Contains(object key, object value)
{
if(!base.ContainsKey(key))
{
return false;
}
return ((UniqueValueList)this[key]).Contains(value);
}
/// <summary>
/// Gets / sets a value for the given key. Hides original indexer
/// </summary>
/// <remarks>returns null if not found</remarks>
public new UniqueValueList this[object key]
{
get
{
if(!base.ContainsKey(key))
{
return null;
}
return (UniqueValueList)base[key];
}
set
{
this.Add(key, value);
}
}
}
Joined: 01-Sep-2005
Thanks Otis for the response.
The standard user will not be making these sort of demands on the App, but there will be times (e.g. financial returns, government returns, bulk student processing, end-of-year rollover, etc.) when this sort of bulk entity assessment/processing will be required. A frequent use of this is the importing/exporting of bulk data.
We have undertaken further testing without returning ~3-5 seconds (sub 1 second for DataSet). This is just a Load of All records – No sorting, no filtering - i.e. GetMulti(null). A big concern is the memory consumption – 155MB with the Collection and 68MB with the DataSet. These are the “out-of-the-box” standard DataSet and “out-of-the-box” standard LLBLGen-Pro generated Entities. This is not a big DataSet for us; many of our DataSets will be in the 100,000’s. Some of this processing can, and will, be performed in the DB via Stored-Proc’s - some of the processing however is very complex and we would prefer the processing to be performed via C#.
The major time and memory usage I mentioned for remoting was due to returning the Collection – Approx 10 minutes and 600MB!
One major requirement we have from our clients is the ability to perform “ad-hoc” queries (current functionality in existing App). A consequence of this is the possibility they can/will request all records (e.g. all students). We can assess this and stop it being returned, but the memory usage will be large and in an AppServer servicing up to 100 clients this is a problem! Would paging help with this?
Is the above possible; users creating “ad-hoc” queries to return dynamically created sets of data – e.g. 5 fields from “Student”, 2 fields “Degree”, 4 fields from “Course”, 3 fields from “Funding”, and 5 fields from “Payments” ? Also what type information is returned ?
I like what LLBL does and how it is used – but speed and memory usage is a concern when churning through large amounts of data.
Joined: 01-Sep-2005
Another approach is the use of TypedLists containing all Fields... We have performed some testing and these are significantly faster than EntityCollections. To read in the Table and send over to the client as a DataSet is almost instantaneous - (less than 10 seconds) To read in the Table and send over to the client as a TypedList is almost instantaneous - (less than 10 seconds) The real slow bit appears to be the sending to the client, though reading is slower than DataSet. This is good - slower than DataSets - but good... The downside is that TypedLists are more memory hungry between 1.5 - 2 times the memory required by DataSets.
When is your next version 1.0.2005.1 due out ?
PaulMckenzie wrote:
Thanks Otis for the response.
The standard user will not be making these sort of demands on the App, but there will be times (e.g. financial returns, government returns, bulk student processing, end-of-year rollover, etc.) when this sort of bulk entity assessment/processing will be required. A frequent use of this is the importing/exporting of bulk data.
Import/export of data might be a bottleneck, as you're then talking about massive amounts of data in-memory to be saved into a db system. Systems like DTS of sqlserver for example are much more efficient in this.
We have undertaken further testing without returning ~3-5 seconds (sub 1 second for DataSet). This is just a Load of All records – No sorting, no filtering - i.e. GetMulti(null). A big concern is the memory consumption – 155MB with the Collection and 68MB with the DataSet. These are the “out-of-the-box” standard DataSet and “out-of-the-box” standard LLBLGen-Pro generated Entities. This is not a big DataSet for us; many of our DataSets will be in the 100,000’s. Some of this processing can, and will, be performed in the DB via Stored-Proc’s - some of the processing however is very complex and we would prefer the processing to be performed via C#.
The major time and memory usage I mentioned for remoting was due to returning the Collection – Approx 10 minutes and 600MB!
Yes, at the moment a lot of elements in an entity are not shared. For example, 2 Customer entities have no data in common, not even the name of the fields. This causes in large sets of data, a lot of objects to be created in remoting scenario's.
To avoid having a couple of scenario's getting mixed up, I'd like to point out some things. For non-hierarchical sets of objects in an entity collection, a custom formatter for remoting can create a big difference (In .NET you can write your own formatter, which can pack data much more efficiently than you will get with the default binary/soap formatters). To transport an entity collection's contents of non-hierarchical entities (i.e.: the entities don't have a reference to a related entity inside themselves) you need: - the entity factory set on the entitycollection - per entity an array with their currentvalue values, one per field - per entity an array with their dbvalue values, one per field - per entity an array with their ischanged flags, one per field - per entity a flag if the entity is new.
The serialize method in your custom formatter packs this data into a datablock which is send to the client. There, the deserializer method of your formatter reads the factory first, creates an entity collection object and uses the factory to produce new entities. Then, it uses the arrays of values to fill each entity.
For hierarchical object graphs, this is not usable, as each object can have a graph inside itself. This makes it so hard to optimize this out. The dataset doesnt need to do this, as every row is non-hierarchical, PLUS, every row in a dataset doesn't contain field info, it shares that with every other row.
Sending large sets of data to a client (and I mean 100,000's of rows) is not the way remoting or webservices are designed to be used for. With these amounts of data, you need efficient protocols, like a custom formatter. A client application which pulls 100,000 rows of data to produce a form is better off asking the SERVER to produce the REAL data, based on these 100,000 rows and send that instead.
One major requirement we have from our clients is the ability to perform “ad-hoc” queries (current functionality in existing App). A consequence of this is the possibility they can/will request all records (e.g. all students). We can assess this and stop it being returned, but the memory usage will be large and in an AppServer servicing up to 100 clients this is a problem! Would paging help with this?
Paging is REQUIRED if your resultset is larger than a couple of hundred rows, simply because a user can't cope with thousands of rows in a form, no-one will look at 10,000 rows in a grid. Not only are today's datagrids extremely slow with a big set of rows in the datasource, it's also not useful, the user won't look at all the rows anyway. So paging can help you in great deal with this.
Is the above possible; users creating “ad-hoc” queries to return dynamically created sets of data – e.g. 5 fields from “Student”, 2 fields “Degree”, 4 fields from “Course”, 3 fields from “Funding”, and 5 fields from “Payments” ? Also what type information is returned ?
If you're creating a new resultset from fields from a couple of entities, you can do that, either with a typed list, or if you want to do it dynamically in code, via a dynamic list. (please see dynamic lists in the typedview/list documentation in using the generated code). A dynamic list is created for read-only fetching: you build the resultset from entity fields, specify a filter and aggregates, groupby etc., define the page you want and then call the fetch routine, which will create the sql query necessary and fetch the data into a datatable. This thus means the data is read-only, because the data is from various entities, and thus not an object graph.
PaulMckenzie wrote:
Another approach is the use of TypedLists containing all Fields... We have performed some testing and these are significantly faster than EntityCollections. To read in the Table and send over to the client as a DataSet is almost instantaneous - (less than 10 seconds) To read in the Table and send over to the client as a TypedList is almost instantaneous - (less than 10 seconds)
True, a typedlist is derived from a datatable. They're designed to be fast fetching alternatives for resultsets based on various entities.
The real slow bit appears to be the sending to the client, though reading is slower than DataSet. This is good - slower than DataSets - but good... The downside is that TypedLists are more memory hungry between 1.5 - 2 times the memory required by DataSets.
TypedLists don't contain any more data (except for 1 boolean) than a normal datatable, because they're derived from a datatable. So they can't use twice the amount of memory, simply because they're a datatable. So I find it very strange they do eat up that amount of memory...
When is your next version 1.0.2005.1 due out ?
Beta should start next monday, or if I don't manage to finish everything, tuesday.
I'll do some tests as well to see if it will break any code if I move alot of the inner variables of an EntityField2 object to the non-serialized section (because they're filled in by the factory anyway I think). The binary formatter's order in which data is handled is a bit obscure so I've to follow the good ol' trial/error approach.
Ok, after serializing the entity field data in a separate object (one per entity) and skipping the entityfields object, I managed to bring down the size of all northwind orders + all their order detail objects (2800 objects) from 5.42MB to 3.49 MB. Speed also increased a lot. This is a hierarchy, so it's promissing.
The data I store in the entityfield data is: private string _alias, _objectAlias; private object _currentValue, _dbValue; private bool _isChanged, _isNull; private AggregateFunction _aggregateFunctionToApply; private IExpression _expressionToApply;
so change tracking isn't lost at all. All other info is filled in by the factory when creating a new entityfields object.
I'm now trying to squeese more out of it in other areas.
I think that dropping alias and objectalias from this object also makes sense, as these are set during queries and this object is solely used when serializing/deserializing an entity, not a predicate, then it uses the normal field serializing/deserializing code. Save goes for aggregate/expression
(edit) ok got it to 3MB now, that is with this set: private object _currentValue, _dbValue; private bool _isChanged, _isNull;
Serializing/deserializing it to disk takes 2, 3 seconds, in a unittest, debug build.
(edit2): writing it to compact xml is btw: 4.44MB.
I think this adjustment is great 50% smaller
(edit3): a single flat collection of orderdetail objects (~ 2000) is now 980KB, was 1.7MB. I can't get it smaller, because the sync info has to be stored as well, in the case of a hierarchy. A single flat collection could be smaller, though it then has to know if any contained entity object has a related entity or not.
(edit4): whoa! checking out .NET 2.0 datatable serializing code, it seems there is another way: bitarrays and object arrays . Should be even smaller. Hold on.
(edit5): ok, using object arrays and bitarrays for the fields / tracking flags (and thus not the object with the fields mentioned above) I managed to bring down a flat orderdetails collection to 761KB (was 1.7MB). THe orders with all order details hierarchy is now 2.61MB (was originally 5.42 MB). .
Joined: 05-Aug-2005
PaulMckenzie wrote:
We have a Table we are at the moment performing some testing on - 60 properties 15000 records.
Why would you possibly need to send 15,000 items over the wire to the client? What would the user do with a list if items this big?
BOb
Joined: 01-Sep-2005
Sounds like the next version (and the .Net 2 version) will help speed greatly - thanks... When is the .Net 2 version expected ?
PaulMckenzie wrote:
Sounds like the next version (and the .Net 2 version) will help speed greatly - thanks... When is the .Net 2 version expected ?
v1.0.2005.1, which contains the enhancements, is now in beta. v2.0 is planned for the end of 2005, but can be january 2006, it's too early to tell, but we target end of 2005.
Joined: 01-Sep-2005
PaulMckenzie wrote:
Excellent - what will be the cost ?
V2.0? For current customers 49 euro for a limited period of time, per developer. (we move to a per-developer license for v2.0).
Joined: 01-Sep-2005
PaulMckenzie wrote:
What will the cost for non existing customers be ?
Probably 229 euro per developer. We require customers to have a license for the designer. So if you have a team with 4 people and 2 use the designer, you then (in v2.0) need 2 licenses.