Tweak in CollectionCore<T>

Posts   
 
    
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 08-Dec-2006 10:14:54   

Hi

In the AddRange() method, I think the line

_contents.Capacity += c.Count;

should be

_contents.Capacity = _contents.Count + c.Count;

Cheers Simon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39943
Joined: 17-Aug-2003
# Posted on: 08-Dec-2006 10:26:30   

Capacity is always >= the count, as capacity is increased as soon as internally the storage of the data has to be increased.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 08-Dec-2006 11:06:08   

Otis wrote:

Capacity is always >= the count, as capacity is increased as soon as internally the storage of the data has to be increased.

Thats right but what the code is doing is adding the new count to the existing capacity so the capacity will always increase.

Actually, I think it should be

_contents.Capacity = Math.Max(_contents.Capacity, _contents.Count + c.Count);

(I mixed up Capacity with EnsureCapacity which doesn't reduce, only increases)

By adding the new count the existing count, you will only increase the capacity when actually necessary.

    [Test]
    public void TestEmptyCollectionCapacity()
    {
        EntityCollection<VoyageEntity> data = new EntityCollection<VoyageEntity>();
        Console.WriteLine(data.Capacity);
        data.Add(new VoyageEntity());
        Console.WriteLine(data.Capacity);
    }

    [Test]
    public void TestEmptyCollectionCapacityRange()
    {
        EntityCollection<VoyageEntity> data = new EntityCollection<VoyageEntity>();
        Console.WriteLine(data.Capacity);
        data.AddRange(new VoyageEntity[] {new VoyageEntity()});
        Console.WriteLine(data.Capacity);
    }

The former, does not increase the Capacity - it stays at 256 The latter's Capacity is 257. If you add 1000 entities it would be 1256.

Incidentally, InitClassCore() is called with an initial capacity of 32 or 256 depending on how it is created. I think that 256 is way too much taking up 1KB even for an empty array.

List<T> uses 4 as a default and then doubles in size - I've tried various options for presizing collections in my serialization stuff and I've not yet been able to find a worthwhile reason unless the size is actually known- in fact time often increased because of additional GCs - the internal resizing is extremely quick.

Cheers Simon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39943
Joined: 17-Aug-2003
# Posted on: 08-Dec-2006 11:29:21   

simmotech wrote:

Otis wrote:

Capacity is always >= the count, as capacity is increased as soon as internally the storage of the data has to be increased.

Thats right but what the code is doing is adding the new count to the existing capacity so the capacity will always increase.

Actually, I think it should be

_contents.Capacity = Math.Max(_contents.Capacity, _contents.Count + c.Count);

(I mixed up Capacity with EnsureCapacity which doesn't reduce, only increases)

By adding the new count the existing count, you will only increase the capacity when actually necessary.

Good point. I'll change it.

    [Test]
    public void TestEmptyCollectionCapacity()
    {
        EntityCollection<VoyageEntity> data = new EntityCollection<VoyageEntity>();
        Console.WriteLine(data.Capacity);
        data.Add(new VoyageEntity());
        Console.WriteLine(data.Capacity);
    }

    [Test]
    public void TestEmptyCollectionCapacityRange()
    {
        EntityCollection<VoyageEntity> data = new EntityCollection<VoyageEntity>();
        Console.WriteLine(data.Capacity);
        data.AddRange(new VoyageEntity[] {new VoyageEntity()});
        Console.WriteLine(data.Capacity);
    }

The former, does not increase the Capacity - it stays at 256 The latter's Capacity is 257. If you add 1000 entities it would be 1256.

Incidentally, InitClassCore() is called with an initial capacity of 32 or 256 depending on how it is created. I think that 256 is way too much taking up 1KB even for an empty array.

That's a tiny mistake on my part. There are two calls to InitClassCore(), one passes 32, the other passes 256. I've now corrected the 256 one to also use 32.

List<T> uses 4 as a default and then doubles in size - I've tried various options for presizing collections in my serialization stuff and I've not yet been able to find a worthwhile reason unless the size is actually known- in fact time often increased because of additional GCs - the internal resizing is extremely quick.

Cheers Simon

'32' is a random number I picked, I don't have proof it is the most common demeanor to have, though thinking about it how much items there should be in a collection, and to avoid resizes as much as possible, the collection shouldn't be too small, but also not too big (256 is too much, it allocates too much memory in large collections of entities which have collections of related entities).

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 08-Dec-2006 12:36:10   

Otis wrote:

'32' is a random number I picked, I don't have proof it is the most common demeanor to have, though thinking about it how much items there should be in a collection, and to avoid resizes as much as possible, the collection shouldn't be too small, but also not too big (256 is too much, it allocates too much memory in large collections of entities which have collections of related entities).

H'mmm - hard to come up with a good default value. The testing I am doing at the moment involves a lot of M:1 and 1:M relationships via Prefetch paths and frequently results in just a single entity in the collection. Is there anywhere a DefaultCollectionCapacity could be set? Or maybe the Prefetch path code could call TrimExcess once a member collection is filled?

Cheers Simon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39943
Joined: 17-Aug-2003
# Posted on: 08-Dec-2006 12:53:20   

Not at the moment. Though what a small number implies as well is memory fragmentation, due to resizes. The CLR has to allocate a bigger buffer, block-copy the references over and deallocate the original buffer. Memory fragmentation then leads to compact actions in the GC, which can slow things down. Of course, if there are entities added to the collection. But a collection with capacity == count leads automatically to a resize even after a single add. My benchmarking tests (as long as you can rely on benchmarks of course wink ) suggest that the fewer resizes you have, the faster the code.

Frans Bouma | Lead developer LLBLGen Pro