Bulk Import Question

Posts   
 
    
Posts: 134
Joined: 04-Mar-2005
# Posted on: 27-Oct-2005 02:20:08   

I've written an import to move data from our legacy system to the new (better, faster, etc.) system. There will be a one-time load of the existing data and then an on-going load of changes (the legacy system stays unfortunately...).

I'm looping through all records to be imported, applying various business rules to transform to the new structure and then committing. I maintain one adapter throughout the import which I pass to each method which requires it. While the commit time varies depending on exactly what's being created in the new structure the commit time seems to trend upward overall as I progress through the rows. Is it possible that the adapter is getting "loaded down" with the entities which have been commited and not completely disposed of? I don't think it's a DB issue because if I stop the import process and then pick up where I left off the speed seems to increase.

This is all based on highly un-scientific methods of testing and timing so if you tell me I'm crazy I'll just go away... wink As much as anything I wanted to check that keeping the same adapter for the entire import is a good idea.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39933
Joined: 17-Aug-2003
# Posted on: 27-Oct-2005 11:20:57   

You should keep an eye on what you create for object graphs. If you add a given entity E to another entity X's collection, you by then have added X and its graph to E and if you save E, X and all its related entities are examined as well.

So if you're doing importing, stick with collections, don't use graphs, just build collections, and save them non-recursively.

Also, if you're doing a bulk import inside 1 transaction, it will be slower.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 134
Joined: 04-Mar-2005
# Posted on: 27-Oct-2005 14:57:29   

Otis wrote:

You should keep an eye on what you create for object graphs. If you add a given entity E to another entity X's collection, you by then have added X and its graph to E and if you save E, X and all its related entities are examined as well.

So if you're doing importing, stick with collections, don't use graphs, just build collections, and save them non-recursively.

I'm deliberately doing non-recursive saves already however some of my keys are being defined in the import (as part of a sequence) so I need the refetch and FK/PK sync that comes from adding an entity to another's entity collection. I'm minimizing this as much as possible

Otis wrote:

Also, if you're doing a bulk import inside 1 transaction, it will be slower.

I'm doing the import as one transaction per source entity (which results in many destination entities) but I'm keeping the same adapter open for the entire import.

I noticed that memory usage creeps up through the import but when the apapter is closed a significant amount gets released. I'm assuming that I need to do something different to what I'm currently doing to make this more efficient.

Any ideas? confused

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39933
Joined: 17-Aug-2003
# Posted on: 01-Nov-2005 11:20:57   

ChicagoKiwi wrote:

Otis wrote:

You should keep an eye on what you create for object graphs. If you add a given entity E to another entity X's collection, you by then have added X and its graph to E and if you save E, X and all its related entities are examined as well.

So if you're doing importing, stick with collections, don't use graphs, just build collections, and save them non-recursively.

I'm deliberately doing non-recursive saves already however some of my keys are being defined in the import (as part of a sequence) so I need the refetch and FK/PK sync that comes from adding an entity to another's entity collection. I'm minimizing this as much as possible

You only need one reference from one graph to another...

Otis wrote:

Also, if you're doing a bulk import inside 1 transaction, it will be slower.

I'm doing the import as one transaction per source entity (which results in many destination entities) but I'm keeping the same adapter open for the entire import.

As long as you commit the transaction, it should be ok, but during a transaction, the save will be slower. I use hashtables to keep track of which entities are in a hashtable, so that shouldn't be a problem over time... adding an entry to a hashtable is linear, not accumulitive in time.

I noticed that memory usage creeps up through the import but when the apapter is closed a significant amount gets released. I'm assuming that I need to do something different to what I'm currently doing to make this more efficient. Any ideas? confused

What you could try for testing is to close the connection as well when you commit. It might be the Oracle ODP.NET resources are the ones you see in the memory being build up, and these aren't freed necessarily if the connection is kept open.

Frans Bouma | Lead developer LLBLGen Pro