It's never a trivial job to integrate databases when the records don't have unique primary keys in all databases. A few weeks ago I built a similar integration script for which I decided to use Entity Framework.
First the good news. With EF's DbContext
API it's ridiculously easy to insert a complete object graph and make EF take care of all newly generated primary keys an foreign keys. The reason why this is so easy is that when an object's state is changed to Added
all of its adhering objects become Added
as well and EF figures out the right order of inserts. This is truly great! It made me build the core of the copy routine in a few hours, which would have been many days if I should have done it in T-SQL for example. The latter is much much more error prone too.
Of course life isn't that easy. Now the bad news:
This takes tons of machine resources. Of course I used a new context instance for each copy step, but still I had to execute the program on a machine with a decent processor and a fair amount of internal memory. The exact specifications don't matter, the message is: test with the largest databases and see what kind of beast you need. If the memory consumption can't be managed by any machine at your disposal, you have to split up the routine in smaller chunks, but that will take more programming.
The object graph that's changed to Added
must be divergent. By this I mean that there should only be 1-n
associations starting from the root. The reason is, EF will really mark all objects as Added
. So if somewhere in the graph a few branches refer back to the same object (because there is a n-1
association), these "new" objects will be multiplied, because EF doesn't know their identity. An example of this could be Company -< Customer
-< Order
>- OrderType
: when there are only 2 order types, inserting one root company with 10 customers with 10 orders each will create 100 order type records in stead of 2.
So the hard part is to find paths your class structure that are divergent as much as possible. This won't always be possible. If so, you'll have to add the leaves of the converging paths first. In the example: first insert order types. When a new company is inserted you first load the existing order types into the context and then add the company. Now link the new orders to the existing order types. This can only be done if you can match objects by natural keys (in this example: the order type names), but usually this is possible.
You must take care not to insert multiple copies of master data. Suppose the order types in the previous example are the same in all databases (although their primary keys may differ!). The order types from the source database should not be reinserted in the target database. Moreover, you must fix the references in the source data to the correct records in the target database (again by matching by natural key).
So although it wasn't trivial it was doable and the job was done in a relatively short time. I'm sure that other alternatives (t-SQL, integration services, BIDS, if doable at all) would have taken more time or would have been more buggy. And the problem with bugs in this area is that they may become apparent much later.
I later found out that the issues I describe under 2) are related to fetching the source objects with AsNoTracking
. See this interesting post: Entity Framework 6 - use my getHashCode(). I used AsNoTracking
because it performs better and it reduces memory consumption.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…