Below is my application database table which contains SQL queries stored in a table: QueryStorage
Id Query ConnectionString Rdbms
1 select... Data Source Sql Server
2 select... Data Source Oracle
The SQL queries in the above table are updated through web service and we are not allowed to update above the queries, though we can add something on top of the query something like this:
Query stored in table: select id as LinkedColumn, Amount as CompareColumn from Source
Tweaked query from my c# app:select Q.LinkedColumn, Q.CompareColumn from (stored sql query) as Q
I am trying to compare 2 unordered list like below:
Query executed for Id = 1(Sql server)
from QueryStorage table records are like below:
select Id as LinkedColumn,CompareColumn from Source
List 1:
LinkedColumn CompareColumn
1 100
2 200
3 300
4 400
5 500
6 600
7 700
8 800
9 900
10 1000
Query executed for Id = 2(Oracle)
from QueryStorage table records are like below:
select Id as LinkedColumn,CompareColumn from Target
List 2:
LinkedColumn CompareColumn
10 10
9 20
8 30
7 40
6 50
5 60
4 70
3 80
2 90
1 5
I want to join on LinkedColumn from source to target
and then do comparison on CompareColumn
which should give me following output:
SrcLinkedColumn SrcCompareColumn TgtLinkedColumn TgtCompareColumn
1 100 1 5
2 200 2 90
Logic:
var data = (from s in List1.AsEnumerable()
join t in List2.AsEnumerable() on s.Field<string>("LinkedColumn") equals t.Field<string>("LinkedColumn")
where s.Field<decimal>("CompareColumn") != t.Field<decimal>("CompareColumn")
select new
{
srcLinkedcol = s.Field<string>("LinkedColumn"),
srcCompareCol = s.Field<decimal>("CompareColumn"),
tgtLinkedCol = t.Field<string>("LinkedColumn"),
tgtCompareCol = t.Field<decimal>("CompareColumn")
}).ToList();
There will be millions of records from source to target which I want to compare with so big issue is of out of memory exception
which we are facing right now by loading all data in memory and then doing comparison with above linq query.
There are 2 solutions which I have thought of like below:
1) Open 2 ordered data reader.
Pros :
- No memory exception
- Fast as there will be 1 to 1 comparision of LinkedColumn for List1 and
List2 records.
Cons :
- Order by is require on LinkedColumns and as i have no control over
query as because it is dumped by webservice in QueryStorage table so
user is explicitly require to submit query with order by on
LinkedColumn.
- Wont work if order by on Linkedcolumn is not present.
- Order by query have performance overhead so because of this user may not include order by on LinkedColumn in query.
2) Compare chunk by chunk records by tweaking query and adding OffSet and FetchRowNext
. This is how I am thinking for algorithm:
But i still feel that with second approach i can get memory exception issue because at some steps where data from source and target are not matching i will be storing them inside buffer(datatable or list etc..) for next chunk comparision.
Can anybody please guide me what should be the good algorithm for this or any better way to address this?
Note : I dont want to use LinkedServer and SSIS.
See Question&Answers more detail:
os