I was hoping a savant would answer this one and tell me I was dumb for asking, but here are my current thoughts maybe this will bring her out... path forward as I see it below.
Given only read access it is best to pull all data at the onset from the different sources to an environment that I have full development rights (i.e. a PostgreSQL or MySQL server on localhost if small... or potentially a single development place on azure that the enterprise allows for full development rights). Create this in an ETL tool, i.e. SSIS, and run as updated data is needed for any delta in the source. Subsequently perform joins on this consolidated data.
This makes sense since I need to experiment with the joins, dealing with a small enough or transferable amount of data that the duplication of the data is not a resource hog, and would perform the experimental joins directly in a SQL editor instead of an ETL tool.
If I had a much larger dataset I would attempt to sample down, perform the same exercise and then once the cross server join was understood execute it at scale in an ETL tool, probably in some sort of batch fashion that ideally could be eventually executed in parallel.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…