Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
385 views
in Technique[技术] by (71.8m points)

sql server - best way to "glue" columns together

I need to combine columns from about 15 tables in one large table. Something the following works. But it takes very long to run while the CPU usage spikes to 100%, which causes concerns. Any suggestions will be highly appreciated.

declare @t1 table (empid int)
declare @t2 table (empid int, phone varchar(50))
declare @t3 table (empid int, license varchar(50))
declare @t4 table (empid int, email varchar(100))

insert into @t1 values (1)
insert into @t1 values (2)
insert into @t1 values (3)
insert into @t2 values (1, '5551234')
insert into @t2 values (2, '5553333')
insert into @t2 values (2, 'ttt2222')
insert into @t3 values (2, 'L4455')
insert into @t3 values (3, 'L7890')
insert into @t4 values (2, 'xxx@abc')

SELECT t1.empid, t2.phone, t3.license, t4.email
FROM
    @t1 t1
    LEFT OUTER JOIN
    (SELECT empid, phone, row_number() over (partition by empid order by phone) as rn 
    FROM @t2) t2 
    ON t2.empid = t1.empid
    FULL OUTER JOIN
    (SELECT empid, license, row_number() over (partition by empid order by license) as rn 
    FROM @t3) t3 
    ON t3.empid=t1.empid and (t2.rn is null or t3.rn = t2.rn)
    FULL OUTER JOIN
    (SELECT empid, email, row_number() over (partition by empid order by email) as rn 
    FROM @t4) t4
    ON t4.empid=t1.empid and t4.rn=coalesce(t2.rn, t3.rn) --image how long this coalesce clause is going to be for the 15th table?
order by t1.empid, t2.rn
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your question is not really clear and it would be helpful if you included expected result. Let me guess what you want...

I'll give more meaningful names to the tables in your example and add few more rows to highlight the problem. In real life these tables would be real tables, of course, not variables, but I'll stick with variables to make this sample script easy to run and try. I'm using SQL Server 2008 for this example.

declare @TMain table (empid int);
declare @TPhones table (empid int, phone varchar(50));
declare @TLicenses table (empid int, license varchar(50));
declare @TEmails table (empid int, email varchar(100));

insert into @TMain values (1);
insert into @TMain values (2);
insert into @TMain values (3);
insert into @TMain values (4);

insert into @TPhones values (1, '5551234');
insert into @TPhones values (2, '5551111');
insert into @TPhones values (2, '5552222');
insert into @TPhones values (2, '5553333');
insert into @TPhones values (2, '5554444');

insert into @TLicenses values (2, 'L4455');
insert into @TLicenses values (3, 'L7890');

insert into @TEmails values (2, 'xxx@abc');
insert into @TEmails values (2, 'yyy@abc');
insert into @TEmails values (2, 'zzz@abc');

Simple variant

There is a fast, efficient and wrong naive approach:

SELECT
    Main.empid
    ,Phones.phone
    ,Licenses.license
    ,Emails.email
FROM
    @TMain AS Main
    LEFT JOIN @TPhones AS Phones ON Phones.empid = Main.empid
    LEFT JOIN @TLicenses AS Licenses ON Licenses.empid = Main.empid
    LEFT JOIN @TEmails AS Emails ON Emails.empid = Main.empid
ORDER BY Main.empid, phone, license, email;

It produces Cartesian product of all rows and duplicates rows. This is the result set of the query above. You can see that empid = 2 returned 12 rows, which is 4 phones multiplied by 3 emails and by 1 license. My guess is that you want to see only 4 rows for empid = 2. In other words, for each empid the result should have minimum possible number of rows (I'll show the correct result set in the end).

empid   phone   license email
1   5551234 NULL    NULL
2   5551111 L4455   xxx@abc
2   5551111 L4455   yyy@abc
2   5551111 L4455   zzz@abc
2   5552222 L4455   xxx@abc
2   5552222 L4455   yyy@abc
2   5552222 L4455   zzz@abc
2   5553333 L4455   xxx@abc
2   5553333 L4455   yyy@abc
2   5553333 L4455   zzz@abc
2   5554444 L4455   xxx@abc
2   5554444 L4455   yyy@abc
2   5554444 L4455   zzz@abc
3   NULL    L7890   NULL
4   NULL    NULL    NULL

Long variant

I'm not sure whether my proposed approach below is more efficient than yours. You'll have to try both and compare performance for your data.

We'll need a table of numbers. SQL, Auxiliary table of numbers http://web.archive.org/web/20150411042510/http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html http://dataeducation.com/you-require-a-numbers-table/

Again, in real life you'll have a proper table of numbers, but for this example I'll use the following:

declare @TNumbers table (Number int);
insert into @TNumbers values (1);
insert into @TNumbers values (2);
insert into @TNumbers values (3);
insert into @TNumbers values (4);
insert into @TNumbers values (5);

The main idea behind my approach is to make a helper table that would contain correct number of rows for each empid at first and then use this table to get results efficiently.

We'll start with counting number of phones, licenses, e-mails for each empid:

WITH
CTE_Rows
AS
(
    SELECT Phones.empid, COUNT(*) AS EmpRows
    FROM @TPhones AS Phones
    GROUP BY Phones.empid

    UNION ALL

    SELECT Licenses.empid, COUNT(*) AS EmpRows
    FROM @TLicenses AS Licenses
    GROUP BY Licenses.empid

    UNION ALL

    SELECT Emails.empid, COUNT(*) AS EmpRows
    FROM @TEmails AS Emails
    GROUP BY Emails.empid
)

Then we calculate the maximum number of rows for each empid:

,CTE_MaxRows
AS
(
    SELECT
        CTE_Rows.empid
        ,MAX(CTE_Rows.EmpRows) AS MaxEmpRows
    FROM CTE_Rows
    GROUP BY CTE_Rows.empid
)

The CTE above has one row for each empid: empid itself and a maximum number of phones, licenses, e-mails for this empid. Now we need to expand this table and generate the given number of rows for each empid. Here I'm using the Numbers table for it:

,CTE_RowNumbers
AS
(
SELECT
    CTE_MaxRows.empid
    ,Numbers.Number AS rn
FROM
    CTE_MaxRows
    CROSS JOIN @TNumbers AS Numbers
WHERE
    Numbers.Number <= CTE_MaxRows.MaxEmpRows
)

Then we need to add row numbers to all tables with data, which we'll use for joining later:

,CTE_Phones
AS
(
    SELECT
        Phones.empid
        ,ROW_NUMBER() OVER (PARTITION BY Phones.empid ORDER BY phone) AS rn
        ,Phones.phone
    FROM @TPhones AS Phones
)
,CTE_Licenses
AS
(
    SELECT
        Licenses.empid
        ,ROW_NUMBER() OVER (PARTITION BY Licenses.empid ORDER BY license) AS rn
        ,Licenses.license
    FROM @TLicenses AS Licenses
)
,CTE_Emails
AS
(
    SELECT
        Emails.empid
        ,ROW_NUMBER() OVER (PARTITION BY Emails.empid ORDER BY email) AS rn
        ,Emails.email
    FROM @TEmails AS Emails
)

Now we are ready to join all this together. CTE_RowNumbers has exact number of rows that we need, so there is no need for complex FULL JOINs here, simple LEFT JOIN is enough:

,CTE_Data
AS
(
    SELECT
        CTE_RowNumbers.empid
        ,CTE_Phones.phone
        ,CTE_Licenses.license
        ,CTE_Emails.email
    FROM
        CTE_RowNumbers
        LEFT JOIN CTE_Phones ON CTE_Phones.empid = CTE_RowNumbers.empid AND CTE_Phones.rn = CTE_RowNumbers.rn
        LEFT JOIN CTE_Licenses ON CTE_Licenses.empid = CTE_RowNumbers.empid AND CTE_Licenses.rn = CTE_RowNumbers.rn
        LEFT JOIN CTE_Emails ON CTE_Emails.empid = CTE_RowNumbers.empid AND CTE_Emails.rn = CTE_RowNumbers.rn
)

We are almost done. I guess, it is possible that the main table has some empids that don't have any related data (no phones, no liceses, no e-mails), like empid = 4 in my sample data. To get these empids in the result set I'll left join the CTE_Data to the main table:

SELECT
    Main.empid
    ,CTE_Data.phone
    ,CTE_Data.license
    ,CTE_Data.email
FROM
    @TMain AS Main
    LEFT JOIN CTE_Data ON CTE_Data.empid = Main.empid
ORDER BY Main.empid, phone, license, email;

To get the full script just put all code blocks from this post together in the same order as they appear here.

This is the result set:

empid   phone   license email
1   5551234 NULL    NULL
2   5551111 L4455   xxx@abc
2   5552222 NULL    yyy@abc
2   5553333 NULL    zzz@abc
2   5554444 NULL    NULL
3   NULL    L7890   NULL
4   NULL    NULL    NULL

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...