Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
628 views
in Technique[技术] by (71.8m points)

sql - Efficient query to split a delimited column into a separate table

I have some data that includes a column with delimited data. There are multiple records in the same column essentially:

A0434168.A2367943.A18456972.A0135374.A0080362.A0084546.A0100991.A0064071.A0100858

The values are of variable length, and delimited by periods. I've been attempting to create a lookup table for this data, using a cursor. Due to the volume of data, the cursor is unreasonably slow.

My cursor looks like the following:

DECLARE @ptr nvarchar(160)
DECLARE @aui nvarchar(15)
DECLARE @getmrhier3 CURSOR 

SET @getmrhier3 = CURSOR FOR
    SELECT  cast(ptr as nvarchar(160)),aui
    FROM    mrhier3
    FORWARD_ONLY
OPEN @getmrhier3
FETCH NEXT
    FROM @getmrhier3 INTO @ptr, @aui

WHILE @@FETCH_STATUS = 0
BEGIN
    if(len(@ptr) > 0)
    begin
        if(charindex('.',@ptr) > 0)
        begin
            insert into mrhierlookup(hieraui,aui)
            values      (substring(@ptr,0,charindex('.',@ptr)),@aui)

            update  mrhier3
            set     ptr = substring(@ptr,charindex('.',@ptr)+1,LEN(@ptr))
            where   aui = @aui 
              and   ptr = @ptr
        end
        else
        begin
            insert into mrhierlookup(hieraui,aui)
            values      (@ptr,@aui)

            update  mrhier3
            set     ptr = ''
            where   aui = @aui 
              and   ptr = @ptr
        end
    end
    FETCH NEXT
        FROM @getmrhier3 INTO @ptr, @aui
END

CLOSE       @getmrhier3
DEALLOCATE  @getmrhier3

The current version of the cursor just works on the leading value of the column. All lengths are arbitrary. The column is at most ~150 characters long.

With the current dataset, building the lookup table will likely take days. It will have several million records.

Is there a better way to efficiently (quickly) parse out this data into a separate table for the purpose of performing join operations more quickly?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Create a split function:

CREATE FUNCTION dbo.SplitStrings(@List NVARCHAR(MAX))
RETURNS TABLE
AS
   RETURN ( SELECT Item FROM
       ( SELECT Item = x.i.value('(./text())[1]', 'nvarchar(max)')
         FROM ( SELECT [XML] = CONVERT(XML, '<i>'
         + REPLACE(@List, '.', '</i><i>') + '</i>').query('.')
           ) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y
       WHERE Item IS NOT NULL
   );
GO

Then get rid of all the cursor and looping nonsense and do this:

INSERT dbo.mrhierlookup
(
  heiraui,
  aui
)
SELECT s.Item, m.aui
  FROM dbo.mrhier3 AS m
  CROSS APPLY dbo.SplitStrings(m.ptr) AS s
GROUP BY s.Item, m.aui;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...