Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
279 views
in Technique[技术] by (71.8m points)

c# - SSIS Task for inconsistent column count import?

Problem.

I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.

Furthermore the arrangement of these files are inconsistent.

Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.

Any suggestion, samples or direction will greatly be appreciated.

http://www.cozyroc.com/ssis/data-flow-task

Some forums

http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400

http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Off the top of my head, I have a 50% solution for you.

The problem

SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.

Query based solution

If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.

Variables

I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.

  • InputFolder is hard wired to the source folder. In my example, it's C:ssisdataKipreal
  • CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
  • Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + @[User::CurrentFilename]

variables window

Connection manager

Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + @[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties="text;HDR=Yes;FMT=CSVDelimited;";"

Control Flow

My Control Flow looks like a Data flow task nested in a Foreach file enumerator

control flow

Foreach File Enumerator

My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for @[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"

Foreach File Enumerator - Collection tab

In the Variable Mappings tab, assign the value to our @[User::CurrentFileName] variable

Foreach File Enumerator - Variable Mappings tab

At this point, each iteration of the loop will change the value of the @[User::Query to reflect the current file name.

Data Flow

This is actually the easiest piece. Use an OLE DB source and wire it as indicated.

Data flow

Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the @[User::Query] variable in there, click OK and you're ready to work. oledb file source

Sample data

I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.

col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444

and

col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222

Running the package results in these two screen shots

5 column file 7 column file

What's missing

I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...