Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
205 views
in Technique[技术] by (71.8m points)

Want VBA in excel to read very large CSV and create output file of a small subset of the CSV

I have a csv file of 1.2 million records of text. The alphanumeric fields are wrapped in quotation marks, the date/time or numeric fields are not.

For example "Fred","Smith",01/07/1967,2,"7, The High Street","Anytown","Anycounty","LS1 7AA"

What I want do is write some VBA in Excel (more or less the only tool available to me that I am reasonably proficient in the use of) that reads the CSV record by record, performs a check (as it happens on the last field, the post code) and then outputs a small subset of the 1.2m records to a new output file.

I understand how to open the two files, read the record, do what I need to do with the data and write it out (I will just output the input record with a prefix denoting an exception type)

What I don't know is how to parse the CSV in VBA properly. I can't do a simple text scan and search for commas as the text sometimes has commas in (hence why the text fields are text delimited)

Is there a fantastic command that would let me quicky get the data from the nth field in my record?

What I want is s_work = field(s_input_record,5) where 5 is the field number in my CSV....

Many thanks, C

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The following code should do the trick. I don't have Excel in front of me, so I haven't tested it, but the concept is sound.

If this ends up being too slow, we can look at ways to improve the efficiency.

Sub SelectSomeRecords()
    Dim testLine As String

    Open inputFileName For Input As #1
    Open outputFileName For Output As #2

    While Not EOF(1)
        Line Input #1, testLine
        If RecordIsInteresting(testLine) Then
            Print #2, testLine
        End If
    Wend

    Close #1
    Close #2
End Sub

Function RecordIsInteresting(recordLine As String) As Boolean
    Dim lineItems(1 to 8) As String

    GetRecordItems(lineItems(), recordLine)

    ''// do your custom checking here:
    RecordIsInteresting = lineItems(8) = "LS1 7AA"
End Function

Sub GetRecordItems(items() As String, recordLine as String)
    Dim finishString as Boolean
    Dim itemString as String
    Dim itemIndex as Integer
    Dim charIndex as Long
    Dim inQuote as Boolean
    Dim testChar as String

    inQuote = False
    charIndex = 1
    itemIndex = 1
    itemString = ""
    finishString = False

    While charIndex <= Len(recordLine)
        testChar = Mid$(recordLine, charIndex, 1)

        finishString = False

        If inQuote Then
            If testChar = Chr$(34) Then
                inQuote = False
                finishString = True
                charIndex = charIndex + 1 ''// ignore the next comma
            Else
                itemString = itemString + testChar
            End If
        Else
            If testChar = Chr$(34) Then
                inQuote = True
            ElseIf testChar = "," Then
                finishString = True
            Else
                itemString = itemString + testChar
            End If
        End If

        If finishString Then
            items(itemIndex) = itemString
            itemString = ""
            itemIndex = itemIndex + 1
        End If

        charIndex = charIndex + 1
    Wend
End Sub

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...