Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
185 views
in Technique[技术] by (71.8m points)

excel - I have 55,000 rows of rental data to clean, the data is inconsistent and contains text and numbers

I have 55,000 rows of rental data, in one of the columns is the rent prices per week.

This column is not sanitized which has led to a nightmarish combination of numbers, text and phone numbers contained within these fields.

I am trying to extract just the weekly rental figure or return a blank cell if none is provided.

Code current employed below, which copies the column to another column and runs a series of find and replaces against this data. This is leading to about 75%/80% being perfect and the remainder being almost entirely useless (data is either butchered like the below image or contains items like $400$500 which still needs to be cleaned)

filter list

The above data contained the following text as an example of how its butchered:

List

Sub A_Core_Clean_Level_1()

'Application.ScreenUpdating = False
Rows("1:1").AutoFilter

Sheets("Rental Data").Range("I1").EntireColumn.Copy Range("R1").EntireColumn

Columns("R:R").Replace What:="1oo", Replacement:="100", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="2oo", Replacement:="200", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="3oo", Replacement:="300", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="4oo", Replacement:="400", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="5oo", Replacement:="500", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="6oo", Replacement:="600", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="7oo", Replacement:="700", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="9oo", Replacement:="800", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="9oo", Replacement:="900", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:="1 day", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="2 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="3 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="4 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="5 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="6 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="7 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="8 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="9 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="10 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:=".00*", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="-", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="$ ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" $", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:=" 1 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 2 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 3 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 4 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 5 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 6 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 7 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 8 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 9 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 10 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 11 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 12 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 13 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 14 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 15 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 16 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 17 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 18 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 19 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 20 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 21 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 22 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 23 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 24 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 25 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:="a", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="b", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="c", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="d", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="e", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="f", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="g", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="h", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="i", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="j", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="k", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="l", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="m", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="n", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="o", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="p", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="q", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="r", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="s", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="t", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="u", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="v", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="w", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="x", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="y", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="z", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:="..", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=".", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is a job for a User Defined Function (UDF). Here is a prototype.

Function Rental(Cell As Range) As Variant

    Dim CellVal As String           ' Cell's contents
    Dim p       As Integer          ' position of $ in CellVal
    
    CellVal = Cell.Value
    p = InStr(CellVal, "$") + 1
    If p = 1 Then
        Rental = "No $$$"
    Else
        Rental = Val(Mid(CellVal, p))
    End If
End Function

Install the function in a standard code module of your workbook. That's a module you have to insert yourself, by default Module1. None of the existing code sheets can do the job.

Call the function from the worksheet with syntax as shown below, where A2 is the cell in which you have your butchered data. You can copy the formula down in the same way as built-in Excel formulas.

=Rental(A2)

In its present shape the function will look for the $ sign and returns whatever numbers follow it, up till the next non-numeric character or the end of the string. Your system's decimal separator will be deemed numeric but not the thousands separator.

If no $-sign is found the function returns "No $$$". Here the advantage of a UDF comes in. Instead of this you might hitch on another method of looking for the value you want.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...