Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
179 views
in Technique[技术] by (71.8m points)

python - Using re.sub to manipulate pandas df column with stored constraints **V HELPFUL**

This problem took me a while to figure out, and there were only bits a pieces on stack overflow so I thought I'd share in case anyone is running into this problem.

Goal:

  • 1- Change string in entire PD DF Column based on dictionary stored constraints
  • 2- Avoid replacing substrings that take place within other words
  • 3- Do so on the entire pandas df column without looping through O(n)

Example Input.

Expected answer

  • '107 AVELRIGHT AV' -> '107 AVELRIGHT Avenue
  • 1084 LAMONTDRO LA' -> 1084 LAMONTDRO Lane

Without Getting

  • '107 AVELRIGHT AV' -> '107 AvenueELRIGHT Avenue
  • 1084 LAMONTDRO LA' -> 1084 LaneMONTDriveO Lane
question from:https://stackoverflow.com/questions/65931181/using-re-sub-to-manipulate-pandas-df-column-with-stored-constraints-v-helpful

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
#here is the dictionary constraint
add_map = dict([
    ("AV", "Avenue"),
    ("BV", "Boulevard"),
    ("BP", "Bypass"), 
    ("BY", "Bypass"),
    ("CL", "Circle"),
    ("DR", "Drive"),
    ("LA", "Lane"),
    ("PY", "Parkway"),
    ("RD", "Road"),
    ("ST", "Street"),
    ("WY", "Way"),
    ("TR", "Trail"),
    
      
])

You can have as many constraints as you'd like. O(j) (where j = # dict constraints) does not get written into pandas df with solution below. Look for **~~~ below.

data_909['Address_new'] = data_909['Address'] #make a new df column 
#print(data_909['Address_new']) #print it!

obj = data_909['Address'].copy() #make a copy cuz why not. I just kept this here cuz I'm lazy.
for k,v in add_map.items(): #iterate through the dictionary constraints 
    rule1 = (r"()(%s)()" % k) **solves problem 2, lookup  regex**
    rule2 = (lambda m: add_map.get(m.group(), m.group())) #no idea wtf this does. found online but it works so I'm keeping it
    obj = obj.str.replace(rule1, rule2, regex=True, flags=re.IGNORECASE) ~~~flags=re.IGN... prevents O(size dictionary constraints) from being written out
data_909['Address_n'] = obj #save it



Hope this helps anyone with a similar problem. Cheers

Sample CSV:

          Date      Time        Address
2   2015-09-02  13:43:19  241 WARREN ST
11  2015-09-03  10:33:28   29 DEVLIN AV
12  2015-09-03  11:23:11  247 LARREN ST
13  2015-09-03  11:30:55     5 ASTER LA
25  2015-09-04  04:50:52    54 W 5th ST
32  2015-09-04  14:29:47    4 STRUCE RD


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...