Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
403 views
in Technique[技术] by (71.8m points)

r - Extract text after "/" in a data frame column

I have a data frame that has two columns Link and Value. The Link column has values like "abcd.com/efgh/ijkl/mnop" and is a URL. There are 10,000 rows in this frame which i have taken from a sample of 100,000 rows.

Now I want to extract the data after the last "/" from left to right or first "/" from right to left. So for eg in the above sample shown I was to extract "mnop"

I want to do this for all the 10,000 rows that is there in the column Link while the Value column should not be effected.

I was able to to use

a = sapply(webdatatest, substring, 36)

but this is not a dynamic method as positions of last "/" would change. Also this was effecting the second column also.

So need some help on this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try basename(). It

removes all of the path up to and including the last path separator (if any).

basename("abcd.com/efgh/ijkl/mnop")
# [1] "mnop"

It is vectorized, so you can just stick the whole column in there.

basename(rep("abcd.com/efgh/ijkl/mnop", 3))
# [1] "mnop" "mnop" "mnop"

So, to apply this to one column link of a data frame webdata, you can simply do

webdata$link <- basename(webdata$link)

The other obvious function would be sub(), but I think basename() will do the trick and it's easier.

sub(".*/", "", rep("abcd.com/efgh/ijkl/mnop", 3))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...