I want to use a regex to extract all URLs from text in a dataframe, into a new column. I have some older code that I have used to extract keywords, so I'm looking to adapt the code for a regex. I want to save a regex as a string variable and apply here:
data$ContentURL <- apply(sapply(regex, grepl, data$Content, fixed=FALSE), 1, function(x) paste(selection[x], collapse=','))
It seems that fixed=FALSE
should tell grepl
that its a regular expression, but R doesn't like how I am trying to save the regex as:
regex <- "http.*?1-\d+,\d+"
My data is organized in a data frame like this:
data <- read.table(text='"Content" "date"
1 "a house a home https://www.foo.com" "12/31/2013"
2 "cabin ideas https://www.example.com in the woods" "5/4/2013"
3 "motel is a hotel" "1/4/2013"', header=TRUE)
And would hopefully look like:
Content date ContentURL
1 a house a home https://www.foo.com 12/31/2013 https://www.foo.com
2 cabin ideas https://www.example.com in the woods 5/4/2013 https://www.example.com
3 motel is a hotel 1/4/2013
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…