Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
425 views
in Technique[技术] by (71.8m points)

r - Remove text inside brackets, parens, and/or braces

I am in need of a function that extracts any type of bracket ie (), [], {} and the information in between. I created it and get it to do what I want but I get an annoying warning that I don't really know what it means. I want the annoying warning to go away either by fixing what's wrong with my code or suppressing the warning. I attempted this with suppressWarnings() but it didn't work because I don't think I used it correctly.

This function uses regmatches and requires R version 2.14 or higher

Here's the function below and an example to reproduce the warning. Thank you for the help.

################
# THE FUNCTION #
################
bracketXtract <- function(text, bracket = "all", include.bracket = TRUE) {

    bracketExtract <- if (include.bracket == FALSE) {
        function(Text, bracket) {
                  switch(bracket, 
                        square = lapply(Text, function(j) gsub("[\[\]]", "", 
                                 regmatches(j, gregexpr("\[.*?\]", j))[[1]], 
                                 perl = TRUE)), 
                        round =  lapply(Text, function(j) gsub("[\(\)]", "", 
                                 regmatches(j, gregexpr("\(.*?\)", j))[[1]])), 
                        curly =  lapply(Text, function(j) gsub("[\{\}]", "", 
                                 regmatches(j, gregexpr("\{.*?\}", j))[[1]])), 
                        all =    { P1 <- lapply(Text, function(j) gsub("[\[\]]", "", 
                                         regmatches(j, gregexpr("\[.*?\]", j))[[1]], 
                                         perl = TRUE))
                                   P2 <- lapply(Text, function(j) gsub("[\(\)]", "", 
                                         regmatches(j, gregexpr("\(.*?\)", j))[[1]]))
                                   P3 <- lapply(Text, function(j) gsub("[\{\}]", "", 
                                         regmatches(j, gregexpr("\{.*?\}", j))[[1]]))
                    apply(cbind(P1, P2, P3), 1, function(x) rbind(as.vector(unlist(x))))
                })
        }
    } else {
        function(Text, bracket) {
                  switch(bracket, 
                         square = lapply(Text, function(j) regmatches(j, 
                                  gregexpr("\[.*?\]", j))[[1]]), 
                         round =  lapply(Text, function(j) regmatches(j, 
                                  gregexpr("\(.*?\)", j))[[1]]), 
                         curly =  lapply(Text, function(j) regmatches(j, 
                                  gregexpr("\{.*?\}", j))[[1]]), 
                         all =    { P1 <- lapply(Text, function(j) regmatches(j, 
                                          gregexpr("\[.*?\]", j))[[1]])
                                    P2 <- lapply(Text, function(j) regmatches(j, 
                                          gregexpr("\(.*?\)", j))[[1]])
                                    P3 <- lapply(Text, function(j) regmatches(j, 
                                          gregexpr("\{.*?\}", j))[[1]])
                apply(cbind(P1, P2, P3), 1, function(x) rbind(as.vector(unlist(x))))
            })
        }
    }
    if (length(text) == 1) {
        unlist(lapply(text, function(x) bracketExtract(Text = text,
            bracket = bracket)))
    } else {
        sapply(text, function(x) bracketExtract(Text = text, 
            bracket = bracket))
    }
} 

##################
# TESTING IT OUT #
##################
j <- "What kind of cheese isn't your cheese? {wonder} Nacho cheese! [groan] (Laugh)"                                                          
bracketXtract(j, 'round')
bracketXtract(j, 'round', include.bracket = FALSE)

examp2<-data.frame(var1=1:4)                                                                                                                               
examp2$text<-as.character(c("I love chicken [unintelligible]!", "Me too! (laughter) It's so good.[interupting]", 
             "Yep it's awesome {reading}.", "Agreed."))

#=================================#
# HERE"S WHERE THE WARNINGS COME: #
#=================================#                                                                                                                                                            
examp2$text2<-bracketXtract(examp2$text, 'round')                                                                                                  
   examp2
examp2$text2<-bracketXtract(examp2$text, 'all')                                                                                                  
   examp2
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Maybe this function is a little more straight-forward? Or at least more compact.

bracketXtract <-
    function(txt, br = c("(", "[", "{", "all"), with=FALSE)
{
    br <- match.arg(br)
    left <-        # what pattern are we looking for on the left?
        if ("all" == br) "\(|\{|\["
        else sprintf("\%s", br)
    map <-         # what's the corresponding pattern on the right?
        c(`\(`="\)", `\[`="\]", `\{`="\}",
          `\(|\{|\[`="\)|\}|\]")
    fmt <-         # create the appropriate regular expression
        if (with) "(%s).*?(%s)"
        else "(?<=%s).*?(?=%s)"
    re <- sprintf(fmt, left, map[left])
    regmatches(txt, gregexpr(re, txt, perl=TRUE))    # do it!
}

No need to lapply; the regular expression functions are vectorized in that way. This fails with nested parentheses; likely regular expressions won't be a good solution if that's important. Here we are in action:

> txt <- c("I love chicken [unintelligible]!",
+          "Me too! (laughter) It's so good.[interupting]",
+          "Yep it's awesome {reading}.",
+          "Agreed.")
> bracketXtract(txt, "all")
[[1]]
[1] "unintelligible"

[[2]]
[1] "laughter"    "interupting"

[[3]]
[1] "reading"

[[4]]
character(0)

This fits without trouble into a data.frame.

> examp2 <- data.frame(var1=1:4)
> examp2$text <- c("I love chicken [unintelligible]!",
+                  "Me too! (laughter) It's so good.[interupting]",
+                  "Yep it's awesome {reading}.", "Agreed.")
> examp2$text2<-bracketXtract(examp2$text, 'all')
> examp2
  var1                                          text                 text2
1    1              I love chicken [unintelligible]!        unintelligible
2    2 Me too! (laughter) It's so good.[interupting] laughter, interupting
3    3                   Yep it's awesome {reading}.               reading
4    4                                       Agreed.                      

The warning you were seeing has to do with trying to stick a matrix into a data frame. I think the answer is "don't do that".

> df = data.frame(x=1:2)
> df$y = matrix(list(), 2, 2)
> df
  x    y
1 1 NULL
2 2 NULL
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...