Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
254 views
in Technique[技术] by (71.8m points)

dataframe - How to subset a text by 2 sentences in R?

I have the following dataframe:

df = data.frame(Text = c("This is great. A really great place to be. For sure if you wanna solve R issues. Skilled people.", "Good morning. There are very skilled programmers here. They will help sorting this. I am sure.", "SO is great. You can get many things solve. Additional paragraph."), stringsAsFactors = F)

I have used to subset the text into sentences:

library(textshape)

split_sentence(df$Text)

However, I would like to subset the "Text" column every 2 senteces, so to get a list like:

This is great.
A really great place to be.
Good morning.
There are very skilled programmers here. 
SO is great.
You can get many things solve.

Can anyone help me?

Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could split Text into separate rows for every sentence and select only 1st 2 sentences in each row. Using dplyr you can do this as :

library(dplyr)

df %>%
  mutate(row = row_number()) %>%
  tidyr::separate_rows(Text, sep = '\.\s*') %>%
  group_by(row) %>%
  slice(1:2) %>%
  ungroup %>%
  select(-row)

#  Text                                   
#  <chr>                                  
#1 This is great                          
#2 A really great place to be             
#3 Good morning                           
#4 There are very skilled programmers here
#5 SO is great                            
#6 You can get many things solve        

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...