Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
780 views
in Technique[技术] by (71.8m points)

r - Extract "words" from a string

I have a table with 153 rows by 9 columns. My interest is the character string in the first column, I want to extract the fourth word and create a new list from this fourth word, this list will be 153 rows, 1 column.

An example of the first two rows of column 1 of this database table:

[1] Resistance_Test DevID (Ohms) 428
[2] Diode_Test SUBLo (V) 353

"Words" are separated by spaces, so the fourth word of the first row is "428" and the fourth word of the second row is "353". How can I create a new list containing the fourth word of all 153 rows?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use gsub() with a regular expression

x <- c("Resistance_Test DevID (Ohms) 428", "Diode_Test SUBLo (V) 353")
ptn <- "(.*? ){3}"
gsub(ptn, "", x)

[1] "428" "353"

This works because the regular expression (.*? ){3} finds exactly three {3} sets of characters followed by a space (.*? ), and then replaces this with ane empty string.

See ?gsub and ?regexp for more information.


If your data has structure that you don't mention in your question, then possibly the regular expression becomes even easier.

For example, if you are always interested in the last word of each line:

ptn <- "(.*? )"
gsub(ptn, "", x)

Or perhaps you know for sure you can only search for digits and discard everything else:

ptn <- "\D"
gsub(ptn, "", x)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...