Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
347 views
in Technique[技术] by (71.8m points)

regex - Deleting words from within R

How do I delete multiple words and empty lines? I am importing a large text file into R using readLines. it is composed of 10,000+ pages. Ultimately I am trying to create a data frame.

Each page has the following two lines which I would like to remove along with the empty line in ONE STEP.

Set reorder point
Plan category: P

Below is an example text file which I named MM:

Set reorder point Plan category: P

Plnt SLoc Material           Description              L.T MRP Stat Auto MatSG PC PN    Freq  Qty    CFreq     CQty Cur.RPt New.RPt CurRepl NewRepl Updt    Cost   ServStock  Unit  OpenMatResb DFStorLocLevel
0231 0002 GB.C152260-00001   ASSY PISTON & SEAL/O-RING 44  PD       X    A    A  A     18     136                     30      29      50      43         24.88      51.000  EA
0231 0002 WH.112734          MOTOR REDUCER, THREE-PHAS 41  PD       X    B    B  A     16      17                      3       3       5       4        483.87       1.000  EA      X
Set reorder point
Plan category: P

Plnt SLoc Material           Description              L.T MRP Stat Auto MatSG PC PN    Freq  Qty    CFreq     CQty Cur.RPt New.RPt CurRepl NewRepl Updt    Cost   ServStock  Unit  OpenMatResb DFStorLocLevel
0231 0002 WH.920569          SPINDLE MOTOR MINI O      22  PD       X    A    A  A     69      85                     15       9      25      13        680.91      21.000  EA
0231 0002 GB.C150583-00001   VALVE-AIR MDI             64  PD       X    A    A  A     16     113                     50      35      80      52         19.96     116.000  EA
Set reorder point
Plan category: P

Plnt SLoc Material           Description              L.T MRP Stat Auto MatSG PC PN    Freq  Qty    CFreq     CQty Cur.RPt New.RPt CurRepl NewRepl Updt    Cost   ServStock  Unit  OpenMatResb DFStorLocLevel
0231 0002 FG.124-0140        BEARING                   32  PD       X    A    A  A     36     205                     35      32      50      48         21.16      55.000  EA
0231 0002 WP.254997          BEARING,BALL .9843 X 2.04 52  PD       X    A    A  A     18     155                     50      39     100      58          2.69     181.000  EA
Set reorder point
Plan category: P

I tried the following code in two steps but it removes one or the other and did not remove the line:

gsub("Set reorder point", "", MM)
gsub("Plan category: P", "", MM)

This deletion needs to be done within R. Many thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As suggested by Ironholds, gsub is the way to go, you can try:

df<-readLines("MM.txt")

df.clean<-gsub("^Set.*$|^Plan.*$","",df)

This will replace lines 1) beginning with 'Set' 2) beginning with 'Plan' with "", an empty line

Now only blank lines remain, which could be deleted with

df.clean<-df.clean[df.clean!=""]

Here, ^ and $ signify beginning and end of line matching indexers.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...