Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

Stata remove entire word from string

I have a string variable where I want to remove certain words, but many other words would be a partial match, which I don't want to remove. I want to remove words, if and only if they are a complete match.

clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end

* I create a copy to compare obefore/after strip
gen strip_words = words

* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
    replace strip_words = subinstr(strip_words, "`w'","", .) 
}

list
     +---------------------------------------------------------------+
     | index                            words            strip_words |
     |---------------------------------------------------------------|
  1. |     1              more mor morph test            e ph test   |
  2. |     2   ten tennis tenner tenth keeper     nis ner th keeper  |
  3. |     3           badder baddy bad other         der dy other   |
     +---------------------------------------------------------------+

I've tried padding some spaces with replace strip_words = " " + strip_words + " ", but then this also removes the spaces separating the other words. My desired output would be

     +-------------------------------------------------------------------------+
     | index                            words                      strip_words |
     |-------------------------------------------------------------------------|
  1. |     1              more mor morph test              more  morph test    |
  2. |     2   ten tennis tenner tenth keeper    tennis tenner tenth keeper    |
  3. |     3           badder baddy bad other           badder baddy  other    |
     +-------------------------------------------------------------------------+
'''

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

See help string functions for subinword().

clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end

* I create a copy to compare obefore/after strip
gen strip_words = words

* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
    replace strip_words = subinword(strip_words, "`w'","", .) 
}

replace strip_words = itrim(strip_words) 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...