Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
417 views
in Technique[技术] by (71.8m points)

regex - Search for three words written in different "shapes" using grep

I have a text file with the following contents:

**gvo??a gvozda gvozdja
гвож?а

It’s four words, but each means one thing: iron.

The "d", "dj", "?", "?" are four letters indicating a one "phone".

I am using the following grep formula to search for these three words:

grep 's*[gг][vв]o[?жz](dj|[d??])as*' filename

This grep command gives no output at all. Why? It should gives all these words in the file:

gvo??a
gvozda
gvozdja
гвож?а
question from:https://stackoverflow.com/questions/65885288/search-for-three-words-written-in-different-shapes-using-grep

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem occurs due to the fact that your pattern does not match Cyrillic о and а, and because you use a POSIX ERE pattern without the -E option.

You can use

grep -Eo '[gг][vв][oо][?жz](dj|[d??])[aа]' filename

Using s* does not actually make sense as it only matches zero or more whitespace chars (only in GNU grep).

I added -o option here to output all matches, not just matched lines.

See the online grep demo.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...