Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
470 views
in Technique[技术] by (71.8m points)

notepad++ - Removing Duplicate lines with random text behind it

I have text like this in Notepad++

Random Text Here:188.0.0.0
Random Text Here:188.0.3.0
Random Text Here:188.2.0.0

However, some of the numbers at the end are duplicated and I am wanting to get rid of them. For example:

Random Text Here:188.0.3.0
Random Different Text Here:188.0.3.0

How would I go about doing that in the mass's as there are thousands of these lines?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In Notepad++ I would try the following multi-step process.

(1) Use a regular expression to change all lines to put the IP address and fixed text at the front from Random Text Here:188.0.0.0 to :188.0.0.0!!!Random Text Here.

(2) Use TextFx to sort the file removing duplicates.

(3) Use a regular expression to find and remove duplicate. This may need multiple passes.

(4) Use a regular expression to put the text back in the right order.

(5) (Optional) sort the file again.

Problems with the above approach:

(a) The "random text" that sorts first for an IP address will be the one that is kept, not the first in the original file.

(b) The result will be ordered by IP address or by the random text depending on whether step (5) is used.

In more detail:

(0) Choose a character or a short string that does not occur in the input file. I will use !!.

(1) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(.*)(:d+.d+.d+.d+)$ to $2!!$1.

(2) Use TextFx to sort the file. Specifying sort unique may be useful to reduce the number of lines.

(3) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(:d+.d+.d+.d+)!!(.*) 1.*$ to $1!!$2. When there are several lines with the same IP address this will remove about half of them. Run the same replacement several times until it reports no changes have been made. You may need to alter the part depending on the line endings in your file

(4) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(:d+.d+.d+.d+)!!(.*)$ to $2$1.

(5) (Optional) sort the file again.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...