EDITED FOR CLARIFICATION & SPECIFICITY
I know this is a tough one, but I thought I'd ask anyway...
I'm using grep or egrep "grep-E" (with extended regex capability). I was also told that Strings could be used and may help with this effort, but I haven't fully explored that option yet...
Input file: is a binary file so it contains all kind of junk
Desired Output: strings that meet all of these conditions:
Return ONLY strings with (8-24 readable characters), exclude white spaces " ", as they are are delimiters (separators) of strings in the input file.
ONLY the following characters can makeup a string and are allowed ANYWHERE (beginning, end, middle) in a string:
"0-9" "a-z" "A-Z" ! # $ % ^ & ( ) @ ~ " ' ] ? [ * + ; , =
- The following characters are NOT allowed in a stream:
/ . | : < > except the dot '.' it can ONLY be at the beginning or at the end of the string, but NOT in the middle. BUT I have removed it completely form the regex, b/c I don't know the syntax of specifying taht it can only be at the end or beginning of a stream. and if I include the dot in the dot, it returns tons of "false strings" "junk"
- No stream should contain 3 or more repeated back-to-back characters i.e strings that have 3 or more repeated (back2back) chars should be ignored
i.e. aaab^s zY&$$$$[[[[[[777th, or ((((%%_+++------ should be ignored.
- All non-readable characters should be ignored is acceptable in a stream.
i.e. subscripts 1q n× ÷ ± D à ?? ? è á ? ù ? ? ò etc...
I've tested some of your suggestions and so far, this regex does about 90% of the job.
(?!(.)1{3})[0-9a-zA-Z!#$%^&()@~"'*-+][;,=]{8,24}
but only when tested on dubdubdubrubular.com or dubdubdub.gethifi.com/tools/regex For some reason, grep is chocking on it!!!
for your reference, I'm including a sample of the binary file in question:
Sample:
http://pastebin.com/wY6a0Uir
Note: if you test the sample on http://www.gethifi.com/tools/regex you'll see that returned line #21 for example should not have been returned.
Hope this clarifies the question a bit, and not confuse it more :)
Cheers!
See Question&Answers more detail:
os