Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
570 views
in Technique[技术] by (71.8m points)

regex - How to compare and substitute strings in different lines in unix

I want to compare and substitute strings present in different lines in unix

For example I have a file with two words in each line

<a> <b>
<d> <e>
<b> <c>
<c> <e>

If second word of any line matched with first word of any other line then second word of this line should be replaced with second word of matched line and it should iterate until there is no match between second word of the line with first word of another line

I need result like

<a> <e>
<b> <e>
<c> <e>
<d> <e>

I am new to unix and not getting any idea how to implement this. Can any one give suggestions or explain how we can do this

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is VERY clearly a case for a recursive descent solution:

$ cat tst.awk
function descend(node) {return (map[node] in map ? descend(map[node]) : map[node])}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

$ awk -f tst.awk file
<a> <e>
<b> <e>
<c> <e>
<d> <e>

If infinite recursion in your input is a possibility, here;s an approach that will print as the 2nd field the last node before the recursion starts and put a "*" next to it so you know it's happening:

$ cat tst.awk
function descend(node,  child, descendant) {
    stack[node]
    child = map[node]
    if (child in map) {
        if (child in stack) {
            descendant = node "*"
        }
        else {
            descendant = descend(child)
        }
    }
    else {
        descendant = child
    }
    delete stack[node]
    return descendant
}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

.

$ cat file
<w> <w>
<x> <y>
<y> <z>
<z> <x>
<a> <b>
<d> <e>
<b> <c>
<c> <e>

$ awk -f tst.awk file
<w> <w>*
<x> <z>*
<y> <x>*
<z> <y>*
<a> <e>
<b> <e>
<c> <e>
<d> <e>

If you need the output order to match the input order and/or or to print duplicate lines twice, change the bottom 2 lines of the script to:

{ keys[++numKeys] = $1; map[$1] = $2 }
END {
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        print key, descend(key)
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...