Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
180 views
in Technique[技术] by (71.8m points)

Pivot table content using awk command

i would like to use awk to pivot the table content, can someone share how will i do this. thanks

table1

FEATURE,TESTER,LICENSE_USED,PROGRAM,AREA
Low,T6712,23,Element01,FT1
High,T7911,54,Element03,FT2
Medium,E8123,48,Element02,FT3
High,F4309,54,Element02,PB1
Low,F4309,23,Element01,PB1
Low,T7911,23,Element04,FT1
High,E8123,54,Element05,FT2
Medium,F4309,48,Element01,PB1

expected output, the Feature becomes column with the values from license used

TESTER,PROGRAM,AREA,High,Low,Medium
E8123,Element02,FT3,0,0,48
E8123,Element05,FT2,54,0,0
F4309,Element01,PB1,0,23,48
F4309,Element02,PB1,54,0,0
T6712,Element01,FT1,0,23,0
T7911,Element03,FT2,54,0,
T7911,Element04,FT1,0,23,0

another output, would it be possible transpose the expected output just like below?:

TESTER,E8123,E8123,F4309,F4309,T6712,T7911,T7911
PROGRAM,Element02,Element05,Element01,Element02,Element01,Element03,Element04
AREA,FT3,FT2,PB1,PB1,FT1,FT2,FT1
High,0,54,0,54,0,54,0
Low,0,0,23,0,23,0,23
Medium,48,0,48,0,0,,0
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's the original input:

FEATURE,TESTER,LICENSE_USED,PROGRAM,AREA
Low,T6712,23,Element01,FT1
High,T7911,54,Element03,FT2
Medium,E8123,48,Element02,FT3
High,F4309,54,Element02,PB1
Low,F4309,23,Element01,PB1
Low,T7911,23,Element04,FT1
High,E8123,54,Element05,FT2
Medium,F4309,48,Element01,PB1

Let's pretty that up so we can see what's going on:

$ tr , '11'  < data.txt | column -tR 3,5
FEATURE  TESTER  LICENSE_USED  PROGRAM    AREA
Low      T6712             23  Element01   FT1
High     T7911             54  Element03   FT2
Medium   E8123             48  Element02   FT3
High     F4309             54  Element02   PB1
Low      F4309             23  Element01   PB1
Low      T7911             23  Element04   FT1
High     E8123             54  Element05   FT2
Medium   F4309             48  Element01   PB1

Here's the expected output:

$ tr , '11'  < expected.txt | column -tR 2,3,4,5,6,7,8
TESTER       E8123      E8123      F4309      F4309      T6712      T7911      T7911
PROGRAM  Element02  Element05  Element01  Element02  Element01  Element03  Element04
AREA           FT3        FT2        PB1        PB1        FT1        FT2        FT1
High             0         54          0         54          0         54          0
Low              0          0         23          0         23          0         23
Medium          48          0         48          0          0          0          0

It appears we want three column headers to be translated to row headers and we want to use Low, Medium, and High as row headers per tester/element/area (sorting by Tester makes this apparent):

$ tr , '11'  < data.txt | column -tR 3 | sort -k2
Medium   E8123             48  Element02  FT3
High     E8123             54  Element05  FT2
Low      F4309             23  Element01  PB1
Medium   F4309             48  Element01  PB1
High     F4309             54  Element02  PB1
Low      T6712             23  Element01  FT1
Low      T7911             23  Element04  FT1
High     T7911             54  Element03  FT2
FEATURE  TESTER  LICENSE_USED  PROGRAM    AREA

We can easily see that Testers work on different Elements too, so we'll have to account for that:

BEGIN {
    FS=","
}
NR > 1 {
    data[$2,$4,$5] = data[$2,$4,$5] $1 ":" $3 FS
}
END {
    #construct the table
    for (tester_element_area in data) {
        split(tester_element_area, parts, SUBSEP)

        tester  = parts[1]
        element = parts[2]
        area    = parts[3]

        n = split(data[tester_element_area], d)

        template["High"]   = 0
        template["Medium"] = 0
        template["Low"]    = 0

        for (i = 1; i <= n; i++) {
            split(d[i], license, ":")

            degree = license[1]
            value  = license[2]

            template[ degree ] = value
        }

        table["TESTER"]  = table["TESTER"]  FS tester
        table["PROGRAM"] = table["PROGRAM"] FS element
        table["AREA"]    = table["AREA"]    FS area
        table["High"]    = table["High"]    FS template["High"]
        table["Medium"]  = table["Medium"]  FS template["Medium"]
        table["Low"]     = table["Low"]     FS template["Low"]
    }

    #print the table
    header[1] = "TESTER"
    header[2] = "PROGRAM"
    header[3] = "AREA"
    header[4] = "High"
    header[5] = "Low"
    header[6] = "Medium"

    for (i = 1; i <= 6; i++) {
        header_name = header[i]

        printf header_name

        n = split(table[header_name], parts)

        for (j = 1; j <= n; j++) {
            if (j > 1) {
                printf FS
            }
            printf parts[j]
        }
        print ""
    }
}

Let's see what it returns:

$ awk -f prog.awk < data.txt | tr , '11' | column -tR2,3,4,5,6,7,8
TESTER       E8123      T7911      F4309      E8123      T6712      T7911      F4309
PROGRAM  Element05  Element04  Element02  Element02  Element01  Element03  Element01
AREA           FT2        FT1        PB1        FT3        FT1        FT2        PB1
High            54          0         54          0          0         54          0
Low              0         23          0          0         23          0         23
Medium           0          0          0         48          0          0         48

Not too shabby, the columns aren't ordered exactly correct. They should be sorted. If you're willing to use GAWK, it requires a small code change:

END {
        for (tester_element_area in data) {                                     
                cols[++i] = tester_element_area                                 
        }                                                                       

        m = asort(cols)                                                         

        #construct the table                                                    
        for (k = 1; k <= m; k++) {                                              
                tester_element_area = cols[k]
                ...

Output:

$ awk -f prog.awk < data.txt | tr , '11' | column -tR2,3,4,5,6,7,8
TESTER       E8123      E8123      F4309      F4309      T6712      T7911      T7911
PROGRAM  Element02  Element05  Element01  Element02  Element01  Element03  Element04
AREA           FT3        FT2        PB1        PB1        FT1        FT2        FT1
High             0         54          0         54          0         54          0
Low              0          0         23          0         23          0         23
Medium          48          0         48          0          0          0          0

Update: Sort by Area

NR > 1 {
    data[$5,$2,$4] = data[$5,$2,$4] $1 ":" $3 FS
}
END {
        for (tester_element_area in data) {                                     
                cols[++i] = tester_element_area
        }                                                                       

        m = asort(cols)                                                         

        #construct the table                                                    
        for (k = 1; k <= m; k++) {                                              
                tester_element_area = cols[k]
                split(tester_element_area, parts, SUBSEP)

                area     = parts[1]
                tester   = parts[2]
                element  = parts[3]

Output:

TESTER       T6712      T7911      E8123      T7911      E8123      F4309      F4309
PROGRAM  Element01  Element04  Element05  Element03  Element02  Element01  Element02
AREA           FT1        FT1        FT2        FT2        FT3        PB1        PB1
High             0          0         54         54          0          0         54
Low             23         23          0          0          0         23          0
Medium           0          0          0          0         48         48          0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...