I am dealing with a text file and each record in the file is separated by blank line. I want to extract the records which meats certain criteria.
For example, my text file looks like this
#EVM predictionEVM prediction: Mode:STANDARD S-ratio: 2.52 11043-11477 orient(-) score(1246.00)
11477 11043 single- 4 6 {SNAP_model.scaffold6_size143996-snap.2;SNAP
#EVM prediction: Mode:STANDARD S-ratio: 1.00 20968-21183 orient(+) score(432.00)
20968 21183 single+ 1 3 {GeneID_mRNA_scaffold6_size143996_6;GeneID}
#EVM prediction: Mode:STANDARD S-ratio: 1.00 21940-22362 orient(-) score(846.00)
22362 21940 single- 4 6 {GeneID_mRNA_scaffold6_size143996_7;GeneID}
#EVM prediction: Mode:STANDARD S-ratio: 12.32 33363-34677 orient(+) score(21500.00)
33363 33495 initial+ 1 1 {SNAP_model.scaffold6_size143996-snap.3;SNAP},{GeneID_mRNA_scaffold6_size143996_10;GeneID},{Augustus_model.g38.t1;Augustus}
33496 33611 INTRON {SNAP_model.scaffold6_size143996-snap.3;SNAP},{GeneID_mRNA_scaffold6_size143996_10;GeneID},{Augustus_model.g38.t1;Augustus},{ev_type:GeMoMa/ID=model.scaffold6_size143996.rna-XM_007036272.2_R0;GeMoMa}
33612 33741 internal+ 2 2 {SNAP_model.scaffold6_size143996-snap.3;SNAP},{GeneID_mRNA_scaffold6_size143996_10;GeneID},{Augustus_model.g38.t1;Augustus},{ev_type:GeMoMa/ID=model.scaffold6_size143996.rna-XM_007036272.2_R0;GeMoMa}
33742 33842 INTRON {SNAP_model.scaffold6_size143996-snap.3;SNAP},{GeneID_mRNA_scaffold6_size143996_10;GeneID},{Augustus_model.g38.t1;Augustus},{ev_type:GeMoMa/ID=model.scaffold6_size143996.rna-XM_007036272.2_R0;GeMoMa}
33843 34677 terminal+ 3 3 {SNAP_model.scaffold6_size143996-snap.3;SNAP},{GeneID_mRNA_scaffold6_size143996_10;GeneID},{Augustus_model.g38.t1;Augustus}
#EVM prediction: Mode:STANDARD S-ratio: 2.41 46394-48564 orient(-) score(9677.00) noncoding_equivalent(4012.03) raw_noncoding(7194.39) offset(3182.36)
46879 46394 terminal- 4 6 {GeneID_mRNA_scaffold6_size143996_13;GeneID}
47512 46880 INTRON {GeneID_mRNA_scaffold6_size143996_13;GeneID}
48256 47513 internal- 4 6 {GeneID_mRNA_scaffold6_size143996_13;GeneID}
48366 48257 INTRON {Augustus_model.g41.t1;Augustus}
48429 48367 internal- 4 6 {Augustus_model.g41.t1;Augustus}
48510 48430 INTRON {Augustus_model.g41.t1;Augustus}
48564 48511 initial- 4 6 {Augustus_model.g41.t1;Augustus}
Now, I want to extract the records with score greater 1000. I want to remove second and third record which has sccore-432 score(432.00)
and score-846 score(846.00)
I have written awk code
awk -F '[()]' '{if ($4 > 1000) print $0}' input.out
but it is giving only first line as output. i.e
#EVM predictionEVM prediction: Mode:STANDARD S-ratio: 2.52 11043-11477 orient(-) score(1246.00)
#EVM prediction: Mode:STANDARD S-ratio: 12.32 33363-34677 orient(+) score(21500.00)
#EVM prediction: Mode:STANDARD S-ratio: 2.41 46394-48564 orient(-) score(9677.00) noncoding_equivalent(4012.03) raw_noncoding(7194.39) offset(3182.36)
But I want to extract complete record corresponding to the score greater than 1000.
Please help to extract complete record
question from:
https://stackoverflow.com/questions/65662237/filtering-text-file-based-on-values-in-some-lines-using-awk 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…