I have a big text file like the small example:
small example:
chr1 37091 37122 D00645:305:CCVLRANXX:1:1104:21074:48301 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1104:4580:50451 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1106:13064:5974 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1106:16735:48726 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:2210:5043:83540 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:2204:15744:24410 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:2204:19627:73060 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:2206:8497:68295 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1312:11371:24672 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1312:17050:42431 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1312:12969:62696 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1312:6478:73521 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1312:8402:80222 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1309:19837:15007 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1309:20126:89687 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1310:2838:27860 0 -
chr1 37091 37122 D00645:305:CCVLRANXX:1:1310:7280:85906 0 -
chr1 54832 54863 D00645:305:CCVLRANXX:1:2102:19886:3949 0 -
chr1 74307 74338 D00645:305:CCVLRANXX:1:2203:13233:29983 0 -
chr1 74325 74356 D00645:305:CCVLRANXX:1:1310:7266:92995 0 -
chr1 93529 93560 D00645:305:CCVLRANXX:1:1103:1743:29602 0 +
chr1 93529 93560 D00645:305:CCVLRANXX:1:1101:16098:97354 0 +
I am trying to count the lines with similar 1st, 2nd and 3rd columns and make a new file with 4 columns in which the first 3 columns are similar to the original file but the 4th column is number of times that every row is repeated. for example there 17
rows with chr1 37091 37122
here is the expected output for the above small example:
expected output:
chr1 37091 37122 17
chr1 54832 54863 1
chr1 74307 74338 1
chr1 74325 74356 1
chr1 93529 93560 2
I wrote this code in python but it does not return what I want. do you how to fix it?
infile = open('infile.txt', 'rb')
content = []
for i in infile:
content.append(i.split())
final = []
for j in range(len(content)):
if content[j] == content[j-1]:
final.append(content[j])
with open('outfile.txt','w') as f:
for sublist in final:
for item in sublist:
f.write(item + ' ')
f.write('
')
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…