Python Hash not being updated in csv file output

Question

Welcome To Ask or Share your Answers For Others

Python Hash not being updated in csv file output

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

Python Hash not being updated in csv file output

I have working code that takes a directory of csv files and hashes one column of each line, then aggregates all files together. The issue is the output only displays the first hash value, not re-running the hash for each line. Here is the code:

 import glob
 import hashlib

 files = glob.glob( '*.csv' )
 output="combined.csv"

 with open(output, 'w' ) as result:
     for thefile in files:
        f = open(thefile)
        m = f.readlines()
        for line in m[1:]:
            fields = line.split()       
            hash_object = hashlib.md5(b'(fields[2])')
            newline = fields[0],fields[1],hash_object.hexdigest(),fields[3]
            joined_line = ','.join(newline)
            result.write(joined_line+ '
')
  f.close()

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:23:26+0000

You are creating a hash of a fixed bytestring b'(fields[2])'. That value has no relationship to your CSV data, even though it uses the same characters as are used in your row variable name.

You need to pass in bytes from your actual row:

hash_object = hashlib.md5(fields[2].encode('utf8'))

I am assuming your fields[2] column is a string, so you'd need to encoding it first to get bytes. The UTF-8 encoding can handle all codepoints that could possibly be contained in a string.

You also appear to be re-inventing the CSV reading and writing wheel; you probably should use the csv module instead:

 import csv

 # ...

 with open(output, 'w', newline='') as result:
     writer = csv.writer(result)

     for thefile in files:
        with open(thefile, newline='') as f:
            reader = csv.reader(f)
            next(reader, None)  # skip first row
            for fields in reader:
                hash_object = hashlib.md5(fields[2].encode('utf8'))
                newrow = fields[:2] + [hash_object.hexdigest()] + fields[3:]
                writer.writerow(newrow)

Categories

Python Hash not being updated in csv file output

Python Hash not being updated in csv file output

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags