The Python 2 docs for filecmp()
say:
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal.
Which sounds like two files which are identical except for their os.stat()
signature will be considered unequal, however this does not seem to be the case, as illustrated by running the following code snippet:
import filecmp
import os
import shutil
import time
with open('test_file_1', 'w') as f:
f.write('file contents')
shutil.copy('test_file_1', 'test_file_2')
time.sleep(5) # pause to get a different time-stamp
os.utime('test_file_2', None) # change copied file's time-stamp
print 'test_file_1:', os.stat('test_file_1')
print 'test_file_2:', os.stat('test_file_2')
print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2')
Output:
test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L,
st_ctime=1320719522L)
test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L,
st_ctime=1320719539L)
filecmp.cmp(): True
As you can see the two files' time stamps — st_atime
, st_mtime
, and st_ctime
— are clearly not the same, yet filecmp.cmp()
indicates that the two are identical. Am I misunderstanding something or is there a bug in either filecmp.cmp()
's implementation or its documentation?
Update
The Python 3 documentation has been rephrased and currently says the following, which IMHO is an improvement only in the sense that it better implies that files with different time stamps might still be considered equal even when shallow
is True.
If shallow is true, files with identical os.stat()
signatures are
taken to be equal. Otherwise, the contents of the files are compared.
FWIW I think it would have been better to simply have said something like this:
If shallow is true, file content is compared only when
os.stat()
signatures are unequal.
See Question&Answers more detail:
os