I know there have been some questions regarding file reading, binary data handling and integer conversion using struct
before, so I come here to ask about a piece of code I have that I think is taking too much time to run. The file being read is a multichannel datasample recording (short integers), with intercalated intervals of data (hence the nested for
statements). The code is as follows:
# channel_content is a dictionary, channel_content[channel]['nsamples'] is a string
for rec in xrange(number_of_intervals)):
for channel in channel_names:
channel_content[channel]['recording'].extend(
[struct.unpack( "h", f.read(2))[0]
for iteration in xrange(int(channel_content[channel]['nsamples']))])
With this code, I get 2.2 seconds per megabyte read with a dual-core with 2Mb RAM, and my files typically have 20+ Mb, which gives some very annoying delay (specially considering another benchmark shareware program I am trying to mirror loads the file WAY faster).
What I would like to know:
- If there is some violation of "good practice": bad-arranged loops, repetitive operations that take longer than necessary, use of inefficient container types (dictionaries?), etc.
- If this reading speed is normal, or normal to Python, and if reading speed
- If creating a C++ compiled extension would be likely to improve performance, and if it would be a recommended approach.
- (of course) If anyone suggests some modification to this code, preferrably based on previous experience with similar operations.
Thanks for reading
(I have already posted a few questions about this job of mine, I hope they are all conceptually unrelated, and I also hope not being too repetitive.)
Edit: channel_names
is a list, so I made the correction suggested by @eumiro (remove typoed brackets)
Edit: I am currently going with Sebastian's suggestion of using array
with fromfile()
method, and will soon put the final code here. Besides, every contibution has been very useful to me, and I very gladly thank everyone who kindly answered.
Final Form after going with array.fromfile()
once, and then alternately extending one array for each channel via slicing the big array:
fullsamples = array('h')
fullsamples.fromfile(f, os.path.getsize(f.filename)/fullsamples.itemsize - f.tell())
position = 0
for rec in xrange(int(self.header['nrecs'])):
for channel in self.channel_labels:
samples = int(self.channel_content[channel]['nsamples'])
self.channel_content[channel]['recording'].extend(
fullsamples[position:position+samples])
position += samples
The speed improvement was very impressive over reading the file a bit at a time, or using struct
in any form.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…