You should be checking all of the header data to see what the actual sizes are. Broadcast Wave Format files will contain an even larger extension subchunk. WAV and AIFF files from Pro Tools have even more extension chunks that are undocumented as well as data after the audio. If you want to be sure where the sample data begins and ends you need to actually look for the data chunk ('data' for WAV files and 'SSND' for AIFF).
As a review, all WAV subchunks conform to the following format:
Subchunk Descriptor (4 bytes)
Subchunk Size (4 byte integer, little endian)
Subchunk Data (size is Subchunk Size)
This is very easy to process. All you need to do is read the descriptor, if it's not the one you are looking for, read the data size and skip ahead to the next. A simple Java routine to do that would look like this:
//
// Quick note for people who don't know Java well:
// 'in.read(...)' returns -1 when the stream reaches
// the end of the file, so 'if (in.read(...) < 0)'
// is checking for the end of file.
//
public static void printWaveDescriptors(File file)
throws IOException {
try (FileInputStream in = new FileInputStream(file)) {
byte[] bytes = new byte[4];
// Read first 4 bytes.
// (Should be RIFF descriptor.)
if (in.read(bytes) < 0) {
return;
}
printDescriptor(bytes);
// First subchunk will always be at byte 12.
// (There is no other dependable constant.)
in.skip(8);
for (;;) {
// Read each chunk descriptor.
if (in.read(bytes) < 0) {
break;
}
printDescriptor(bytes);
// Read chunk length.
if (in.read(bytes) < 0) {
break;
}
// Skip the length of this chunk.
// Next bytes should be another descriptor or EOF.
int length = (
Byte.toUnsignedInt(bytes[0])
| Byte.toUnsignedInt(bytes[1]) << 8
| Byte.toUnsignedInt(bytes[2]) << 16
| Byte.toUnsignedInt(bytes[3]) << 24
);
in.skip(Integer.toUnsignedLong(length));
}
System.out.println("End of file.");
}
}
private static void printDescriptor(byte[] bytes)
throws IOException {
String desc = new String(bytes, "US-ASCII");
System.out.println("Found '" + desc + "' descriptor.");
}
For example here is a random WAV file I had:
Found 'RIFF' descriptor.
Found 'bext' descriptor.
Found 'fmt ' descriptor.
Found 'minf' descriptor.
Found 'elm1' descriptor.
Found 'data' descriptor.
Found 'regn' descriptor.
Found 'ovwf' descriptor.
Found 'umid' descriptor.
End of file.
Notably, here both 'fmt ' and 'data' legitimately appear in between other chunks because Microsoft's RIFF specification says that subchunks can appear in any order. Even some major audio systems that I know of get this wrong and don't account for that.
So if you want to find a certain chunk, loop through the file checking each descriptor until you find the one you're looking for.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…