You're actually pretty close. But the code is confusing: specifically the variable names and what actual values they represent. As a result, you appear to be just guessing the math. So let's go back to square one and determine what exactly it is we need to do, and the math will very easily fall out of it.
First, just imagine we have one sample covering each of the five channels. This is called an audio frame for that sample. The frame looks like this:
[channel0][channel1][channel2][channel3][channel4]
The width of a sample in one channel is called byterate
in your code, but I don't like that name. I'm going to call it bytes_per_sample
instead. You can easily see the width of the entire frame is this:
int bytes_per_frame = bytes_per_sample * channel_count;
It should be equally obvious that to find the starting offset for channel c
within a single frame, you multiply as follows:
int sample_offset_in_frame = bytes_per_sample * c;
That's just about all you need! The last bit is your z
loop which covers each byte in a single sample for one channel. I don't know what z
is supposed to represent, apart from being a random single-letter identifier you chose, but hey let's just keep it.
Putting all this together, you get the absolute offset of sample s
in channel c
and then you copy individual bytes out of it:
int sample_offset = bytes_per_frame * s + bytes_per_sample * c;
for (int z = 0; z < bytes_per_sample; ++z) {
audio.push_back(audio_ptr[sample_offset + z]);
}
This does actually assume you're looping over the number of samples, not the number of bytes in your channel. So let's show all the loops for completion sake:
const int bytes_per_sample = bitrate / 8;
const int bytes_per_frame = bytes_per_sample * channel_count;
const int num_samples = audio_size / bytes_per_frame;
for (int c = 0; c < channel_count; ++c)
{
int sample_offset = bytes_per_sample * c;
for (int s = 0; s < num_samples; ++s)
{
for (int z = 0; z < bytes_per_sample; ++z)
{
audio.push_back(audio_ptr[sample_offset + z]);
}
// Skip to next frame
sample_offset += bytes_per_frame;
}
}
You'll see here that I split the math up so that it's doing less multiplications in the loops. This is mostly for readability, but might also help a compiler understand what's happening when it tries to optimize. Concerns over optimization are secondary (and in your case, there are much more expensive worries going on with those vectors and the map)..
The most important thing is you have readable code with reasonable variable names that makes logical sense.