Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
125 views
in Technique[技术] by (71.8m points)

How do I use audio sample data from Java Sound?

This question is usually asked as a part of another question but it turns out that the answer is long. I've decided to answer it here so I can link to it elsewhere.

Although I'm not aware of a way that Java can produce audio samples for us at this time, if that changes in the future, this can be a place for it. I know that JavaFX has some stuff like this, for example AudioSpectrumListener, but still not a way to access samples directly.


I'm using javax.sound.sampled for playback and/or recording but I'd like to do something with the audio.

Perhaps I'd like to display it visually or process it in some way.

How do I access audio sample data to do that with Java Sound?

See also:

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Well, the simplest answer is that at the moment Java can't produce sample data for the programmer.

This quote is from the official tutorial:

There are two ways to apply signal processing:

  • You can use any processing supported by the mixer or its component lines, by querying for Control objects and then setting the controls as the user desires. Typical controls supported by mixers and lines include gain, pan, and reverberation controls.

  • If the kind of processing you need isn't provided by the mixer or its lines, your program can operate directly on the audio bytes, manipulating them as desired.

This page discusses the first technique in greater detail, because there is no special API for the second technique.

Playback with javax.sound.sampled largely acts as a bridge between the file and the audio device. The bytes are read in from the file and sent off.

Don't assume the bytes are meaningful audio samples! Unless you happen to have an 8-bit AIFF file, they aren't. (On the other hand, if the samples are definitely 8-bit signed, you can do arithmetic with them. Using 8-bit is one way to avoid the complexity described here, if you're just playing around.)

So instead, I'll enumerate the types of AudioFormat.Encoding and describe how to decode them yourself. This answer will not cover how to encode them, but it's included in the complete code example at the bottom. Encoding is mostly just the decoding process in reverse.

This is a long answer but I wanted to give a thorough overview.


A Little About Digital Audio

Generally when digital audio is explained, we're referring to Linear Pulse-Code Modulation (LPCM).

A continuous sound wave is sampled at regular intervals and the amplitudes are quantized to integers of some scale.

Shown here is a sine wave sampled and quantized to 4-bit:

lpcm_graph

(Notice that the most positive value in two's complement representation is 1 less than the most negative value. This is a minor detail to be aware of. For example if you're clipping audio and forget this, the positive clips will overflow.)

When we have audio on the computer, we have an array of these samples. A sample array is what we want to turn the byte array in to.

To decode PCM samples, we don't care much about the sample rate or number of channels, so I won't be saying much about them here. Channels are usually interleaved, so that if we had an array of them, they'd be stored like this:

Index 0: Sample 0 (Left Channel)
Index 1: Sample 0 (Right Channel)
Index 2: Sample 1 (Left Channel)
Index 3: Sample 1 (Right Channel)
Index 4: Sample 2 (Left Channel)
Index 5: Sample 2 (Right Channel)
...

In other words, for stereo, the samples in the array just alternate between left and right.


Some Assumptions

All of the code examples will assume the following declarations:

  • byte[] bytes; The byte array, read from the AudioInputStream.
  • float[] samples; The output sample array that we're going to fill.
  • float sample; The sample we're currently working on.
  • long temp; An interim value used for general manipulation.
  • int i; The position in the byte array where the current sample's data starts.

We'll normalize all of the samples in our float[] array to the range of -1f <= sample <= 1f. All of the floating-point audio I've seen comes this way and it's pretty convenient.

If our source audio doesn't already come like that (as is for e.g. integer samples), we can normalize them ourselves using the following:

sample = sample / fullScale(bitsPerSample);

Where fullScale is 2bitsPerSample - 1, i.e. Math.pow(2, bitsPerSample-1).


How do I coerce the byte array in to meaningful data?

The byte array contains the sample frames split up and all in a line. This is actually very straight-forward except for something called endianness, which is the ordering of the bytes in each sample packet.

Here's a diagram. This sample (packed in to a byte array) holds the decimal value 9999:

  24-bit sample as big-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00000000     00100111     00001111

 24-bit sample as little-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00001111     00100111     00000000

They hold the same binary values; however, the byte orders are reversed.

  • In big-endian, the more significant bytes come before the less significant bytes.
  • In little-endian, the less significant bytes come before the more significant bytes.

WAV files are stored in little-endian order and AIFF files are stored in big-endian order. Endianness can be obtained from AudioFormat.isBigEndian.

To concatenate the bytes and put them in to our long temp variable, we:

  1. Bitwise AND each byte with the mask 0xFF (which is 0b1111_1111) to avoid sign-extension when the byte is automatically promoted. (char, byte and short are promoted to int when arithmetic is performed on them.) See also What does value & 0xff do in Java?
  2. Bit shift each byte in to position.
  3. Bitwise OR the bytes together.

Here's a 24-bit example:

long temp;
if (isBigEndian) {
    temp = (
          ((bytes[i    ] & 0xffL) << 16)
        | ((bytes[i + 1] & 0xffL) <<  8)
        |  (bytes[i + 2] & 0xffL)
    );
} else {
    temp = (
           (bytes[i    ] & 0xffL)
        | ((bytes[i + 1] & 0xffL) <<  8)
        | ((bytes[i + 2] & 0xffL) << 16)
    );
}

Notice that the shift order is reversed based on endianness.

This can also be generalized to a loop, which can be seen in the full code at the bottom of this answer. (See the unpackAnyBit and packAnyBit methods.)

Now that we have the bytes concatenated together, we can take a few more steps to turn them in to a sample. The next steps depend on the actual encoding.

How do I decode Encoding.PCM_SIGNED?

The two's complement sign must be extended. This means that if the most significant bit (MSB) is set to 1, we fill all the bits above it with 1s. The arithmetic right-shift (>>) will do the filling for us automatically if the sign bit is set, so I usually do it this way:

int bitsToExtend = Long.SIZE - bitsPerSample;
float sample = (temp << bitsToExtend) >> bitsToExtend.

(Where Long.SIZE is 64. If our temp variable wasn't a long, we'd use something else. If we used e.g. int temp instead, we'd use 32.)

To understand how this works, here's a diagram of sign-extending 8-bit to 16-bit:

 11111111 is the byte value -1, but the upper bits of the short are 0.
 Shift the byte's MSB in to the MSB position of the short.

 0000 0000 1111 1111
 <<                8
 ───────────────────
 1111 1111 0000 0000

 Shift it back and the right-shift fills all the upper bits with 1s.
 We now have the short value of -1.

 1111 1111 0000 0000
 >>                8
 ───────────────────
 1111 1111 1111 1111

Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.

Then normalize the sample, as described in Some Assumptions.

You might not need to write explicit sign-extension if your code is simple

Java does sign-extension automatically when converting from one integral type to a larger type, for example byte to int. If you know that your input and output format are always signed, you can use the automatic sign-extension while concatenating bytes in the earlier step.

Recall from the section above (How do I coerce the byte array in to meaningful data?) that we used b & 0xFF to prevent sign-extension from occurring. If you just remove the & 0xFF from the highest byte, sign-extension will happen automatically.

For example, the following decodes signed, big-endian, 16-bit samples:

for (int i = 0; i < bytes.length; i++) {
    int sample = (bytes[i] << 8) // high byte is sign-extended
               | (bytes[i + 1] & 0xFF); // low byte is not
    // ...
}

How do I decode Encoding.PCM_UNSIGNED?

We turn it in to a signed number. Unsigned samples are simply offset so that, for example:

  • An unsigned value of 0 corresponds to the most negative signed value.
  • An unsigned value of 2bitsPerSample - 1 corresponds to the signed value of 0.
  • An unsigned value of 2bitsPerSample corresponds to the most positive signed value.

So this turns out to be pretty simple. Just subtract the offset:

float sample = temp - fullScale(bitsPerSample);

Then normalize the sample, as described in Some Assumptions.

How do I decode Encoding.PCM_FLOAT?

This is new since Java 7.

In practice, floating-point PCM is typically either IEEE 32-bit or IEEE 64-bit and already normalized to the range of ±1.0. The samples can be obtained with the utility methods Float#intBitsToFloat and Double#longBitsToDouble.

// IEEE 32-bit
float sample = Float.intBitsToFloat((int) temp);
// IEEE 64-bit
double sampleAsDouble = Double.longBitsToDo

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...