Your code will only return correct NV21 if there is no padding at all, and U and V plains overlap and actually represent interlaced VU values. This happens quite often for preview, but in such case you allocate extra w*h/4
bytes for your array (which presumably is not a problem). Maybe for captured image you need a more robust implemenation, e.g.
private static byte[] YUV_420_888toNV21(Image image) {
int width = image.getWidth();
int height = image.getHeight();
int ySize = width*height;
int uvSize = width*height/4;
byte[] nv21 = new byte[ySize + uvSize*2];
ByteBuffer yBuffer = image.getPlanes()[0].getBuffer(); // Y
ByteBuffer uBuffer = image.getPlanes()[1].getBuffer(); // U
ByteBuffer vBuffer = image.getPlanes()[2].getBuffer(); // V
int rowStride = image.getPlanes()[0].getRowStride();
assert(image.getPlanes()[0].getPixelStride() == 1);
int pos = 0;
if (rowStride == width) { // likely
yBuffer.get(nv21, 0, ySize);
pos += ySize;
else {
long yBufferPos = -rowStride; // not an actual position
for (; pos<ySize; pos+=width) {
yBufferPos += rowStride;
yBuffer.get(nv21, pos, width);
rowStride = image.getPlanes()[2].getRowStride();
int pixelStride = image.getPlanes()[2].getPixelStride();
assert(rowStride == image.getPlanes()[1].getRowStride());
assert(pixelStride == image.getPlanes()[1].getPixelStride());
if (pixelStride == 2 && rowStride == width && uBuffer.get(0) == vBuffer.get(1)) {
// maybe V an U planes overlap as per NV21, which means vBuffer[1] is alias of uBuffer[0]
byte savePixel = vBuffer.get(1);
try {
vBuffer.put(1, (byte)~savePixel);
if (uBuffer.get(0) == (byte)~savePixel) {
vBuffer.put(1, savePixel);
vBuffer.get(nv21, ySize, 1);
uBuffer.get(nv21, ySize + 1, uBuffer.remaining());
return nv21; // shortcut
catch (ReadOnlyBufferException ex) {
// unfortunately, we cannot check if vBuffer and uBuffer overlap
// unfortunately, the check failed. We must save U and V pixel by pixel
vBuffer.put(1, savePixel);
// other optimizations could check if (pixelStride == 1) or (pixelStride == 2),
// but performance gain would be less significant
for (int row=0; row<height/2; row++) {
for (int col=0; col<width/2; col++) {
int vuPos = col*pixelStride + row*rowStride;
nv21[pos++] = vBuffer.get(vuPos);
nv21[pos++] = uBuffer.get(vuPos);
return nv21;
If you anyway intend to pass the resulting array to C++, you can take advantage of the fact that
the buffer returned will always have isDirect return true, so the underlying data could be mapped as a pointer in JNI without doing any copies with GetDirectBufferAddress.
This means that same conversion may be done in C++ with minimal overhead. In C++, you may even find that the actual pixel arrangement is already NV21!
PS Actually, this can be done in Java, with negligible overhead, see the line if (pixelStride == 2 && …
above. So, we can bulk copy all chroma bytes to the resulting byte array, which is much faster than running the loops, but still slower than what can be achieved for such case in C++. For full implementation, see Image.toByteArray().