A highly optimized shader-based approach for performing a nine-hit Gaussian blur was presented by Daniel Rákos. His process uses the underlying interpolation provided by texture filtering in hardware to perform a nine-hit filter using only five texture reads per pass. This is also split into separate horizontal and vertical passes to further reduce the number of texture reads required.
I rolled an implementation of this, tuned for OpenGL ES and the iOS GPUs, into my image processing framework (under the GPUImageFastBlurFilter class). In my tests, it can perform a single blur pass of a 640x480 frame in 2.0 ms on an iPhone 4, which is pretty fast.
I used the following vertex shader:
attribute vec4 position;
attribute vec2 inputTextureCoordinate;
uniform mediump float texelWidthOffset;
uniform mediump float texelHeightOffset;
varying mediump vec2 centerTextureCoordinate;
varying mediump vec2 oneStepLeftTextureCoordinate;
varying mediump vec2 twoStepsLeftTextureCoordinate;
varying mediump vec2 oneStepRightTextureCoordinate;
varying mediump vec2 twoStepsRightTextureCoordinate;
void main()
{
gl_Position = position;
vec2 firstOffset = vec2(1.3846153846 * texelWidthOffset, 1.3846153846 * texelHeightOffset);
vec2 secondOffset = vec2(3.2307692308 * texelWidthOffset, 3.2307692308 * texelHeightOffset);
centerTextureCoordinate = inputTextureCoordinate;
oneStepLeftTextureCoordinate = inputTextureCoordinate - firstOffset;
twoStepsLeftTextureCoordinate = inputTextureCoordinate - secondOffset;
oneStepRightTextureCoordinate = inputTextureCoordinate + firstOffset;
twoStepsRightTextureCoordinate = inputTextureCoordinate + secondOffset;
}
and the following fragment shader:
precision highp float;
uniform sampler2D inputImageTexture;
varying mediump vec2 centerTextureCoordinate;
varying mediump vec2 oneStepLeftTextureCoordinate;
varying mediump vec2 twoStepsLeftTextureCoordinate;
varying mediump vec2 oneStepRightTextureCoordinate;
varying mediump vec2 twoStepsRightTextureCoordinate;
// const float weight[3] = float[]( 0.2270270270, 0.3162162162, 0.0702702703 );
void main()
{
lowp vec3 fragmentColor = texture2D(inputImageTexture, centerTextureCoordinate).rgb * 0.2270270270;
fragmentColor += texture2D(inputImageTexture, oneStepLeftTextureCoordinate).rgb * 0.3162162162;
fragmentColor += texture2D(inputImageTexture, oneStepRightTextureCoordinate).rgb * 0.3162162162;
fragmentColor += texture2D(inputImageTexture, twoStepsLeftTextureCoordinate).rgb * 0.0702702703;
fragmentColor += texture2D(inputImageTexture, twoStepsRightTextureCoordinate).rgb * 0.0702702703;
gl_FragColor = vec4(fragmentColor, 1.0);
}
to perform this. The two passes can be achieved by sending a 0 value for the texelWidthOffset
(for the vertical pass), and then feeding that result into a run where you give a 0 value for the texelHeightOffset
(for the horizontal pass).
I also have some more advanced examples of convolutions in the above-linked framework, including Sobel edge detection.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…