On my system, g++ test.cpp -march=native -O2 -c -o test
This will output for the normal version (loop body extract):
30: c5 f9 57 c0 vxorpd %xmm0,%xmm0,%xmm0
34: c5 fb 2a c0 vcvtsi2sd %eax,%xmm0,%xmm0
38: c4 e2 f1 99 c2 vfmadd132sd %xmm2,%xmm1,%xmm0
3d: c5 fb 11 04 c2 vmovsd %xmm0,(%rdx,%rax,8)
42: 48 83 c0 01 add $0x1,%rax
46: 48 39 c8 cmp %rcx,%rax
49: 75 e5 jne 30 <_Z11ProcessAutoii+0x30>
And for the intrinsics version:
88: c5 f9 57 c0 vxorpd %xmm0,%xmm0,%xmm0
8c: 8d 50 01 lea 0x1(%rax),%edx
8f: c5 f1 57 c9 vxorpd %xmm1,%xmm1,%xmm1
93: c5 fb 2a c0 vcvtsi2sd %eax,%xmm0,%xmm0
97: c5 f3 2a ca vcvtsi2sd %edx,%xmm1,%xmm1
9b: c5 f9 14 c1 vunpcklpd %xmm1,%xmm0,%xmm0
9f: c4 e2 e9 98 c3 vfmadd132pd %xmm3,%xmm2,%xmm0
a4: c5 f8 29 04 c1 vmovaps %xmm0,(%rcx,%rax,8)
a9: 48 83 c0 02 add $0x2,%rax
ad: 48 39 f0 cmp %rsi,%rax
b0: 75 d6 jne 88 <_Z11ProcessSSE2ii+0x38>
So in short: the compiler automatically generates AVX code from the C version.
Edit after playing a bit more with flags to have SSE2 only in both cases:
g++ test.cpp -msse2 -O2 -c -o test
The compiler still does something different from what you generate with intrinsics. Compiler version:
30: 66 0f ef c0 pxor %xmm0,%xmm0
34: f2 0f 2a c0 cvtsi2sd %eax,%xmm0
38: f2 0f 59 c2 mulsd %xmm2,%xmm0
3c: f2 0f 58 c1 addsd %xmm1,%xmm0
40: f2 0f 11 04 c2 movsd %xmm0,(%rdx,%rax,8)
45: 48 83 c0 01 add $0x1,%rax
49: 48 39 c8 cmp %rcx,%rax
4c: 75 e2 jne 30 <_Z11ProcessAutoii+0x30>
Intrinsics version:
88: 66 0f ef c0 pxor %xmm0,%xmm0
8c: 8d 50 01 lea 0x1(%rax),%edx
8f: 66 0f ef c9 pxor %xmm1,%xmm1
93: f2 0f 2a c0 cvtsi2sd %eax,%xmm0
97: f2 0f 2a ca cvtsi2sd %edx,%xmm1
9b: 66 0f 14 c1 unpcklpd %xmm1,%xmm0
9f: 66 0f 59 c3 mulpd %xmm3,%xmm0
a3: 66 0f 58 c2 addpd %xmm2,%xmm0
a7: 0f 29 04 c1 movaps %xmm0,(%rcx,%rax,8)
ab: 48 83 c0 02 add $0x2,%rax
af: 48 39 f0 cmp %rsi,%rax
b2: 75 d4 jne 88 <_Z11ProcessSSE2ii+0x38>
Compiler does not unroll the loop here. It might be better or worse depending on many things. You might want to bench both versions.