You cannot mix up FPU and XMM calculations.
When you calculate something on the FPU you must store it (as @Elderbug said) in memory and then you must load it to a XMM Register to return it on 64bit procs on x64 on a Win OS.
There still can be an advantage of using FPU on 64bit Systems, cause the internal precision of the FPU can be 80bits (if you use the right FPU Control Word: bits 8,9 float32 (24-bit mantissa) = 00b
double float (53-bit mantissa) = 10b
extended precision (64bit mantissa) = 11b
If you want to use the FPU:
fld QWORD PTR x ; laod var to FPU: into ST(0) (MASM Syntax)
fadd ST(0), ST(0) ; this adds [x]+[x]
fstp QWORD PTR x ; store result back in var
movsd xmm0, QWORD PTR x
NOTE: for movsd always SSE2 is required. (On SSE1 machines a GP fault wil occur! See Intel? 64 and IA-32 Architectures Software Developer’s Manual:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
However, if you run Windows8/8.1/10 that is never an issue for you, cause the OS itself requests SSE2 as system requirement.
EDIT: SSE2 is baseline in x86-x64 (as stated by Peter Cordes in the comments), so you can use it always on 64bit.
If you want to use SIMD with XMM registers:
movsd xmm0, QWORD PTR x
addsd xmm0, xmm0 ; this instruction also requires SSE2
; ok, retun xmm0
Also note, that you also cannot mix up XMM and MMX-Registers!
(The instructions MOVQ2DQ and MOVDQ2Q can convert them from one to the other but others can't)
If your function uses parameters and if it should run on a Windows operating system, you need to ensure a valid function prolog/epilog. see: https://future2048.blogspot.com
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…