c - Massive fprintf speed difference without "-std=c99"

Question

Welcome To Ask or Share your Answers For Others

c - Massive fprintf speed difference without "-std=c99"

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c - Massive fprintf speed difference without "-std=c99"

I had been struggling for weeks with a poor-performing translator I had written. On the following simple bechmark

#include<stdio.h>

int main()
{
    int x;
    char buf[2048];
    FILE *test = fopen("test.out", "wb");
    setvbuf(test, buf, _IOFBF, sizeof buf);
    for(x=0;x<1024*1024; x++)
        fprintf(test, "%04d", x);
    fclose(test);
    return 0
}

we see the following result

bash-3.1$ gcc -O2 -static test.c -o test
bash-3.1$ time ./test

real    0m0.334s
user    0m0.015s
sys     0m0.016s

As you can see, the moment the "-std=c99" flag is added in, performance comes crashing down:

bash-3.1$ gcc -O2 -static -std=c99 test.c -o test
bash-3.1$ time ./test

real    0m2.477s
user    0m0.015s
sys     0m0.000s

The compiler I'm using is gcc 4.6.2 mingw32.

The file generated is about 12M, so this is a difference between of about 21MB/s between the two.

Running diff shows the the generated files are identical.

I assumed this has something to do with file locking in fprintf, of which the program makes heavy use, but I haven't been able to find a way to switch that off in the C99 version.

I tried flockfile on the stream I use at the beginning of the program, and an corresponding funlockfile at the end, but was greeted with compiler errors about implicit declarations, and linker errors claiming undefined references to those functions.

Could there be another explanation for this problem, and more importantly, is there any way to use C99 on windows without paying such an enormous performance price?

Edit:

After looking at the code generated by these options, it looks like in the slow versions, mingw sticks in the following:

_fprintf:
LFB0:
    .cfi_startproc
    subl    $28, %esp
    .cfi_def_cfa_offset 32
    leal    40(%esp), %eax
    movl    %eax, 8(%esp)
    movl    36(%esp), %eax
    movl    %eax, 4(%esp)
    movl    32(%esp), %eax
    movl    %eax, (%esp)
    call    ___mingw_vfprintf
    addl    $28, %esp
    .cfi_def_cfa_offset 4
    ret
    .cfi_endproc

In the fast version, this simply does not exist; otherwise, both are exactly the same. I assume __mingw_vfprintf seems to be the slowpoke here, but I have no idea what behavior it needs to emulate that makes it so slow.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:16:29+0000

After some digging in the source code, I have found why the MinGW function is so terribly slow:

At the beginning of a [v,f,s]printf in MinGW, there is some innocent-looking initialization code:

__pformat_t stream = {
    dest,                   /* output goes to here        */
    flags &= PFORMAT_TO_FILE | PFORMAT_NOLIMIT, /* only these valid initially */
    PFORMAT_IGNORE,             /* no field width yet         */
    PFORMAT_IGNORE,             /* nor any precision spec     */
    PFORMAT_RPINIT,             /* radix point uninitialised  */
    (wchar_t)(0),               /* leave it unspecified       */
    0,                          /* zero output char count     */
    max,                        /* establish output limit     */
    PFORMAT_MINEXP          /* exponent chars preferred   */
};

However, PFORMAT_MINEXP is not what it appears to be:

#ifdef _WIN32
# define PFORMAT_MINEXP    __pformat_exponent_digits() 
# ifndef _TWO_DIGIT_EXPONENT
#  define _get_output_format()  0 
#  define _TWO_DIGIT_EXPONENT   1
# endif
static __inline__ __attribute__((__always_inline__))
int __pformat_exponent_digits( void )
{
  char *exponent_digits = getenv( "PRINTF_EXPONENT_DIGITS" );
  return ((exponent_digits != NULL) && ((unsigned)(*exponent_digits - '0') < 3))
    || (_get_output_format() & _TWO_DIGIT_EXPONENT)
    ? 2
    : 3
    ;
}

This winds up getting called every time I want to print, and getenv on windows must not be very quick. Replacing that define with a 2 brings the runtime back to where it should be.

So, the answer comes down to this: when using -std=c99 or any ANSI-compliant mode, MinGW switches the CRT runtime with its own. Normally, this wouldn't be an issue, but the MinGW lib had a bug which slowed its formatting functions down far beyond anything imaginable.

Categories

c - Massive fprintf speed difference without "-std=c99"

c - Massive fprintf speed difference without "-std=c99"

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags