Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
214 views
in Technique[技术] by (71.8m points)

c - Why is malloc() allocating 2 more bytes than its supposed to?

I'm writing a c- compiler. Flex recognizes my string token and sends it to a function to store it in a struct{} containing info about it, but first the string needs to have escape chars removed, which is a ''. Here is my code that does that:

char* removeEscapeChars(char* svalue)
{
    char* processedString; //will be the string with escape characters removed
    int svalLen = strlen(svalue);
    printf("svalLen (size of string passed in): %d
", svalLen);
    printf("svalue (string passed in): %s
", svalue);
    int foundEscapedChars = 0;
    for (int i = 0; i < svalLen;) 
    {
        if (svalue[i] == '\') {
            //Found escaped character
            if (svalue[i+1] == 'n') {
                //Found newline character
                svalue[i] = int('
');
            }
            else if (svalue[i+1] == '0') {
                //Found null character
                svalue[i] = int('');
            }
            else {
                //Any other character
                svalue[i] = svalue[i+1];
            }
            i++;
            foundEscapedChars++;
            for (int j = i; j < svalLen + 1; j++) {
                svalue[j] = svalue[j+1];
            }
        }
        else {
            i++;
        }
    }
    int newSize = svalLen - foundEscapedChars;
    processedString = (char*) malloc(newSize * sizeof(char));
    memcpy(processedString, svalue, newSize * sizeof(char));
    printf("newSize: %d
", newSize);
    printf("processedString: %s
", processedString);
    printf("processedString Size: %d
", strlen(processedString));
    
    free(svalue);
    return processedString;
}

It works 99% of the time, but when its tested on this specific string (or a similar one with 40 characters) "-//W3C//DTD XHTML 1.0 Transitional//EN", malloc() appears to be allocating memory for a string 2 bytes too large. The output for this is below. Notice that I used int newSize in my call to malloc(), which it says is of value 40, and then strlen() returns 42. sizeof(char) is == 1 also. The main issue is its inserting garbage characters at the end of the string. What gives?

"-//W3C//DTD XHTML 1.0 Transitional//EN"
svalLen (size of string passed in): 40
svalue (string passed in) "-//W3C//DTD XHTML 1.0 Transitional//EN"
newSize: 40
processedString: "-//W3C//DTD XHTML 1.0 Transitional//EN"Z
processedString Size: 42
Line 47 Token: STRINGCONST Value: "-//W3C//DTD XHTML 1.0 Transitional//EN"Z Len: 40 Input: "-//W3C//DTD XHTML 1.0 Transitional//EN"
question from:https://stackoverflow.com/questions/65929228/why-is-malloc-allocating-2-more-bytes-than-its-supposed-to

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a reworking of your code that takes a different, more conventional approach to processing strings. Start first with a function that counts escape characters, as this will be useful in the next step:

int escapeCount(char* str) {
    int c = 0;

    // Can just increment and work through the string using the given pointer
    while (*str) {
        // Backslash something here
        if (*str == '\') {
            ++str;
            ++c;
        }

        if (*str) {
          // Handle unmatched  at end of string
          ++str;
        }
    }

    return c;
}

Now using that information you can allocate the correct buffer size:

char* removeEscapeChars(char* str)
{
    // IMPORTANT: Allocate strlen() + 1 for the NUL byte not counted
    char* result = malloc(strlen(str) - escapeCount(str) + 1);
    char* r = result;

    do {
        if (*str == '\') {
            ++str;

            switch (*str) {
                case 'n':
                    *r = '
';
                    break;
                case 'r':
                    *r = '
';
                    break;
                case 't':
                    *r = '';
                    break;
                default:
                    *r = *str;
                    break;
            }
        }
        else {
            *r = *str;
        }

        if (*str) {
          ++str;
        }

        ++r;
    } while(*str);

    return result;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...