Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
950 views
in Technique[技术] by (71.8m points)

arrays - Duplicate removal

Let's be honest, this is a hw question.

The question in its entirety:

Implement duplicate removal algorithm in a one-dimensional array using C++/Java in O(n) time complexity with no extra space. For example, if the input array is {3,5,5,3,7,8,5,8,9,9} then the output should be {3,5,7,8,9}.

I have thought about it for quite a while and haven't been able to solve it yet.

My thoughts:

  1. I could remove duplicates in O(n) if the array was sorted. But the fastest sorting algorithm I know has O(n*log(n)) complexity.

  2. One algorithm that sorts in O(n) is bin or bucket sort. The problem here is that it cannot be implemented without using extra space.

  3. I wonder if it is possible at all.

I have researched a bit and have found nothing new.

Thanks for any help.

P.S.: I would have given it more time if it weren't for the exam tomorrow.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is indeed possible, just use in-place radix sort.

It runs for O(kn) where k is constant for any standard numeric data type, and requires O(1) extra space.

Here`s the code:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

/// in-place 32-bit recursive radix sort
void I32R(int32_t *data, uint32_t size, uint32_t nbit) {
    uint32_t dbgn = (uint32_t)-1, dend = size;

    while (++dbgn < dend)
        if (data[dbgn] & nbit)
            while (dbgn < --dend)
                if (~data[dend] & nbit) {
                    data[dbgn] ^= data[dend];
                    data[dend] ^= data[dbgn];
                    data[dbgn] ^= data[dend];
                    break;
                }
    if ((nbit >>= 1) && (dend > 1))
        I32R(data, dend, nbit);
    if (nbit && (size - dend > 1))
        I32R(data + dend, size - dend, nbit);
}

/// O_t(n) / O_s(1) duplicate remover
int32_t dups(int32_t *data, uint32_t size) {
    int32_t iter, *uniq = data;

    if (size < 2)
        return size;
    for (iter = 0; iter < size; iter++)
        data[iter] ^= (1 << 31);
    I32R(data, size, 1 << 31);
    data[0] ^= (1 << 31);
    for (iter = 1; iter < size; iter++)
        if (*uniq != (data[iter] ^= (1 << 31)))
            *++uniq = data[iter];
    return uniq - data + 1;
}

void parr(int32_t *data, uint32_t size) {
    for (; size; size--)
        printf("%4d%s", *data++, (size == 1)? "

" : ", ");
}

int main() {
    int32_t iter, size, *data;

    data = malloc((size = 256) * sizeof(*data));
    for (iter = 0; iter < size; iter++)
        data[iter] = (int8_t)rand() & -3;
    parr(data, size);
    parr(data, dups(data, size));
    free(data);
    return 0;
}

N.B.#1: inverting the sign bit before sorting is necessary for positive numbers to become greater than negatives, as radix sort only operates on unsigned values.

N.B.#2: this is just a rough illustration, never really tested.

N.B.#3: oh wow, this is actually faster than qsort()!

N.B.#4: and now there`s a non-recursive version of the sorting function; the usage is pretty much the same, save for the lack of nbit:

void I32NR(int32_t *data, uint32_t size) {
    int32_t mask, head;
    struct {
        uint32_t init, size, nbit, edge;
    } heap[32];

    heap[0].nbit = 32;
    heap[0].size = size;
    heap[0].init = head = 0;
    do {
        size = heap[head].init - 1;
        mask = 1 << ((heap[head].nbit & 0x7F) - 1);
        heap[head].edge = heap[head].size;
        while (++size < heap[head].edge)
            if (data[size] & mask)
                while (size < --heap[head].edge)
                    if (~data[heap[head].edge] & mask) {
                        data[size] ^= data[heap[head].edge];
                        data[heap[head].edge] ^= data[size];
                        data[size] ^= data[heap[head].edge];
                        break;
                    }
        heap[head].nbit = ((heap[head].nbit & 0x7F) - 1)
                        |  (heap[head].nbit & 0x80);
        if ((heap[head].nbit & 0x7F) && (heap[head].edge > 1)) {
            heap[head + 1] = heap[head];
            heap[head + 1].size = heap[head].edge;
            heap[++head].nbit |= 0x80;
            continue;
        }
        do {
            if ((heap[head].nbit & 0x7F)
            &&  (heap[head].size - heap[head].edge > 1)) {
                heap[head + 1] = heap[head];
                heap[head + 1].init = heap[head].edge;
                heap[++head].nbit &= 0x7F;
                break;
            }
            while ((head >= 0) && !(heap[head--].nbit & 0x80));
        } while (head >= 0);
    } while (head >= 0);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...