Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
45 views
in Technique[技术] by (71.8m points)

java - Optimization Problem on files split based on size

I am facing an optimization problem and I would like to find an algorithm capable of solving it.

Hyphotesis: the total sum of files size is at least greater than 15 MB. So total size is [15 MB; +∞]. For simplicity think of infinity as 100 GB,

Problem: I have a list of files with size between 3 KB and 4 MB. I have to zip those files and I need to guarantee that the sum of files size before zip them together is between 15 MB and 150 MB. There is any known algorithm to deal with this problem? In order not to have an algorithm to much costly in terms of computational requirements it is acceptable not to minimize the number of chunks (so each chunk is not mandatory to be as big as possible).

Thanks, Giuseppe

question from:https://stackoverflow.com/questions/65885139/optimization-problem-on-files-split-based-on-size

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can tweak the well-known first-fit decreasing algorithm to do this.

import random

K = 1000
B = 1
KB = K * B
MB = K * KB


class File:
    def __init__(self):
        self.size = random.randrange(3 * KB, 4 * MB)


class Chunk:
    def __init__(self, max_chunk_size):
        self.free = max_chunk_size
        self.files = []

    def add(self, file):
        if self.free < file.size:
            return False
        self.free -= file.size
        self.files.append(file)
        return True

    def enlarge(self, delta_free):
        assert delta_free >= 0
        self.free += delta_free

    def size(self):
        return sum(file.size for file in self.files)


def first_fit_decreasing(files, min_chunk_size, max_chunk_size):
    chunks = []
    for file in sorted(files, key=lambda file: file.size, reverse=True):
        for existing_chunk in chunks:
            if existing_chunk.add(file):
                break
        else:
            if len(chunks) >= 2:
                chunks[-2].enlarge(min_chunk_size)
            new_chunk = Chunk(max_chunk_size - min_chunk_size)
            new_chunk.add(file)
            chunks.append(new_chunk)
    if chunks[-1].size() < min_chunk_size:
        chunks[-2].enlarge(min_chunk_size)
        for file in chunks[-1].files:
            chunks[-2].add(file)
        del chunks[-1]
    return chunks


def test(n):
    files = [File() for i in range(n)]
    min_chunk_size = 15 * MB
    max_chunk_size = 150 * MB
    chunks = first_fit_decreasing(files, min_chunk_size, max_chunk_size)
    assert sorted(id(file) for file in files) == sorted(
        id(file) for chunk in chunks for file in chunk.files
    )
    for chunk in chunks:
        assert min_chunk_size <= chunk.size() <= max_chunk_size
    print(len(chunks), "chunks")
    print(sum(chunk.free for chunk in chunks) / MB, "MB free")


if __name__ == "__main__":
    for t in range(1000):
        test(150)
    test(10000)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...