Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
460 views
in Technique[技术] by (71.8m points)

python - How to use expand in snakemake when some particular combinations of wildcards are not desired?

Let's suppose that I have the following files, on which I want to apply some processing automatically using snakemake:

test_input_C_1.txt
test_input_B_2.txt
test_input_A_2.txt
test_input_A_1.txt

The following snakefile uses expand to determine all the potential final results file:

rule all:
    input: expand("test_output_{text}_{num}.txt", text=["A", "B", "C"], num=[1, 2])

rule make_output:
    input: "test_input_{text}_{num}.txt"
    output: "test_output_{text}_{num}.txt"
    shell:
        """
        md5sum {input} > {output}
        """

Executing the above snakefile results in the following error:

MissingInputException in line 4 of /tmp/Snakefile:
Missing input files for rule make_output:
test_input_B_1.txt

The reason for that error is that expand uses itertools.product under the hood to generate the wildcards combinations, some of which happen to correspond to missing files.

How to filter out the undesired wildcards combinations?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The expand function accepts a second optional non-keyword argument to use a different function from the default one to combine wildcard values.

One can create a filtered version of itertools.product by wrapping it in a higher-order generator that checks that the yielded combination of wildcards is not among a pre-established blacklist:

from itertools import product

def filter_combinator(combinator, blacklist):
    def filtered_combinator(*args, **kwargs):
        for wc_comb in combinator(*args, **kwargs):
            # Use frozenset instead of tuple
            # in order to accomodate
            # unpredictable wildcard order
            if frozenset(wc_comb) not in blacklist:
                yield wc_comb
    return filtered_combinator

# "B_1" and "C_2" are undesired
forbidden = {
    frozenset({("text", "B"), ("num", 1)}),
    frozenset({("text", "C"), ("num", 2)})}

filtered_product = filter_combinator(product, forbidden)

rule all:
    input:
        # Override default combination generator
        expand("test_output_{text}_{num}.txt", filtered_product, text=["A", "B", "C"], num=[1, 2])

rule make_output:
    input: "test_input_{text}_{num}.txt"
    output: "test_output_{text}_{num}.txt"
    shell:
        """
        md5sum {input} > {output}
        """

The missing wildcards combinations can be read from the configfile.

Here is an example in json format:

{
    "missing" :
    [
        {
            "text" : "B",
            "num" : 1
        },
        {
            "text" : "C",
            "num" : 2
        }
    ]
}

The forbidden set would be read as follows in the snakefile:

forbidden = {frozenset(wc_comb.items()) for wc_comb in config["missing"]}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...