Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
258 views
in Technique[技术] by (71.8m points)

perl - HOMER de novo motif discovery cannot open hg19 fasta files

I have some chip seq data in BAM format At some point I wanted to do a de novo motif discovery using HOMERs findMotifsGenome.pl script

The problem seems to be that this app cannot open the refrence genome fasta files, even though they were installed by the app itself!

Has anyone encountered this problem?

The linux command used:

$perl /home/chipseq_project/homer/bin/findMotifsGenome.pl /home/chipseq_project/homer/findpeak_output/peaks.txt hg19 /home/chipseq_project/homer/motif_output/ -size given

the stdout text:

    Position file = /home/chipseq_project/homer/findpeak_output/peaks.txt
    Genome = hg19
    Output Directory = /home/chipseq_project/homer/motif_output/
    Using actual sizes of regions (-size given)
    Fragment size set to given
    Found mset for "human", will check against vertebrates motifs
    Peak/BED file conversion summary:
            BED/Header formatted lines: 0
            peakfile formatted lines: 7662

    Peak File Statistics:
            Total Peaks: 7662
            Redundant Peak IDs: 0
            Peaks lacking information: 0 (need at least 5 columns per peak)
            Peaks with misformatted coordinates: 0 (should be integer)
            Peaks with misformatted strand: 0 (should be either +/- or 0/1)

    Peak file looks good!

    Background fragment size set to 81 (avg size of targets)
    Background files for 81 bp fragments found.

    Extracting sequences from directory: /home/chipseq_project/homer/.//data/genomes/hg19//
    !!Could not open file for 1 (.fa or .fa.masked)
    !!Could not open file for 10 (.fa or .fa.masked)
    !!Could not open file for 11 (.fa or .fa.masked)
    !!Could not open file for 12 (.fa or .fa.masked)
    !!Could not open file for 13 (.fa or .fa.masked)
    !!Could not open file for 14 (.fa or .fa.masked)
    !!Could not open file for 15 (.fa or .fa.masked)
    !!Could not open file for 16 (.fa or .fa.masked)
    !!Could not open file for 17 (.fa or .fa.masked)
    !!Could not open file for 18 (.fa or .fa.masked)
    !!Could not open file for 19 (.fa or .fa.masked)
    !!Could not open file for 2 (.fa or .fa.masked)
    !!Could not open file for 20 (.fa or .fa.masked)
    !!Could not open file for 21 (.fa or .fa.masked)
    !!Could not open file for 22 (.fa or .fa.masked)
    !!Could not open file for 3 (.fa or .fa.masked)
    !!Could not open file for 4 (.fa or .fa.masked)
    !!Could not open file for 5 (.fa or .fa.masked)
    !!Could not open file for 6 (.fa or .fa.masked)
    !!Could not open file for 7 (.fa or .fa.masked)
    !!Could not open file for 8 (.fa or .fa.masked)
    !!Could not open file for 9 (.fa or .fa.masked)
    !!Could not open file for X (.fa or .fa.masked)
    !!Could not open file for Y (.fa or .fa.masked)

    Not removing redundant sequences


    Sequences processed:
            0 total

    Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
    Freq    Bin     Count

    Total sequences set to 50000

    Choosing background that matches in CpG/GC Content...

Illegal division by zero at /home/chipseq_project/homer/bin/assignGeneWeights.pl line 63. Assembling sequence file... Normalizing lower order oligos using homer2

    Reading input files...
    0 total sequences read
    Autonormalization: 1-mers (4 total)
            A       inf%    inf%    -nan
            C       inf%    inf%    -nan
            G       inf%    inf%    -nan
            T       inf%    inf%    -nan
    Autonormalization: 2-mers (16 total)
            AA      inf%    inf%    -nan
            CA      inf%    inf%    -nan
            GA      inf%    inf%    -nan
            TA      inf%    inf%    -nan
            AC      inf%    inf%    -nan
            CC      inf%    inf%    -nan
            GC      inf%    inf%    -nan
            TC      inf%    inf%    -nan
            AG      inf%    inf%    -nan
            CG      inf%    inf%    -nan
            GG      inf%    inf%    -nan
            TG      inf%    inf%    -nan
            AT      inf%    inf%    -nan
            CT      inf%    inf%    -nan
            GT      inf%    inf%    -nan
            TT      inf%    inf%    -nan
    Autonormalization: 3-mers (64 total)
    Normalization weights can be found in file: /home/chipseq_project/homer/motif_output//seq.autonorm.tsv
    Converging on autonormalization solution:
    ...............................................................................
    Final normalization:    Autonormalization: 1-mers (4 total)
            A       inf%    inf%    -nan
            C       inf%    inf%    -nan
            G       inf%    inf%    -nan
            T       inf%    inf%    -nan
    Autonormalization: 2-mers (16 total)
            AA      inf%    inf%    -nan
            CA      inf%    inf%    -nan
            GA      inf%    inf%    -nan
            TA      inf%    inf%    -nan
            AC      inf%    inf%    -nan
            CC      inf%    inf%    -nan
            GC      inf%    inf%    -nan
            TC      inf%    inf%    -nan
            AG      inf%    inf%    -nan
            CG      inf%    inf%    -nan
            GG      inf%    inf%    -nan
            TG      inf%    inf%    -nan
            AT      inf%    inf%    -nan
            CT      inf%    inf%    -nan
            GT      inf%    inf%    -nan
            TT      inf%    inf%    -nan
    Autonormalization: 3-mers (64 total)
    Finished preparing sequence/group files

    ----------------------------------------------------------
    Known motif enrichment

    Reading input files...
    0 total sequences read
    264 motifs loaded
    Cache length = 11180
    Using binomial scoring
    Checking enrichment of 264 motif(s)
    |0%                                    50%                                  100%|

Illegal division by zero at /home/chipseq_project/homer/bin/findKnownMotifs.pl line 142. ---------------------------------------------------------- De novo motif finding (HOMER)

    Scanning input files...

!!! Something is wrong... are you sure you chose the right length for motif finding? !!! i.e. also check your sequence file!!!

    Scanning input files...

!!! Something is wrong... are you sure you chose the right length for motif finding? !!! i.e. also check your sequence file!!!

    -blen automatically set to 2
    Scanning input files...

!!! Something is wrong... are you sure you chose the right length for motif finding? !!! i.e. also check your sequence file!!! Use of uninitialized value in numeric gt (>) at /home/chipseq_project/homer/bin/compareMotifs.pl line 1289. !!! Filtered out all motifs!!! Job finished - if results look good, please send beer to ..

    Cleaning up tmp files...
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One thing to check: if the chromosome naming in your bed file and consistent with the chrom naming in the genome your are using: for example you shouldnt have '12' for chromosome 12 in your bed file whereas in the genome of your interest it is 'chr12'


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...