Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
593 views
in Technique[技术] by (71.8m points)

python - Using itertools.product and want to seed a value

So I've wrote a small script to download pictures from a website. It goes through a 7 alpha charactor value, where the first char is always a number. The problem is if I want to stop the script and start it up again I have to start all over.

Can I seed itertools.product somehow with the last value I got so I don't have to go through them all again.

Thanks for any input.

here is part of the code:

numbers = '0123456789'
alnum = numbers + 'abcdefghijklmnopqrstuvwxyz'

len7 = itertools.product(numbers, alnum, alnum, alnum, alnum, alnum, alnum) # length 7

for p in itertools.chain(len7):
    currentid = ''.join(p) 

    #semi static vars
    url = 'http://mysite.com/images/'
    url += currentid

    #Need to get the real url cause the redirect
    print "Trying " + url
    req = urllib2.Request(url)
    res = openaurl(req)
    if res == "continue": continue
    finalurl = res.geturl()

    #ok we have the full url now time to if it is real
    try: file = urllib2.urlopen(finalurl)
    except urllib2.HTTPError, e:
        print e.code

    im = cStringIO.StringIO(file.read())
    img = Image.open(im)
    writeimage(img)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

here's a solution based on pypy's library code (thanks to agf's suggestion in the comments).

the state is available via the .state attribute and can be reset via .goto(state) where state is an index into the sequence (starting at 0). there's a demo at the end (you need to scroll down, i'm afraid).

this is way faster than discarding values.

> cat prod.py 

class product(object):

    def __init__(self, *args, **kw):
        if len(kw) > 1:
            raise TypeError("product() takes at most 1 argument (%d given)" %
                             len(kw))
        self.repeat = kw.get('repeat', 1)
        self.gears = [x for x in args] * self.repeat
        self.num_gears = len(self.gears)
        self.reset()

    def reset(self):
        # initialization of indicies to loop over
        self.indicies = [(0, len(self.gears[x]))
                         for x in range(0, self.num_gears)]
        self.cont = True
        self.state = 0

    def goto(self, n):
        self.reset()
        self.state = n
        x = self.num_gears
        while n > 0 and x > 0:
            x -= 1
            n, m = divmod(n, len(self.gears[x]))
            self.indicies[x] = (m, self.indicies[x][1])
        if n > 0:
            self.reset()
            raise ValueError("state exceeded")

    def roll_gears(self):
        # Starting from the end of the gear indicies work to the front
        # incrementing the gear until the limit is reached. When the limit
        # is reached carry operation to the next gear
        self.state += 1
        should_carry = True
        for n in range(0, self.num_gears):
            nth_gear = self.num_gears - n - 1
            if should_carry:
                count, lim = self.indicies[nth_gear]
                count += 1
                if count == lim and nth_gear == 0:
                    self.cont = False
                if count == lim:
                    should_carry = True
                    count = 0
                else:
                    should_carry = False
                self.indicies[nth_gear] = (count, lim)
            else:
                break

    def __iter__(self):
        return self

    def next(self):
        if not self.cont:
            raise StopIteration
        l = []
        for x in range(0, self.num_gears):
            index, limit = self.indicies[x]
            l.append(self.gears[x][index])
        self.roll_gears()
        return tuple(l)

p = product('abc', '12')
print list(p)
p.reset()
print list(p)
p.goto(2)
print list(p)
p.goto(4)
print list(p)
> python prod.py 
[('a', '1'), ('a', '2'), ('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')]
[('a', '1'), ('a', '2'), ('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')]
[('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')]
[('c', '1'), ('c', '2')]

you should test it more - i may have made a dumb mistake - but the idea is quite simple, so you should be able to fix it :o) you're free to use my changes; no idea what the original pypy licence is.

also state isn't really the full state - it doesn't include the original arguments - it's just an index into the sequence. maybe it would have been better to call it index, but there are already indici[sic]es in the code...

update

here's a simpler version that is the same idea but works by transforming a sequence of numbers. so you just imap it over count(n) to get the sequence offset by n.

> cat prod2.py 

from itertools import count, imap

def make_product(*values):
    def fold((n, l), v):
        (n, m) = divmod(n, len(v))
        return (n, l + [v[m]])
    def product(n):
        (n, l) = reduce(fold, values, (n, []))
        if n > 0: raise StopIteration
        return tuple(l)
    return product

print list(imap(make_product(['a','b','c'], [1,2,3]), count()))
print list(imap(make_product(['a','b','c'], [1,2,3]), count(3)))

def product_from(n, *values):
    return imap(make_product(*values), count(n))

print list(product_from(4, ['a','b','c'], [1,2,3]))

> python prod2.py 
[('a', 1), ('b', 1), ('c', 1), ('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)]
[('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)]
[('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)]

(the downside here is that if you want to stop and restart you need to have kept track yourself of how many you have used)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...