I am crawling a site which may contain a lot of start_urls
, like:
http://www.a.com/list_1_2_3.htm
I want to populate start_urls
like [list_d+_d+_d+.htm]
,
and extract items from URLs like [node_d+.htm]
during crawling.
Can I use CrawlSpider
to realize this function?
And how can I generate the start_urls
dynamically in crawling?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…