一个小爬虫(练手)
本帖最后由 ai0by 于 2019-3-27 17:04 编辑[*]# -*- coding: utf-8 -*-
[*]from bs4 import BeautifulSoup
[*]import requests
[*]import urllib2
[*]import random
[*]
[*]def spy(url):
[*] req = urllib2.Request(url)
[*] req = urllib2.urlopen(req)
[*] page = req.read()
[*] soup = BeautifulSoup(page, "html.parser")
[*] for imgSoup in soup.find_all('div', {"class": "row"}):
[*] for i in imgSoup.find_all('div', {'class': 'photo'}):
[*] for j in i.find('div', {'class': 'photo-link-outer'}).find('a').find_all('img'):
[*] img = j.get("src")
[*] print img
[*] str = random.sample('zyxwvutsrqponmlkjihgfedcba', 6)
[*] downImg(img, str)
[*] nexturl = soup.find('p',{'class':'go-to-next-page'})
[*] nexturl = nexturl.find('a').get('href')
[*] pageurl = "http://jigadori.fkoji.com"+nexturl
[*] spy(pageurl)
[*]
[*]def downImg(img,m):
[*] try:
[*] r = requests.get(img)
[*] except Exception , e:
[*] print "图片获取失败"
[*] return
[*] with open('./img/good%s.jpg' % m, 'wb') as f:
[*] f.write(r.content)
[*]if __name__ == '__main__':
[*] url = "http://jigadori.fkoji.com"
[*] spy(url)复制代码
http://fulicos.sbcoder.cn/2019/03/21/5c92e798d4689.png
http://fulicos.sbcoder.cn/2019/03/21/5c92e794e9827.png
昨天上午看大佬发的资源没存上,自写了一个,不太完善,好歹算是能看。。。。 了解一下requests 膜拜大佬。。。。。。。 666,害我又上http://jigadori.fkoji.com/ 看了半天 了解一下requests+asyncio
ansheng 发表于 2019-3-21 09:40
了解一下requests+asyncio
谢谢大佬指点,看看去 学了一天的pycharm这个能看懂了,看来功夫没白费啊 真就天天开车呗 这网站我打不开啊,被你爬死了???
lol.gif
lol.gif
有没有1024整车的爬虫啊?
页:
[1]