找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
楼主: wenguonideshou

另类下载91的脚本—介绍大佬Kenneth Reitz的新爬虫库requests

[复制链接]

0

主题

18

回帖

48

积分

新手上路

积分
48
发表于 2018-3-1 13:23:18 | 显示全部楼层
在Win7,版本为3.6,运行出错了。。。:

error occurred during loading data. Trying to use cache server http://d2g6u4gh6d9rq0.cloudfront.net/browsers/fake_useragent_0.1.10.json
Traceback (most recent call last):
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Program Files\Python36\Lib\http\client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Program Files\Python36\Lib\http\client.py", line 964, in send
    self.connect()
  File "C:\Program Files\Python36\Lib\http\client.py", line 1392, in connect
    super().connect()
  File "C:\Program Files\Python36\Lib\http\client.py", line 936, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Program Files\Python36\Lib\socket.py", line 724, in create_connection
    raise err
  File "C:\Program Files\Python36\Lib\socket.py", line 713, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 67, in get
    context=context,
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: [u]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 154, in load
    for item in get_browsers(verify_ssl=verify_ssl):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 97, in get_browsers
    html = get(settings.BROWSERS_STATS_PAGE, verify_ssl=verify_ssl)
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 84, in get
    raise FakeUserAgentError('Maximum amount of retries reached')
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
Traceback (most recent call last):
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Program Files\Python36\Lib\http\client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Program Files\Python36\Lib\http\client.py", line 964, in send
    self.connect()
  File "C:\Program Files\Python36\Lib\http\client.py", line 1392, in connect
    super().connect()
  File "C:\Program Files\Python36\Lib\http\client.py", line 936, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Program Files\Python36\Lib\socket.py", line 724, in create_connection
    raise err
  File "C:\Program Files\Python36\Lib\socket.py", line 713, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 67, in get
    context=context,
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: [u]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 154, in load
    for item in get_browsers(verify_ssl=verify_ssl):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 97, in get_browsers
    html = get(settings.BROWSERS_STATS_PAGE, verify_ssl=verify_ssl)
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 84, in get
    raise FakeUserAgentError('Maximum amount of retries reached')
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Program Files\Python36\Lib\http\client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python36\Lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Program Files\Python36\Lib\http\client.py", line 964, in send
    self.connect()
  File "C:\Program Files\Python36\Lib\http\client.py", line 936, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Program Files\Python36\Lib\socket.py", line 704, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "C:\Program Files\Python36\Lib\socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 67, in get
    context=context,
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1346, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Program Files\Python36\Lib\urllib\request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: [u]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Administrator/Desktop/91/91.py", line 2, in
    from requests_html import HTMLSession
  File "C:\Program Files\Python36\lib\site-packages\requests_html.py", line 22, in
    useragent = UserAgent()
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\fake.py", line 69, in __init__
    self.load()
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\fake.py", line 78, in load
    verify_ssl=self.verify_ssl,
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 250, in load_cached
    update(path, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 245, in update
    write(path, load(use_cache_server=use_cache_server, verify_ssl=verify_ssl))
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 189, in load
    verify_ssl=verify_ssl,
  File "C:\Program Files\Python36\lib\site-packages\fake_useragent\utils.py", line 84, in get
    raise FakeUserAgentError('Maximum amount of retries reached')
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

Process finished with exit code 1
回复

使用道具 举报

0

主题

303

回帖

652

积分

高级会员

积分
652
发表于 2018-3-1 14:18:52 | 显示全部楼层

maxfly 发表于 2018-3-1 15:11

在Win7,版本为3.6,运行出错了。。。:

error occurred during loading data. Trying to use cache serve ...

91是被墙的,我本地是有V-P-N然后进行测试的
真正跑我是放在国外VPS跑的
回复

使用道具 举报

118

主题

1967

回帖

4428

积分

论坛元老

积分
4428
 楼主| 发表于 2018-3-1 14:40:29 | 显示全部楼层

wenguonideshou 发表于 2018-3-1 15:14

91是被墙的,我本地是有V-P-N然后进行测试的
真正跑我是放在国外VPS跑的

直接在浏览器可以打开那网址呀。。。
回复

使用道具 举报

1

主题

149

回帖

347

积分

中级会员

积分
347
发表于 2018-3-1 15:01:17 | 显示全部楼层
开源  让生活更美好
回复

使用道具 举报

118

主题

1967

回帖

4428

积分

论坛元老

积分
4428
 楼主| 发表于 2018-3-1 15:09:06 | 显示全部楼层

maxfly 发表于 2018-3-1 15:17

直接在浏览器可以打开那网址呀。。。

增加了本机开启小飞机的情况下爬取的脚本看的我一头雾水。。。
回复

使用道具 举报

1

主题

149

回帖

347

积分

中级会员

积分
347
发表于 2018-3-1 15:01:00 | 显示全部楼层

wenguonideshou 发表于 2018-3-1 15:37

增加了本机开启小飞机的情况下爬取的脚本

加上下面的就行了:
from fake_useragent import UserAgent
ua = UserAgent(verify_ssl=False)


回复

使用道具 举报

22

主题

773

回帖

1692

积分

金牌会员

积分
1692
发表于 2018-3-1 15:11:54 | 显示全部楼层
顶。戏子大佬
回复

使用道具 举报

1

主题

149

回帖

347

积分

中级会员

积分
347
发表于 2018-3-1 15:14:11 | 显示全部楼层
好东西 支持大佬 感谢大佬分享
回复

使用道具 举报

33

主题

2114

回帖

4383

积分

论坛元老

积分
4383
发表于 2018-3-1 15:11:00 | 显示全部楼层
应该加个权限
回复

使用道具 举报

21

主题

833

回帖

1773

积分

金牌会员

积分
1773
发表于 2018-3-1 15:17:57 | 显示全部楼层
大佬就是大佬啊
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|Discuz! X

GMT+8, 2025-4-20 18:07 , Processed in 0.019483 second(s), 3 queries , Gzip On, Redis On.

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表