在Python中使用requests库爬取数据时返回为空如何解决-编程学习网

在Python中使用requests库爬取数据时返回为空如何解决？很多新手对此不是很清楚，为了帮助大家解决这个难题，下面小编将为大家详细讲解，有这方面需求的人可以来学习下，希望你能有所收获。

Python主要用来做什么

Python主要应用于：1、Web开发；2、数据科学研究；3、网络爬虫；4、嵌入式应用开发；5、游戏开发；6、桌面应用开发。

html字段：

在Python中使用requests库爬取数据时返回为空如何解决

robots协议：

在Python中使用requests库爬取数据时返回为空如何解决

现在我们开始用python IDLE 爬取

在Python中使用requests库爬取数据时返回为空如何解决

import requestsr = requests.get("https://baike.so.com/doc/24368318-25185095.html")r.status_coder.text

结果分析，我们可以成功访问到该网页，但是得不到网页的结果。被360搜索识别，我们将headers修改。

在Python中使用requests库爬取数据时返回为空如何解决

输出有个小插曲，网页内容很多，我是想将前500个字符输出，第一次格式错了

import requestsheaders = {  'Cookie':'OCSSID=4df0bjva6j7ejussu8al3eqo03',  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'         '(KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',}r = requests.get("https://baike.so.com/doc/24368318-25185095.html"， headers = headers)r.status_coder.text

接着我们对需要的内容进行爬取，用(.find)方法找到我们内容位置，用(.children)下行遍历的方法对内容进行爬取，用(isinstance)方法对内容进行筛选：

import requestsfrom bs4 import BeautifulSoupimport bs4headers = {  'Cookie':'OCSSID=4df0bjva6j7ejussu8al3eqo03',  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'         '(KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',}r = requests.get("https://baike.so.com/doc/24368318-25185095.html", headers = headers)r.status_coder.encoding = r.apparent_encodingsoup = BeautifulSoup(r.text, "html.parser")for tr in soup.find('tbody').children:if isinstance(tr, bs4.element.Tag):tds = tr('td')print([tds[0].string, tds[1].string, tds[2].string])

得到结果如下：

在Python中使用requests库爬取数据时返回为空如何解决

修改输出的数目，我们用Clist列表来存取所有城市的排名，将前20个输出代码如下：

import requestsfrom bs4 import BeautifulSoupimport bs4Clist = list() #存所有城市的列表headers = {  'Cookie':'OCSSID=4df0bjva6j7ejussu8al3eqo03',  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'         '(KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',}r = requests.get("https://baike.so.com/doc/24368318-25185095.html", headers = headers)r.encoding = r.apparent_encoding #将html的编码解码为utf-8格式soup = BeautifulSoup(r.text, "html.parser") #重新排版for tr in soup.find('tbody').children:   #将tbody标签的子列全部读取if isinstance(tr, bs4.element.Tag):  #筛选tb列表，将有内容的筛选出啦  tds = tr('td')  Clist.append([tds[0].string, tds[1].string, tds[2].string])for i in range(21):  print(Clist[i])

最终结果：

在Python中使用requests库爬取数据时返回为空如何解决

看完上述内容是否对您有帮助呢？如果还想对相关知识有进一步的了解或阅读更多相关文章，请关注编程网行业资讯频道，感谢您对编程网的支持。

文章详情

在Python中使用requests库爬取数据时返回为空如何解决

Python主要用来做什么

软考中级精品资料免费领

相关文章

猜你喜欢

在Python中使用requests库爬取数据时返回为空如何解决

在Python中使用os.path.exists()函数时返回false如何解决

Vue3中使用reactive时后端有返回数据但dom没有更新如何解决