Python爬虫基础之selenium库的用法总结-编程学习网

短信预约信息系统项目管理师报名、考试、查分时间动态提醒

一、selenium简介

官网

在这里插入图片描述

总的来说： selenium库主要用来做浏览器的自动化脚本库。

二、selenium基本用法


from selenium import webdriver

url = 'http://www.baidu.com'

# 将webdriver实例化
path = 'C:\Program Files (x86)\Python38-32\chromedriver.exe'
browser = webdriver.Chrome(executable_path = path)

# 用谷歌浏览器访问百度
 
r = browser.get(url)
with open ('test.txt','wb+') as f:
    f.write(r.content)

三、常用用法


'''
代码功能：selenium是的常用用法
时间：@Date: 2021-05-22 21:37:05
'''

from selenium import webdriver

# 导入Options类
from selenium.webdriver.chrome.options import Options

url = "https://movie.douban.com/"

# Options的实例化
chrome_options = Options()

# 设置浏览器参数

# --headless 是不显示浏览器启动以及执行过程
chrome_options.add_argument('--headless')

# 设置lang和User-Agent信息，防止反爬检测
chrome_options.add_argument('lang=zh_CN.utf-8')

UserAgent = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'

chrome_options.add_argument('User-Agent='+UserAgent)

# 启动浏览器并设置chrome_options参数
driver = webdriver.Chrome(chrome_options=chrome_options)

# 设置浏览器窗口最大化
# driver.maximize_window()

# # 设置浏览器窗口最小化
# driver.minimize_window()

driver.get(url)

# 获取网页的
print(driver.title)

# page_source是获取网页的HTML代码
print(driver.page_source)

四、cookie的设置、获取与删除


from selenium import webdriver
import time

# 启动浏览器
driver = webdriver.Chrome()
driver.get('https://www.youdao.com')
time.sleep(5)

# 添加cookie
driver.add_cookie({'name':'login','value':'登录'})

# 获取全部cookie
allCookies = driver.get_cookies()

print('全部cookies',allCookies)

# 获取name为login的cookie
cookie = driver.get_cookie('login')
print('name为login的cookie',cookie)

# 删除单个cookie
driver.delete_cookie('login')
print("\n--------------剩余的cookies\n",driver.get_cookies())

# 删除全部cookies
driver.delete_all_cookies()

print("-------------剩余的cookies------------\n",driver.get_cookies())

time.sleep(60)

五、文件的上传与下载文件上传upload


<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h1>配合selenium模拟上传文件</h1>
    <input type='file' name='file' />
</body>
</html>


'''
代码功能：selenium上传文件 --配合upload.html使用
时间：@Date: 2021-05-23 09:56:53
'''


from selenium import webdriver
import time

driver = webdriver.Chrome()

url = 'http://localhost:52330/selenium/upload.html'

driver.get(url)

ele = driver.find_element_by_name('file')
print("获取到的元素",ele)

# 注意路径不能有中文字符
ele.send_keys('D:\dcsdk_eventv3.db')
time.sleep(10)

下载文件


'''
代码功能：模拟文件下载
时间：@Date: 2021-05-23 10:21:28
'''

from selenium import webdriver
import time

# 设置文件保存路径，如果不设置，会默认保存到Downloads文件夹

options = webdriver.ChromeOptions()

prefs = {'download.default_directory':'D:\\'}
options.add_experimental_option('prefs',prefs)

# 启动浏览器
driver = webdriver.Chrome()

# 下载PC版微信
driver.get('https://pc.weixin.qq.com')

# 浏览器窗口最大化
driver.maximize_window()
time.sleep(5)
# 点击下载按钮
driver.find_element_by_class_name('download-button').click()

time.sleep(30)

六、窗口的切换


from selenium import webdriver
import time

url = 'https://www.baidu.com/'
driver = webdriver.Chrome()

# 隐式等待，一次设置对整个driver的周期都起作用
driver.implicitly_wait(30)

driver.get(url)

# 使用js开启新的窗口
js = 'window.open("https://www.sogou.com/")'
driver.execute_script(js)


# 获取当前显示的窗口信息
current_window = driver.current_window_handle

print(driver)
# 获取浏览器的全部窗口信息
handles = driver.window_handles
print('获取到的窗口全部信息\n------------------\n',handles)


'''
获取到的窗口全部信息
------------------
 ['CDwindow-7FB808B4F24EF5385A9AFBDC21FA13B9', 'CDwindow-E879C0A64E734C3F88468A4388F48E3B']
'''

# 设置延时看切换的效果
time.sleep(3)


# 根据窗口信息进行窗口切换
# 切换到百度搜索的窗口
driver.switch_to_window(handles[0])
time.sleep(3)


# 切换到搜狗窗口
driver.switch_to_window(handles[1])

七、项目实战


'''
代码功能：熟悉selenium的自动化操作
时间：2020/5/22
'''


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

# Keys 类中定义了很多快捷键


url = 'https://wwww.baidu.com'

path = 'C:\Program Files (x86)\Python38-32\chromedriver.exe'
driver = webdriver.Chrome(executable_path=path)
driver.get(url)

# 获取输入框标签对象
element = driver.find_element_by_id('kw')

# 输入框输入内容
element.send_keys('python你')
time.sleep(2)

# 删除最后一个文字
element.send_keys(Keys.BACK_SPACE)
time.sleep(2)

# 添加空格加教程
element.send_keys(Keys.SPACE)
element.send_keys("教程")
time.sleep(2)


# ctrl+a 全选输入框内容
element.send_keys(Keys.CONTROL, 'a')
time.sleep(2)

# ctrl+x 剪切输入框内容
element.send_keys(Keys.CONTROL, 'x')
time.sleep(2)
# ctrl+v 复制
element.send_keys(Keys.CONTROL, 'v')
time.sleep(2)

# 回车键
driver.find_element_by_id('su').send_keys(Keys.ENTER)
time.sleep(10)

到此这篇关于Python爬虫基础之selenium库的用法总结的文章就介绍到这了,更多相关Python selenium库内容请搜索编程网以前的文章或继续浏览下面的相关文章希望大家以后多多支持编程网！

文章详情

Python爬虫基础之selenium库的用法总结

目录

一、selenium简介

二、selenium基本用法

三、常用用法

四、cookie的设置、获取与删除

五、文件的上传与下载文件上传upload

六、窗口的切换

七、项目实战

软考中级精品资料免费领

相关文章

猜你喜欢

Python爬虫基础之selenium库的用法总结

Python爬虫基础之selenium库怎么用

Python爬虫基础之爬虫的分类知识总结

python基础之Numpy库中array用法总结

Python爬虫之Requests库的基

Python爬虫基础之初次使用scrapy爬虫实例

python爬虫之selenium库的安装及使用教程

Python爬虫：一些常用的爬虫技巧总结

python实现selenium网络爬虫的方法小结

Python爬虫之Urllib库的基本使

python学习-Selenium爬虫之使用代理ip的方法

Python常用的爬虫技巧总结

Python爬虫基础之简单说一下scrapy的框架结构

Python爬虫基础之请求的示例分析

Python基础篇之字符串方法总结

总结python爬虫抓站的实用技巧

用python爬虫抓站的一些技巧总结

python爬虫之requests库的使用详解

Python爬虫之BeautifulSoup的基本使用教程

Python基础之变量的相关知识总结

文章详情

Python爬虫基础之selenium库的用法总结

目录

一、selenium简介

二、selenium基本用法

三、常用用法

四、cookie的设置、获取与删除

五、文件的上传与下载 文件上传upload

六、窗口的切换

七、项目实战

软考中级精品资料免费领

相关文章

猜你喜欢

Python爬虫基础之selenium库的用法总结

Python爬虫基础之selenium库怎么用

Python爬虫基础之爬虫的分类知识总结

python基础之Numpy库中array用法总结

Python爬虫之Requests库的基

Python爬虫基础之初次使用scrapy爬虫实例

python爬虫之selenium库的安装及使用教程

Python爬虫：一些常用的爬虫技巧总结

python实现selenium网络爬虫的方法小结

Python爬虫之Urllib库的基本使

python学习-Selenium爬虫之使用代理ip的方法

Python常用的爬虫技巧总结

Python爬虫基础之简单说一下scrapy的框架结构

Python爬虫基础之请求的示例分析

Python基础篇之字符串方法总结

总结python爬虫抓站的实用技巧

用python爬虫抓站的一些技巧总结

python爬虫之requests库的使用详解

Python爬虫之BeautifulSoup的基本使用教程

Python基础之变量的相关知识总结

五、文件的上传与下载文件上传upload