在学习Python超强爬虫8天速成(完整版)爬取各种网站数据实战案例Day7 - 06.无头浏览器+规避检测时候老师演示的代码,遇到一些问题及解决过程,供分享和指点
from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import ChromeOptions
# non visual interface
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
# avoid detection risks
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
driver = webdriver.Chrome(executable_path='./chromedriver.exe', chrome_options=chrome_options, options=option)
driver.get('https://www.baidu.***')
# get page source
print(driver.page_source)
sleep(2)
driver.quit()
由于刚开始使用的是seleniumV3.7报错TypeError: __init__() got an unexpected keyword argument 'options' ,作为初学者,比较疑惑,网上没有找到合适的解决办法,尝试将selenium升级到Version4.1.0,但是会有两个warning,
01: DeprecationWarning: executable_path has been deprecated, please pass in a Service object 发生于driver = webdriver.Chrome(executable_path='./chromedriver.exe')
解决方式
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# 创建一个Service对象,指定ChromeDriver的路径
service = Service('./chromedriver.exe')
# 通过Service对象来初始化Chrome WebDriver
driver = webdriver.Chrome(service=service)
02:DeprecationWarning: use options instead of chrome_options 发生于driver = webdriver.Chrome(service=service, chrome_options=chrome_options, options=option),
但是chrome_options和option都需要传入options,不知如何解决,但是最后尝试将无界面和反检测相应配置参数都传入Options对象,如下
from selenium import webdriver from selenium.webdriver.chrome.service import Service # 创建一个Service对象,指定ChromeDriver的路径 service = Service('./chromedriver.exe') # 通过Service对象来初始化Chrome WebDriver driver = webdriver.Chrome(service=service)
经过测试,后台运行和防止被检测均生效
最终代码
from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
chrome_options = Options()
# non visual interface
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
# avoid detection risks
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
# 创建一个Service对象,指定ChromeDriver的路径
service = Service('./chromedriver.exe')
# 通过Service对象来初始化Chrome WebDriver
driver = webdriver.Chrome(service=service, options=chrome_options)
driver.get('https://www.baidu.***')
print(driver.page_source)
sleep(2)
driver.quit()
期待指点...