超星多线程学习通小助手2310版-python 分析第二期-编程学习网

效果图和联系方式，请“转至末尾”

软件是免费的，在分析一期末尾下载

今天我们接着来，继续分析这个代码含义

正文内容：

任务1：用户登录，并合并cookie

def step_1():    sign_sus = False    while sign_sus == False:        os.system("cls")        uname = input("请输入您的手机号:")        password = input("请输入您的密码(已自动隐藏,请放心输入):")        sign_in_rsp = sign_in(uname, password)        sign_in_json = sign_in_rsp.json()        if sign_in_json['status'] == False:            print(sign_in_json.get('msg2'), "\n\n请按回车重新键入账号数据")            input()        else:            sign_sus = True            print("登陆成功，正在处理您的数据...")            for t in range(random.randint(5, 10)):                print('        [\] \r', end='')                time.sleep(0.1)                print('        [|]\r', end='')                time.sleep(0.1)                print('        [/]\r', end='')                time.sleep(0.1)                print('        [-]\r', end='')                time.sleep(0.1)            print('        [OK]\r', end='')            print("加载完成")            time.sleep(1)    global cookieStr, uid, global_headers    uid = sign_in_rsp.cookies['_uid']    cookieStr = ''    for item in sign_in_rsp.cookies:        cookieStr = cookieStr + item.name + '=' + item.value + ';'    global_headers = {        'Cookie': cookieStr,        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36 Edg/85.0.564.51'    }

注释：

这个任务一，没有什么值得分析的，我个人是往里面加入了一点花哨的动画，暂且跳过吧

任务2：课程读取，并输出课程信息

def step_2():    class_url = "******"    class_rsp = requests.get(url=class_url, headers=global_headers)    if class_rsp.status_code == 200:        class_HTML = etree.HTML(class_rsp.text)        os.system("cls")        print("{:-^50s}".format("您当前已开启的课程如下\n"))        i = 0        global course_dict        course_dict = {}        for class_item in class_HTML.xpath("/html/body/div/div[2]/div[3]/ul/li[@class='courseItem curFile']"):            try:                class_item_name = class_item.xpath("./div[2]/h3/a/@title")[0]                # 等待开课的课程由于尚未对应链接，所以缺少a标签。                i += 1                print(class_item_name)                course_dict[i] = [class_item_name, "https://mooc1-2.chaoxing.com{}".format(class_item.xpath("./div[1]/a[1]/@href")[0])]            except:                pass          print("———————————————————————————————————")    else:        print("课程处理失败，请联系作者zhouwangxu@vip.qq.com")

注释：

这段代码是一个课程读取并输出课程信息的函数。首先，程序会发送一个GET请求到指定的URL*******，获取该页面的HTML内容。如果请求成功，程序会解析HTML内容，提取出所有课程的名称和对应的链接，并将它们保存到一个字典中。最后，程序会遍历这个字典，输出每个课程的名称和链接。

具体来说，程序首先使用`requests.get()`方法发送一个GET请求到指定的URL，并指定全局变量`global_headers`作为请求头。如果请求成功，程序会将响应的文本内容传递给`etree.HTML()`方法进行解析，得到一个表示HTML文档的对象`class_HTML`。接着，程序会使用XPath表达式`"//html/body/div/div[2]/div[3]/ul/li[@class='courseItem curFile']"`来提取所有课程的名称和链接。其中，`"//html/body/div/div[2]/div[3]/ul/li[@class='courseItem curFile']"`表示从HTML文档中选取所有class属性为'courseItem curFile'的li元素，也就是当前已开启的课程列表。对于每个课程，程序会尝试提取其名称和链接，并将它们保存到一个字典`course_dict`中。最后，程序会遍历这个字典，输出每个课程的名称和链接。

需要注意的是，由于某些课程可能尚未开课，因此它们的HTML结构可能与已开课的课程略有不同。在这种情况下，程序需要使用try-except语句来捕获异常，避免出现错误。

获取url重定向后的新地址与cpi

def url_302(oldUrl: str):    # 302跳转，requests库默认追踪headers里的location进行跳转，使用allow_redirects=False    course_302_rsp = requests.get(url=oldUrl, headers=global_headers, allow_redirects=False)    new_url = course_302_rsp.headers.get("Location")    if new_url == None:        new_url = oldUrl    result = parse.urlparse(new_url)    new_url_data = parse.parse_qs(result.query)    try:        cpi = new_url_data.get("cpi")[0]    except:        print("fail to get cpi")        cpi = None    return {"new_url": new_url, "cpi": cpi}

注释：

这段代码定义了一个名为`url_302`的函数，该函数接受一个字符串类型的参数`oldUrl`，表示旧的URL地址。

函数内部首先使用`requests.get()`方法发送一个GET请求到`oldUrl`，并将响应结果保存在`course_302_rsp`变量中。其中，`headers=global_headers`表示请求头信息为全局变量`global_headers`，`allow_redirects=False`表示禁止自动重定向。

接下来，从响应头中获取Location字段的值，并将其赋值给`new_url`变量。如果`new_url`为空，则将`oldUrl`赋值给`new_url`。

然后，使用`parse.urlparse()`方法解析`new_url`，并将结果保存在`result`变量中。接着，使用`parse.parse_qs()`方法解析`result.query`，并将结果保存在`new_url_data`变量中。

最后，尝试从`new_url_data`中获取cpi值，并将其保存在`cpi`变量中。如果获取失败，则打印错误信息并将`cpi`设为None。最终，返回一个包含新旧URL和cpi值的字典对象。

获取所有课程信息

def course_get(url: str):    course_headers = {        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',        'Accept-Encoding': 'gzip, deflate, br',        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',        'Connection': 'keep-alive',        'Cookie': cookieStr,        'Host': 'mooc1-2.chaoxing.com',        'Sec-Fetch-Dest': 'document',        'Sec-Fetch-Mode': 'navigate',        'Sec-Fetch-Site': 'none',        'Upgrade-Insecure-Requests': '1',        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36 Edg/85.0.564.51'    }    course_rsp = requests.get(url=url, headers=course_headers)    course_HTML = etree.HTML(course_rsp.text)    return course_HTML

注释：

这也是一个请求头文件，看看就行，没什么需要解释的

递归读取章节

def recursive_course(course_unit_list, chapter_mission, level):    for course_unit in course_unit_list:        h3_list = course_unit.xpath("./h3")        for h3_item in h3_list:            chapter_status = __list_get(h3_item.xpath("./a/span[@class='icon']/em/@class"))            if chapter_status == "orange":                print("--" * level, __list_get(h3_item.xpath("./a/span[@class='articlename']/@title")), "      ", __list_get(h3_item.xpath("./a/span[@class='icon']/em/text()")))                chapter_mission.append("*********{}".format(__list_get(h3_item.xpath("./a/@href"))))            else:                print("--" * level, __list_get(h3_item.xpath("./a/span[@class='articlename']/@title")), "      ", chapter_status)        chapter_item_list = course_unit.xpath("./div")        if chapter_item_list:            recursive_course(chapter_item_list, chapter_mission, level + 1)

thread讲url存储为队列

def createQueue(urls):    urlQueue = Queue()    for url in urls:        urlQueue.put(url)    return urlQueueclass spiderThread(threading.Thread):    def __init__(self, threadName, urlQueue, cpi):        super(spiderThread, self).__init__()        self.threadName = threadName        self.urlQueue = urlQueue        self.cpi = cpi    def run(self):        while True:            if self.urlQueue.empty():                break            chapter = self.urlQueue.get()            deal_misson([chapter], self.cpi, 0)            time.sleep(0.2)def createThread(threadCount, urlQueue, cpi):    threadQueue = []    for i in range(threadCount):        spiderThreading = spiderThread("threading_{}".format(i), urlQueue=urlQueue, cpi=cpi)  # 循环创建多个线程，并将队列传入        threadQueue.append(spiderThreading)  # 将线程放入线程池    return threadQueue

注释：

这段代码定义了两个函数和一个类。

第一个函数`createQueue(urls)`的作用是将一组URL存储到一个队列中，方便后续的处理和访问。该函数接受一个URL列表作为参数，创建一个空的队列对象`urlQueue`，然后遍历URL列表，将每个URL添加到队列中。最后返回这个队列对象。

第二个函数`createThread(threadCount, urlQueue, cpi)`的作用是创建多个线程来处理URL队列中的URL。该函数接受三个参数：`threadCount`表示需要创建的线程数量，`urlQueue`表示URL队列对象，`cpi`表示当前页面的CPI值。函数内部首先创建一个空的线程队列`threadQueue`，然后循环创建多个线程，并将队列传入。每个线程都是`spiderThread`类的实例，其中`__init__()`方法用于初始化线程的属性，`run()`方法用于执行线程的主要逻辑。在`run()`方法中，如果URL队列为空，则退出循环；否则从队列中取出一个URL，调用`deal_misson()`函数进行处理，并休眠0.2秒后继续下一次循环。最后将所有创建的线程放入线程池中，并返回线程队列。

最后一个类`spiderThread`继承自`threading.Thread`类，表示一个爬虫线程。该类包含一个构造方法和一个`run()`方法。构造方法用于初始化线程的名称、URL队列和CPI值等属性。`run()`方法用于执行线程的主要逻辑，即不断从URL队列中取出URL进行处理，直到队列为空为止。

这次的分享就暂且到这吧，下面是测试成功视频

这是个视频文件，同时运行了四个账号，并且每个账号都是多线程进行的“学习”

原视频跳转

交流方式

有问题请联系QQ1219235650（小号）

zhouwangxu@vip.qq.com

来源地址：https://blog.csdn.net/2301_77423777/article/details/133745687

文章详情

超星多线程学习通小助手2310版-python 分析第二期

效果图和联系方式，请“转至末尾”

软件是免费的，在分析一期末尾下载

正文内容：

任务1：用户登录，并合并cookie

任务2：课程读取，并输出课程信息

获取url重定向后的新地址与cpi

获取所有课程信息

递归读取章节

thread讲url存储为队列

这次的分享就暂且到这吧，下面是测试成功视频

交流方式

有问题请联系QQ1219235650（小号）

zhouwangxu@vip.qq.com

软考中级精品资料免费领

相关文章

猜你喜欢

超星多线程学习通小助手2310版-python 分析第二期