# PythonSpider **Repository Path**: shiya_liu/PythonSpider ## Basic Information - **Project Name**: PythonSpider - **Description**: Python爬虫:涉及发送请求、POST请求、多级页面、代理IP、多线程、动态加载、数据存储、Selenium、Scrapy框架、分布式、机器视觉并配有文档解释以及源码实例 如果感觉还不错的话 请star ⭐️~ - **Primary Language**: Python - **License**: GPL-2.0 - **Default Branch**: master - **Homepage**: https://gitee.com/shiya_liu/PythonSpider - **GVP Project**: No ## Statistics - **Stars**: 6 - **Forks**: 1 - **Created**: 2023-05-08 - **Last Updated**: 2025-09-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: Python, Spider, 文档, 源码实例 ## README # PythonSpider [![Python Version](https://img.shields.io/badge/python-3.8+-green)](https://www.python.org) 微信 CSDN [![CI status](https://github.com/smicallef/spiderfoot/workflows/Tests/badge.svg)](https://github.com/LiuShiYa-github/PythonSpider/actions) ![img.png](Image/img.png) ## 声明 * 此repo是纪录学习Python爬虫阶段的代码与笔记,学习视频来源于网络 * 代码、教程**仅限于学习交流,请勿用于任何商业用途!** ## 知识点
👉查看涉及的知识点 **第一章** ```text 01 网络爬虫概述 02 urllib.request原理以及使用 03 正则表达式re使用 ``` **第二章** ```text 01 数据持久化存储-csv 02 数据持久化存储-MySQL 03 数据持久化存储-MongoDB 04 requests模块 05 增量爬虫-基于MySQL及Redis实现 ``` **第三章** ```text 01 爬虫-图片抓取 02 xpath语法解析 03 lxml+xpath解析提取数据 ``` **第四章** ```text 01 requests模块高级使用 02 代理ip使用 03 POST请求数据抓取 ``` **第五章** ```text 01 动态加载数据爬取 02 JSON解析模块及全站抓取 03 多线程爬虫 04 多级页面多线程爬取 05 Cookie模拟登录 ``` **第六章** ```text 01 Selenium+PhantomJS Chrome Firefox 02 Selenium常用方法 03 Selenium高级操作 ``` **第七章** ```text 01 Scrapy框架原理 02 Scrapy配置文件解析 03 中间件 04 Scrapy处理POST请求 05 Scrapy之图片管道 06 Scrapy之文件管道 ``` **第八章** ```text 01 Scrapy之分布式爬虫原理 02 Scrapy之分布式爬虫实现 03 机器视觉与tesseract 04 移动端数据抓取 ```
## 实例
👉查看实例 * [抓取贴吧HTML](https://gitee.com/shiya_liu/PythonSpider/blob/master/01%E7%AC%AC%E4%B8%80%E7%AB%A0%EF%BC%9A%E7%88%AC%E8%99%AB%E6%A6%82%E8%BF%B0+urllib+re/TiebaSpider.py "悬停显示") * [猫眼经典电影-保存为CSV-单行保存](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieCSVWriterow.py "悬停显示") * [猫眼经典电影-保存为CSV-多行保存](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieCSVWriterows.py "悬停显示") * [猫眼电影经典影片-存储到MySQL](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieMysql.py "悬停显示") * [猫眼电影经典影片-存储到MongoDB](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieMongoDB.py "悬停显示") * [汽车之家基于Redis实现增量爬虫](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/CarHomeSpiderIncrementalRedis.py "悬停显示") * [汽车之家Mysql实现增量爬虫](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/CarHomeSpiderMysqlIncre.py "悬停显示") * [图片抓取-爬取wallhaven.cc](https://gitee.com/shiya_liu/PythonSpider/blob/master/03%E7%AC%AC%E4%B8%89%E7%AB%A0%EF%BC%9Alxml+xpath/SpiderWallhavenSelenimu.py "悬停显示") * [基于xpath抓取链家二手房源](https://gitee.com/shiya_liu/PythonSpider/blob/master/03%E7%AC%AC%E4%B8%89%E7%AB%A0%EF%BC%9Alxml+xpath/LianHomeSpider.py "悬停显示") * [requests.post请求有道翻译结果抓取](https://gitee.com/shiya_liu/PythonSpider/blob/master/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9Arequests%E7%9A%84%E9%AB%98%E7%BA%A7%E4%BD%BF%E7%94%A8/PostYoudaoTranslate.py "悬停显示") * [requests.proxies抓取飞度代理的免费高匿代理并测试可用性](https://gitee.com/shiya_liu/PythonSpider/blob/master/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9Arequests%E7%9A%84%E9%AB%98%E7%BA%A7%E4%BD%BF%E7%94%A8/ProxyIpPool.py "悬停显示") * [汽车之家数据抓取-两级页面](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/CarHomeSpider.py "悬停显示") * [抓取动态加载JSON格式-豆瓣剧情电影排行榜](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/DoubanPlotSpider.py "悬停显示") * [抓取动态加载JSON格式-豆瓣全站的电影](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/DoubanPlotStorageJsonSpider.py "悬停显示") * [多线程抓取动态加载JSON格式-华为应用市场社交类app](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/HuaweiAppMultithreading.py "悬停显示") * [多线程抓取动态加载JSON格式抓取腾讯招聘](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/MultilevelPageMultithreading.py "悬停显示") * [selenium无头浏览器方式获取京东商城爬虫类的图书](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/JdSeleniumOptionsSpider.py "悬停显示") * [使用selenium模拟登录QQ邮箱](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/SeleniumLoginQQmail.py "悬停显示") * [selenium抓取网易云音乐排行榜](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/SeleniumWangyiyunMusic.py "悬停显示") * [使用selenium抓取最新行政区化代码](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/mzbSelniumSpider.py "悬停显示") * [Scrapy中间件-随机User-Agent-代理IP地址-抓取二手车之家](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Che168-middlewares "悬停显示") * [Scrapy多级页面抓取-二手车之家](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Che168 "悬停显示") * [Scrapy数据持久化-抓取瓜子二手车](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Guazi "悬停显示") * [Scrapy一次发送所有队列URL-抓取瓜子二手车](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Guazi2 "悬停显示") * [Scrapy抓取文件处理-盗墓笔记全系列](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Daomu "悬停显示") * [ScrapyPOST抓取-肯德基门店](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/KFC "悬停显示") * [Scrapy三级以上页面抓取-PPT模板](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/PPT "悬停显示") * [Scrapy抓取图片-360浏览器美眉图片抓取](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/SO "悬停显示") * [Scrapy分布式爬虫-腾讯招聘](https://gitee.com/shiya_liu/PythonSpider/tree/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/Tencent "悬停显示") * [移动端数据抓取-有道翻译](https://gitee.com/shiya_liu/PythonSpider/blob/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/Mobilephone_youdao.py "悬停显示") * [豆瓣滑块验证码](https://gitee.com/shiya_liu/PythonSpider/blob/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/DoubanLimitSlider.py "悬停显示") * [pytesseract识别图片](https://gitee.com/shiya_liu/PythonSpider/blob/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/Pytesseract.py "悬停显示") * [抓取bilibili舞蹈区top100](https://gitee.com/shiya_liu/PythonSpider/blob/master/09%E5%AE%9E%E6%88%98/BilibiliDanceTop100.py) * [拉勾网职位信息](https://gitee.com/shiya_liu/PythonSpider/blob/master/09%E5%AE%9E%E6%88%98/lagou.py) * [互联网岗位信息分析](https://gitee.com/shiya_liu/PythonSpider/blob/master/10Flask%E6%95%B0%E6%8D%AE%E5%8F%AF%E8%A7%86%E5%8C%96/LagouFlask.py) * [微博热搜top20展示](https://gitee.com/shiya_liu/PythonSpider/blob/master/10Flask%E6%95%B0%E6%8D%AE%E5%8F%AF%E8%A7%86%E5%8C%96/WeiboFlask.py) * [微博热搜](https://gitee.com/shiya_liu/PythonSpider/blob/master/09%E5%AE%9E%E6%88%98/WebHotSearch.py) * [猫眼电影类型展示](https://gitee.com/shiya_liu/PythonSpider/blob/master/10Flask%E6%95%B0%E6%8D%AE%E5%8F%AF%E8%A7%86%E5%8C%96/MaoyanFilmFlask.py)