# PythonSpider
**Repository Path**: shiya_liu/PythonSpider
## Basic Information
- **Project Name**: PythonSpider
- **Description**: Python爬虫:涉及发送请求、POST请求、多级页面、代理IP、多线程、动态加载、数据存储、Selenium、Scrapy框架、分布式、机器视觉并配有文档解释以及源码实例
如果感觉还不错的话 请star ⭐️~
- **Primary Language**: Python
- **License**: GPL-2.0
- **Default Branch**: master
- **Homepage**: https://gitee.com/shiya_liu/PythonSpider
- **GVP Project**: No
## Statistics
- **Stars**: 6
- **Forks**: 1
- **Created**: 2023-05-08
- **Last Updated**: 2025-09-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: Python, Spider, 文档, 源码实例
## README
# PythonSpider
[](https://www.python.org)
[](https://github.com/LiuShiYa-github/PythonSpider/actions)

## 声明
* 此repo是纪录学习Python爬虫阶段的代码与笔记,学习视频来源于网络
* 代码、教程**仅限于学习交流,请勿用于任何商业用途!**
## 知识点
👉查看涉及的知识点
**第一章**
```text
01 网络爬虫概述
02 urllib.request原理以及使用
03 正则表达式re使用
```
**第二章**
```text
01 数据持久化存储-csv
02 数据持久化存储-MySQL
03 数据持久化存储-MongoDB
04 requests模块
05 增量爬虫-基于MySQL及Redis实现
```
**第三章**
```text
01 爬虫-图片抓取
02 xpath语法解析
03 lxml+xpath解析提取数据
```
**第四章**
```text
01 requests模块高级使用
02 代理ip使用
03 POST请求数据抓取
```
**第五章**
```text
01 动态加载数据爬取
02 JSON解析模块及全站抓取
03 多线程爬虫
04 多级页面多线程爬取
05 Cookie模拟登录
```
**第六章**
```text
01 Selenium+PhantomJS Chrome Firefox
02 Selenium常用方法
03 Selenium高级操作
```
**第七章**
```text
01 Scrapy框架原理
02 Scrapy配置文件解析
03 中间件
04 Scrapy处理POST请求
05 Scrapy之图片管道
06 Scrapy之文件管道
```
**第八章**
```text
01 Scrapy之分布式爬虫原理
02 Scrapy之分布式爬虫实现
03 机器视觉与tesseract
04 移动端数据抓取
```
## 实例
👉查看实例
* [抓取贴吧HTML](https://gitee.com/shiya_liu/PythonSpider/blob/master/01%E7%AC%AC%E4%B8%80%E7%AB%A0%EF%BC%9A%E7%88%AC%E8%99%AB%E6%A6%82%E8%BF%B0+urllib+re/TiebaSpider.py "悬停显示")
* [猫眼经典电影-保存为CSV-单行保存](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieCSVWriterow.py "悬停显示")
* [猫眼经典电影-保存为CSV-多行保存](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieCSVWriterows.py "悬停显示")
* [猫眼电影经典影片-存储到MySQL](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieMysql.py "悬停显示")
* [猫眼电影经典影片-存储到MongoDB](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/MaoyanClassicMovieMongoDB.py "悬停显示")
* [汽车之家基于Redis实现增量爬虫](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/CarHomeSpiderIncrementalRedis.py "悬停显示")
* [汽车之家Mysql实现增量爬虫](https://gitee.com/shiya_liu/PythonSpider/blob/master/02%E7%AC%AC%E4%BA%8C%E7%AB%A0%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%8C%81%E4%B9%85%E5%8C%96+requests/CarHomeSpiderMysqlIncre.py "悬停显示")
* [图片抓取-爬取wallhaven.cc](https://gitee.com/shiya_liu/PythonSpider/blob/master/03%E7%AC%AC%E4%B8%89%E7%AB%A0%EF%BC%9Alxml+xpath/SpiderWallhavenSelenimu.py "悬停显示")
* [基于xpath抓取链家二手房源](https://gitee.com/shiya_liu/PythonSpider/blob/master/03%E7%AC%AC%E4%B8%89%E7%AB%A0%EF%BC%9Alxml+xpath/LianHomeSpider.py "悬停显示")
* [requests.post请求有道翻译结果抓取](https://gitee.com/shiya_liu/PythonSpider/blob/master/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9Arequests%E7%9A%84%E9%AB%98%E7%BA%A7%E4%BD%BF%E7%94%A8/PostYoudaoTranslate.py "悬停显示")
* [requests.proxies抓取飞度代理的免费高匿代理并测试可用性](https://gitee.com/shiya_liu/PythonSpider/blob/master/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9Arequests%E7%9A%84%E9%AB%98%E7%BA%A7%E4%BD%BF%E7%94%A8/ProxyIpPool.py "悬停显示")
* [汽车之家数据抓取-两级页面](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/CarHomeSpider.py "悬停显示")
* [抓取动态加载JSON格式-豆瓣剧情电影排行榜](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/DoubanPlotSpider.py "悬停显示")
* [抓取动态加载JSON格式-豆瓣全站的电影](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/DoubanPlotStorageJsonSpider.py "悬停显示")
* [多线程抓取动态加载JSON格式-华为应用市场社交类app](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/HuaweiAppMultithreading.py "悬停显示")
* [多线程抓取动态加载JSON格式抓取腾讯招聘](https://gitee.com/shiya_liu/PythonSpider/blob/master/05%E7%AC%AC%E4%BA%94%E7%AB%A0%EF%BC%9A%E5%A4%9A%E7%BA%A7%E9%A1%B5%E9%9D%A2+%E5%A4%9A%E7%BA%BF%E7%A8%8B+Cookie%E7%99%BB%E5%BD%95/MultilevelPageMultithreading.py "悬停显示")
* [selenium无头浏览器方式获取京东商城爬虫类的图书](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/JdSeleniumOptionsSpider.py "悬停显示")
* [使用selenium模拟登录QQ邮箱](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/SeleniumLoginQQmail.py "悬停显示")
* [selenium抓取网易云音乐排行榜](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/SeleniumWangyiyunMusic.py "悬停显示")
* [使用selenium抓取最新行政区化代码](https://gitee.com/shiya_liu/PythonSpider/blob/master/06%E7%AC%AC%E5%85%AD%E7%AB%A0%EF%BC%9ASelenium+PhantomJS+Chrome+Firefox/mzbSelniumSpider.py "悬停显示")
* [Scrapy中间件-随机User-Agent-代理IP地址-抓取二手车之家](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Che168-middlewares "悬停显示")
* [Scrapy多级页面抓取-二手车之家](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Che168 "悬停显示")
* [Scrapy数据持久化-抓取瓜子二手车](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Guazi "悬停显示")
* [Scrapy一次发送所有队列URL-抓取瓜子二手车](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Guazi2 "悬停显示")
* [Scrapy抓取文件处理-盗墓笔记全系列](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/Daomu "悬停显示")
* [ScrapyPOST抓取-肯德基门店](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/KFC "悬停显示")
* [Scrapy三级以上页面抓取-PPT模板](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/PPT "悬停显示")
* [Scrapy抓取图片-360浏览器美眉图片抓取](https://gitee.com/shiya_liu/PythonSpider/tree/master/07%E7%AC%AC%E4%B8%83%E7%AB%A0%EF%BC%9AScrapy%E6%A1%86%E6%9E%B6+%E4%B8%AD%E9%97%B4%E4%BB%B6/SO "悬停显示")
* [Scrapy分布式爬虫-腾讯招聘](https://gitee.com/shiya_liu/PythonSpider/tree/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/Tencent "悬停显示")
* [移动端数据抓取-有道翻译](https://gitee.com/shiya_liu/PythonSpider/blob/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/Mobilephone_youdao.py "悬停显示")
* [豆瓣滑块验证码](https://gitee.com/shiya_liu/PythonSpider/blob/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/DoubanLimitSlider.py "悬停显示")
* [pytesseract识别图片](https://gitee.com/shiya_liu/PythonSpider/blob/master/08%E7%AC%AC%E5%85%AB%E7%AB%A0%EF%BC%9A%E5%88%86%E5%B8%83%E5%BC%8F+%E6%BB%91%E5%9D%97+%E7%A7%BB%E5%8A%A8%E7%AB%AF/Pytesseract.py "悬停显示")
* [抓取bilibili舞蹈区top100](https://gitee.com/shiya_liu/PythonSpider/blob/master/09%E5%AE%9E%E6%88%98/BilibiliDanceTop100.py)
* [拉勾网职位信息](https://gitee.com/shiya_liu/PythonSpider/blob/master/09%E5%AE%9E%E6%88%98/lagou.py)
* [互联网岗位信息分析](https://gitee.com/shiya_liu/PythonSpider/blob/master/10Flask%E6%95%B0%E6%8D%AE%E5%8F%AF%E8%A7%86%E5%8C%96/LagouFlask.py)
* [微博热搜top20展示](https://gitee.com/shiya_liu/PythonSpider/blob/master/10Flask%E6%95%B0%E6%8D%AE%E5%8F%AF%E8%A7%86%E5%8C%96/WeiboFlask.py)
* [微博热搜](https://gitee.com/shiya_liu/PythonSpider/blob/master/09%E5%AE%9E%E6%88%98/WebHotSearch.py)
* [猫眼电影类型展示](https://gitee.com/shiya_liu/PythonSpider/blob/master/10Flask%E6%95%B0%E6%8D%AE%E5%8F%AF%E8%A7%86%E5%8C%96/MaoyanFilmFlask.py)