如何使用Python一步完成動態數據的爬取

發布時間：2021-10-09 16:15:55 來源：億速云閱讀：183 作者：柒染欄目：大數據

今天就跟大家聊聊有關如何使用Python一步完成動態數據的爬取，可能很多人都不太了解，為了讓大家更加了解，小編給大家總結了以下內容，希望大家根據這篇文章可以有所收獲。

前言

最近又到了寫畢業論文的季節了，有好多粉絲朋友私信我說老哥能不能幫我爬點數據讓我來寫論文，這時正好有位小女生正在打算買只小喵咪，于是老哥在全網搜索于是發現了下面的網站只好動動自己的小手，來完成這個艱巨的任務了，有喜歡爬蟲的同學，或有需要爬取數據的同學可以私聊老哥。

頁面分析

我們通過訪問一下地址：http://www.maomijiaoyi.com/index.php?/chanpinliebiao_pinzhong_38.html
這時我們可以看到一些喵咪的列表，但是通過F12觀看實際是返回的一個頁面，而不是我們常用的Json，此時我們還需要將返回的頁面打開才能獲取到具體喵咪的詳細信息，例如：價格、電話、年齡、品種、瀏覽次數等等。
如何使用Python一步完成動態數據的爬取
這時我們需要做的

解析返回的列表
將地區數據解析出來
請求喵咪的具體信息
解析返回的頁面
將數據保存csv文件

CSV 文件

啟動程序將會保存一下內容：
如何使用Python一步完成動態數據的爬取

代碼實現

1、導入依賴環境

`import requests # 返送請求 pip install requests` 
`import parsel # html頁面解析器 pip install  parsel` 
`import  csv # 文本保存`

2、獲取喵咪的列表

`url = "http://www.maomijiaoyi.com/index.php?/chanpinliebiao_pinzhong_37_"+str(i)+"--24.html"`
 `headers = {`
 `'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36'`
 `}`
 `data = requests.get(url=url, headers=headers).text`
 `selector = parsel.Selector(data)`
 `urls = selector.css('div .content:nth-child(1) a::attr(href)').getall()`

3、根據去獲取喵咪的具體數據

 `for s in regionAndURL:`
 `url = "http://www.maomijiaoyi.com" + s[0]`
 `address = s[1]`
 `data = requests.get(url=url, headers=headers).text`
 `selector = parsel.Selector(data)`
 `title = selector.css('.detail_text .title::text').get().strip()  ## 標簽`
 `price = selector.css('.info1 span:nth-child(2)::text').get().strip()  ## 價格`
 `viewsNum = selector.css('.info1 span:nth-child(4)::text  ').get()  ## 瀏覽次數`
 `commitment = selector.css('.info1 div:nth-child(2) span::text  ').get().replace("賣家承諾: ", "")  # 賣家承諾`
 `onlineOnly = selector.css('.info2 div:nth-child(1) .red::text  ').get()  # 在售只數`
 `variety = selector.css('.info2 div:nth-child(3) .red::text  ').get()  # 品種`
 `prevention = selector.css('.info2 div:nth-child(4) .red::text  ').get()  # 預防`
 `contactPerson = selector.css('.user_info div:nth-child(1) .c333::text  ').get()  # 聯系人姓名`
 `phone = selector.css('.user_info div:nth-child(2) .c333::text  ').get()  ## 電話`
 `shipping = selector.css('.user_info div:nth-child(3) .c333::text  ').get().strip()  # 運費`
 `purebred = selector.css('.item_neirong div:nth-child(1) .c333::text').get().strip()  # 是否純種`
 `quantityForSale = selector.css('.item_neirong div:nth-child(3) .c333::text').get().strip()  # 待售數量`
 `catSex = selector.css('.item_neirong div:nth-child(4) .c333::text').get().strip()  # 貓咪性別`
 `catAge = selector.css('div.xinxi_neirong .item:nth-child(2)  div:nth-child(2) .c333::text').get().strip()  # 貓咪年齡`
 `dewormingSituation = selector.css(`
 `'div.xinxi_neirong .item:nth-child(2)  div:nth-child(3) .c333::text').get().strip()  # 驅蟲情況`
 `canWatchCatsInVideo = selector.css(`
 `'div.xinxi_neirong .item:nth-child(2)  div:nth-child(4) .c333::text').get().strip()  # 可視頻看貓咪`

4、將數據保存為csv文件

`f = open('喵咪.csv', mode='a', encoding='utf-8', newline='')`
`csvHeader = csv.DictWriter(f,`
 `fieldnames=['地區', '標簽', '價格', '瀏覽次數', '賣家承諾', '在售只數', '地區', '品種', '預防', '聯系人姓名', '電話',`
 `'運費', '是否純種', '待售數量', '貓咪性別', '貓咪年齡', '驅蟲情況', '可視頻看貓咪', '詳情地址'])`
`#設置頭`
`csvHeader.writeheader()`
 `dis = {`
 `'地區': address,`
 `'標簽': title,`
 `'價格': price,`
 `'瀏覽次數': viewsNum,`
 `'賣家承諾': commitment,`
 `'在售只數': onlineOnly,`
 `'品種': variety,`
 `'預防': prevention,`
 `'聯系人姓名': contactPerson,`
 `'電話': phone,`
 `'運費': shipping,`
 `'是否純種': purebred,`
 `'待售數量': quantityForSale,`
 `'貓咪性別': catSex,`
 `'貓咪年齡': catAge,`
 `'驅蟲情況': dewormingSituation,`
 `'可視頻看貓咪': canWatchCatsInVideo,`
 `'詳情地址': url`
 `}`
 `csvHeader.writerow(dis)`

看完上述內容，你們對如何使用Python一步完成動態數據的爬取有進一步的了解嗎？如果還想了解更多知識或者相關內容，請關注億速云行業資訊頻道，感謝大家的支持。

向AI問一下細節

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

如何使用Python一步完成動態數據的爬取

前言

頁面分析

CSV 文件

代碼實現

猜你喜歡

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

如何使用Python一步完成動態數據的爬取

前言

頁面分析

CSV 文件

代碼實現

猜你喜歡

最新資訊

相關推薦

相關標簽