亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

Python爬取豆瓣高分電影前250名

發布時間:2020-07-23 21:59:21 來源:網絡 閱讀:369 作者:莫渺1996 欄目:編程語言
import requests
import pymysql
import time
import re
import xlwt
from lxml import etree

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36',
          'Cookie': 'gr_user_id = c6f58a39 - ea25 - 4f58 - b448 - 545070192c4e;59a81cc7d8c04307ba183d331c373ef6_gr_session_id = e8e4b66f - 440a - 4ae7 - a76a - fe2dd2b34a26;59a81cc7d8c04307ba183d331c373ef6_gr_last_sent_sid_with_cs1 = e8e4b66f - 440a - 4ae7 - a76a - fe2dd2b34a26;59a81cc7d8c04307ba183d331c373ef6_gr_last_sent_cs1 = N % 2FA;59a81cc7d8c04307ba183d331c373ef6_gr_session_id_e8e4b66f - 440a - 4ae7 - a76a - fe2dd2b34a26 = true;grwng_uid = 9ec14ad9 - 5ac0 - 4bb1 - 81c1 - bc60d2685710;abtest_ABTest4SearchDate = b;xzuuid = 79426b52;_uab_collina = 154660443606130958890473;TY_SESSION_ID = 907f32df - c060 - 49ca - b945 - 98215cc03475;rule_math = pvzq3r06hi'}

conn = pymysql.connect(host= 'localhost',user= 'root',passwd='momiao5201314',db='doubanmovie',port=3306,charset='utf8')
cursor = conn.cursor() #創建光標對象

'''
# 創建一個workbook設置編碼
workbook = xlwt.Workbook(encoding = 'utf-8')
# 創建一個worksheet
worksheet = workbook.add_sheet('My Worksheet')
#定義表頭
header = ['movie_name','director','actors,style','country','release_time','time','score']
for h in range(len(header)):
    workbook.write(0,h,header[h])
'''

def get_movie_url(url):
    html = requests.get(url,headers=headers)
    selector = etree.HTML(html.text)
    movie_urls = selector.xpath('//div[@class="hd"]/a/@href')
    for movie_url in movie_urls:
        #print(movie_url)
        get_movie_info(movie_url)

def get_movie_info(url):
    html = requests.get(url,headers=headers)
    selector = etree.HTML(html.text)
    try:
        movie_name = selector.xpath('//*[@id="content"]/h2/span[1]/text()')  #1電影名稱
        #print(movie_name)
        director = selector.xpath('//*[@id="info"]/span[1]/span[2]/a/text()') #2導演
        #print(director)
        actors = selector.xpath('//*[@id="info"]/span[3]/span[2]')[0]  #Xpath疑問?
        actor = actors.xpath('string(.)')  #3演員
        #print(actor)
        style = re.findall('<span property="v:genre">(.*?)</span>',html.text,re.S)[0] + re.findall('<span property="v:genre">(.*?)</span>',html.text,re.S)[1]  #4類型
        #print(style)
        country = re.findall('<span class="pl">制片國家/地區:</span>(.*?)<br/>',html.text,re.S) #5制片地區
        #print(country)
        release_time = re.findall('上映日期:</span>.*?>(.*?)</span>',html.text,re.S) #6上映時間
        #print(release_time)
        time = re.findall('<span class="pl">片長:</span>.*?>(.*?)</span>',html.text,re.S) #7片長
        #print(time)
        score = selector.xpath('//*[@id="interest_sectl"]/div[1]/div[2]/strong/text()') #8評分
        #print(score)
        print(str(movie_name))
        #sql = 'insert into doubanmovie(name,director,actor,style,country,release_time,time,score,) values("{}","{}","{}","{}","{}","{}","{}","{}")'.format(movie_name,director,actor,style,country,release_time,time,score) #多一個逗號
        cursor.execute("insert into doubanmovie(name,director,actor,style,country,release_time,time,score) values(%s,%s,%s,%s,%s,%s,%s,%s)",(str(movie_name),str(director),str(actor),str(style),str(country),str(release_time),str(time),str(score)))
    except IndexError:
        pass

if __name__ == '__main__':
    urls = ['https://movie.douban.com/top250?start={}&filter='.format(num)for num in range(0,250,25)]
    for url in urls:
        get_movie_url(url)
        time.sleep(2)
    conn.commit()

Python爬取豆瓣高分電影前250名

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

多伦县| 莒南县| 息烽县| 仲巴县| 永春县| 闽侯县| 香港| 会理县| 宁波市| 布拖县| 鞍山市| 高密市| 南汇区| 双流县| 筠连县| 沁源县| 九台市| 进贤县| 东明县| 临海市| 梁山县| 宁晋县| 积石山| 格尔木市| 辽阳县| 竹溪县| 兴化市| 北票市| 贺兰县| 阳西县| 贵溪市| 阿拉尔市| 华容县| 绥中县| 瑞丽市| 将乐县| 贵德县| 政和县| 衡东县| 库车县| 姜堰市|