要用Python爬取文本內容并保存,可以按照以下步驟進行:
requests
庫,用于發送HTTP請求獲取網頁內容;導入BeautifulSoup
庫,用于解析網頁內容。import requests
from bs4 import BeautifulSoup
requests
庫的get
方法發送GET請求,并通過text
屬性獲取網頁內容。url = '要爬取的網頁URL'
response = requests.get(url)
html = response.text
BeautifulSoup
庫解析網頁內容,并提取所需的文本信息。soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
open
函數打開一個文件,然后使用write
方法寫入內容。with open('保存的文件路徑', 'w', encoding='utf-8') as file:
file.write(text)
完整代碼示例:
import requests
from bs4 import BeautifulSoup
url = '要爬取的網頁URL'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
with open('保存的文件路徑', 'w', encoding='utf-8') as file:
file.write(text)
請將代碼中的要爬取的網頁URL
替換為你需要爬取的網頁的URL,保存的文件路徑
替換為你希望保存的文件路徑。