在Python中進行數據爬蟲并將數據存儲起來,通常有以下幾種方法:
保存到文件:
csv
模塊將數據寫入CSV文件。import csv
data = [['Name', 'Age'], ['Alice', 25], ['Bob', 30]]
with open('output.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerows(data)
json
模塊將數據寫入JSON文件。import json
data = {'Name': 'Alice', 'Age': 25}
with open('output.json', 'w', encoding='utf-8') as file:
json.dump(data, file, ensure_ascii=False, indent=4)
data = 'Alice,25\nBob,30'
with open('output.txt', 'w', encoding='utf-8') as file:
file.write(data)
保存到數據庫:
sqlite3
模塊將數據存儲到SQLite數據庫。import sqlite3
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS users (name TEXT, age INTEGER)''')
data = [('Alice', 25), ('Bob', 30)]
cursor.executemany('INSERT INTO users VALUES (?, ?)', data)
conn.commit()
conn.close()
mysql-connector-python
或pymysql
模塊將數據存儲到MySQL數據庫。import mysql.connector
conn = mysql.connector.connect(
host='localhost',
user='yourusername',
password='yourpassword',
database='mydatabase'
)
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS users (name VARCHAR(255), age INT)''')
data = [('Alice', 25), ('Bob', 30)]
cursor.executemany('INSERT INTO users VALUES (%s, %s)', data)
conn.commit()
conn.close()
psycopg2
模塊將數據存儲到PostgreSQL數據庫。import psycopg2
conn = psycopg2.connect(
host='localhost',
user='yourusername',
password='yourpassword',
database='mydatabase'
)
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS users (name VARCHAR(255), age INT)''')
data = [('Alice', 25), ('Bob', 30)]
cursor.executemany('INSERT INTO users VALUES (%s, %s)', data)
conn.commit()
conn.close()
保存到文件系統:
import json
data = [{'Name': 'Alice', 'Age': 25}, {'Name': 'Bob', 'Age': 30}]
with open('output.jsonl', 'w', encoding='utf-8') as file:
for item in data:
file.write(json.dumps(item) + '\n')
pickle
模塊將數據序列化后保存到文件。import pickle
data = {'Name': 'Alice', 'Age': 25}
with open('output.pkl', 'wb') as file:
pickle.dump(data, file)
保存到緩存:
redis-py
模塊將數據存儲到Redis緩存。import redis
r = redis.Redis(host='localhost', port=6379, db=0)
data = {'Name': 'Alice', 'Age': 25}
r.set('user:1', json.dumps(data))
選擇哪種存儲方式取決于你的具體需求,例如數據量大小、是否需要快速訪問、是否需要跨系統共享等。