NLTK(Natural Language Toolkit)是一個用于自然語言處理的Python庫,可以用來轉換文本數據。以下是使用NLTK庫轉換文本的一些常見方法:
from nltk.tokenize import word_tokenize
text = "This is a sample sentence."
tokens = word_tokenize(text)
print(tokens)
from nltk import pos_tag
tokens = word_tokenize("This is a sample sentence.")
tags = pos_tag(tokens)
print(tags)
from nltk import ne_chunk
tokens = word_tokenize("Barack Obama was born in Hawaii.")
tags = pos_tag(tokens)
entities = ne_chunk(tags)
print(entities)
from nltk.stem import PorterStemmer, WordNetLemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
word = "running"
stemmed_word = stemmer.stem(word)
lemmatized_word = lemmatizer.lemmatize(word)
print(stemmed_word, lemmatized_word)
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
text = "This is a sample sentence."
tokens = word_tokenize(text)
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)
這些是NLTK庫中一些常用的文本轉換方法,可以根據具體的需求選擇合適的方法進行文本處理。