Vote count:
0
I am attempting to read a text file, strip the punctuation, make everything lower case, then print the total number of words, the total number of unique words (meaning that "a" for instance, if it is in the text 20 times, will only be counted once), then print the most frequently occuring words along with their frequency (i.e. a:20).
I realize there are similar questions on StackOverflow, but I am a beginner and am trying to solve this problem using a minimal number of imports, and was wondering if there is a way to code this and not import something like Collections.
I have my code below, but I do not understand why I am not getting the answer that I need. This code is printing the entirety of the text file (with each word on a new line, and all punctuation removed), then printing:
e 1
n 1
N 1
o 1
Which, I assume, is "None" split into characters with their frequency. Why is my code giving me this answer and what can I do to change it?
Code below:
file=open("C:\\Users\\Documents\\AllSonnets.txt", "r")
def strip_sonnets():
import string
new_file=file.read().split()
for words in new_file:
data=words.translate(string.punctuation)
data=data.lower()
data=data.strip(".")
data=data.strip(",")
data=data.strip("?")
data=data.strip(";")
data=data.strip("!")
data=data.replace("'","")
data=data.replace('"',"")
data=data.strip(":")
print(data)
new_file=strip_sonnets()
new_file=str(new_file)
count={}
for w in new_file:
if w in count:
count[w] += 1
else:
count[w] = 1
for word, times in count.items():
print (word, times)
Counting number of words and unique words from txt file- Python
Aucun commentaire:
Enregistrer un commentaire