Vote count:
0
I'm trying to get a frequency table of words in linux, using shell commands:
I'd like to know how many times "?xml" occurs in the file.
so option1 is grep, sort, unique
and :
cat allWords.txt | grep "<?xml" | sort | uniq -c
1 Data=<?xml
12 'http://ift.tt/1AJdaVl'><bCard><?xml
1 <?xml?>
1099 <?xml
4 '<?xml'
3 '<?xml
which is correct.
option 2 is to use faster awk
:
awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" allWords.txt | grep "<?xml"
554 <?xml
6 'http://ift.tt/1AJdaVl'><bCard><?xml
3 '<?xml'
which incorrect for some words, but correct for others. awk could be just find for me if I could get it to work right.
So why there's a difference?
asked 19 secs ago
Word frequency table - different results
Aucun commentaire:
Enregistrer un commentaire