python - tìm sự xuất hiện của từ trong một tệp

Tôi đang cố tìm số lượng từ đã xảy ra trong một tệp. Tôi có một tập tin văn bản (TEST.txt) nội dung của tập tin như sau:python - tìm sự xuất hiện của từ trong một tệp

ashwin programmer india 
amith programmer india

Kết quả tôi mong đợi là:

{ 'ashwin':1, 'programmer ':2,'india':2, 'amith ':1}

Mã Tôi đang sử dụng là:

for line in open(TEST.txt,'r'): 
    word = Counter(line.split()) 
    print word

Kết quả tôi nhận được là:

Counter({'ashwin': 1, 'programmer': 1,'india':1}) 
Counter({'amith': 1, 'programmer': 1,'india':1})

Có ai giúp tôi không? Cảm ơn trước .

Nguồn

2013-02-26 Ashwin

Sử dụng update phương pháp Counter. Ví dụ:

from collections import Counter 

data = '''\ 
ashwin programmer india 
amith programmer india''' 

c = Counter() 
for line in data.splitlines(): 
    c.update(line.split()) 
print(c)

Output:

Counter({'india': 2, 'programmer': 2, 'amith': 1, 'ashwin': 1})

Nguồn

2013-02-26 06:59:37

+1 Chỉ cần những gì tôi đã đi để đăng bài - điều này làm cho việc sử dụng tốt đẹp của phương pháp 'Counter.update' chuyên ngành và không yêu cầu đọc toàn bộ tập tin vào bộ nhớ ... –

Bạn đang lặp lại trên mỗi dòng và gọi Số lượt truy cập mỗi lần. Bạn muốn Counter chạy trên toàn bộ tệp. Hãy thử:

from collections import Counter 

with open("TEST.txt", "r"): 
    contents = f.read().split() 
print Counter(contents)

Nguồn

2013-02-26 06:55:45 Anorov

Nó vẫn có thể tốt hơn để xử lý các dòng tập tin bằng dòng thay vì ... – jadkik94

@ jadkik94 Nếu anh ta đang xử lý mọi dòng trong khối đó, thì tại sao nó lại tạo nên sự khác biệt? – Anorov

@Anorov Điều gì sẽ xảy ra nếu bạn có tệp 50 GB mà bạn muốn đếm? (Tha chỉ xảy ra khi chỉ có 3 từ duy nhất) .... –

from collections import Counter; 
cnt = Counter(); 

for line in open ('TEST.txt', 'r'): 
    for word in line.split(): 
    cnt [word] += 1 

print cnt

Nguồn

2013-02-26 06:57:13

cảm ơn bạn tôi đã nhận được nó làm việc – Ashwin

Sử dụng một Defaultdict:

from collections import defaultdict 

def read_file(fname): 

    words_dict = defaultdict(int) 
    fp = open(fname, 'r') 
    lines = fp.readlines() 
    words = [] 

    for line in lines: 
     words += line.split(' ') 

    for word in words: 
     words_dict[word] += 1 

    return words_dict

Nguồn

2014-01-19 02:02:08 GrilledTuna

FILE_NAME = 'file.txt' 

wordCounter = {} 

with open(FILE_NAME,'r') as fh: 
    for line in fh: 
    # Replacing punctuation characters. Making the string to lower. 
    # The split will spit the line into a list. 
    word_list = line.replace(',','').replace('\'','').replace('.','').lower().split() 
    for word in word_list: 
     # Adding the word into the wordCounter dictionary. 
     if word not in wordCounter: 
     wordCounter[word] = 1 
     else: 
     # if the word is already in the dictionary update its count. 
     wordCounter[word] = wordCounter[word] + 1 

print('{:15}{:3}'.format('Word','Count')) 
print('-' * 18) 

# printing the words and its occurrence. 
for (word,occurance) in wordCounter.items(): 
    print('{:15}{:3}'.format(word,occurance))

Nguồn

2017-02-20 15:42:06

python - tìm sự xuất hiện của từ trong một tệp

Trả lời

Các vấn đề liên quan