diff --git a/chatanalyzer/__main__.py b/chatanalyzer/__main__.py index 795acde..a2def92 100644 --- a/chatanalyzer/__main__.py +++ b/chatanalyzer/__main__.py @@ -33,15 +33,10 @@ def main(): if ftfy.ftfy(m["sender_name"]) == p.name: if m["type"] == "Generic" and "content" in m: p.add_message(m["timestamp_ms"], m["content"]) - - # participants[0].print_df() WORDS_IN_CHAT = count_all_words(participants) for p in participants: p.count_words() - # print(WORDS_IN_CHAT["counts"]) - # for p in participants: - # WORDS_IN_CHAT.extend(p.get_words(longer_than=0)) print("The participants of this chat:") for p in participants: diff --git a/chatanalyzer/analyzing.py b/chatanalyzer/analyzing.py index 9330e4b..01c366d 100644 --- a/chatanalyzer/analyzing.py +++ b/chatanalyzer/analyzing.py @@ -29,15 +29,6 @@ def create_dataframe(participants): "sender": []} df = pd.DataFrame(skeleton) for p in participants: - # date=[] - # date += (p.messages.keys()) - # message=[] - # message += (p.messages.values()) - - # m_di = {"date": date, - # "message": message} - - # m_df = pd.DataFrame(m_di) m_df = p.messages_df diff --git a/chatanalyzer/participant.py b/chatanalyzer/participant.py index 21cb5e4..a3c9aa3 100644 --- a/chatanalyzer/participant.py +++ b/chatanalyzer/participant.py @@ -16,9 +16,7 @@ def count_all_words(participants): df = pd.concat([df, p.messages_df]) words = df.set_index(['timestamp']).apply(lambda x: x.str.split(' ').explode()).reset_index() - # print(words) words_count = words['message'].value_counts().sort_index() - # words_count.index = pd.PeriodIndex(words_count.index) df_out = words_count.rename_axis('message').reset_index(name='counts') df_out['message'] = df_out.apply(lambda row : to_uppercase(row['message']), axis = 1) return df_out.sort_values(by="counts", ascending=True).set_index("message").to_dict()["counts"] @@ -33,14 +31,11 @@ class Participant: self.words = pd.DataFrame() def add_message(self, timestamp, message): - # self.messages[str(datetime.fromtimestamp(timestamp/1000))] = ftfy.ftfy(message) self.messages_df = self.messages_df.append(dict(zip(self.messages_df.columns,[str(datetime.fromtimestamp(timestamp/1000)), ftfy.ftfy(message)])), ignore_index=True) def count_words(self): words = self.messages_df.set_index(['timestamp']).apply(lambda x: x.str.split(' ').explode()).reset_index() - # print(words) words_count = words['message'].value_counts().sort_index() - # words_count.index = pd.PeriodIndex(words_count.index) self.words = words_count.rename_axis('message').reset_index(name='counts') def get_words(self, longer_than=0):