Hi everyone,
I’m reaching out for advice on how to analyze a large dataset of chat logs between my ex and me. About 99% of our custody-related communication was done via Facebook Messenger. Luckily facebook allows you download the entire history. Here’s some context:
- Background:My ex and I have been separated since 2013 and have co-parented without court involvement until March of this year. Unfortunately, she broke all communication, removed me as the father at my daughter’s school (possibly moved her to a different school or homeschooling), and has prevented me from seeing my daughter. She’s doing it out of spite and is working well for her so far since the court system is a bit skewed.
- Legal Situation: A few years ago, my ex pursued child support, so I opened a paternity rights case. She later backed out, but I kept my case open because I felt excluded from important parenting decisions. Then started it up after she cut me off. Recently, I hired a more aggressive lawyer to prepare for mediation after delays with my previous lawyer.
- Current Problem: To prepare for mediation/court, I need to compile evidence of custody arrangements. Unfortunately, I never logged my time or had formal agreements signed. Now I’m scrambling to organize this data to lessen the financial blow.
What I’ve Done So Far:
- Exported all chat logs and used AI tools to import the data into a CSV format (left sample code further below).
- The CSV includes columns like `Sender`, `Timestamp`, `Message`, and `Action` (e.g., Pickup/Drop-Off/Other).
- I’ve identified some common keywords like “ready,” “meet,” “leaving,” “driving,” etc., which are often used in discussions about custody exchanges.
Challenges:
- I have almost no programming experience and am struggling with analyzing the data at a granular level.
- I need help identifying and flagging messages related to custody arrangements (e.g., pickups/drop-offs) and discussions about money.
- My goal is to calculate overnight stays and create a clear timeline of custody exchanges.
What I’m Looking For:
- Tips or Python scripts that can help me filter messages by keywords (e.g., “ready,” “meet”) and flag relevant rows in the CSV.
- Guidance on how to calculate overnight stays based on timestamps (e.g., pickups after 5 PM or drop-offs before 10 AM).
- Suggestions for visualizing this data (e.g., timelines or charts) to present in court.
Here’s an example of what my CSV looks like:
“`
Sender,Timestamp,Message,Action
Alex (full name hidden),2021-03-04 20:45:33,"So she should have told you we already did the reading assignment. Lol",Pickup
Brandy (full name hidden),2021-03-05 10:18:43,"Shes already forgot about it as of now",Pickup
Brandy (full name hidden),2021-03-06 09:07:52,"Hey. Can you meet around noon... I'm going to be at 436 and palm springs by Publix.",Pickup
“`
I’ve tried using AI tools like Perplexity.ai for analysis, but it didn’t fully analyze the file as needed. I’m open to hiring someone if necessary but would love any tips or pointers from this community first.
Thanks in advance for any help or advice you can provide!
from bs4 import BeautifulSoup
import pandas as pd
# Load your Facebook HTML file
html_file = r"message_1.html"
with open(html_file, "r", encoding="utf-8") as file:
soup = BeautifulSoup(file, "html.parser")
# Extract message threads
messages = []
message_blocks = soup.find_all('div', class_='_a6-g') # Main container for each message
for message in message_blocks:
try:
# Extract sender name
sender_tag = message.find('div', class_='_2ph_ _a6-h _a6-i')
sender = sender_tag.text.strip() if sender_tag else "Unknown"
# Extract timestamp
timestamp_tag = message.find('div', class_='_a72d')
timestamp = timestamp_tag.text.strip() if timestamp_tag else "Unknown"
# Extract message content
content_tag = message.find('div', class_='_2ph_ _a6-p')
if content_tag:
content = content_tag.get_text(separator=" ").strip()
else:
content = "Unknown"
# Append extracted data to list
messages.append({'Sender': sender, 'Timestamp': timestamp, 'Message': content})
except AttributeError as e:
print(f"Error parsing message: {e}")
# Convert to Pandas DataFrame
df = pd.DataFrame(messages)
# Debugging: Print the first few rows of the DataFrame and check for missing columns
print("DataFrame contents:")
print(df.head())
# Remove duplicates and empty messages
df = df.drop_duplicates()
df = df[df['Message'].str.strip() != ""]
# Parse and clean Timestamp column
df['Timestamp'] = pd.to_datetime(df['Timestamp'], format='%b %d, %Y %I:%M:%S %p', errors='coerce')
# Drop rows with invalid timestamps (optional)
df = df.dropna(subset=['Timestamp'])
# Sort DataFrame by Timestamp
df = df.sort_values(by='Timestamp')
# Save sorted DataFrame to CSV (ensure no PermissionError)
csv_file_path = r'C:tempsorted_custody_schedule_new.csv'
try:
df.to_csv(csv_file_path, index=False)
print(f"Sorted custody-related messages saved to {csv_file_path}.")
except PermissionError as e:
print(f"PermissionError: {e}. Please close any program using the file or save with a different name.")
submitted by /u/akolozvary
[link] [comments]
r/learnpython Hi everyone, I’m reaching out for advice on how to analyze a large dataset of chat logs between my ex and me. About 99% of our custody-related communication was done via Facebook Messenger. Luckily facebook allows you download the entire history. Here’s some context: Background:My ex and I have been separated since 2013 and have co-parented without court involvement until March of this year. Unfortunately, she broke all communication, removed me as the father at my daughter’s school (possibly moved her to a different school or homeschooling), and has prevented me from seeing my daughter. She’s doing it out of spite and is working well for her so far since the court system is a bit skewed. Legal Situation: A few years ago, my ex pursued child support, so I opened a paternity rights case. She later backed out, but I kept my case open because I felt excluded from important parenting decisions. Then started it up after she cut me off. Recently, I hired a more aggressive lawyer to prepare for mediation after delays with my previous lawyer. Current Problem: To prepare for mediation/court, I need to compile evidence of custody arrangements. Unfortunately, I never logged my time or had formal agreements signed. Now I’m scrambling to organize this data to lessen the financial blow. What I’ve Done So Far: Exported all chat logs and used AI tools to import the data into a CSV format (left sample code further below). The CSV includes columns like `Sender`, `Timestamp`, `Message`, and `Action` (e.g., Pickup/Drop-Off/Other). I’ve identified some common keywords like “ready,” “meet,” “leaving,” “driving,” etc., which are often used in discussions about custody exchanges. Challenges: I have almost no programming experience and am struggling with analyzing the data at a granular level. I need help identifying and flagging messages related to custody arrangements (e.g., pickups/drop-offs) and discussions about money. My goal is to calculate overnight stays and create a clear timeline of custody exchanges. What I’m Looking For: Tips or Python scripts that can help me filter messages by keywords (e.g., “ready,” “meet”) and flag relevant rows in the CSV. Guidance on how to calculate overnight stays based on timestamps (e.g., pickups after 5 PM or drop-offs before 10 AM). Suggestions for visualizing this data (e.g., timelines or charts) to present in court. Here’s an example of what my CSV looks like: “` Sender,Timestamp,Message,Action Alex (full name hidden),2021-03-04 20:45:33,”So she should have told you we already did the reading assignment. Lol”,Pickup Brandy (full name hidden),2021-03-05 10:18:43,”Shes already forgot about it as of now”,Pickup Brandy (full name hidden),2021-03-06 09:07:52,”Hey. Can you meet around noon… I’m going to be at 436 and palm springs by Publix.”,Pickup “` I’ve tried using AI tools like Perplexity.ai for analysis, but it didn’t fully analyze the file as needed. I’m open to hiring someone if necessary but would love any tips or pointers from this community first. Thanks in advance for any help or advice you can provide! from bs4 import BeautifulSoup import pandas as pd # Load your Facebook HTML file html_file = r”message_1.html” with open(html_file, “r”, encoding=”utf-8″) as file: soup = BeautifulSoup(file, “html.parser”) # Extract message threads messages = [] message_blocks = soup.find_all(‘div’, class_=’_a6-g’) # Main container for each message for message in message_blocks: try: # Extract sender name sender_tag = message.find(‘div’, class_=’_2ph_ _a6-h _a6-i’) sender = sender_tag.text.strip() if sender_tag else “Unknown” # Extract timestamp timestamp_tag = message.find(‘div’, class_=’_a72d’) timestamp = timestamp_tag.text.strip() if timestamp_tag else “Unknown” # Extract message content content_tag = message.find(‘div’, class_=’_2ph_ _a6-p’) if content_tag: content = content_tag.get_text(separator=” “).strip() else: content = “Unknown” # Append extracted data to list messages.append({‘Sender’: sender, ‘Timestamp’: timestamp, ‘Message’: content}) except AttributeError as e: print(f”Error parsing message: {e}”) # Convert to Pandas DataFrame df = pd.DataFrame(messages) # Debugging: Print the first few rows of the DataFrame and check for missing columns print(“DataFrame contents:”) print(df.head()) # Remove duplicates and empty messages df = df.drop_duplicates() df = df[df[‘Message’].str.strip() != “”] # Parse and clean Timestamp column df[‘Timestamp’] = pd.to_datetime(df[‘Timestamp’], format=’%b %d, %Y %I:%M:%S %p’, errors=’coerce’) # Drop rows with invalid timestamps (optional) df = df.dropna(subset=[‘Timestamp’]) # Sort DataFrame by Timestamp df = df.sort_values(by=’Timestamp’) # Save sorted DataFrame to CSV (ensure no PermissionError) csv_file_path = r’C:tempsorted_custody_schedule_new.csv’ try: df.to_csv(csv_file_path, index=False) print(f”Sorted custody-related messages saved to {csv_file_path}.”) except PermissionError as e: print(f”PermissionError: {e}. Please close any program using the file or save with a different name.”) submitted by /u/akolozvary [link] [comments]
Hi everyone,
I’m reaching out for advice on how to analyze a large dataset of chat logs between my ex and me. About 99% of our custody-related communication was done via Facebook Messenger. Luckily facebook allows you download the entire history. Here’s some context:
- Background:My ex and I have been separated since 2013 and have co-parented without court involvement until March of this year. Unfortunately, she broke all communication, removed me as the father at my daughter’s school (possibly moved her to a different school or homeschooling), and has prevented me from seeing my daughter. She’s doing it out of spite and is working well for her so far since the court system is a bit skewed.
- Legal Situation: A few years ago, my ex pursued child support, so I opened a paternity rights case. She later backed out, but I kept my case open because I felt excluded from important parenting decisions. Then started it up after she cut me off. Recently, I hired a more aggressive lawyer to prepare for mediation after delays with my previous lawyer.
- Current Problem: To prepare for mediation/court, I need to compile evidence of custody arrangements. Unfortunately, I never logged my time or had formal agreements signed. Now I’m scrambling to organize this data to lessen the financial blow.
What I’ve Done So Far:
- Exported all chat logs and used AI tools to import the data into a CSV format (left sample code further below).
- The CSV includes columns like `Sender`, `Timestamp`, `Message`, and `Action` (e.g., Pickup/Drop-Off/Other).
- I’ve identified some common keywords like “ready,” “meet,” “leaving,” “driving,” etc., which are often used in discussions about custody exchanges.
Challenges:
- I have almost no programming experience and am struggling with analyzing the data at a granular level.
- I need help identifying and flagging messages related to custody arrangements (e.g., pickups/drop-offs) and discussions about money.
- My goal is to calculate overnight stays and create a clear timeline of custody exchanges.
What I’m Looking For:
- Tips or Python scripts that can help me filter messages by keywords (e.g., “ready,” “meet”) and flag relevant rows in the CSV.
- Guidance on how to calculate overnight stays based on timestamps (e.g., pickups after 5 PM or drop-offs before 10 AM).
- Suggestions for visualizing this data (e.g., timelines or charts) to present in court.
Here’s an example of what my CSV looks like:
“`
Sender,Timestamp,Message,Action
Alex (full name hidden),2021-03-04 20:45:33,"So she should have told you we already did the reading assignment. Lol",Pickup
Brandy (full name hidden),2021-03-05 10:18:43,"Shes already forgot about it as of now",Pickup
Brandy (full name hidden),2021-03-06 09:07:52,"Hey. Can you meet around noon... I'm going to be at 436 and palm springs by Publix.",Pickup
“`
I’ve tried using AI tools like Perplexity.ai for analysis, but it didn’t fully analyze the file as needed. I’m open to hiring someone if necessary but would love any tips or pointers from this community first.
Thanks in advance for any help or advice you can provide!
from bs4 import BeautifulSoup
import pandas as pd
# Load your Facebook HTML file
html_file = r"message_1.html"
with open(html_file, "r", encoding="utf-8") as file:
soup = BeautifulSoup(file, "html.parser")
# Extract message threads
messages = []
message_blocks = soup.find_all('div', class_='_a6-g') # Main container for each message
for message in message_blocks:
try:
# Extract sender name
sender_tag = message.find('div', class_='_2ph_ _a6-h _a6-i')
sender = sender_tag.text.strip() if sender_tag else "Unknown"
# Extract timestamp
timestamp_tag = message.find('div', class_='_a72d')
timestamp = timestamp_tag.text.strip() if timestamp_tag else "Unknown"
# Extract message content
content_tag = message.find('div', class_='_2ph_ _a6-p')
if content_tag:
content = content_tag.get_text(separator=" ").strip()
else:
content = "Unknown"
# Append extracted data to list
messages.append({'Sender': sender, 'Timestamp': timestamp, 'Message': content})
except AttributeError as e:
print(f"Error parsing message: {e}")
# Convert to Pandas DataFrame
df = pd.DataFrame(messages)
# Debugging: Print the first few rows of the DataFrame and check for missing columns
print("DataFrame contents:")
print(df.head())
# Remove duplicates and empty messages
df = df.drop_duplicates()
df = df[df['Message'].str.strip() != ""]
# Parse and clean Timestamp column
df['Timestamp'] = pd.to_datetime(df['Timestamp'], format='%b %d, %Y %I:%M:%S %p', errors='coerce')
# Drop rows with invalid timestamps (optional)
df = df.dropna(subset=['Timestamp'])
# Sort DataFrame by Timestamp
df = df.sort_values(by='Timestamp')
# Save sorted DataFrame to CSV (ensure no PermissionError)
csv_file_path = r'C:tempsorted_custody_schedule_new.csv'
try:
df.to_csv(csv_file_path, index=False)
print(f"Sorted custody-related messages saved to {csv_file_path}.")
except PermissionError as e:
print(f"PermissionError: {e}. Please close any program using the file or save with a different name.")
submitted by /u/akolozvary
[link] [comments]