Web Scraping code sometimes works but usually doesn’t /u/BeBetterMySon Python Education

Background: This originally got deleted. I edited it to be more readable. I’m trying to web scrape Pro Football Reference and store passing data for the Dallas Cowboys in dataframes, with each year representing a dataframe (i.e. regular_season[‘2024’] would be a dataframe, regular_season[‘2005’],etc. My code sometimes works but it usually only gives me the 2000-2002 or the 2000-2008 data and then freezes with the code still running (I’m trying to scrape from 2000 to 2023). I think I may need to make some timing adjustments or I’m not sure. Here is my code. I edited out the packages I imported to save time:

base_url='https://www.pro-football-reference.com/teams/dal/{}.htm' url_list=[] regular_season={} for i in range(2000,2024): url_list.append(base_url.format(str(i))) for url in url_list: print(f'scraping {url}') year=url.split('/')[-1][0:4] driver.get(url) WebDriverWait(driver,10).until(EC.presence_of_element_located((By.TAG_NAME, "table"))) page_source=driver.page_source tables=pd.read_html(page_source) regular_season[year]=tables[43] print(f'{year} year added') time.sleep(2) 

submitted by /u/BeBetterMySon
[link] [comments]

​r/learnpython Background: This originally got deleted. I edited it to be more readable. I’m trying to web scrape Pro Football Reference and store passing data for the Dallas Cowboys in dataframes, with each year representing a dataframe (i.e. regular_season[‘2024’] would be a dataframe, regular_season[‘2005′],etc. My code sometimes works but it usually only gives me the 2000-2002 or the 2000-2008 data and then freezes with the code still running (I’m trying to scrape from 2000 to 2023). I think I may need to make some timing adjustments or I’m not sure. Here is my code. I edited out the packages I imported to save time: base_url=’https://www.pro-football-reference.com/teams/dal/{}.htm’ url_list=[] regular_season={} for i in range(2000,2024): url_list.append(base_url.format(str(i))) for url in url_list: print(f’scraping {url}’) year=url.split(‘/’)[-1][0:4] driver.get(url) WebDriverWait(driver,10).until(EC.presence_of_element_located((By.TAG_NAME, “table”))) page_source=driver.page_source tables=pd.read_html(page_source) regular_season[year]=tables[43] print(f'{year} year added’) time.sleep(2) submitted by /u/BeBetterMySon [link] [comments] 

Background: This originally got deleted. I edited it to be more readable. I’m trying to web scrape Pro Football Reference and store passing data for the Dallas Cowboys in dataframes, with each year representing a dataframe (i.e. regular_season[‘2024’] would be a dataframe, regular_season[‘2005’],etc. My code sometimes works but it usually only gives me the 2000-2002 or the 2000-2008 data and then freezes with the code still running (I’m trying to scrape from 2000 to 2023). I think I may need to make some timing adjustments or I’m not sure. Here is my code. I edited out the packages I imported to save time:

base_url='https://www.pro-football-reference.com/teams/dal/{}.htm' url_list=[] regular_season={} for i in range(2000,2024): url_list.append(base_url.format(str(i))) for url in url_list: print(f'scraping {url}') year=url.split('/')[-1][0:4] driver.get(url) WebDriverWait(driver,10).until(EC.presence_of_element_located((By.TAG_NAME, "table"))) page_source=driver.page_source tables=pd.read_html(page_source) regular_season[year]=tables[43] print(f'{year} year added') time.sleep(2) 

submitted by /u/BeBetterMySon
[link] [comments] 

Leave a Reply

Your email address will not be published. Required fields are marked *