Need Help Scraping WNBA Team Pages — Selenium Struggles with Dynamic Pages /u/raciallyambiguousmf Python Education

Hey everyone,

I’m working on a project to figure in which I am scraping WNBA team pages. Basically, I want to analyze each team’s season tickets, and determine how the prices have increased YoY since the league has gained more popularity since Caitlyn Clark entered. The idea is to analyze schedules, matchups, and demand for big games and then compare that against ticket prices to predict potential resale value.

**I have little technical experience, I understand some extremely basic concepts of Python and programming, and have been working with ChatGPT so far***

Right now, I’m building a Python script with Selenium to scrape data from each team’s official website about season ticket pricing, deposits, availability, and arena seating charts.

Where It’s Working:

  • I’ve got it set up to open each team’s site and navigate to the tickets/memberships section. (did this by manually grabbing the URLs since not each team’s page structure is the same
  • Pulling text-based data like deposits and availability windows. (Not cleanly however)
  • Logging the URLs for manual checking
  • The data is spotty for all of the above outside of the logging of the URLs

Where It’s Struggling:

  • Dynamic Content Loading – Some sites take forever to load or don’t render the data Selenium is looking for right away.
  • Popups and Overlays – Cookie consent banners keep blocking clicks, even though I’m trying to handle them in the code.
  • Inconsistent Layouts – Team websites use different structures and labels (e.g., “season tickets” vs. “memberships”), so my script sometimes stops at the general tickets page instead of digging deeper.
  • Image Extraction – A lot of teams have arena pricing charts as images, and scraping these isn’t working reliably, especially when images are loaded dynamically.
    • This would be ideal if I could even pull these images and load them into separate pages on something like .XLS – right now my program exports to .CSV which I then change to Excel

What I’m Looking For:

  • Better Ways to Handle Dynamic Content – Should I be using explicit waits differently, or is there a better tool than Selenium?
  • Popup Handling Tips – Are there any best practices for identifying and closing cookie banners and overlays?
  • Image Scraping Advice – How can I reliably find and save images like seating charts, even if they’re loaded dynamically?
  • API Recommendations – Are there APIs (e.g., Ticketmaster) that might simplify this instead of scraping?

I’ll take literally any advice / feedback, whether it be related to my program or even just strategy (i.e., am I even approaching this taskcorrectly)

So far I was able to successfully write a script that scraped the WNBA official schedule that pulled each team’s homegames, this is currently the next step in my overall plan.

Thanks in advance!

See the attacched link for Github Gist of my ProjectGithub Gist

submitted by /u/raciallyambiguousmf
[link] [comments]

​r/learnpython Hey everyone, I’m working on a project to figure in which I am scraping WNBA team pages. Basically, I want to analyze each team’s season tickets, and determine how the prices have increased YoY since the league has gained more popularity since Caitlyn Clark entered. The idea is to analyze schedules, matchups, and demand for big games and then compare that against ticket prices to predict potential resale value. **I have little technical experience, I understand some extremely basic concepts of Python and programming, and have been working with ChatGPT so far*** Right now, I’m building a Python script with Selenium to scrape data from each team’s official website about season ticket pricing, deposits, availability, and arena seating charts. Where It’s Working: I’ve got it set up to open each team’s site and navigate to the tickets/memberships section. (did this by manually grabbing the URLs since not each team’s page structure is the same Pulling text-based data like deposits and availability windows. (Not cleanly however) Logging the URLs for manual checking The data is spotty for all of the above outside of the logging of the URLs Where It’s Struggling: Dynamic Content Loading – Some sites take forever to load or don’t render the data Selenium is looking for right away. Popups and Overlays – Cookie consent banners keep blocking clicks, even though I’m trying to handle them in the code. Inconsistent Layouts – Team websites use different structures and labels (e.g., “season tickets” vs. “memberships”), so my script sometimes stops at the general tickets page instead of digging deeper. Image Extraction – A lot of teams have arena pricing charts as images, and scraping these isn’t working reliably, especially when images are loaded dynamically. This would be ideal if I could even pull these images and load them into separate pages on something like .XLS – right now my program exports to .CSV which I then change to Excel What I’m Looking For: Better Ways to Handle Dynamic Content – Should I be using explicit waits differently, or is there a better tool than Selenium? Popup Handling Tips – Are there any best practices for identifying and closing cookie banners and overlays? Image Scraping Advice – How can I reliably find and save images like seating charts, even if they’re loaded dynamically? API Recommendations – Are there APIs (e.g., Ticketmaster) that might simplify this instead of scraping? I’ll take literally any advice / feedback, whether it be related to my program or even just strategy (i.e., am I even approaching this taskcorrectly) So far I was able to successfully write a script that scraped the WNBA official schedule that pulled each team’s homegames, this is currently the next step in my overall plan. Thanks in advance! See the attacched link for Github Gist of my ProjectGithub Gist submitted by /u/raciallyambiguousmf [link] [comments] 

Hey everyone,

I’m working on a project to figure in which I am scraping WNBA team pages. Basically, I want to analyze each team’s season tickets, and determine how the prices have increased YoY since the league has gained more popularity since Caitlyn Clark entered. The idea is to analyze schedules, matchups, and demand for big games and then compare that against ticket prices to predict potential resale value.

**I have little technical experience, I understand some extremely basic concepts of Python and programming, and have been working with ChatGPT so far***

Right now, I’m building a Python script with Selenium to scrape data from each team’s official website about season ticket pricing, deposits, availability, and arena seating charts.

Where It’s Working:

  • I’ve got it set up to open each team’s site and navigate to the tickets/memberships section. (did this by manually grabbing the URLs since not each team’s page structure is the same
  • Pulling text-based data like deposits and availability windows. (Not cleanly however)
  • Logging the URLs for manual checking
  • The data is spotty for all of the above outside of the logging of the URLs

Where It’s Struggling:

  • Dynamic Content Loading – Some sites take forever to load or don’t render the data Selenium is looking for right away.
  • Popups and Overlays – Cookie consent banners keep blocking clicks, even though I’m trying to handle them in the code.
  • Inconsistent Layouts – Team websites use different structures and labels (e.g., “season tickets” vs. “memberships”), so my script sometimes stops at the general tickets page instead of digging deeper.
  • Image Extraction – A lot of teams have arena pricing charts as images, and scraping these isn’t working reliably, especially when images are loaded dynamically.
    • This would be ideal if I could even pull these images and load them into separate pages on something like .XLS – right now my program exports to .CSV which I then change to Excel

What I’m Looking For:

  • Better Ways to Handle Dynamic Content – Should I be using explicit waits differently, or is there a better tool than Selenium?
  • Popup Handling Tips – Are there any best practices for identifying and closing cookie banners and overlays?
  • Image Scraping Advice – How can I reliably find and save images like seating charts, even if they’re loaded dynamically?
  • API Recommendations – Are there APIs (e.g., Ticketmaster) that might simplify this instead of scraping?

I’ll take literally any advice / feedback, whether it be related to my program or even just strategy (i.e., am I even approaching this taskcorrectly)

So far I was able to successfully write a script that scraped the WNBA official schedule that pulled each team’s homegames, this is currently the next step in my overall plan.

Thanks in advance!

See the attacched link for Github Gist of my ProjectGithub Gist

submitted by /u/raciallyambiguousmf
[link] [comments] 

Leave a Reply

Your email address will not be published. Required fields are marked *