I have a simple script that uses Playwright, Python and Headless Chrome and BeautifulSoup to return the contents of a page. It works if I use regular urls, but whenever there is Cloudflare/CAPTCHA is involved, it fails. I have tried to add a random waiting time as well as using Playwright-stealth, add headers etc and nothing seems to work. Is there any workaround for this that exists?
This is my script:
from playwright.sync_api import sync_playwright from playwright_stealth import stealth_sync import time import random url = "https://sports.betonline.ag/sportsbook/basketball/nba" try: with sync_playwright() as p: browser = p.chromium.launch(headless=True) context = browser.new_context( user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" ) page = context.new_page() stealth_sync(page) page.goto(url) time.sleep(random.uniform(2, 5)) page.wait_for_load_state("networkidle") time.sleep(random.uniform(1, 3)) page_content = page.content() browser.close() from bs4 import BeautifulSoup soup = BeautifulSoup(page_content, "html.parser") print(soup.text) except Exception as e: print(f"An error occurred: {e}")
submitted by /u/makelefani
[link] [comments]
r/learnpython I have a simple script that uses Playwright, Python and Headless Chrome and BeautifulSoup to return the contents of a page. It works if I use regular urls, but whenever there is Cloudflare/CAPTCHA is involved, it fails. I have tried to add a random waiting time as well as using Playwright-stealth, add headers etc and nothing seems to work. Is there any workaround for this that exists? This is my script: from playwright.sync_api import sync_playwright from playwright_stealth import stealth_sync import time import random url = “https://sports.betonline.ag/sportsbook/basketball/nba” try: with sync_playwright() as p: browser = p.chromium.launch(headless=True) context = browser.new_context( user_agent=”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36″ ) page = context.new_page() stealth_sync(page) page.goto(url) time.sleep(random.uniform(2, 5)) page.wait_for_load_state(“networkidle”) time.sleep(random.uniform(1, 3)) page_content = page.content() browser.close() from bs4 import BeautifulSoup soup = BeautifulSoup(page_content, “html.parser”) print(soup.text) except Exception as e: print(f”An error occurred: {e}”) submitted by /u/makelefani [link] [comments]
I have a simple script that uses Playwright, Python and Headless Chrome and BeautifulSoup to return the contents of a page. It works if I use regular urls, but whenever there is Cloudflare/CAPTCHA is involved, it fails. I have tried to add a random waiting time as well as using Playwright-stealth, add headers etc and nothing seems to work. Is there any workaround for this that exists?
This is my script:
from playwright.sync_api import sync_playwright from playwright_stealth import stealth_sync import time import random url = "https://sports.betonline.ag/sportsbook/basketball/nba" try: with sync_playwright() as p: browser = p.chromium.launch(headless=True) context = browser.new_context( user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" ) page = context.new_page() stealth_sync(page) page.goto(url) time.sleep(random.uniform(2, 5)) page.wait_for_load_state("networkidle") time.sleep(random.uniform(1, 3)) page_content = page.content() browser.close() from bs4 import BeautifulSoup soup = BeautifulSoup(page_content, "html.parser") print(soup.text) except Exception as e: print(f"An error occurred: {e}")
submitted by /u/makelefani
[link] [comments]