๐ท๏ธ Python Web Scraping with Selenium โ Dynamic Sites
Selenium controls a real browser for scraping JavaScript-heavy sites that BeautifulSoup can't handle. Automate login, fill forms, click buttons.
For production scrapers, use Playwright (pip install playwright) โ it's faster and has better async support than Selenium.
๐ป Code Example:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.options import Options # 1. Setup Chrome in headless mode (no visible browser window) options = Options() options.add_argument('--headless') options.add_argument('--no-sandbox') driver = webdriver.Chrome(options=options) try: # 2. Navigate to page driver.get('https://pynfinity.com/welcome/home') print(f"Title: {driver.title}") # 3. Wait for element to appear (up to 10 seconds) wait = WebDriverWait(driver, 10) search_box = wait.until( EC.element_to_be_clickable((By.CSS_SELECTOR, '#search-input')) ) # 4. Interact โ type and submit search_box.clear() search_box.send_keys("Python regex tutorial santoshtvk") # 5. Find results results = driver.find_elements(By.CSS_SELECTOR, '.result-item') for result in results[:5]: print(result.text) # 6. Take a screenshot driver.save_screenshot('pynfinity_search.png') finally: driver.quit() # Always close the browser!
| Concept | Key Takeaway |
|---|---|
| By.CSS_SELECTOR | Most reliable โ use for IDs, classes, attributes |
| By.XPATH | Powerful but brittle โ avoid if CSS works |
| By.ID | Fastest โ only works when element has unique ID |
| WebDriverWait | Explicit wait โ always prefer over time.sleep() |
| driver.execute_script() | Run custom JavaScript directly in the browser |
Keep exploring and happy coding! ๐