The Ultimate Web Scraping With Python Bootcamp 2023
Learn to extract data from the web with python with just one course, covering selectolax, playwright, scrapy and more
What you’ll learn
Understand the fundamentals of web scraping in python from absolute scratch
Scrape information from static and dynamic websites and extract it to a variety of formats
Intercept and emulate hidden APIs to identify highly productive alternatives to getting your data
Master the requests library for working with HTTP
Parse and extract content from HTML using beautifulsoup, selectolax, and Microsoft Playwright
Master complex CSS selectors including descendant, child, sibling combinators
Create scrapy crawlers and practice items, itemloaders and custom pipelines
Integrate scrapy with playwright for highly performant, fine-tuned dynamic website crawling
Practice processing and extracting data to a variety of formats including csv, json, xml, and SQL
No programming experience needed – I’ll teach you everything you need to know
No paid software required – we’ll be using open-sourced python libraries
A computer with access to the internet
Prepare to learn real skills you could put to practice right away
Welcome to the Ultimate Web Scraping With Python Bootcamp, the only course you need to go from a complete beginner in python to a very competent web scraper.
Web scraping is the process of programmatically extracting data from the web. Scraping agents visit a web resource, extract content from it, and then process the resulting data in order to parse some specific information of interest.
Scraping is the kind of programming skill that offers immediate feedback, and can be used to automate a wide variety of data collection and processing tasks.
Over the next 17+ hours, we will methodically cover everything you need to know to write web scraping agents in python.
This bootcamp is organized in three parts of increasing difficulty designed to help you progressively build your skill.
Part I – Begin
- a detailed overview the request-response cycle
- understanding user-agents, HTTP verbs, headers and statuses
- understanding why custom headers can often be used to bypass paywalls
- mastering the requests library to work with HTTP in python
- what stateless means and how cookies work
- exploring the role of proxies in modern web architectures
- mastering beautifulsoup for parsing and data extraction
Part II – Refine
- identifying and using hidden APIs and understanding the benefits they offer
- emulating headers, cookies, and body content with ease
- automatically generating python code from intercepted API requests using postman and httpie
- working with the highly performant selectolax parsing library
- mastering CSS selectors
- introducing Microsoft Playwright for headless browsing and dynamic rendering
Part III – Master
- learning how to set up scrapy and explore its command line interface (“the scrapy tool“)
- dynamically explore response objects using scrapy shell
- understand and define item schemas and load data using itemloaders and input/output processors
- write PageMethods to specify highly specific instructions to the headless browser from right within scrapy
- define custom pipelines for saving into SQL databases and highly customized output formats
In this bootcamp, I will take you step-by-step through engaging video lectures and teach you everything you need to know to get started with web scraping in python.
By the end of this course, you will have a complete toolset to conceptualize and implement scraping agents for any website you can imagine.
See you inside!
Who this course is for:
- Anyone who wants to learn how to collect data from the web programmatically
- Students with or without web scraping experience looking to level up
- Complete beginners with no experience
Created by Andy Bek
Last updated 9/2023
Size: 6.76 GB
Google Drive Links