Web scraping considerations



Data tidying and importing

Data Science with R

Ethics

“Can you?” vs “Should you?”

“Can you?” vs “Should you?”

Challenges

Unreliable formatting

Data broken into many pages

Workflow

A new R workflow

  • When working in a Quarto document, your analysis is re-run each time you render

  • If web scraping in a Quarto document, you’d be re-scraping the data each time you knit, which is undesirable (and not nice)!

  • An alternative workflow:

    • Use an R script to save your code
    • Saving interim data scraped using the code in the script as CSV or RDS files
    • Use the saved data in your analysis in your Quarto document