scrapeRateProfs | Project Homepage Developer's Homepage |
Python Module for scraping reviews from RateMyProfessor.
This module scrapes reviews about professors from a certain university from ratemyprofessor.com.
Does this in the following steps:
Step 1: Get the list of professors starting with each letter
Step 2: Get the list of all reviews for that professor.
Step 3: Get each review from that list.
It also has the ability to be interrupted and (effectively) resume from where it stopped without having to redownload all the previous files. It does this by performs optimizations such as storing the webpages it has downloaded and compacting the downloaded pages into a format that is easy to process.
Usage:
scrapeRateProfs.py [-h] -sid SchoolID [-delay DELAY] -o OUTPUT -path PATH
Inputs:
-h, --help show this help message and exit
-sid SchoolID ID of the school on RateMyProf
-delay DELAY Amount of time to pause after downloading a website
-o OUTPUT Path to output file for reviews
-path PATH Directory where the webpages should be downloaded
Key Outputs:
- TSV File containing the review information.
- TSV File containing the aggregate information for a professor.
- Condensed set of information downloaded
Formats are provided in the accompanying README
Modules | ||||||
|
Functions | ||
|