Crawl data from the selected results provider
- Discover events, eg. from https://www.sportsoft.cz/en/results
- input: year
- output: dataset of events with parameters such as name, date, location, sport type, and URL to detail page
- For each event, crawl the results
- input: URL of the detail page (see above)
- output: dataset with all results — sub-event name, rank, name (possibly with some unique identifier?), nationality, year of birth, club, category, and time
Recommended tools:
Selenium- Beautiful Soup
You do NOT have to focus on creating production-grade code, a prototype working on your machine is OK for now.
Edited by Matej Mojzeš