Home › Forums › Archive Topics › Trends, Research And Notebooks › 50,000+ Horse Race Database
- This topic has 3 replies, 3 voices, and was last updated 15 years, 1 month ago by
Anonymous.
- AuthorPosts
- November 26, 2010 at 16:17 #16869
I have downsized the original 150,000 horse race database to a more manageable 50,000+ file (4MB), and made it a .xls file so that old versions of Excel i.e. Excel 97+ can view the file.
The data covers 2006 to 2010 and was originally put together to produce some standard times and to learn web scraping.
It will be updated on a regular basis if anybody is interested.<b>Download</b>:
[code:u97hp9rh]http://tinyurl.com/23yymyn[/code:u97hp9rh]December 10, 2010 at 10:22 #331886Hi, yes, I’m interested. updates would be good.
Thanks for making this available, I’ve been looking through it and there it would seem there are quite a few errors, the extra-fast and extra-slow times are easy enough to spot but mistakes within normal parameters aren’t. If possible (!) it might be a good idea to scrape RP data and compare the two.
The date format is a little awkward, if it were along the lines of 2010-12-10 (for example) it would make re-ordering easier and enable a unique numbering of each entry (per spreadsheet) – useful when correcting mistakes.
Probably the most serious omission from the data is the class of race, I’m not sure how you can make accurate standard times without this data. If you take an average from all races you are not accounting for the different average class peculiar to each course. It would also be useful (at least from a research perspective) to have age, weight, OR and winning distance data (if possible!).
Whatever, it’s an admirable and inspirational project, I wish you well.
December 10, 2010 at 16:47 #331941Yes you are right about there being mistakes, if you click on the link of any race it will take you to the SportingLife achive race, there is nothing wrong with the scraping it’s a fault with the Sportinglife database.
I have dropped the project anyway, as web scraping the Sportinglife data is a waste of time
December 10, 2010 at 19:56 #331970
AnonymousInactive- Total Posts 17716
Yes you are right about there being mistakes, if you click on the link of any race it will take you to the SportingLife achive race, there is nothing wrong with the scraping it’s a fault with the Sportinglife database.
I have dropped the project anyway, as web scraping the Sportinglife data is a waste of time

Dont give up, there was only a few minor errors!?
You tried the Racing Post?
- AuthorPosts
- You must be logged in to reply to this topic.