Home › Forums › Archive Topics › Trends, Research And Notebooks › 54,000+ Horse Race Database.
- This topic has 13 replies, 5 voices, and was last updated 12 years, 1 month ago by TheBluesBrother.
-
AuthorPosts
-
September 10, 2012 at 14:23 #22601
I put together an Excel.xlsx 54,000+ database of every winner in the GB and Ireland from 1/1/2005 up to 5/9/12.
You will need Excel 2007 or 2010 to read the .xlsx file (5.8MB) or get an excel .xlsx reader/converter.
The database contains: race date,country code,course name,pattern class,race distance,race title,horse name,horse age,winner time,time/secs,weight,bha ratings,rpr/raceform ratings and official going.
This might be useful to somebody who wants to compile their own standard times.
<!– m –>http://tinyurl.com/btco724<!– m –>
EDIT:<i><b> I am working my way through the list slowly correcting the lines that are out of sync.</b></i>
September 10, 2012 at 15:05 #412647Just taken a look at this, one big error is you can’t distinguish between Maiden’s and Handicaps.
September 10, 2012 at 15:14 #412649I didn’t know you can export such a big sample, i tried to every trainer in cats A-D but RI kept saying "overflow" and crashing. Do i need to do something different for big samples?
September 10, 2012 at 15:26 #412650Just taken a look at this, one big error is you can’t distinguish between Maiden’s and Handicaps.
I will amend the file to add race types.
I omitted the official going which has just been added.
September 10, 2012 at 15:40 #412654Thank you The Blues Brother, maybe add the weight of the winner and the finishing position of other runners if you can, then we can predict what sort of time you need to finish in to run different places.
Now lets get to work on making some reliable standards, I am going to start with
Lingfield AW.
September 10, 2012 at 16:13 #412661Thank you The Blues Brother, maybe add the weight of the winner and the finishing position of other runners if you can, then we can predict what sort of time you need to finish in to run different places.
Now lets get to work on making some reliable standards, I am going to start withLingfield AW.
Just added the race title to the file.
http://tinyurl.com/btco724 (5.8MB)
I will add the weight of the winner to the file but not the finishing position of the other runners as the file will be end up being massive.
September 10, 2012 at 17:11 #412674Just amended the file to show the winners weights
http://tinyurl.com/btco724 (5.8MB)
September 10, 2012 at 17:31 #412678Do you know how to project what time a horse should be running on the basis of their BHA Rating?
I’ve just quickly done Ascot but you seem to only have 3 running’s of The Kings Stand despite the data going back to 2005?
Anyhow this is how you do it; filter a distance you’d like to use, copy the Time/Sec’s alongside the BHA Rating from the data file and put the data into SPSS.
Go to Variables make sure both cases are "Numeric" and measured to "Scale". Then you click Data > Define Variable Properties > BHA > (Un-tick Limt Number of values displayed to) continue > tick missing on the value "0" then ok.
You then find Analyze up the top > Regression > Curve Estimation > BHA goes into Independent Variable & Time goes into Dependent Variable > Click Linear > Ensure Plot Models and Constant in equation is ticked > Save > Predicted values.
Go to Data View, find the new data beside your BHA figures and copy into Excel, sort by ascending order and you get something like this.
116 59.80
114 59.88
113 59.92
110 60.04
108 60.12
103 60.31
102 60.35
102 60.35
101 60.39
101 60.39
101 60.39
100 60.43
98 60.51
98 60.51
98 60.51
98 60.51
96 60.59
96 60.59
96 60.59
95 60.62
94 60.66
94 60.66
94 60.66
92 60.74
89 60.86
88 60.90
88 60.90
88 60.90
88 60.90
85 61.02
85 61.02
83 61.09
83 61.09
82 61.13
81 61.17
79 61.25
79 61.25
77 61.33
74 61.45
73 61.48
72 61.52
71 61.56
71 61.56September 10, 2012 at 20:41 #412701Good stuff BB.
Is the BHA rating the rating awarded for that race or going into that race (i.e. it’s best rating to that point)?
September 11, 2012 at 05:38 #412745Good stuff BB.
Is the BHA rating the rating awarded for that race or going into that race (i.e. it’s best rating to that point)?
It would be the BHA rating going into the race
September 11, 2012 at 05:49 #412746Do you know how to project what time a horse should be running on the basis of their BHA Rating?
I’ve just quickly done Ascot but you seem to only have 3 running’s of The Kings Stand despite the data going back to 2005?
Anyhow this is how you do it; filter a distance you’d like to use, copy the Time/Sec’s alongside the BHA Rating from the data file and put the data into SPSS.
Go to Variables make sure both cases are "Numeric" and measured to "Scale". Then you click Data > Define Variable Properties > BHA > (Un-tick Limt Number of values displayed to) continue > tick missing on the value "0" then ok.
You then find Analyze up the top > Regression > Curve Estimation > BHA goes into Independent Variable & Time goes into Dependent Variable > Click Linear > Ensure Plot Models and Constant in equation is ticked > Save > Predicted values.
Go to Data View, find the new data beside your BHA figures and copy into Excel, sort by ascending order and you get something like this.
Nice work here, I might download the latest version of SPSS and have a go myself
The problem with compiling Ascot standard times comes when you come to the mile races there are two race distances, one on the straight course and the other on the round course
September 11, 2012 at 11:14 #412772Do you know how to project what time a horse should be running on the basis of their BHA Rating?
I’ve just quickly done Ascot but you seem to only have 3 running’s of The Kings Stand despite the data going back to 2005?
Anyhow this is how you do it; filter a distance you’d like to use, copy the Time/Sec’s alongside the BHA Rating from the data file and put the data into SPSS.
Go to Variables make sure both cases are "Numeric" and measured to "Scale". Then you click Data > Define Variable Properties > BHA > (Un-tick Limt Number of values displayed to) continue > tick missing on the value "0" then ok.
You then find Analyze up the top > Regression > Curve Estimation > BHA goes into Independent Variable & Time goes into Dependent Variable > Click Linear > Ensure Plot Models and Constant in equation is ticked > Save > Predicted values.
Go to Data View, find the new data beside your BHA figures and copy into Excel, sort by ascending order and you get something like this.
Great data BB. Thanks
On this particular bit of the topic:
Surely if you want to predict Time, then Time should be the independent variable and BHA should be the dependent variable. Also, for the sake of precision why would you not first find out a coefficient of correlation between the two variables to check the strength of the relationship. And then look at the similar coefficient using more than one dependent variable, most probably BHA and Official Going, and produce a multiple regression analysis that will then predict Times for BHA and Official Going. Otherwise in predicting Times for BHA rating the data will be based on the average Official Going thus giving a Time only for average Official Going. Which is probably not what you want if the race you want to predict Times for is run on non-average Official Going.
September 11, 2012 at 11:27 #412775I didn’t know you can export such a big sample, i tried to every trainer in cats A-D but RI kept saying "overflow" and crashing. Do i need to do something different for big samples?
Do your project in blocks of about 20,000 lines and you will have no problem with a stack overflow, Raceform Interactive memory cannot handle anymore.
October 4, 2012 at 04:39 #415131This database has now been cleaned up and the rows that were out of sync has been corrected.
Database has been updated from 1/1/2005 to 1/10/2012
-
AuthorPosts
- You must be logged in to reply to this topic.