2000 Guineas

My contribution here. http://www.chef-de-race.com/dosage/classics/2009/2009_2000_guineas_preview.htm Also in today's RU under different headline.

I keep an open mind on dosage Steve, and have used it to good effect before now, but I'm struggling a bit with the logic you're applying in terms of the mathematical integrity, and your incorporation of 'outliers' in the arithmetic mean. If you're using a moving average (which you appear to be) then extreme values will disproportionately impact on it, and they could become quite volatile as they fall off the sample period. There's clearly different ways of calculating an average and I would have thought something closer to a mode would have been more informative in helping you identify the cluster in a histogram (or sweet spot) if you like.
 
I keep an open mind on dosage Steve, and have used it to good effect before now, but I'm struggling a bit with the logic you're applying in terms of the mathematical integrity, and your incorporation of 'outliers' in the arithmetic mean. If you're using a moving average (which you appear to be) then extreme values will disproportionately impact on it, and they could become quite volatile as they fall off the sample period. There's clearly different ways of calculating an average and I would have thought something closer to a mode would have been more informative in helping you identify the cluster in a histogram (or sweet spot) if you like.

Yes quite so this is how we identify aptitudinal type and eventually place them in the scale of categories ranging from Brilliant through Intermediate, Classic, Solid and Professional. if there is sufficient evidence of prepotency at a given distance for the progeny of a given stallion.

However for simplicity a general readership can appreciate a mean average from recent performances and then be shown how an ideal type may deviate from the mean. A DI either side of 2 is a good rule of thumb for a typical 2,000 Guineas winner. This functions as a useable and surprisingly accurate indicator for this particular race.
 
Last edited:
I disagree actually Steve, as I think he (like any other trainer) has an obligation to the sport, its development, and its reputation. That's not to say that he has to take the public onside regarding is every thought, but he's takign a perverse pleasure from seemingly sticking the media away, and the paying customers/ punters are getting caught in the cross-fire.

We live on planet Football, and racing is hardly in a position to routinely alienate its followers in the same way football can get away with. The authorities need to explore Bolger's behaviour a bit more closely and see what sanctions they might consider being to invoke, or introduce to prevent this kind of damagign activity.

...It irks me too. I can stomach studied reticence it's misinformation I have a problem with.
 
But didn't the misinformation come 3 weeks ago?

That would make the non-declaration of Intense Focus just reticence then, no?
 
But didn't the misinformation come 3 weeks ago?

That would make the non-declaration of Intense Focus just reticence then, no?

People have different ideas about Bolger. I ignore him as hanging anything on what he says is dangerous. I prefer to concentrate on the fundamentals of the horse and create a logical scenario for myself e.g. what would I do if I owned the horse. It seems to work better than waiting for a steer from him.

I figured that the Derby was the race for New Approach, for example, and along with some others ignored the untruth that he had been left in the race "by mistake" (a fat lie if every there was one).

I normally wouldn't back a Bolger horse for anything ante post. But sometimes you have to trust your gut and ignore the misdirection.
 
I'm not talking about the dosage profile Steve, I'm more interested in exploring the idea of using the arithmetic mean of the DI's as an indicator as to where you should source the winner from.

Let me give you a hypothetical example as it would still illustrate the dangers of using outliers to set an average

DI's of any race winners

1.30
1.29
1.28
1.32
1.26
1.30
1.31
1.36
1.31
4.33

The arithemetic mean would be 1.61 (rounded to 2 dcp's), yet all but one of your winners fall below that average. If you plotted them on a chart you're sweet spot would be no where near the 1.61 that your mean would have looking for. Instead you should surely be throwing out the top and and bottom decile and setting your average based on a median, or a even something closer to a mode. Or even a mean of the most popular modal scores by cluster

Essentially it's not dosage that's the issue, but rather how you're applying the maths to use the figures. By putting everything on the mean there's a danger that you end up looking in the wrong direction I'd have thought

If you're using a 10 year moving average, then the winner who prevailed at 4.33 would fall out of the sample at the next renewal and depending on what the new winners figure was, your average would alter dramatically all of sudden because you'd incorporated 'outliers'
 
I'm not talking about the dosage profile Steve, I'm more interested in exploring the idea of using the arithmetic mean of the DI's as an indicator as to where you should source the winner from.

Let me give you a hypothetical example as it would still illustrate the dangers of using outliers to set an average

DI's of any race winners

1.30
1.29
1.28
1.32
1.26
1.30
1.31
1.36
1.31
4.33

The arithemetic mean would be 1.61 (rounded to 2 dcp's), yet all but one of your winners fall below that average. If you plotted them on a chart you're sweet spot would be no where near the 1.61 that your mean would have looking for. Instead you should surely be throwing out the top and and bottom decile and setting your average based on a median, or a even something closer to a mode. Or even a mean of the most popular modal scores by cluster

Essentially it's not dosage that's the issue, but rather how you're applying the maths to use the figures. By putting everything on the mean there's a danger that you end up looking in the wrong direction I'd have thought

If you're using a 10 year moving average, then the winner who prevailed at 4.33 would fall out of the sample at the next renewal and depending on what the new winners figure was, your average would alter dramatically all of sudden because you'd incorporated 'outliers'

Yes I understood you the first time. I'm using a mean for practical purposes then showing how a typical winner deviates from that mean.

You're right this isn't a Dosage issue but a simple means of expressing what a typical winner looks like. A typical winner tends to fall between the mean (about 1.6) and up to around DI 2, which is more typical of an ideal type.
 
I can't see how it would have drawn you onto Haafhd then? (as the copy mentions it did).

If you were using an 11 year moving average then 1.65 would still have been your mean, but almost exclusively because of Zafonic's outlying value of 4.20 corrupting the dataset. Normal practice would be to remove extreme values and just accept their occasional occurance otherwise as random distribution that happens in most datasets, but not typical to the profile. With Zafonic removed, your arithmetic mean would be 1.55.

If you look at the current figures that you used in the article, then you've got another outlier in 'Golan' albeit the opposite end of the scale this time. You do however have quite a tight cluster of 8 winners between 2.33 and 1.57.

72% is probably quite good and i think most people would accept that. If you put them on an x/ y axis with CD's on one schedule and DI's on the other there's a distinct pattern. However if you draw the line at 1.65 then 7 of your values are above it, and 4 below it, which shouldn't happen, and suggests that the means is too low. This is largely down to the outlying values of Golan, and to a lesser extent Footstepsinthesand and Refuse to Bend.

This isn't unusual though in so far as 68% of values shoudl theoretically fall within 1 standard deviation of the mean, (34% either side) and these outliers would probably be picked up by the next standard deviation which encompasses 90%.

Funnily enough, if you concentrated purely on the area where the density of the plot is (something which would be more akin to a mode than a mean) then your actually mean of these values is 1.92 which encompasses a high of 2.33 Haafhd and a low of 1.57 Island Sands, which is of course pretty well where you said it should it be.

As i said you can plot these on a graph and almost do a dot to dot exercise that gives you quite a tight little sweet spot; much more so than this one that allows you to move across a much wider range at your own discretion.

It wouldn't materially alter your selection incidentally, but the qualifiers would be;

Mastercraftsman
Oquba
Set Sail
Himalaya
and Monitor Closely

It's not the legitimacy of dosage I'm querying, but rather whether it's right to use the arithmetic mean, and whether or not a better hot spot couldn't be generated, or even a hot spot and a warm spot. Mind you, there's alwasy the issue of a weighted index too which might reflect change sin breeding patterns. Dosage itself accommodates the principle

I can't decide if the maths is corrupting the methodology, or the methodology corrupting the maths. Having said that, if ain't broke, don't fix it.
 
I'm sure Godolphin indicated that Shaweel was doubtful? and even mentioned something to do with giving him time re-acclimatise which I thought was interesting (if they did) as this would represent something of a departure from their previous policy, but it seems to have gone largely unremarked on.

QUOTE]

Yesterday they said he was doubtful not before.

Yesterday was also the earliest I'd heard anything official and, due to personal interest, I've been keeping an eye out for any news on whether he was likely to run. Having said that, I held off from backing him once he was entered for the Greenham as his dissapointing run there made me think he might not line up on Saturday. IF was also beaten on his reappearance so maybe the signs were there? Admittedly I'm not sure if Bolger said IF would run at Newmarket after his seasonal reappearance :confused:
 
Can our Irish friends tell us the correct pronunciation of Gan Amhras?

I've been presuming it's 'Gan Avras' on the basis that 'mh' in Scottish Gaelic tends to be pronounced 'v'.
 
I can't see how it would have drawn you onto Haafhd then? (as the copy mentions it did).

If you were using an 11 year moving average then 1.65 would still have been your mean, but almost exclusively because of Zafonic's outlying value of 4.20 corrupting the dataset. Normal practice would be to remove extreme values and just accept their occasional occurance otherwise as random distribution that happens in most datasets, but not typical to the profile. With Zafonic removed, your arithmetic mean would be 1.55.

If you look at the current figures that you used in the article, then you've got another outlier in 'Golan' albeit the opposite end of the scale this time. You do however have quite a tight cluster of 8 winners between 2.33 and 1.57.

72% is probably quite good and i think most people would accept that. If you put them on an x/ y axis with CD's on one schedule and DI's on the other there's a distinct pattern. However if you draw the line at 1.65 then 7 of your values are above it, and 4 below it, which shouldn't happen, and suggests that the means is too low. This is largely down to the outlying values of Golan, and to a lesser extent Footstepsinthesand and Refuse to Bend.

This isn't unusual though in so far as 68% of values shoudl theoretically fall within 1 standard deviation of the mean, (34% either side) and these outliers would probably be picked up by the next standard deviation which encompasses 90%.

Funnily enough, if you concentrated purely on the area where the density of the plot is (something which would be more akin to a mode than a mean) then your actually mean of these values is 1.92 which encompasses a high of 2.33 Haafhd and a low of 1.57 Island Sands, which is of course pretty well where you said it should it be.

As i said you can plot these on a graph and almost do a dot to dot exercise that gives you quite a tight little sweet spot; much more so than this one that allows you to move across a much wider range at your own discretion.

It wouldn't materially alter your selection incidentally, but the qualifiers would be;

Mastercraftsman
Oquba
Set Sail
Himalaya
and Monitor Closely

It's not the legitimacy of dosage I'm querying, but rather whether it's right to use the arithmetic mean, and whether or not a better hot spot couldn't be generated, or even a hot spot and a warm spot. Mind you, there's alwasy the issue of a weighted index too which might reflect change sin breeding patterns. Dosage itself accommodates the principle

I can't decide if the maths is corrupting the methodology, or the methodology corrupting the maths. Having said that, if ain't broke, don't fix it.

Haafhd's DI is just over 2. He conformed to a long-term average which I know to be around DI 2 and to our ballpark figure of a typical winner which is also around DI 2, just as the Derby is around DI 1.

The group of past winners that I’ve used in this year’s analysis is really no more than for illustrative purposes. It gives anyone unfamiliar with the system a handle on the sort of horse that typically wins. Not exactly dumbing down but the sort of illustrative device that an Editor requires.

When Golan drops out of this selection it will revert to its long term average of closer to DI 2 once more (assuming that another significant outlier doesn’t take his place). So it’s a way in to illustrating what sort of colt generally does well in terms of stamina blend.

Consider the following table:

2007
COLT (SIRE/DAM SIRE)

DP


DI


CD

Mofarij (Bering/Nureyev)

3- 0-17-4-2 = 26


0.79


-0.08

Tobosa (Tobougg/Royal Academy)

6- 1- 7-4-4 = 22


0.91


0.05

Eagle Mountain (Rock Of Gibraltar/Darshaan)

2- 2-11-1-2 = 18


1.12


0.06

Teofilo (Galileo/Danehill)

4- 2-12-4-0 = 22


1.20


0.27

Truly Royal (Noverre/Polish Precedent)

6- 2-13-1-4 = 26


1.26


0.19

Diamond Tycoon (Johannesburg/Last Tycoon)

2- 1- 6-1-0 = 10


1.50


0.40

Strategic Prince (Dansili/Diesis)

3- 2- 7-1-1 = 14


1.55


0.36

Vital Equine (Danetime/Selkirk)

3- 2-10-1-0 = 16


1.67


0.44

Duke Of Marmalade (Danehill/Kingmambo)

8- 4-23-0-1 = 36


1.88


0.50

Cockney Rebel (Val Royal/Known Fact)

6- 1- 7-2-0 = 16


1.91


0.69

US Ranger (Danzig/Red Ransom)

7-12-26-0-1 = 46


2.29


0.52

Adagio (Grand Lodge/Cadeaux Genereux)

6-10- 8-4-0 = 28


2.50


0.64

Mount Nelson (Rock Of Gibraltar/Selkirk)

5- 2- 9-0-0 = 16


2.56


0.75

Prime Defender (Bertolini/Superlative)

6- 5-11-0-0 = 22


3.00


0.77

Yellowstone (Rock Of Gibraltar/Exclusive Era)

7- 4-11-0-0 = 22


3.00


0.82

Major Cadeaux (Cadeaux Genereux/Woodman)

6- 1- 7-0-0 = 14


3.00


0.93

Haatef (Danzig/Mr Prospector)

15-13-24-0-0 = 52


3.33


0.83

Dutch Art (Medicean/Spectrum)

7- 0- 5-0-0 = 12


3.80


1.17

Halicarnassus (Cape Cross/Relaunch)

6- 4- 6-0-0 = 16


4.33


1.00





Look at the three together in the middle just below DI 2 down to our mean of DI 1.6 then recall the result. Three of the first four were contained in this quite narrow band. This also appears to be the case in most years. Exceptions will inevitably occur in any system. All I’m doing in this analysis is identifying the sort of colt that typically does well. Some find this useful.
 
Last edited:
It's too many years since I did standard deviation (and even then I'm sure we used a calculator and I've forgotten the button punch sequence). There must be someone out there who can do it though?

If we say the arithmetic mean is 1.65 based on the following;

Henry = 1.92
Cockney = 1.91
George = 1.67
Footsteps = 1.08
Haafhd = 2.33
Refuse = 1.05
Rock = 2.16
Golan = 0.60
Kings B = 2.06
Island = 1.57
King of K's = 1.78

Then my understanding is that one standard deviation dictates that 68% of the values that make up the mean fall within this deviation? That's 34% either side of the mean on the dataset?

What would that spread be? and who would be the qualifiers on this years entries? (well I could do the second bit myself once I know the spread)

The next thing I'd be curious to explore, is if the outliers in bold were removed, then we have a much tighter grouping that's supplied 72% of the winners but with 8 values instead of 11. Consequently the new mean is now 1.92 as the outliers are low now that Zafonics dropped off the sample.

What would the new spread be according to 1 standard deviation now, and who would qualify on this stricter criteria?

I'm sure it could be done on a chart, (to help visualise it) and I'm equally sure it's a 5 minute job to someone who knows how to do it.

Anyone with the maths and or IT skills? (or more probably, the tragic inclination to work it out for me please?)
 
Here's a graph comparing the DI and CD of 19 of the last 20 Guineas winners (Tirol left out - not enough dosage points) and the DI and CD of 14 this year's 17 contenders (as per Steve's article):

3489876710_9be5119db1_o.png


The shaded area is my attempt - with no statistical basis offered! - to use it as a method for selection, containing as it does a little over half of the winners, but just under a quarter of the contenders.
 
Good stuff GF, I knew there was a way of doing this.

Is that based on the arithemetic mean to one standard deviation?
 
Last edited:
It's not based on anything - it's simply DI vs CD, and the shaded area is entirely by inspection.
 
is the dosage for the past winners not different now from the time they ran?

dosage is updated with new chefs etc..changing each individual profile each time a new chef is added

would this make past example comparisons not viable?
 
Last edited:
Well we need a proper mathmagician to do the technical stuff then.

Having said that, I still think the spreads interesting, as there a clear cluster, and then a few minor groupings. I'm assuming Zafonic is the outlier on the high side? and there's another one there too at about 3.70. These two must be quite historical? and yet I thought conventional wisdom was that we were supposed to be breeding for speed?

I'm guessing that Pennekamp, Golan and Mark Of Esteem are the group of three who dip below the positive CD line so that would be 2001, 1996 and 1995. Zafonic would be 1993 and the other high outlier must pre-date that.

It would seem to be a penalty kick for Mastercraftsman then, especially as the outliers are increasingly representing history. Would it be indicative of a more finite breeding industry? as the recent winners are increasingly occupying a narrower window suggetsing a greater intensity of effort to produce a certain type of horse?. If anything it's leaning toward stamina influences being more important outside of the hot spot, but if we've got a year where pace might be an issue (and there doesn't appear to be an obvious group to do this) then the box might shift slightly to the right and bring in the likes of Finjaan, Lord Shanakill and Oceans Minstrel into the mix.

The ones that i thought might need a fast pace to be at their best such as Intense Focus, and Arazan aren't lining up, which leaves Sea The Stars, Gran Amhas and I thought Cityscape as the ones who'd benefit from it being forced. Is Gran Ducel in there by rights, or there to assist the other two?
 
Last edited:
Consequently the new mean is now 1.92 as the outliers are dropped off the sample.

Exactly. The typical winner is just short of DI 2, which is the rule of thumb figure I tend to use (i.e. the mean adjusted to the typical).
 
Last edited:
is the dosage for the past winners not different now from the time they ran?

dosage is updated with new chefs etc..changing each individual profile each time a new chef is added

would this make past example comparisons not viable?


New chefs are added. Rainbow Quest is the only one to have affected the calculations of recent Guineas performers to any significant extent. The Dosage should become more accurate for those so affected. Where this has occured the old and new figure appears in my calculations.
 
for instance

in 1998 BATSHOOF'S dosage index was 18-4-12-00 DI 4.67 CD 1.18

it is now 18-4-20-8-0 (50) DI = 1.78 CD = 0.64

quite a difference

so comparing present dosages to past winners dosages isn't really viable...unless you know the dosage at the time they ran
 
Back
Top