Who Tests The Testers’ Tests?

by Rant on August 10, 2008 · 37 comments

in Doping in Sports, Floyd Landis, Olympics

One of the big questions that came out of the Floyd Landis case was whether the tests used to declare that an athlete has tested positive for a banned substance are capable of doing so to an acceptable degree of accuracy. This issue has been debated in a number of forums, including here and over at Trust But Verify over the last couple of years. Suffice it to say that there are a number of different opinions on the subject.

But when an article critical of some of the scientific or statistical methods involved in anti-doping science appears in a highly respected journal like Nature, something very powerful is being stated. Whether you agree or disagree with what Donald Berry wrote, it is at the very least a wake-up call to the anti-doping scientists and agencies that the process needs refinement.

When the journal publishes what can only be called a scathing editorial taking the anti-doping world to task, that’s an even greater indictment of the current practices. Trust But Verify has an excellent piece about the editorial and what it says. TBV has another excellent piece that looks more closely at Landis’ data, taking into account Donald Berry’s commentary in his article for Nature. The second piece is the more technical of the two, but they are both good discussions. If you haven”t already done so, take the time to read both articles. Both the articles and the comments made to them are a must read for anyone interested in how the anti-doping system works.

Over the past few days, I’ve been mulling over what Berry’s article means, and I’ve been following the discussions at TBV. As Larry points out in a comment over at TBV, Berry raises two main points:

One, exactly what is the sensitivity (the ability to determine true positive test results) of WADA’s testing? Is it 95%, 99% or some other value? Because we don’t know the answer to the question, it’s hard to evaluate how good (or poor) the testing done by anti-doping labs really is. Berry offers up a couple of interesting observations about the sensitivity of the tests, when applied to multiple tests, and what the false-positive rate could be. For an athlete who never doped, being tested 8 times (as Landis was at the 2006 Tour) would lead to a 34% probability that one of those eight tests could come up as a false positive. Scary. At the 99% level, that same athlete would still have an 8% chance of a false positive result. And in a situation where 126 tests were carried out, at a 1% false positive rate, there would be a 72% chance that one of the tests would be a false positive. This is not so unexpected, perhaps, given that a test with a one percent false positive rate, conducted more than 100 times, could reasonably be expected to have such a result. But even so, these numbers are cause for concern.

As Michael Press pointed out at rec.bicycles.racing:

Remember too that with
99% true positive rate
99% true negative rate
5% usage rate in the tested population
a positive test leaves a 17% chance that the testee is not positive.

This kind of calculation applies to all medical screening
tests of the sort used daily in the population at large.

What does this mean? In short, if these were the standards that WADA uses, clean athletes should be very concerned about the possibility of testing positive when, in fact, they aren’t cheating.

The bigger question, as Larry notes, is the one in the headline to this piece. Who tests the testers’ tests? How do we know that the tests used by WADA labs work as intended, and that they catch only those who cheat? Unfortunately, the world of anti-doping “science” is a pretty closed system. Yes, there is an annual conference where members of the community get together and discuss various topics and present papers and so forth. But how open is it to the view of outside scientists with expertise in the same, or similar, areas?

As Donald Berry sees it, and the journal Nature concurs, it appears that very little in the way of research and experimentation geared towards validating WADA’s multitude of anti-doping tests has been published. As Nature observes, citing Berry’s article:

The ability of an anti-doping test to detect a banned substance in an athlete is calibrated in part by testing a small number of volunteers taking the substance in question. But Berry says that individual labs need to verify these detection limits in larger groups that include known dopers and non-dopers under blinded conditions that mimic what happens during competition.

Drug testing should not be exempt from the scientific principles and standards that apply to other biomedical sciences.

Nature believes that accepting ‘legal limits’ of specific metabolites without such rigorous verification goes against the foundational standards of modern science, and results in an arbitrary test for which the rate of false positives and false negatives can never be known. By leaving these rates unknown, and by not publishing and opening to broader scientific scrutiny the methods by which testing labs engage in study, it is Nature‘s view that the anti-doping authorities have fostered a sporting culture of suspicion, secrecy and fear.

If this is true, how can we have faith that the tests catch those who cheat, and don’t inadvertently label innocent athletes as cheats? If WADA’s tests, on average, have a one percent false-positive rate, then at this year’s Tour, we could expect that out of the several hundred tests conducted (in reality many more tests are conducted, as each sample is actually tested for a number of drugs) would have several potential false positive results. This is rather disconcerting, because once someone is said to have a positive (or “non-negative” in a bit of Orwellian anti-doping double-speak), that person is as good as convicted in the court of public opinion. Not to mention the whole anti-doping adjudication system.

Before we go destroying a person’s reputation, we ought to be darned sure that we’re acting on information that is pretty well rock solid. In the world of sports and doping, that means we need to be highly confident that the anti-doping tests work as advertised.

So who’s out there testing the testers’ tests? Is anyone? Are the labs that develop these tests doing so? Are other labs conducting tests to validate results reported by those who have developed the tests? We shouldn’t just trust those who’ve developed tests to say they work. They have a vested interest in doing so — having said that they have a test and that it works, it would be quite embarrassing to have to turn around and sheepishly admit that, well, no, actually it doesn’t.

The thing is, if we’re working to ensure fairness in sports, we need tests we can believe in. Not because one person, or one lab, says so. We need tests that are validated and verified, and for which the rate of false positives and false negatives is known with a reasonable degree of certainty. Unfortunately, this is just the kind of information that is difficult to locate. It may exist somewhere. But for an athlete trying to defend him- or herself against an accusation of doping, this information is hard to come by — if it exists. And if it exists, WADA’s cause would be better served by making it public.

What better way to show the world that Joe or Jane Athlete, caught by a specific test, is a doper than to show how the test that caught Joe or Jane has an incredibly low rate of false positives, taken as a single test, or even when applied to testing a large population of athletes?

Which brings me back to the question of just how precise the testing needs to be in order to ensure that even when administered multiple times, the test produces relatively few false positives.

Just how accurate do the individual tests need to be in order to ensure that the rate of false positives is low enough that few, if any, innocent athletes will be prosecuted for a doping violation? That depends on where you draw the line at just what is acceptable. Is one out of one thousand an acceptable rate? If so, for a system where 126 tests are conducted (as in the 2006 Tour), then the individual tests need to have a 99.999% specificity. In such a system, on average only one out of 100,000 individual tests would return a false positive. When applied to a set of 126 tests, one out of 1000 tests would result in a false positive. Should the one out of 10,000 be the acceptable rate for such a grouping? If so, you need tests that have a 99.9999% specificity. In this case, only one out of each million individual tests would return a false positive. Want only one false positive out of 100,000 to be the cutoff for the number of tests conducted in 2006? That means the individual tests need to have a confidence level of 99.99999% specificity. In this last example, each individual test would return, on average, only one false positive out of ten million tests conducted

It’s enough to make your head spin, isn’t it? So what is an acceptable level of false positives? That’s a crucial question to ask when evaluating just how good a test is. After all, people’s livelihoods depend on these tests being accurate. And along with the question of false positives is the question of false negatives. For a good test, the rate of both should be low. But if you have to choose, it’s better to have fewer false positives than false negatives. Someone who escapes the tests once may, at a later time, get caught.

As Donald Berry said in his article in Nature:

I believe that test results much more unusual than the 99th percentile among non-dopers should be required before they can be labelled ‘positive’.

As you can see, you have to go very far, indeed. Berry’s critique of the manner in which the WADA tests have been developed points to many areas that need to be improved.

Whether a substance can be measured directly or not, sports doping laboratories must prospectively define and publicize a standard testing procedure, including unambiguous criteria for concluding positivity, and they must validate that procedure in blinded experiments. Moreover, these experiments should address factors such as substance used (banned and not), dose of the substance, methods of delivery, timing of use relative to testing, and heterogeneity of metabolism among individuals.

To various degrees, these same deficiencies exist elsewhere — including in some forensic laboratories. All scientists share responsibility for this. We should get serious about interdisciplinary collaborations, and we should find out how other scientists approach similar problems. Meanwhile, we are duty-bound to tell other scientists when they are on the wrong path.

Nature’s editorial concurs, saying:

Detecting cheats is meant to promote fairness, but drug testing should not be exempt from the scientific principles and standards that apply to other biomedical sciences, such as disease diagnostics. The alternative could see the innocent being punished while the guilty escape on the grounds of reasonable doubt.

The anti-doping system must be built on a solid scientific foundation. A very important part of that foundation is the validation of the anti-doping tests, and the determination of just how capable the tests are of catching cheaters without catching innocent victims. Without such validation, and without that information being readily accessible, we have no way to know if the tests developed by WADA and their affiliates do anything remotely like what they claim.

Who tests the testers’ tests?

Somebody needs to, if we’re going to have an anti-doping system we can believe in.

fmk August 10, 2008 at 8:48 pm: “For an athlete who never doped, being tested 8 times (as Landis was at the 2006 Tour) would lead to a 34% probability that one of those eight tests could come up as a false positive.”

So if, say, I was tested twenty-two times in a season, by Berry’s analysis I’d have false positived … seven and a half times?

Christ in a cage, but ‘the most tested athlete on the planet’ must have had a *very* lucky year in 2004, especially with at least eleven of those tests coming at the Tour de France alone, meaning – statistically speaking – he have dinged false positive three-and-three-quarters times there.
Rant August 10, 2008 at 9:09 pm: Not quite. The probability that one test out of those eight would come up as a false-positive was 34 percent. That doesn’t mean that one-third of those eight tests would come up false positive, however.
In the case of “the most tested athlete on the planet”, the probability that one of those 22 tests would come up as a false positive would be just over 67 percent, given the same degree of sensitivity. Still, if the tests only have a sensitivity of 95 percent (which no one knows for sure), said athlete was very lucky, indeed. If the sensitivity was 99 percent (Berry’s other scenario), however, “Sir Lance-a-lot” would have had a 20 percent chance that one of those tests would come up as a false positive. Still lucky to escape, just not quite to the same degree.
fmk August 10, 2008 at 10:01 pm: Do explain how you calculated 67%.
Larry August 10, 2008 at 11:12 pm: fmk, I believe that 67% =1 minus 95% to the 22nd power (.95 x .95 x .95 x .95, etc., 22 times).
Morgan Hunter August 10, 2008 at 11:25 pm: “Testing – Testing? We don’t need – no stinking testing!” To paraphrase an old screen villain – passing himself off as a Mexican bandido – “We have WADA!”

Meanwhile – “back at the ranch” – the “original” Olympics turns out to be just like our version of “cage fighting” – except many times – the BEST won only when he killed his opponent – AND THE CROWDS WENT WILD!!! They loved it!

“—Yes Angela – the modern Olympics STARTED WITH A LIE and continues to hide “the lie” by throwing in the phrase – “the Olympic Ideal”…

“—Think about this Angela: “Aren’t we the lucky ones? After all every sporting event is just supposed to be a “civilized form” of a very ancient thing called “WHO’S THE TOP BANANA?”

“—That is a very good question Angela – “The old Olympics” – were sponsored by the rulers in existence at the time – yes. How better to control the “aggressive nature” of their populace AND MAKE MONEY doing so – then to put the fight for dominance into the form called sport! The difference is – the ancient Olympics – was sponsored by the government of that time.”

“—Today?” Angela – “Why we have REAL SPONSORS – who sponsor the Olympic Idea…not government…l”

“—You are right Angela. The IOC and most all governing bodies do have a major controlling block of power that is directly funded by individual governments…”

“…You know Angela – you are becoming very annoying dear – can we please change the subject? Daddy has to be refereeing a hockey match in twenty minutes.”

HONEY! WHERE ARE THE KEYS TO THE TURNIP CART???????
Larry August 10, 2008 at 11:43 pm: fmk and others, I don’t think the right conclusion to draw from Dr. Berry’s article is that the Lance Armstrongs and Floyd Landises of the world are lucky if they can avoid a false positive test. It’s easy to get seduced by the seeming certainty of the numbers. The reality is much fuzzier.

If you want to focus on the statistics, then the proper lesson to draw is that the testing is far from a certain thing. You can use Dr. Berry’s analysis as a proper counter to the nonsense we heard from Dr. De Ceaurriz (head of the French lab) who said that the testing was “foolproof” and that “no error is possible”. But you cannot quantify the likelihood of an error with any precision. An error is possible in any case, and errors are likely in general. The fact that errors occur is close to a certainty. More than this you cannot say.

Jean C has already made the argument (a sound one in general terms; I don’t agree with his particulars) that facts in addition to the test need to be taken into consideration. This gets to the so-called “prosecutor’s fallacy” discussed by Dr. Berry in his article: these tests become a lot more reliable if there’s independent evidence supporting the test result. So, for example, if the prosecutor finds DNA evidence linking a defendant to a murder, and there’s already evidence against that defendant (he was seen at the scene of the crime, he knew the murder victim, etc.), that makes the evidence much more reliable and reduces the possibility of the test being false. If on the other hand the prosecutor simply uses the DNA evidence against a computerized gene pool containing millions of people’s DNA, then the likelihood of a false positive goes way up. It’s possible to debate the nature of the independent evidence in the Landis case (and indeed, whether some of this evidence is truly independent), but the nature of this other evidence mucks up the statistics.

You can’t decide the Landis case on the basis of the math.

On the other hand, the points made by Dr. Berry about “testing the tests” are unequivocal, and damning.

Morgan, just when I think I’ve figured out who Angela is, I encounter your family turnip cart, and now I don’t know what to think!
Morgan Hunter August 11, 2008 at 12:26 am: Dear Confused…..

…”Would you mind? Hold my flag while I get my half-gallon cup of beer and 37 hot dogs with extra onions?”

But hey – the turnip cart? – Ye-ep – it’s “pimped!”
fmk August 11, 2008 at 1:32 am: So. That’s an interesting way of calculating the number. Assume there’s only two possible outcomes and it it’s not one it’s the other. But I’m going to save questioning the calculation logic a moment.

Let’s assume we had a sample population of 400 tests. The chances of there being a false positive in among those tests, by Berry’s calculation is 1-(0.95^400) – or certainty as near as makes no odds (you actually only need just over a hundred tests to reach certainty in Berry’s calculation).

Now, where could we get our hands on a population of 400 samples? I know! The Tour de France! They did 400 tests this year (more, actually, quite a few more). And what did they get? Twenty-two positives. Good chance that one of them’s one of Berry’s false positives, no?

Except .. well, hasn’t everyone either admitted they junked or produced a TUE to explain the positive? Well, except Casper, but he’s accepted the test isn’t a false positive, he’s just got to prove he was legit taking the drug he took.

So there’s 400 samples. 22 positives. A racing certainty of at least one false positive. And no one screaming false positive! false positive!

Makes ya wonder, doesn’t it? Could logic this weak really be used to prove Landis wasn’t a junkie?
Jean C August 11, 2008 at 1:46 am: This can help those who want to do some calculations with Bayles’ rule
http://faculty.vassar.edu/lowry/bayes.html
fmk August 11, 2008 at 1:53 am: Oh nuts, it ate a comment again. Bah humbug.

Larry, probably shorter this time. Largely, I agree with you. testing has never been about yes and no, black or white, one role of the roulette wheel and that’s your career your betting. It’s always been about judgement and always been part of a process.

But … well I’m sorry, if others are going to try and twist this math to prove silly things I don’t see why I shouldn’t too. It’s math for God’s sake. It’s why it’s there, to twist and turn and prove whatever the hell you want with it. It’s not wonder so many of the the PoMo deconstructionists liked to use math symbols to support their arguments.

ON the ‘most tested athlete on the planet’ thing … a joke, just a joke. I’m actually agnostic on that front, mostly. Don’t like him because of his attitude to people who talked about junking but laugh when I read Walsh bullshitting on about VAM and VO2 Max like he understands it and it proves the man was a junkie. But I just thought it’s be a laugh to put his boast to the test, and 22 is his boast. You should check his 2004 results to see how few of them could have been out-of-competition tests. God but the UCI are great, aren’t they? Always been at the forefront of tsting athletes, they have. Always. It says so on their website.
Rant August 11, 2008 at 5:58 am: fmk,
Sorry it took a bit to get back to you on the calculation. It is the formula that Larry said. It is interesting that no one is screaming false positive. At least one of the cyclists who came up positive (Beltran) denies using EPO. Could he have been a false positive? Or is he just someone denying his actions to escape punishment? Unfortunately, the way the anti-doping system currently works, we’ll probably never know. But he’s assumed guilty by the doping narcs, and by the media and most fans. Funny, though. We haven’t heard a peep about his B sample. Or did I miss it in all the other hubbub?
Rant August 11, 2008 at 6:12 am: Larry,
Right you are. Math can’t prove Landis’ (or Armstrong’s or any athlete’s) innocence. But it does, as you say, point to less certainty in the results than some in the anti-doping establishment claim. And that’s definitely something that should be looked into, and corrected through better, more accurate testing techniques. When you get a situation where a relatively small group of tests (like, say all those conducted at the Tour de France) has a high probability of returning a false positive, then things need to be tightened up. (And not by making it more difficult for an athlete to challenge a positive result).
The point of Berry’s article is not to prove Floyd Landis’ innocence, it’s to raise questions about whether or not anti-doping tests and anti-doping science actually deliver the claimed results. Berry points out some missing, critical information needed to judge whether the tests are as foolproof as some claim. In the process, he shows that while some levels of specificity might sound foolproof, when many tests are conducted, the odds of a false positive can increase quite a bit.
That’s something that everyone should be concerned about, including the anti-doping warriors. Instead of brushing aside criticism, they should be embracing it and working towards more transparency and ensuring that tests are of sufficient degrees of specificity to ensure truly minimal chances of false positives occurring.
Morgan,
You’re a fan of Angela Lansbury? 😉
Rant August 11, 2008 at 7:21 am: fmk,
Sorry about your comment getting stuck in limbo there. Something about the phase of the sun and the moon, etc. 🙂
Morgan Hunter August 11, 2008 at 8:54 am: Rant,

Yeah – I know the old girl – she was a hot number – even back in the 60’s…and it was well known that she loved taking rides on pimped turnip carts….Aha – really….

And – as you state —– “The point of Berry’s article is not to prove Floyd Landis’ innocence, it’s to raise questions about whether or not anti-doping tests and anti-doping science actually deliver the claimed results.” — no matter how some would like to “swing” the concrete into the land of the dubious…
Larry August 11, 2008 at 10:48 am: Jean C, thanks. Terrific link!

Rant, you understand statistics better than I do. I am playing around with the calculator on the site referenced by Jean C, and I’m getting surprising numbers. I am trying to figure out what are the odds of any announced positive being a false positive, using a 95% confidence level (P(B|~A)=.05) and given varying percentages of doping in the pro peloton (P(A)). I’m getting some surprising results. If 30% of the peloton is doping, then the chance of any positive being a false positive is about 11%. If 10% of the peloton is doping, then the chances that the positive is false goes up to 31%. If 5% of the peloton is doping, then the chances go up to 49% (half the announced AAFs would be false?). And if the 5 AAFs announced in this year’s tour accurately represents the percentage of riders doping (2.5%), then the chance that any announced positive is false is … 66%?

Gee, even with a 99% confidence level in your test and 2.5% of riders doping, you have a 28% chance of any announced positive being a false positive.

Is this right?
wildiris August 11, 2008 at 12:28 pm: There is an aspect of this debate that some posters here seem to be having some confusion about. The scatter plots are the result of the same test done on multiple individuals, not the same test done multiple times on the same individual.

If it were the second case, then testing the same individual over and over would eventually lead to a false positive. This is not the case the Berry is considering.

What Berry is specifically referring to is the situation were some individuals, because of their metabolism or body chemistry, will test positive for doping even though they are, in fact, clean. And there will also be individuals, because of their metabolism or body chemistry, will test clean, even though they are, in fact, doping. What Berry is pointing out is, it is these possibilities that the drug testing agencies have never dealt with in a scientifically accepted manner.
Rant August 11, 2008 at 12:30 pm: Larry,
Strange things happen when you start considering probabilities of certain events. Just because a test sounds definitive for a single occurrence doesn’t mean that it’s so definitive when applied to multiple occurrences.
I’ve only played with the numbers a bit over there. But you’re in the ballpark. Running the numbers quoted in the main post gives almost the same value as Michael Press quotes (16.1% via Jean C’s link vs. 17% that Press states). Careful on the interpretation, though. Just because the odds are X doesn’t mean that X percent of the positive results will definitely be false positive. It means that the chances are X percent that a given value is false positive. When you get to around 50 percent, you might just toss a coin, instead. Heads, he’s a doper. Tails, he’s not. 😉
wildiris,
Thanks for pointing that out.
Larry August 11, 2008 at 1:12 pm: wildiris, guilty as charged, but …

I don’t think it’s necessarily the case that the scatter plots are the result of the same test performed once on multiple individuals. In fact, I know that this is NOT how the French lab (then LNDD, now AFLD) set its margin of error to achieve the required expanded uncertainty 95% level of confidence (see WADA International Standard for Laboratories rule 5.4.4.3.2.2). To measure what was required for this level of confidence, the LNDD measured the same urine pool 30 times over a period of 7 months. That’s a scatter plot where the same test was performed multiple times on the same individual. (Actually, technically speaking, this wasn’t an INDIVIDUAL’S urine, it was a pool of urine collected from multiple individuals and mixed together. Nevertheless, all of the tests were performed on an identical pool of urine.)

Dr. Berry was considering a wide variety of things in his article, and performing multiple tests on the same individual is one of the things he considered. To quote the article:

“Because [Landis] was among the leaders he produced 8 pairs of urine samples … So there were 8 opportunities for a true positive — and 8 opportunities for a false positive. If he never doped and assuming a specificity of 95%, the probability of all 8 samples being labelled ‘negative’ is the eighth power of 0.95, or 0.66. Therefore, Landis’s false-positive rate for the race as a whole would be about 34%.”

I agree that Dr. Berry is making a more general point about false positives that is extremely important. I don’t agree that Dr. Berry’s point relates to some possible sources of error and not to others. True, some people may have general biochemistry that would cause a test performed on them to generate false positives or false negatives. But others may have a specific biochemical condition that could throw the tests off, like an infection. In still other cases, the person’s biochemistry may not be the issue — it may simply be a case that the testers will screw up occasionally. The CIR test used in the Landis case is a difficult test to perform and an easy test to screw up (another point made by Dr. Berry). A false positive rate would include all possible sources of error.
Thomas A. Fine August 11, 2008 at 2:43 pm: wildiris,

I think Berry is primarily concerned with normal variations along a (presumed) gaussian distribution. Each one of us will occasionally have values that approach and exceed the threshold, by random chance.

When sampling a population for values, the sampling doesn’t by itself differentiate between people people who are habitually at one end or the other of the curve, and people who are average, but occasionally vary all the way to both ends of the curve.

Larry,

Those big numbers with lots of decimals after them (e.g. 99.9999) are almost certainly unobtainable. And certainly can’t apply to current testoterone CIR methods. I think it’d take a much bigger threshold to push this test to 99.9 percent. If we did, we’d let off lots of dopers, and still catch a dozen or so innocents each year (based on 2005, when there were more than 12,000 tests of cyclists).

Given that we can’t obtain the sensitivity and specificity that we desire, this demands that if we use the tests as is, we must use much lighter sentences. It’s why I keep saying we have to design anti-doping policy around the science we have, not the science we wish we had.

tom
Rant August 11, 2008 at 2:51 pm: Tom,

Given that we can’t obtain the sensitivity and specificity that we desire, this demands that if we use the tests as is, we must use much lighter sentences. It’s why I keep saying we have to design anti-doping policy around the science we have, not the science we wish we had.

Quite right.
William Schart August 11, 2008 at 3:01 pm: Good point Tom. And we also have to make things easier for an athlete with a false positive to clear his name.

As I see it (and I’m sure Larry will correct me if I am wrong), at present, an athlete who is the victim of a false positive must 1. prove that the lab made some specific violation in processing his sample and 2. prove specifically that that error produces the false positive result. And the rules regarding things like discovery are such that the athlete is effectively denied access to evidence which could help him to prove these things (are alternatively, decide that the case against him is solid and there is no point in proceeding).

I think some people are making conclusions about Armstrong’s alleged luck based on the Gambler’s fallacy.

http://en.wikipedia.org/wiki/Gamblers_fallacy
Larry August 11, 2008 at 3:33 pm: Tom, agreed 100%. I think this has been TBV’s argument for quite some time. I’ve come to the opinion that an anti-doping violation should be treated as a no-fault kind of offense (going hand in hand with strict liability is no-fault liability), where in the absence of corroborating facts, WADA would treat the AAF like a red card in soccer, or like a swimmer who false starts once too often. The rider who tested positive in-competition would be disqualified from the race, and might have to go through drug counseling or something like that. We could discuss what would be appropriate for repeat offenders — possibly nothing more than repeat disqualification.

This approach would have to be accompanied by a massive change in the tone used by WADA and the ADAs. Instead of denouncing the character of every athlete with an AAF, the authorities would need to say that the rider was being DQed as a precaution and out of a desire to hold a clean competition, even at the cost of barring riders when we’re not sure whether the rider doped.

Of course, if the ADAs could prove (beyond a reasonable doubt, or at least with substantial evidence) HOW the rider doped, showing that the rider intentionally doped or was unreasonably careless, THEN we could talk about substantial penalties.

Great post, Tom.

William, I no longer even try to summarize the WADA rules in non-lawyer terms. At the moment, the basic rule is that the rider must prove that the lab departed from a rule set forth in the WADA International Standard for Laboratories (ISL), at which point the ADA must prove that this departure did not cause the AAF. (This rule changes a bit beginning in January.) However, only a small portion of the ISL addresses how a lab is supposed to conduct its tests — a larger portion of the ISL addresses how lab methods are supposed to be validated, and it’s not clear whether an athlete can challenge a lab on these method validation rules, particurlarly after the lab method has been reviewed as part of the lab’s accreditation. But even more confusing is what happens after the athlete proves an ISL departure (remember that in the AAA hearing in Malibu, even the majority arbitrators agreed that Landis had approved a number of ISL departures). While the Landaluce case stated that the ADA must at least present SOME evidence that the ISL departure did not cause the AAF, in practice nearly ANY kind of presentation of evidence by the ADA seems to be enough. As far as evidence goes, strictly speaking the athlete is limited to the standard documentation package required by the ISL, but the arbitrators have discretion to require the labs to produce additional evidence (and in the Landis case, there WAS a lot of additional evidence produced, though it wasn’t always the additional evidence sought by the Landis team). Remember that the EDF reprocessing requested by the Landis team was NOT something specifically required by the ISL.
bannaoj August 11, 2008 at 4:21 pm: Below is a link to an interesting piece I found in Sports Illustrated online. The beginning talks about how common it is for world records to fall in swimming in the current climate (the suits plus new pools etc) In the last couple of paragraphs swimming is contrasted by track and field and the “presumption of guilt” where doping is concerned. No direct cycling content but worth a casual read for the curious.

http://sportsillustrated.cnn.com/2008/olympics/2008/writers/tim_layden/08/11/world.records/index.html?eref=T1
Lester August 11, 2008 at 5:52 pm: What about the “B” sample? Since the “B” test must confirm the “A” test doesn’t that greatly decrease the probability of a suspension due to a false positive? (Provided that the false positive is a result of a statistical probablity rather than testing error or test manipulation).

If that is true, then the reliability of the test (to not give false positives) need not be nearly as high because the probability is basically squared.

The way I see it is that if there is a 1 in 100 chance of a false positive, then the chance of a second false positive for the same sample is 1 in 100 times 1 in 100.

So something like 1 in 10,000? Is this right?
Larry August 11, 2008 at 6:41 pm: Lester, great question! I’ve been asking this question myself, and I don’t have a definitive answer yet, but I’ll share what I’ve picked up.

If you look at the exchange of information above between me and wildiris, you’ll see that we’re both contemplating the reasons behind false positives. If the reason for the false positive is an honest and non-systematic lab mistake, then yes, repeating the test could be helpful. But there are lots of OTHER potential reasons for false positives. Perhaps the lab makes the same mistakes over and over (in the Landis case, such lab mistakes were uncovered), in which case doing the test a second time isn’t going to help. Or perhaps the false positive we’re looking at has nothing to do with a mistake, but is a flaw in the test procedure itself. As Tom points out here, human biochemistry is NOT simple and it is NOT uniform. Tests that work well on the majority of the population will work poorly or not at all on some people, and this is a cause for false positives (the cause that wildiris focused on). Moreover, there are considerable fluctuations in the biochemistry of any single person! The athlete’s biochemistry may have been affected on a particular day by a viral infection, or dehydration, or having recently trained at altitude, or even (as my friend Tom has proven) by drinking a beer or two. In any of these cases, simply repeating the test is likely only to result in a repetition of the same false result.

All this being said, many people on this forum strongly argue for the importance of “B” testing, precisely because it should result in some (probably unquantifiable) improvement in test accuracy. But to get the most out of “B” testing, the testing should be done in the way William and others here have recommended: the test should be performed in a “blind” fashion (the lab should not know that the test is a “B” test, and that there’s been a prior positive finding on the “A” sample) by a different lab than the one that performed the “A” test. Of course, there are practical obstacles in doing “B” testing in this manner, but the failure to overcome these obstacles reduces the value of the “B” testing.

The problem with the existing system is that the labs KNOW that they are conducting “B” tests, so they KNOW that if the test does not come out positive, then the lab has made a mistake. In the Landis case, the lab technician that performed the “B” test referred to the test as a “confirming” test … and the technician who performed the “B” test was supervised by her boss … the boss who happened to have performed the “A” test in the first place. In such a case, the potential for the “B” testing to reduce errors is of course reduced.

If you ask our host Rant (a doping historian, among his many talents) how many times a “B” test has ever contradicted an “A” test, you’ll learn that it hardly ever happens. The most famous example of conflicting “A” and “B” tests is currently being played out with Iban Mayo, who had an “A” test positive and a “B” test negative. This was a highly unusual case where, because of vacation schedules, the same lab was NOT able (not at first!) to perform both the “A” and the “B” testing. When the international cycling union (UCI) received the Mayo “B” test negative, they proceeded to shop the “B” sample to other labs until they were able to get a “B” test positive (not surprisingly, at the same lab that had initially performed the “A” test!). I don’t want to debate the merits of the Mayo testing at the moment, but this is a good illustration of the attitude towards “B” testing of certain authorities in cycling, and why the “B” testing does not provide us with the maximum possible (unquantifiable) assurance of accuracy.
Thomas A. Fine August 11, 2008 at 11:54 pm: Lester,

Floyd’s defense has been so focused on lab errors, that it’s easy to only think of that. But false positives are not primarily about lab errors, they’re primarily about actual variations in human beings. This kind of false positive, when retested, yields the same positive result, because the the value is truly high, just not because of doping.

These are the kinds of positives that Berry was focused on.

tom
Jean C August 12, 2008 at 2:25 am: Lester,

Yes, you are right about the square of the statistic… if we agree that the 2 “events” are not linked.

To retest a sample that is right.

But as point it Thomas that is not right for the variabililty of human body. But that kind of problems can be slightly reduce by complementary testings in different conditions.
For example, the testing of other stages, or outside competition testing could be used by any athletes with an AAF.

As pointed elsewhere some LNDD test have been elaborate with a population of 30 people. But I am sure that since the first testing to today testing they have used the real testing to adjust their references as it’s done in many industrial or scientist process.

I don’t think we can use EPO testing like other testing because of the necessary human interpretation of the results. And Mayo’ samples are rumoured to be EPO of 2nd or 3rd generation.
fmk August 12, 2008 at 3:53 am: That SI piece really should have done more on the pool technology side of things. This from elsewhere:

“The pool here is specifically designed to make swimmers feel more comfortable. At three metres deep, it is deeper than many others which means there is less resistance and turbulence off the bottom. It is wider, too, which means the waves can be dispersed into the empty outside lanes and into a sophisticated gutter system. The lane markers are also designed to force water down rather than outwards. This is not just any old council lido.”

Given that even weak – traditionally non-doping – nations are posting faster times, it seems hard to believe that doping is the sole cause of the advanced performances at these Games.

Coming from a country where some people still cherish Michelle Smith’s Gold medals I’m not going to claim swimming is clean. But I’m not sure how fair it is to point the dirty syringe at it just because records are tumbling, not without fairly considering the other factors.
Jean C August 12, 2008 at 5:18 am: CAS has confirmed the 2 year ban of Mayo
http://www.lequipe.fr/Cyclisme/breves2008/20080812_125559_iban-mayo-suspendu-2-ans_Dev.html
wildiris August 12, 2008 at 6:15 am: It was fmk’s comment at the top that prompted my comment. Rant, Larry and Tom, thanks for expanding and clarifying, what I admit was an incomplete observation.

As an engineer who has worked in the medical field, I can say with confidence that, compared to what we in industry have to go through to get an FDA approval, the current drug testing schemes for athletes, wouldn’t even pass the laugh test.
Dave August 12, 2008 at 6:58 am: Award can be found here:

http://www.tas-cas.org/d2wfiles/document/1799/5048/0/Press release Mayo ANgl.pdf

I guess they can do whatever they want, as the rules are just guidelines…

David
bannaoj August 12, 2008 at 11:08 am: I was extremely amused, at the men’s 4x 100 free relay. Yes, the US beat the French … which is a hotter topic in cycling than swimming. Yes, the times were faster due to the pool and the suits.

But, IMO the reason why the U.S. won, was because of the seeding and the ability to draft in water as you can in cycling. If the two teams had not been seeded next to each other, and the U.S. swimmer Lezak had not been able to catch the French swimmer’s wake, and draft for the first 50 meters and save a bit of energy. I don’t believe the U.S. would have won.

Swimming events are seeded based on the semi-final times. The fastest two semifinals get the most middle lanes, the next two, go to either side etc. This gives a “clean water” advantage to the fastest swimmmers, particularly in the older pools. You’ll see many swimming races that have a flying V of swimmers at the finish, with the fastest in the center. However, with this pool it doesn’t matter as much. One of the obvious indications to the contrary is Aaron Piersol, who loafed his way through prelims and the semis, and then beat the field from I think, lane 2. But, he admited he had to do it blind, and just “swim his own race”. You can’t see the field on the edge like you can in the middle lanes. That’s another advantage to the middle, you can often give the hairy eyeball to your closest competition… as happened in that 4x 100 free relay.
Rant August 12, 2008 at 11:26 am: bannaoj,
I was quite amused to see the US win the 4×100 relay, especially after a certain French swimmer engaged in a bit of trash talk. I’m with you, without the lane position that enabled the Lezak to engage in a little drafting, I think the French team would have won. Like real estate, it’s all about location, location, location, eh?
Thomas A. Fine August 12, 2008 at 12:03 pm: Yeah, but the French guy was retarded (or retahded as we say around hea’) because he was swimming really close to the USA guy, instead of on the other side of the lane.

I still say these suits should not be allowed. You can never go back now and compare old records to new. On the other hand, a pool is a pool, and there’s always going to be random differences in hydrodynamics. Unless they’re filling them with EPO…

Also, let me say that the USA guy that swam the anchor leg and ostensibly won it for USA has quite possibly the worst stroke I’ve ever seen in my life (for a top athlete). His breathing stroke took fully twice the time as his other side, and he bobbed up and down in the water like someone that doesn’t know they shouldn’t lift their head to breathe. It was actually a pretty good imitation of my swimming technique (but six times as fast).

tom
Rant August 12, 2008 at 12:34 pm: Wasn’t it Alain Bernard, the anchor, who was talking smack about smashing the US team? That, would be a fine comeuppance, eh? Lezak’s stroke may not have been pretty, but he had the smarts to know how to win. Afterwards, I thought I saw a quote from Bernard or from someone else on the French squad that the US win was a triumph of experience over talent. Well, maybe. But if you’ve gotten to that level of the sport, you ought to have enough experience to know that you don’t swim too close to the lane marker that separates you from your closest competitor.
bannaoj August 12, 2008 at 1:44 pm: Tom, the differences in the pool design are *huge* and make as much, if not more difference at the elite level than the suits do. In one sense it levels the playing field a bit, because Piersol, would have been at a considerable disadvantage in an outer lane with the older pools, because of the wave rebound effect on those outer lanes, the water used to be much more turbulent in the outer lanes than it was in the center ones. It means that someone can laze through the qualifing rounds knowing that if they make the top 8 they’ve got an decent chance of winning. It used to be you needed to be in the best 4 lanes of the 8 in the pool in order to have a good chance of winning if your competition was in the same class as you. So you needed to swim harder in the qualifying heats, to get yourself into those particular lanes where the best water was. The new pools change the qualifying strategies significantly, and if the swimmers are able to conserve energy in qualifying, it can change the ultimate outcome of the finals, especially when someone is swimming multiple events. You’ll see Phelps do the same thing in his prelims and semis because he’s got to save something in the tank for his finals, and in this pool you can set world records in outside lanes more easily.

(This happened with the women, the world record was set in the semis by the Zimbabwae girl, but the U.S. swimmer won, the world record holder came in 2nd. Neither of them equalled the world record time, set in the semi, so it is arguable, that if the Zimbabwe girl had conserved a little more energy in that semi, she probably would have won gold.)

As to the other issue you raise Tom, about the guy’s technique, you don’t criticize the rocking back and forth that McEwen or Cavendish do at the end of the race in a sprint. While technique is all fine and good for a time trial, in a sprint finish, form is sacrificed for power and acceleration. As Lezak had the fastest relay split time in history, I believe this was exactly the case. (FYI, relay splits are from when the toes leave the starting block, rather than a gun going off, which is why they can be faster than ordinary race times for the same distance) In fact Lezak’s race was one of the most analogous to a cycling sprint finish I’ve ever seen. He tucked in his rivals slipstream, until the last 15 seconds of the race when he rocketed past his competition. Rolling a bit much yeah, but he was wringing every ounce of power he could out of his body.

While not completely equivalent, one quick and dirty way to estimate power output between sports is to look at the amount of time they take. Considering how crude it is, it is suprisingly accurate. 500m in track cycling is very similar to 50m in water 1km in track cycling is roughly comparable to 100m in water. But the world records in cycling are often from high altitude circuits, because the air is thinner. These new pools could be looked at giving a similar advantage to the swimmer that the “high altitude” circuits do for track cyclists.
William Schart August 12, 2008 at 11:03 pm: It amazes me that world class athletes still engage in pre-event trash talk, thereby providing bulletin board material for the trashed, who then go on to win. You seem to see it over and over again. That is perhaps one further reason why the US team won.