Towards A New Anti-Doping Approach, Part III

by Rant on December 17, 2008 · 18 comments

As a brief recap, the previous post discussed the question of what drugs should be banned. The short answer: Only those drugs (or techniques, like blood doping) that have a performance-enhancing effect for the sport in question.

So, naturally, the next question would be: How do we determine which drugs actually are performance-enhancing and which ones aren’t? In a word: Science. Using a scientific approach to research and experimentation, we should be able to determine which drugs/techniques have a beneficial effect, in a sporting context. Then we can decide whether or not to ban those drugs/techniques. (Side note: Aspirin has a pain-relieving effect, as well as an anti-inflammatory effect. Should we ban athletes from using aspirin during a race? After all, an athlete who can endure more pain or discomfort may perform better.)

One of the problems we encounter, when doing biomedical-type research, is that the human body is an amazingly complex organism/machine. Sometimes, things we believe to be logically true aren’t born out in research the way we expect. (Otherwise, if everything worked out as expected, any number of drugs that never made it to market would have been the cure for cancer, heart disease, diabetes and more.)

Our knowledge of how things operate, in a biological sense, will never be perfect. At least, not in my lifetime. So it follows, then, that because the science behind the anti-doping tests deals with human biological processes, it will never be perfect, either.

Now, we’d all like the testing and the theories behind the testing that is done in the name of clean sport to be better. The better the testing, the more accurate the testing, the more likely we are to catch those who dope. Makes sense, doesn’t it?

We’d all like for the tests to be perfectly designed and perfectly executed. If that were the case, then only the guilty would get caught and the innocent would be spared any harm. That’s not the way the world works, however. To paraphrase Tom Fine, in the struggle to rid sports of the scourge of doping, we are confined to the science and testing techniques we have, versus those that we would like to have.

Currently we have some rather draconian punishment for anyone who is found through testing (and when athletes exercise their rights, the appeals process) to have doped. Given the possibility that the tests aren’t up to scratch, and that there is always a chance that a truly innocent person might get caught and destroyed by the system, should our anti-doping system traffic in such stringent punishments?

Does it make sense to have such extreme punishments, even for minor infractions, when the science behind the testing is imperfect? From my point of view, no. Because there is some uncertainty in the testing side of the equation, we need to have some flexibility in the enforcement side. Before we punish someone for a doping offense, the system should also take into account whether we can definitively know that the only explanation for a test result is doping.

If doping is one explanation and there are other possible explanations, we must rule out those other explanations before doing anything else. Can we do that with the existing tests? Sometimes. But not always.

Problem is (for someone who’s been accused of doping), in the current system the science behind the tests is deemed to be infallible. In other words, perfect. This is the wrong approach, for a number of reasons, not the least of which is the disadvantage someone wrongly accused of doping has in terms of clearing his or her name. Another equally important problem with this approach is that it’s the very process of questioning the assumptions and questioning the methods behind the science that leads to the improvement of the science and the testing. By making it impossible to challenge the way the tests are conceived and developed, flaws in the theories and practices of those tests may not be uncovered. If a test is built on faulty assumptions, then a number of athletes could pay a steep price and it will likely be a very long time — if ever — before the flaws in the test are uncovered and corrected.

We may be limited to the science we have. But we shouldn’t limit one very important aspect of science: The ability to question prevailing assumptions and possibly prove them wrong. And, as I’ve already noted, the process of questioning and testing can does lead to the changes, adjustments, revisions and better understanding that creates the science we would like to have. You can eventually get there from here. Exactly how long it takes is another matter. And in the meantime, we have to have the flexibility in the system to account for a certain amount of uncertainty.

As I discussed earlier in this series, that flexibility should be borne out in a way that doesn’t cause great harm to an athlete’s career, while finding a way to counter the benefits an athlete might get from using PEDs. Assessing a time penalty, which Morgan suggested, would allow a cyclist to race but would also take away any competitive advantage gained from doping. That seems like a good approach.

So how can we determine who’s doping and who’s not? Well, assuming we’ve got the science in order, then that’s through testing. How much and how often becomes a challenge, as does cost. We’ll save that for another time. First, let’s ask the question: Who should be conducting the tests? The obvious answer would be labs that specialize in this kind of drug testing. Which means places like UCLA, Montreal, Lausanne, and the oft-maligned (sometimes deservedly so, other times not so much) LNDD.

But before doing so, the labs need to have consistent guidelines on how to perform and interpret the tests. This is an area where the World Anti-Doping Agency has fallen flat on its face. While their mission is to “harmonise” drug testing practices at anti-doping labs around the world, they have failed to do so. If you go back to the Landis case, the talk of differing standards for declaring an adverse finding at different labs is proof enough that WADA has not yet achieved consistency in the anti-doping world.

I believe that this is, in part, because they have been trying to do too much (I’ll leave that for another post for more discussion). For the moment, the postulate will be that it’s WADA’s job to set clear standards on what procedures are performed, the manner in which they are performed, and the way the data resulting from those procedures is analyzed. This is not too much to ask. And once the labs have clear-cut criteria, the answer as to who did what will be the same, regardless of whether the tests were performed in Montreal, Los Angeles, Châtenay-Malabry or Timbuktu. This assumes that all lab personnel are properly trained. Standards for that, too, are the province of WADA. And WADA should be enforcing those standards when certifying a lab as competent to perform anti-doping tests.

All of this plays into the question for the next post: Exactly what should the role of WADA be? And further down the road, we’ll take a look at some of Larry’s observations, including the issue of “selective prosecutuion.” Keep those comments coming, everyone. By the time this series is complete, it will have evolved a great deal based on your observations and suggestions.

Jean C December 18, 2008 at 5:04 am: From http://www.lequipe.fr/Cyclisme/ENTRETIEN_PIERRE_BORDRY_1.html

BORDRY: “NO LONGER ON OVERDUE”
By Anthony THOMAS
Organizer of doping on the Tour de France after the ASO-UCI conflict, the AFLD has played a key role this year in tracking down cheaters and helped unmask the use of EPO-Cera. Its president Pierre Bordry has granted a long interview in which he talks about the benefits of targeting controls and the culture change he sees in the pack.

“Pierre Bordry, after the conflict between the UCI and ASO (organizer of the Tour de France), the French fight against doping was found with the responsibility for doping controls. How did you feel about this mission?

We designed a policy of appropriate controls. I took care to visit Pat McQuaid (president of the UCI) to be transparent. We gave all the results live to the UCI. It was transparent to the riders we did what we had said. They wanted something up to the reputation of the Tour de France.

As in 2007, the biggest cases have been revealed during the Tour. The fight against doping remains is it still a two speed races?

The UCI has in its rules that are controlled the first of the day, the second, the Yellow Jersey and some riders randomly selected. This will be good whether the Tour de France last few days but not in a three weeks race. If someone takes a banned product, such as EPO, the window of opportunity to detect the product in his urine is very small, a few days. So, the athlete is organized not to be likely to be controlled. It lurks in the pack and is revealed when the product is the effect but can not be revealed. We have been done differently: we took a blood sample of the 187 riders at the start, then another in Toulouse. As part of the fight against doping. We warned that if there were elements in these abnormal blood samples, we would seek the use of doping products. Instead of a regulation, it is targeting.

However, we had to wait several months after the end of the Tour riders to unmask the positive EPO-Cera.

The tradition is urine. We found traces of EPO that were not clear enough. The lab explained afterwards that he could return the samples to a blood test and detect CERA.

The use of Cera, was apparently known in the peloton several months before the Tour de France.
We must ask why the UCI they have not looked before. We knew before the start of the Tour de France that there were possibilities of using CERA in cycling competitions. It has obtained Cera of EPO before the Tour to see how they react. We were informed by foreign authorities in the fight against doping, either by cyclists themselves. Since June-July, the laboratory of Lausanne and Roche Laboratories have tried to build a test. It was thought that this would be out with the Olympics and before the Tour de France. Unfortunately, this test was released after the Tour de France. It has taken all those urine samples doubtful and resumed the search for EPO in the blood with the test ChÃ¢tenay, in cooperation with Lausanne in September-October. The penalty is based on the analysis results ChÃ¢tenay-Malabry.

So you’re satisfied with your targeting strategy profiles doubtful.

The big difference in the organization controls, the targeting and its unpredictable side. If the athlete knows two hours beforehand that will be controlled, as was the case with the UCI, it can happen things: the type may abandon, and so on. For example, when we wanted to control Ricco against the clock to Cholet, he sought to evade the controls. It took escorts to run after the catch. Why was he doing that? Perhaps because he wanted to make a manipulation to avoid a positive test. They were all convinced we would not find EPO Cera. They did not lose.

Seven cases positive on the whole bunch: is it a symptom of a culture change or symbol of a struggle still insufficient?
It is possible that some doped we have escaped. It was found in September can say the same as many are not doped. They have normal blood parameters. And this is a very big change.

By preventing the riders abnormal blood profiles, the objective was also to play on deterrence.
The deterrent effect has worked but I thought he walk more because we did exactly what we said before the Tour de France. But between the two blood samples, some profiles have “improved” and some have done worse performance than expected. They paid attention, obviously for some riders. ”

Do you consider that the delay controls on the cheaters boils down?

These positive cases proved a posteriori, evidence that is furthest behind. There is no anti-doping research without a very effective. For now, the agency has a Scientific Advisory Board with high level including five foreign famous advisors. They are all very aware of new techniques. We know in advance new products, new molecules. This lets you search. Today, I know dozens of molecules that could soon be used for doping. It could happen fairly quickly. The most difficult challenge will be to put them on the list of banned products. The image of large laboratories suffers diversion of the use of their molecules. So the labs are beginning to contact the agency for reporting products that could pose problems. “
Morgan Hunter December 18, 2008 at 7:37 am: Hey Jean,

I am glad to see that some “sanity” is coming into the “talk about doping” — I have never been against “dope control” what I am against is the methods and rationalizations involved. Thanks for the translation.

Larry,

I do not accuse you personally of being cynical — I hope you understand this — I find the idea cynical. You are absolutely right — “we” have a snowballs chance in hell” to change the activities of the alphabet soup — but what we do have is a forum to perhaps come up with REASONABLE solutions to the present problem.

I see every one of your points as true as far as the “realities” of the present situation — but I hope to “brainstorm” with others and perhaps come up with a paradigm that is better than the one being used today. Since the powers that be — claim that “their way” is the “only and best way” — I see no other alternative then to do exactly what Rant is trying to collate here.

It is my opinion — that if we do come up with “reasonable” and workable solutions for the problems that exist — then we have a better chance to approach the problem with more than just verbal resistance to what we now have. I guess I see this as an “open dialogue” with the aim of doing just that.

No doubt about it — it is not a “simple” circumstance — therefore I do not expect it to be a “magic bullet-one size fits all” solution.

I believe that the “effort” must be made to come up with solutions — whether Rant gets voted into the presidency of WADA or not.(:-)

I believe we see eye to eye concerning the “things that just don’t work” in the present system. These must be changed — the only way they will be changed is if we come up with “workable solutions.”

From reading the Pierre Bordry interview — upon close inspection — even with it being a maschine translation — one can discern that there is change in the tone and content to some degree.
Jean C December 18, 2008 at 8:32 am: Larry,

I don’t understand why you aren’t in favour of targetting.

It seems that everywhere targetting is common today.

In science, industrie and so… even the police use it.

Or would you say that their targetting is erroneous?
Larry December 18, 2008 at 11:24 am: Jean C –

Let’s discuss targeting.

In the broadest and most general terms, targeting simply means focusing more attention on some things, and less attention on other things. You are right, we “target” all the time. For example, if you put me in charge of inspecting bridges, I might “target” the older bridges for a more complete inspection, figuring that old bridges were more likely to have problems than new bridges.

So then, why would I have a problem with the current use of targeting to increase the effectiveness of the anti-doping system in cycling?

My primary problem is that of selective prosecution. Consider the example I’ve used before, that (traffic permitting) most American drivers routinely drive faster than the posted speed limit. There is a myth in the United States that you’re more likely to get a speeding ticket if you drive a red car … so let’s say that the police decided to target drivers of red cars as likely speeders. What we would see, for certain, is that drivers of red cars would be more likely to get speeding tickets than, say, drivers of white cars. Is this because the targeting worked, that it correctly identified a group of drivers more likely to drive too fast? We’ll never know, because the targeting itself focused disproportionally on drivers of red cars — the targeting alone resulted in a larger percentage of these drivers receiving speeding tickets.

We can see the same problem in cycling. Traditionally, cycling has targeted the race leaders — the guys in the maillot jaune, the guys who win stages, and so forth. Not surprisingly, we’ve seen that a relatively high percentage of the top riders have tested positive, compared to the rest of the peloton. Does that mean (as some have suggested) that the top riders are doping more frequently than the domestiques, or even that you have to dope to be a top rider? The answer is, we can’t tell from these statistics, because the top riders have been targeted for much more extensive testing. Whenever group you target, you’re going to find a larger percentage of whatever it is that you’re looking for.

Now, I have no problem with targeting the race leaders. We CARE more about whether they are doping, and while we’d like to see a competition that is 100% free of dope, we especially want to see a podium in Paris where all of the winners rode clean. There’s also a second, important factor: targeting the race leaders is a transparent and purely objective form of targeting. We know who is going to be targeted, and why.

I DO have a problem with the targeting performed in this year’s Tour de France. This targeting lacked transparency — we do not know who was being targeted, or why. We suspect that much of the targeting was based on blood tests performed at the start of the Tour, but we do not know the criteria used to evaluate these tests. We also suspect that some people were targeted because they’ve been suspected of doping by the ADAs, or because of a past history of doping.

If you take as a given that you’re likely to catch a larger percentage of the target group, it is essential that the targets be selected fairly. So, for example, if the AFLD decided to target any rider with a hematocrit level above 48%, I would have no objection. This may or may not be an effective target, but at least the target is based on a reasonable theory and is purely objective. I get nervous if the targeted group consists of riders with “suspicious” blood levels, when there’s no objective criteria for what is and is not suspicious. I get VERY nervous when the target is based on the nationality of the rider, or whether the rider is performing “suspiciously” on the road, or is obnoxious, or has made critical comments about the powers in cycling.

Truth is, those of us who live in the U.S. have witnessed the potential dangers of “targeting”. Groups targeted by U.S. law enforcement have traditionally been racial and religious minorities, the poor, political dissidents, and other groups lacking in social and political power. These groups have been selectively prosecuted, and selectively convicted, generating skewed statistics that suggest (unfairly) that these groups are more likely to engage in criminal behavior.

This is the major part of my problem with targeting. But I have a second objection, which is why targeting is required at all. While we don’t have all the facts (again, a lack of transparency), the AFLD targeting appeared to be a matter of repeated testing of the targeted riders. In other words, the targeting is based on the idea that to catch a rider like Ricco, you need to repeat the same test 10 times to see if he fails it once or twice.

I understand that there are legitimate reasons for repeated testing — you may be testing for a doping substance that disappears quickly from the human body, or that can be masked by a clever rider who knows that a test is coming. But there’s a dark side to repeated testing. As we’ve seen from the Berry article in Nature, ALL anti-doping tests are based on the measurement of a statistical anomaly — where an athlete fails a doping test, what’s really happened is that the scientists have measured something in the athlete’s urine or blood that is statistically unusual. However, if you repeatedly perform the same test on the same athlete, even if you perform the test properly, the chances increase that a test will display an anomaly. It’s like rolling dice: give me one roll of the dice, and the chances of my rolling “snake eyes” (1 – 1) are very small. Give me enough rolls of the dice, and I promise you I’ll roll snake eyes sooner or later.

Ricco asked a good question — it was obnoxious, coming from him, but still a good question: if they tested him 8 times for CERA, how come he only failed 2 tests?

Anyway, I hope this begins to answer your question.
Larry December 18, 2008 at 11:26 am: Morgan, got it, agreed.
Morgan Hunter December 18, 2008 at 4:16 pm: Rant,

With the intent of bringing information that may be “pertinent” to this matter — I ran into this little tidbit . I recommend that the readership Google the commentator and the Institute he speaks for.

I found it of interest because it seemed to me that it “addresses” the mentality and the political posturing behind the ideas fostering our present day situation in “battling doping and the purported belief” why it is the correct approach to the problem.

It would appear — that the “implied promise of drug testing” does not work, as expected.

Who the heck is Lewis Maltby? Maltby is the founder and president of the National Workrights Institute. As a senior private sector executive, Maltby learned that human rights and corporate efficiency are not only compatible, but mutually reinforcing. He left the corporate world in 1988 and founded the National Workplace Rights Office of the American Civil Liberties Union. In 2000, Maltby and his ACLU staff realized the need for an independent organization to fight for human rights on the job and created the National Workrights Institute.
”
The number of employers conducting drug testing is in a long-term decline,” Lewis Maltby, president of the National Workrights Institute, reports. “And most employers who do test, only test for preemployment.”

Maltby cites figures from American Management Association member surveys that show a steady drop in private-sector drug testing, from a peak of 81 percent in 1996 down to 62 percent in 2004. Why the drop?

“Employers are beginning to realize that drug testing is not producing any improvement in the bottom line,” Maltby says. “Most employers who bought into drug testing did so because the government and the drug-test industry promised it would increase safety and productivity, and that promise was not kept.”
Jean C December 19, 2008 at 2:52 am: Larry,

About Berry’s article, I don’t remember ormore probably was unable to detect if it was bad written. His points apllied to hct limit testing are:
– amongst 100 riders, 1 will test false positif
– one sample tested 100 time has 100 time the same result (false negatif or positif, or negatif or positif)

Berry was not referring to the % of error linked with testing process, but to the differences that can be found amongst human bodies of many people which can cause a false positif result.

If I am correct Berry gave “Nostradamus” prediction about Bejing testing, was he right?
I doubt because he missed the point that athletes are often tested so that kind of errors are for old tests already known. (like Cunego who has a hct level often higher than 50%).
He forgot one other major point that there is less discrepancies amonst athletes than in the all population!

If someone could rerear berry’s article and confirm that it would be fine! thanks
Jean C December 19, 2008 at 3:17 am: Larry,

More specificaly about targeting,

Even if you were right about Berry’s point, your fears about the testing errors would not change the problem: innocent would be affected, athletes targeted or not would be false positive… an innocent targeted or none targeted would be harmed (for the same number of testing of course!)!
And if the the targeted riders were most of the bad guys, the false positive would be a “divine” punishment for their sins… Better to have a false positive for a doped rider that to have a clean athlete with a false positive.
But Berry was false…

About the selected target, I do think that we must trust people who are doing the selection as some of us trust …the athletes! It would be unfair to accuse a rider of doping when a positif test exist but it would be fair to say that WADA or NANDA or a lab or a technicien would be corrupt with no evidence!
Jean C December 19, 2008 at 7:27 am: And about Ricco,

CERA molecules are big. They go in urine just under big physical effort. So it’s not anormal that they didn’t get an AAF for all of his urine samples! Probably he tested positive for the hardest stage. I recall one of the two was on Cholet ITT and probably the second was at the middle mountain SuperBesse !
I read somewhere that a positive EPO test requiert an 80% of the max graduation but there is very few doubt with a 50%.
I do think that with biopassport a 50% correlated with other abnormal parameters could become an AAF.
William Schart December 19, 2008 at 9:24 pm: Quoting Jean:

“And if the the targeted riders were most of the bad guys, the false positive would be a “divine” punishment for their sins”¦ Better to have a false positive for a doped rider that to have a clean athlete with a false positive.”

The problem is that we don’t necessarily know if targeted riders are in fact the bad guys because 1. we don’t know what criteria are used for targeting riders and 2. suspicions about riders being dirty are often based on very circumstantial evidence. It is a very poor system indeed if we think it acceptable to convict a rider via a false positive because we only suspect the rider may be dirty.

Targeting a rider like Landis, who does have a “record” is IMO acceptable: there is a definite reason for it. Some of us may not think the convict was valid, but that is beside the point here. But should Armstrong be targeted because of the various allegations against him? Perhaps, perhaps not. How many rumors, allegations, innuendos, etc. are required before we target a rider?

Then there is the possibility we may be targeting the wrong riders if our targeting is based on faulty information. If we spend a lot of time and money testing riders who are in fact clean, there is less time and money to test other riders who may be dirty.
Rant December 19, 2008 at 10:03 pm: Jean,
Berry’s article can be found here. Unfortunately, one has to pay to see the whole article. I’ve got a printed version somwhere. I’ll try to dig it up in the next few days. If my memory is correct, there was no “Nostradamus” prediction about the Olympics and doping at the Olympics in his article.
The probabilities he calculated involved the possibility of a false positive over a given number of tests, assuming both 95% and 99% specificity. Although he made no conclusion about Landis’ guilt or innocence, he calculated that if the tests used on Landis’ sample had a 95% specificity during the 2006 Tour, then from the 8 samples Landis gave there would be a 34% chance that one of the samples would come up as a false positive. If those tests had 99% specificity, then there would still be an 8% chance of a false positive.
Berry also made the point that the rate of false positives is meaningless without knowing the rate of true positives from a given test. So, if the rate of false positives is 1 out of 1000 tests a test can sound pretty darned accurate. But that can be misleading, in terms of false positives. Consider such a case, and where the rate of true positives is 1 out of 100 tests. Out of 1000 tests, one can expect 10 true positives and 1 false positive. Which means that in that scenario, each positive result has a 1 in 11 chance of being a false positive. That’s not quite so convincing a test result. The failure to put the rate of false positives in the context of true positives and to argue that because false positive results are so rare that an accused person must be guilty is sometimes known as the “Prosecutor’s Fallacy.”
William,
Regarding targeting (which is a topic I’ll be taking up at some point), my view is that we need to have clear, specific standards around who gets targeted. There needs to be some credible evidence, not just a bunch of rumors, that suggests an athlete is doping before he or she gets the targeted testing treatment. Testing positive might lead to further testing, for example. Perhaps not after a first offense, but after a second or third positive result, targeted testing might be called for. But I could be on board with targeting after an initial positive result.
In that case, it sends a powerful message to would-be cheaters. If you’re going to be doping, and we catch you, we’re going to keep an eye on you from that point forward. That won’t stop the most determined (or most delusional), but I’d hazard a guess that the thought of being under constant scrutiny might stop a good portion of potential cheats before they even start.
Morgan Hunter December 20, 2008 at 5:47 am: Rant,

You say: “If you’re going to be doping, and we catch you, we’re going to keep an eye on you from that point forward. That won’t stop the most determined (or most delusional), but I’d hazard a guess that the thought of being under constant scrutiny might stop a good portion of potential cheats before they even start.”

I do not believe that such a method would “stop” cheating – in a sense – we have part of this “thinking” in action now.

If we take away the debate about the “testing proceedures” being effective” or scientifically backed – we have the situation you discribe. People “know” that they are being tested” – but this does not deter them from cheating. One explanation is – “wedll – the testing allows too many to slip through.” this may very well be true – but even if it didn’t the response from cheaters will not be to “not cheat” but how to “get away with their cheating” so the issue then comes full circle again – what is to be a “good solution2 against cheating and how we deal with an innocent being “caught2 in the net of the guilty.

In my opinion – it is simply a question of looking at this as: HOW DO WE FRUSTRATE or to put it anotherway – how do we negate the “benefits” of cheating?

Today – we have a flawed testing system that only “functions” because the alphabet soup has made it impossible to “question” their results. IN REALITY by doing this – they also have curtailed our ability to “improve” the sywstem – NOT ONLY THAT – but the issue of guilt or innosence cannot be comed at.
Rant December 20, 2008 at 1:02 pm: Morgan,
I’m thinking about this targeting not quite in the sense you might have taken it. What I’m thinking is, if we’re going to have targeted testing, there have to be some transparent “rules” about who gets tested.
Now, targeted testing could live in a system where the punishment for being caught is to negate the advantage, ala a time penalty or adjustment. That would catch people who are consistently and systematically cheating. Well, it would catch some of them. And it could serve notice that because you tested positive, we’re going to keep a closer eye on you. I don’t know that it’s so much a deterrent, as it is a warning. Keep up your current behavior, and you’ll continually suffer the consequences.
That said, all testing needs to be scientifically sound and accurate. If it’s not, then we haven’t improved much, if at all. What I’m against is arbitrary, capricious targeting of athletes. A “we’re gunning for you because we don’t like you” kind of approach is simply unacceptable.
Disciplinary actions should be in proportion to the offense. And anyone accused should have the opportunity to challenge and refute the allegations in a transparent way, based on rules that allow for a fair hearing, rather than the biased, one-sided process that passes for hearings today.
Where assessing time penalties, instead of outright bans, has an advantage is if an athlete successfully defends him/herself, then the results can be easily readjusted.
The problem of what to do about groups (teams, for example) who systematically cheat, however, is more complicated. But I believe that there is a way to address that wrinkle, too.
Morgan Hunter December 20, 2008 at 4:04 pm: Good job Rant – thanks for the forum.
Larry December 21, 2008 at 12:37 am: Jean C, Rant has responded ably to most of the points you’ve raised.

Berry made a number of strong points in his article that are difficult to pick up without reading very closely. For me, the most significant point raised by Berry is that doping analysis is primarily a question of statistics and probability. The tests being performed in anti-doping (at least, the test used to convict Landis) are tests that look for conditions that supposedly are (1) highly unusual, and (2) connected to a prohibited doping practice. They are not tests that directly determine the presence of a doping product.

This is a subtle but important point. The anti-doping tests can only determine that an athlete has PROBABLY doped. There is always going to be a possibility of a false positive. Some results are simply going to be unusual, outside of the curve, off the scale, however you’d like to put it. That’s not a criticism of the tests, or the scientists, or the labs. That’s just how it is.

Repeat the same tests over and over on the same athlete, and the chance of a false positive increases. That’s simple statistics.

In theory, if you test the same guy enough times, chances are he’s going to fail one of the tests.
Jean C December 22, 2008 at 5:11 am: Berry’s article

The science of doping

Recently, the international Court of Arbitration for Sport upheld doping
charges against cyclist Floyd Landis, stripping him of his title as
winner of the 2006 Tour de France and suspending him from competition
for two years. The court agreed with the majority opinion of a divided
three-member American Arbitration Association (AAA) panel and
essentially placed a stamp of approval on a laboratory test indicating
that Landis had taken synthetic testosterone. Although Landis asserts
his innocence, his options for recourse have all but dried up.

Already, in the run-up to this year’s Olympic Games, vast amounts of
time, money and media coverage have been spent on sports doping. Several
doping experts have contended that tests aren’t sensitive enough and let
dozens of cheaters slip through the cracks. And some athletes are facing
sanctions. Upon testing positive for clenbuterol, US swimmer Jessica
Hardy was held back from the Olympic team and faces a two-year ban from
the sport. She is attesting her innocence. China has already banned
several athletes, some of them for life, on doping charges. Indeed, many
world-class athletes will find their life’s accomplishments and
ambitions, their integrity and their reputations hinging on urine or
blood tests. But when an athlete tests positive, is he or she guilty of
doping? Because of what I believe to be inherent flaws in the testing
practices of doping laboratories, the answer, quite possibly, is no.

In my opinion, close scrutiny of quantitative evidence used in Landis’s
case show it to be non-informative. This says nothing about Landis’s
guilt or innocence. It rather reveals that the evidence and inferential
procedures used to judge guilt in such cases don’t address the question
correctly. The situation in drugtesting labs worldwide must be remedied.
Cheaters evade detection, innocents are falsely accused and sport is
ultimately suffering. Prosecutor’s fallacy One factor at play in many
cases that involve statistical reasoning, is what’s known as the
prosecutor’s fallacy1. At its simplest level, it concludes guilt on the
basis of an observation that would be extremely rare if the person were
innocent. Consider a blood test that perfectly matches a suspect to the
perpetrator of a crime. Say, for example, the matching profile occurs in
just 1 out of every 1,000 people. A naive prosecutor might try to
convince a jury that the odds of guilt are 999:1, that is, the
probability of guilt is 0.999. The correct way to determine odds comes
from Bayes rule2–4 and is equal to 999 times P/(1-P) where P is the
“˜prior probability’ of guilt. Prior probability can be difficult to
assess, but could range from very small to very large based on
corroborating evidence implicating the suspect. The prosecutor’s claim
that the odds are 999:1 implies a prior probability of guilt equal to
0.5 (in which case P and 1-P cancel). Such a high value of P is
possible, but it would require substantial evidence. Suppose there is no
evidence against the suspect other than the blood test: he was
implicated only because he was from the city where the crime occurred.
If the city’s population is one million then P is 1/1,000,000 and the
odds of his guilt are 1001:1 against, which corresponds to a probability
of guilt of less than 0.001.

The prosecutor’s fallacy is at play in doping cases. For example,
Landis’s positive test result seemed to be a rare event, but just how
rare? In doping cases the odds are dictated by the relative likelihood
of a positive test assuming the subject was doping (“˜sensitivity’)
against a positive result assuming no doping (which is one minus
“˜specificity’). Sensitivity and specificity are crucial measures that
must be estimated with reasonable accuracy before any conclusion of
doping can be made, in my opinion. The studies necessary to obtain good
estimates are not easy to do. They require known samples, both positive
and negative for doping, tested by blinded technicians who use the same
procedures under the same conditions present in actual sporting events.
In my view, such studies have not been adequately done, leaving the
criterion for calling a test positive unvalidated. Laboratory practices
Urine samples from cyclists competing in the 2006 Tour de France were
analysed at the French national anti-doping laboratory (LNDD) in
ChÃ¢tenay-Malabry. This is one of 34 laboratories accredited by the World
Anti- Doping Agency to receive and analyse test samples from athletes.
The LNDD flagged Landis’s urine sample following race stage 17, which he
won, because it showed a high ratio of testosterone to epitestosterone.

Based on the initial screening test, the LNDD conducted gas
chromatography with mass spectrometry, and isotope ratio mass
spectrometry on androgen metabolites in Landis’s sample. Such laboratory
tests involve a series of highly sophisticated processes that are used
to identify the likelihood of abnormal levels of plantbased androgen
metabolites (from dietary or pharmaceutical sources) in a urine sample.
The goal is to differentiate from endogenous androgen metabolites
normally found in urine. Mass spectrometry requires careful sample
handling, advanced technician training and precise instrument
calibration. The process is unlikely to be error-free. Each of the
various steps in handling, labelling and storing an athlete’s sample
represents opportunity for error. In arbitration hearings, the AAA threw
out the result of the LNDD’s initial screening test because of improper
procedures. In my opinion, this should have invalidated the more
involved follow-up testing regardless of whether or not sensitivity and
specificity had been determined. Nevertheless, the AAA ruled the
spectrometry results sufficient to uphold charges of doping.

During arbitration and in response to appeals from Landis, the LNDD
provided the results of its androgen metabolite tests for 139 “˜negative’
cases, 27 “˜positive’ cases, and Landis’s stage 17 results (see Fig. 1).
These data were given to me by a member of Landis’s defence team. The
criteria used to discriminate a positive from a negative result are set
by the World Anti-Doping Agency and are applied to these results in Fig.
1b and d. But we have no way of knowing which cases are truly positive
and which are negative. It is proper to establish threshold values such
as these, but only to define a hypothesis; a positive test criterion
requires further investigation on known samples. The method used to
establish the criterion for discriminating one group from another has
not been published, and tests have not been performed to establish
sensitivity and specificity. Without further validation in independent
experiments, testing is subject to extreme biases. The LNDD lab
disagrees with my interpretation. But if conventional doping testing
were to be submitted to a regulatory agency such as the US Food and Drug
Administration5 to qualify as a diagnostic test for a disease, it would
be rejected.

The problem with multiples Landis seemed to have an unusual test result.
Because he was among the leaders he provided 8 pairs of urine samples
(of the total of approximately 126 sample-pairs in the 2006 Tour de
France). So there were 8 opportunities for a true positive “” and 8
opportunities for a false positive. If he never doped and assuming a
specificity of 95%, the probability of all 8 samples being labelled
“˜negative’ is the eighth power of 0.95, or 0.66. Therefore, Landis’s
false-positive rate for the race as a whole would be about 34%. Even a
very high specificity of 99% would mean a false-positive rate of about
8%. The single-test specificity would have to be increased to much
greater than 99% to have an acceptable false-positive rate. But we don’t
know the single-test specificity because the appropriate studies have
not been performed or published.

More important than the number of samples from one individual is the
total number of samples tested. With 126 samples, assuming 99%
specificity, the false-positive rate is 72%. So, an apparently unusual
test result may not be unusual at all when viewed from the perspective
of multiple tests. This is well understood by statisticians, who
routinely adjust for multiple testing. I believe that test results much
more unusual than the 99th percentile among non-dopers should be
required before they can be labelled “˜positive’.

Other doping tests are subject to the same weak science as testosterone,
including tests for naturally occurring substances, and some that claim
to detect the presence of a foreign substance. Detecting a banned
foreign substance in an athlete’s blood or urine would seem to be clear
evidence of guilt. But as with testing for synthetic testosterone, such
tests may actually be measuring metabolites of the drug that are
naturally occurring at variable levels.

Whether a substance can be measured directly or not, sports doping
laboratories must prospectively define and publicize a standard testing
procedure, including unambiguous criteria for concluding positivity, and
they must validate that procedure in blinded experiments. Moreover,
these experiments should address factors such as substance used (banned
and not), dose of the substance, methods of delivery, timing of use
relative to testing, and heterogeneity of metabolism among individuals.
To various degrees, these same deficiencies exist elsewhere “” including
in some forensic laboratories. All scientists share responsibility for
this. We should get serious about interdisciplinary collaborations, and
we should find out how other scientists approach similar problems.
Meanwhile, we are duty-bound to tell other scientists when they are on
the wrong path. Â¦
Jean C December 22, 2008 at 6:11 am: Larry, Rant,

Of course, Berry is right about the statistic but wrong with its application, especially with values.
Of course multiple testings increase the probability of a false positive, and that is a ratio of the number of testing. We have just to know if the increase affect drastically that propability: if the odds of a false positive is 0.01%, a targeted athletes would have just 1% probality of a false positive after 100 testing!

If we apply his given values to Beijing testings (many GC/IRMS were alleged done) or eventually to the testing of the most tested riders of the recent era like Cipollini, Jalabert or Kelly (they got more 300 tests during their career, Armstrong was just a 1/3 of one of them)!

What forgot Berry:
– lab’s scientists well know that kind of basic statistic science!
– a positive test is not the result of a single test but the combination of multiple test and measurement

And the margin error is the sum of different kind of errors: methodology procedural difference between human bodies,
For exemple : a test with a procedural error of 1% can reach 0.0001% by doing 3 different measurements.

The combination of a screening test and a GC/IRMS decreases drastically the methodology errors.
Athletes are a part of human so the tested population has less discrepancies than the whole population

William,
I doubt that the targeting of riders are done about the hearsay. WADA,UCI and AFLD had record about the testings of precedent years ( hct level, …). So when you have a rider who has an hct level around 40 off season and of 48 at the begining of TDF, that guy is a perfect target! More if he had finished a GT with a very high hct level.
Sudden improvments in performance are clue too.
I do think that people are more rational than what we write.

Larry (bis),
I am not surprise that the red cars get more fines than other color.
Human body reacts more to the red color, that is our legacy to detect the danger: an agressor ready to fight pumps more blood so he is redder.
So unconsciously we have more attention for red things!
Rant December 22, 2008 at 11:38 am: Jean,
I’m not entirely sure I’m following you. Regarding the comment about the odds of 0.01%, where are you getting that? p, the probability that an event will occur, is a value between 0 and 1. When talking percentages, you need to multiply p by 100 to reveal the percentage chance of an event occurring. So if p=0.001 (one of Berry’s examples), then expressed as a percentage, the outcome has a likelihood of 0.1 percent, or one occurrence out of 1000 tries.
An AAF is the result of more than one kind of test, although that’s not always the case. But when the drug is testosterone, as illustrated in the article, it is. For the probability of a false positive from multiple types of tests, I believe that’s determined by the product of the probabilities for each test type, rather than the sum. Your illustration certainly points in that direction. The sum of the probabilities for a series of three tests, as you’ve illustrated, would make it three times more likely that a false positive would occur rather than three or more orders of magnitude less likely.
Using a screening test followed by a more accurate test will definitely narrow the results. Any positive on the screening test will trigger the (we hope) more accurate test. To that extent, the negatives (both true and false negatives) are weeded out. But to evaluate the results of the final test, Berry makes a good point. You need to know the final test’s capabilities not only on the false positives, but on the true positives, in order to understand what the results indicate. On the second test, if there are 10 true positives for every false positive, then it’s not as effective as if there were 100 true positives for every false positive, and so forth. A false positive rate, as commonly spoken about, refers to the number of false positives for all tests given, rather than the number of false positives compared to true positives. The prosecutor’s fallacy is the use of the former, rather than accounting for the latter. And it gives a distorted picture of just how accurate test results are, which I believe is one of Berry’s main points. WADA doesn’t publish such statistics though, and from what I’ve seen, neither do their affiliated labs. So it’s a leap of faith to assume that the end result (an AAF) has an extraordinarily high probability of being correct. The fact is, we don’t have enough information to make a conclusion one way or another.
Thanks for posting the quote from Berry’s article. I’ve seen a longer article by him on the subject, so I’m not sure if that was the whole thing from Nature, or just the editorial that accompanied the article.