The Science Of It All

by Rant on September 25, 2007 · 50 comments

in Doping in Sports, Floyd Landis, Tour de France

Much of the majority panel’s decision to find Floyd Landis guilty of doping during the 2006 Tour de France hinges on the underlying science of the tests, and testing processes, used in anti-doping testing. If you’ve read the majority’s opinion, or seen discussion of it on this blog or in other places, then you’ve no doubt seen references to gas chromatography (GC), mass spectrometry (MS), or isotope ratio mass spectrometry (IRMS). IRMS is also referred to as carbon isotope ratio, or CIR. These testing techniques may be combined and referred to as GC/MS (gas chromatography immediately followed by mass spectrometry) or GC/IRMS (gas chromatography immediately followed by isotope ratio mass spectrometry).

Lots of names and acronyms. But what are they? A while back I found an article called GC/MS Analysis, written by Frederic Douglas, that does a good job of explaining the basics. So rather than re-invent the wheel, I’m going to quote from his article.

A description of gas chromatography

Imagine a pile of different types of balls resting at the bottom of an inclined, paved driveway. This pile includes ball bearings, marbles, ping pong balls, golf balls, wiffle balls, handballs, tennis balls, hockey pucks, baseballs, soccer balls, volley balls, basketballs, footballs, and bowling balls. Attempt to move this motley collection of balls up the driveway with a normal leafblower. Some of the pile will quickly move to the top of the driveway immediately, some balls will migrate at varying speeds, and some balls may take an eternity to reach the end of the driveway.

The difference in the time that each type of ball takes to travel to the top depends upon the characteristics of each ball. Obviously, the lighter balls travel more quickly. Also, some balls may take longer due to their shape, like the hockey puck or the football. The different balls interact with each other as the air from the leaf blower acts on the pile. This interaction may hinder or accelerate the ball’s travel as the balls strike each other. The surface characteristics of the ball may be important, as in the examples of the tennis ball and golf ball.

GC [gas chromatography] analysis depends on similar phenomena to separate chemical substances. A mixture of chemicals present in a specimen can be separated in the GC column. Some chemical and physical characteristics of the molecules cause them to travel through the column at different speeds. If the molecule has low mass it may travel more swiftly. Also, the molecule’s shape may affect the time needed to exit the column. How the different substances relate to each other may cause the time needed to travel the column to increase or decrease. Interactions between the sample’s molecule and the column surface may cause the molecule to be retained inside the column for a different amount of time than similar molecules that interact with the column differently.

A description of mass spectrometry

MS [mass spectrometry] analysis is commonly used in arson investigations, engine exhaust analysis, petroleum product analysis, and for blood monitoring in surgery. MS identifies substances by electrically charging the specimen molecules, accelerating them through a magnetic field, breaking the molecules into charged fragments and detecting the different charges. A spectral plot displays the mass of each fragment. A technician can use a compound’s mass spectrum for qualitative identification. The technician uses these fragment masses as puzzle pieces to piece together the mass of the original molecule, the “parent mass.”

The parent mass is analogous to the picture on top of a puzzle box, a guide to the end result obtained by putting together the fragment masses, or puzzle pieces. From the molecular mass and the mass of the fragments, reference data is compared to determine the identity of the specimen. Each substance’s mass spectrum is unique. Providing that the interpretation of the output correctly determines the parent mass, MS identification is conclusive.

A description of isotope ratio mass spectrometry

Douglas’ article doesn’t talk about IRMS, so I’ll give it my best shot here. Isotopes of an element are variations of that element with a different atomic mass, due to more (or less) neutrons in the atom’s nucleus. Carbon has three isotopes, carbon-12 (12C) which has 6 protons and 6 neutrons, carbon-13 (13C) which has an additional neutron, and carbon-14 (14C), which has yet another additional neutron.

In isotope ratio mass spectrometry, as applied to anti-doping testing, the item of interest is the ratio of carbon-13 to carbon-12. Both forms of carbon occur naturally. The theory behind the test is that the amount of carbon-13 in synthetic hormones (which are derived from plant proteins) is different than the amount of carbon-13 in hormones made by the body. If the results of an IRMS test exceed a certain threshold, then the test results indicate the use of a synthetic hormone. At least, that’s the idea.

Because we are (literally) what we eat, a person who’s a strict vegetarian and eats a lot of plant-derived food will have a different amount of carbon-13 in his or her system than someone who’s a strict carnivore. WADA even has research that indicates that over time, through diet, a person can actually test positive by the current criteria without having taken synthetic hormones. That said, having a vegetarian meal the night before a big race isn’t going to make you test positive. And that’s not an issue in this case. Just an interesting aside.

So, to get to the technique. In IRMS testing, the compound is combusted into CO2, and then the relative quantities of CO2 that has an atomic mass of 44 and CO2 that has an atomic mass of 45 are compared.

This result gives the same ratio as if just the carbon-12 and carbon-13 atoms could be separated out by themselves. At the end of IRMS testing, however, you don’t have the building blocks of a compound to look at, you only have CO2, so if you don’t get the compound you want to analyze properly isolated or identified prior to the IRMS test, your results will not be correct.

A description of retention time

The amount of time that a compound is retained in the GC column is known as the retention time. The technician should measure retention time from the sample injection until the compound elutes from the column. The retention time can aid in differentiating between some compounds. However, retention time is not a reliable factor to determine the identity of a compound. If two samples do not have equal retention times, those samples are not the same substance. However, identical retention times for two samples only indicate a possibility that the samples are the same substance. Potentially thousands of chemicals may have the same retention time, peak shape, and detector response.

Retention time vs. Relative retention time

In paragraph 185 of the majority’s ruling, they state:

185. The additional time added to the RT of the analyte or standard in the IRMS will always be a constant time, regardless of the individual substances or compounds being measured. Consequently, the retention times of the compounds emerging from the GC/MS system cannot be the same as those coming from the GC/C/IRMS. Likewise, the RRTs will also be different. Taking the example used above, if the RT from the GC/MS is 10 min for the target analyte and 5 min for the internal standard, in the case of IRMS, we may be adding an additional 1 minute for the combustion of those compounds to take place. The reason that the additional time is the same for each substance/compound is that the substance or compound is no longer in its original form; they have been combusted completely to form CO2. As such, the RT for the target analyte at the end of the IRMS would be 11 min and the RT for the internal standard is 6 min. This results in a RRT of 11/6. Arithmetically speaking it is not possible for the RTs and the RRTs to be identical in the GC/MS and GC/IRMS systems nor can it be ensured that it will be within TD2003IDCR.

Relative retention time is a comparison of the retention time of a material to the retention time of a standard compound in the same system. So if unknown compound X takes 10 minutes to go through, and the standard S takes 5, as the majority uses in their example, then the relative retention time is 10/5. For the second system, the relative retention time is 11/6.

Can the two systems be compared? The answer is yes, as long as there is a constant difference between the two systems. In the majority’s example, the constant difference is one minute. So, adjusting for that difference, the relative retention time in the second system is (11-1)/(6-1), or 10/5. That is, it’s the same. Whenever there is a consistent difference such comparisons can be made. So if it takes twice as long in one system as the other, the comparison would still hold up. To illustrate that mathematically, if the original ratio is X/S, the second system would yield a ratio of 2X/2S, which is the same ratio (the 2s cancel out).

Some limitations to GC/MS testing

For GC/MS testing, Douglas notes, there are some limitations.

Although many consider GC/MS to be the “gold standard” in scientific analysis, GC/MS does have some limitations. Because great faith is maintained in GC/MS analysis, erroneous results are not expected and hard to dispute. However, false positives and false negatives are possible.

Some problems with GC/MS originate in improper conditions in the GC portion of the analysis. If the GC instrument does not separate the specimen’s compounds completely, the MS feed is impure. This usually results in background “noise” in the mass spectrum. If the carrier gas in the GC process is not correctly deflected from entering the MS instrument, similar contamination may occur.

Also, the MS portion suffers from the inexact practice of interpreting mass spectra. An analyst must correlate computer calculations with system conditions. The typical memory bank for MS identification contains about 5000 spectra for a particular group of compounds. Even if a competent analyst could find conclusive results pointing to one substance out of 5000 substances, this does not rule out the remaining over 200,000 known existing chemicals. For the 5000-spectra memory bank, the typical computer result is limited to as many as six possible identifications.

So who’s right?

That’s the $2-million-dollar question. Did the majority (or their adviser) get the science right or not? Did LNDD carry out the tests properly and properly interpret the results? There’s an interesting discussion going on at TBV over whether one can make a determination based on the graphs of the GC/MS results and the GC/IRMS results. One thing to note about the differences between the graphs is that when they are adjusted to fit on the same scale, the two sets of results appear to fall outside of WADA’s specified tolerances of 1% or 0.2 minutes, whichever is less. It’s also important to remember that just because two things look same, doesn’t mean they are.

You could dress me up in the same kit as Floyd Landis, put me on the same brand of bike with all the same components and so on, but I can guarantee that in a 40K time trial, Floyd would leave me in the dust — unless he spotted me about 10 or 15 minutes at the start.

[Note: At some point in the future, we will present an article that looks in greater detail at the arguments made in the majority ruling and see just how well they stack up to the science of the tests. The person writing the article for me has a full schedule, a very busy professional life, and is doing this as a spare time project. In other words, it may be a while before that article is done.]

Larry September 25, 2007 at 12:31 pm

Rant, I have a lot of questions and things I’d like to clarify.

First: the GC. If I’m following things, what the GC does is to take a substance (like urine) that contains a lot of compounds, and the GC spits out each compound, one by one. Standing alone … the GC cannot identify the compound, and it cannot measure the amount of each compound. Is this correct? (Yes I know, the GC is going to be coupled with an MS or an IRMS. But for the moment, I want to understand what each machine is supposed to do.)

Next: given that a substance like urine can contain any number of compounds, how can we be certain that the GC is doing its job and separating out each compound? How can we be certain that the GC is not spitting out two different compounds at the same time? I think this is a problem called “interference”.

Let’s use your driveway analogy. I’m standing at the end of the driveway with a leaf blower, and at the bottom of my driveway are a bunch of different kinds of balls. Let’s say that we have footballs and basketballs in the mix. Let’s say that basketballs are heavier than footballs, so all other things being equal, basketballs would move up the driveway more slowly than footballs. But footballs are oblong, so maybe they don’t roll as well as basketballs. Maybe the relative lightness of the football and the roundness of the basketball are factors that cancel each other out, and the footballs and basketballs move up the driveway at the same rate. How can we tell that this is taking place? (If you want to reserve a discussion of this “interference” until after I ask questions about the MS, that’s fine.)

Rant September 25, 2007 at 5:11 pm

Larry,

Let’s see: GC is for separation, MS or IRMS is to determine what, precisely (or not), might be there.

As to the many substances in urine, one way would be to process it through some reactions designed to remove some of the contaminants so that only a few things remain. How well that was performed might have an impact on the results.

And finally, for the driveway analogy: One clue might be the fragments that show up in MS, which would be performed after everything migrates through the GC part of the system. If you wind up seeing too many different types of fragments of varying weights, etc., you might be able to surmise that there’s contamination. In IRMS, you wouldn’t be able to tell very easily, because the compound is combusted into CO2, and without any traces of what the chemical was originally, this could lead to some incorrect interpretations.

Larry September 25, 2007 at 6:02 pm

Rant –

Yes, good point about removing contaminants you’re not interested in measuring. But I want to return to the question, how are you supposed to know that you need to remove these contaminants?

OK, so we’ve established that we have a GC that, in theory at least, takes a sample containing a mix of compounds and shoots out the compounds, one compound at a time. But we don’t know what the compounds are yet, and we don’t know the quantity of any compound. So next, we hook up the GC to an MS. The MS is something like a compound particle counter. As each compound is shot out of the GC, the MS measures the number of particles in each shot.

The MS produces one of the two kinds of charts we keep seeing on TBV and elsewhere, with a graph that looks something like a range of steep and narrow mountains. The x-axis of the graph is a time line, representing the “retention time” for each compound. Using your driveway analogy, retention time is the amount of time it takes for the “balls” representing each compound in the sample to travel to the end of the GC driveway. The y axis of the graph represents the amount of the compound. So you can measure the amount of the compound by the height of the peak in the GC/MS graph, or perhaps you measure the amount of the compound by the area contained within the peak. Not sure.

This is correct so far?

So, if I’m following this right, if you connect a GC to an MS, you can separate the compounds contained in a sample and you can measure the amount of each compound … but you still have not identified the compound. (That’s going to require doing a comparative analysis of GC/MS charts produced on spiked samples, but I want to hold the discussion of the spiked samples for a later post.)

OK. Let’s go back to the question of interference, where we have two different kinds of balls (let’s say that they are footballs and basketballs) that travel up the driveway at the same time. Let’s say that the footballs and basketballs travel up the driveway at EXACTLY the same retention time. How are you going to be able to tell from the GC/MS chart that you have two different balls with identical retention times? I don’t see how you can. You’re just going to end up (at the end of the day) thinking that you have more basketballs (or footballs) than you really have.

I’ve read a little bit about interference, but the stuff I’ve read about interference talks about peaks on the GC/MS chart that overlap. So in our example, there’d be a peak for footballs that would overlap with the peak for basketballs. In other words, the valley between the two peaks would not reach all the way down to the bottom of the GC/MS graph. I understand that in such a case, the lab analyst would be able to identify the overlapping peaks, see that he (or she) has a problem, and would work to somehow eliminate the interference (perhaps, as you suggested, by filtering out the footballs or the basketballs). But if the footballs and basketballs form a single peak rather than overlapping peaks, I don’t know how the lab technician is supposed to know that there’s interference.

I hope that my plodding through this material is not driving you nuts. I actually hope that we can figure out some of the important stuff, if I can nail these basics down.

Rant September 25, 2007 at 6:24 pm

Larry,

The test protocol would tell the person performing the test to do so, generally speaking. As scientists, they should be well aware of the many compounds that could show up in urine, so any that they aren’t concerned about and that they can remove through one method or another will make it easier to isolate what they’re really looking for. As scientists performing these tests, they should be aware of the kinds of contaminants and how they could be removed. If not, they’re not well trained in their work.

The amount of a compound present is determined by the area of the peak which represents the compound. And you’re correct, if the peaks completely overlap, you would not be able to discern which is which and how much of each you have. So that’s a big concern.

In theory (and well-run practice), the MS data should tell you what’s there. What you have are peaks for the various ions that come off the parent molecule, along with information about the molecule’s atomic mass. By the location of the peaks and the atomic mass, it should be possible to determine what the parent molecule is. Problem is, with poor (or no) separation, everything gets mucked up. Where there is total overlap, the technician (or whoever’s interpreting the data) might not know that there is interference. And in that case, you could be tricked into thinking something is there, or that there is more of that something than is actually there.

Larry September 25, 2007 at 7:17 pm

Rant, I’m going to drop another question on you, because I think it is directly pertinent to the GC/MS discussion.

As you know, the majority decision in the FL arbitration threw out the GC/MS test used to determine FL’s ratio of testosterone and epitestosterone (T/E ratio). According to the majority decision, the T/E testing is ultimately supposed to separately measure the ratio of three “ions”. Does this mean that, when we’re talking about a substance like testosterone, we’re not really talking about a single “compound” with a single retention time, but instead we’re talking about a substance made up of a lot of compounds (or “ions”), each with its own retention time?

Rant September 25, 2007 at 7:28 pm

Larry,

When you’re talking about almost any compound it can be broken down into smaller components. If these components are electrically charged (either positive or negative), they’re called ions. For an MS test, the compound is ionized and the retention times of the ions are measured. From there, you piece the puzzle together to determine what compound it was.

I would have to revisit the testing protocol to see how, exactly, the T/E ratio is determined, but I believe it has to do with the ratios of the areas of the peaks for T and E found in a gas chromatogram.

Larry September 25, 2007 at 9:15 pm

Rant, the more reading I do, the less I understand, and the more frustrated I get. I don’t know what I’m looking at any more.

Yes it is true, the GC does separate out each of the compounds in a urine sample. But the MS does more than count the particles in the compound. The MS blasts the molecules in the compound with electrons, causing them to break into pieces. Actually, it appears that not all of the molecules in the substance break into pieces, some of the molecules stay intact. Once the molecules have been attacked by the electrons, the resultant pieces (both the molecules that survived intact, and the pieces of the molecules that were broken apart) are called “ions”. (Answering my earlier question.) The molecules that did not break apart are called the molecular ions, and the molecules that did break apart are called the fragment ions.

The detector in the MS counts these ions separately, based on their mass (or maybe it’s based on their mass/charge ratio, I’m not sure). The resultant pattern is called a “spectra”, and each compound you might want to detect has a characteristic spectra (which probably differs depending on the type of machine you use). So when the GC/MS produces a spectra, you should be able to compare the spectra to a library of mass spectra of known compounds.

While I don’t want to jump ahead, I guess it’s these ions that are the stuff being further analyzed in the IRMS.

I’m too tired at the moment to ask questions. I guess there’s one kind of graph produced by the GC alone, and another by the GC/MS, and I wish I could just trust the lab people to do this stuff correctly so I wouldn’t have to work this hard trying to understand this.

William Schart September 26, 2007 at 4:26 am

I am troubled by the statement Rant quotes that just because a sample being tested has the same RT as a given substance, the ID is not necessarily valid as some other substance could have the same RT. Somewhere recently I read that a lab might have a library of several thousand compounds, but there are many more compounds than what these libraries have. To use our driveway analogy, you might have a library of the “RT”s of American footballs, baseballs, softballs, golf balls, and tennis balls, but have no info on soccer balls, Aussie footballs, rugby balls and so on. Now it just might be that the “RT” for an American football is the same as for an Aussie football. So how can you tell which variety you have?

Another question: the panel threw out Augenstein’s idea of using RRT because there were 2 machines involved and they claim you can’t compare RRTs between the 2 machines. Implied here is that the good doctor either wasn’t aware that there were 2 different machines used, or ignored that fact. But isn’t the use of 2 machines SOP and thus something that Dr. A would be well aware of? If so, this isn’t necessarily a case that he “got it wrong” but more a case of 2 competing methodolgies (taking a genreous view of things)? A more cynical view would be that the panel cobbled up a reason to discount his testimony, wrapping it up in some pseudo-scientific and mathematical mumbo jumbo? Since other WADA labs can’t contradict what LNDD did, we don’t know how, for example, the well-respected UCLA labs does this.

Of course, per WADA rules the science behind all the testing is presumed to be sound and can’t be challenged. But it is sounding more and more to me that the science isn’t all that cut and dried. There may vary well be sound ways of conducting GC/MS and GC/IRMS testing that can reliably ID the substance of interest if it does occur in the sample being tested, but it still is quite questionable that LNDD employed sound methods.

Rant September 26, 2007 at 4:35 am

Larry,

You’re getting it. I was trying to keep it a bit simpler, but you’ve definitely got the idea of GC/MS at this point. GC/IRMS is a bit different. Instead of ionizing the compound, it’s combusted into CO2. Otherwise, the same kinds of principles hold.

Larry September 26, 2007 at 6:41 am

William, I understand what you are saying. However, from all I am reading, it appears that GC/MS is very well accepted science. It’s used all over the place. And while it’s obviously more complicated than, say, a pregnancy test, it also does not appear to be rocket science. We’re routinely relying on GC/MS every day for a lot of things. I’m not sure what these things are, LOL, but this is what I’m being told.

The driveway analogy only adequately describes the GC portion of the machinery. I think you are quite right, if you’re just looking at the GC part of things, then in our driveway analogy, you have to worry about Aussie footballs and other unexpected stuff that might crop up in someone’s urine. But the MS portion of the analysis gives us a lot more information that should enable us to tell that we might be dealing with something like an Aussie football.

If we add MS to our driveway analogy, then you can see the GC as the guy with the leaf blower at the bottom of the GC driveway, blowing the various balls up the driveway. But we have to add an MS guy to the picture. If I understand this correctly … the MS guy is standing to the side of the driveway, near the end of the driveway, and HE has an Uzi! As the balls arrive near the end of the driveway, the MS guy riddles the balls with rapid machine gun fire. This causes the balls to break into pieces. Moreover, the balls don’t break into pieces randomly — they actually fragment predictably, and each kind of ball fragments in a way that’s characteristic for that ball. (Since we’re dealing with electrons and molecules, the fragmentation introduced by the MS is not the chaotic mess you’d expect if we were actually dealing with bullets and sports balls.)

So (and we’re probably pushing this driveway analogy WAY too far), basketballs might fragment so that the Uzi misses 20% of the basketballs, another 20% of the basketballs fragment into 2 pieces, another 40% fragment into three pieces, and another 20% fragment into 4 pieces. These pieces are the “ions” we keep reading about. The MS then detects all of these ions, and produces a spectrum graph showing the number and atomic mass (or mass/charge, I’m still not sure) of each ion type. So, what you should end up with is a two-dimensional spectrum that is characteristic of basketballs. It’s something like a fingerprint. It’s pretty precise, and you can compare it to the graphs in your spectrum library to make certain you are truly dealing with basketballs.

So in our example, assume that the basketballs and the Aussie footballs arrive at the end of the MS driveway at the same time. The MS guy blasts the footballs and basketballs with Uzi fire. The science tells us that the footballs and basketballs will fragment in different ways, producing a spectrum graph that you won’t be able to recognize. So in theory at least, you’d be able to tell that something went wrong, that what you’re seeing represents some kind of “interference” between the footballs and basketballs.

However, from reading the majority decision, it appears that interference can be a good deal more subtle than what I’ve described with basketballs and footballs. Remember, the majority threw out the LNDD’s T/E testing, because they only compared one ion of FL’s testosterone and epitestosterone. The WADA rules required them to compare 3 ions. The majority said that you had to compare 3 ions “in order to verify that there are no interferences at those ions, which could potentially affect the quantification, abundance or size of the peaks … Dr. Goldberger testified that in the case of ion 432 (the ion monitored by LNDD) there are over ten compounds that have a 90% to 100% abundance of ion 432.”

From this, you can tell that there is some problem with my explanation. From my explanation, LNDD should have been able to determine with precision whether any compound had interfered with their measurement of testosterone at ion 432. If there was such selective interference, then the measurement of ion 432 should have been out of proportion to the measurement of the other testosterone ions, and the spectral graph should not have matched the spectrum for testosterone in LNDD’s spectrum library.

So, we can say with some reasonable certainty that I’m STILL failing to understand what GC/MS is all about!

My failure to understand GC/MS may be caused by the fact that I don’t understand how the GC/MS produces separate spectrum graphs for every compound in an athlete’s urine sample. Does the lab have to run a separate GC/MS test for every compound they need to test for? That seems impossible to me, given the long list of WADA prohibited substances. But if they’re testing for multiple substances in the same GC/MS run, how do they distinguish between the ions in urine compound A and the ions in urine compound B?

Rant, if you’re still hanging in there, and you don’t regret having introduced the subject of footballs and basketballs, and you can shed some light here … care to do so?

Larry September 26, 2007 at 8:14 am

Hey! I just read on TBV that this topic is being written for the “scientifically challenged”. LOL. I resemble that remark!

If this is really a topic intended for the scientifically challenged … can we expect that Cynthia Mongongu is lurking around her somewhere?

Rant September 26, 2007 at 8:37 am

Larry,

Let’s change your analogy a bit here. Let’s say you have molecules made up of soccerballs, basketball, rugby balls, footballs and golf balls, in differing amounts, lightly glued together. You have a number of different types of molecules, and they may move through the system at different rates. So different that it’s possible to get mass spectrums on each molecule before the next molecule’s parts enter the MS portion of the device. The glue that holds the molecules together can be easily broken down by a bit of heat and a bit of pressure.

So now you have the components moving through and being detected. But the soccerballs and the basketballs are of similar weight, density, etc. so they move together simultaneously or almost simultaneously.

You get a certain mass spectrum, which can be associated with the compound soc-bask-rug-golf-itol. Let’s say this is your standard for comparison.

Now imagine you have an unknown compound (which turns out to be soc2-foot-golf-itol). You put that through the system, too, and magically, you have a mass spectrum that looks like soc-bask-rug-golf-itol, because soccer balls and basketballs move through the system in almost identical ways and the peak in the MS data looks the same for both.

Footballs and rugby balls, it turns out, don’t move through the system at quite the same rate, but they’re close. Close enough that on a purely visual inspection it’s easy to confuse the two.

If you only identify the compound based on the peak for soccerballs, you’ll get the ID wrong, because in one instance the peak really represents soccerballs and basketballs and in the other it represents only soccerballs. That’s why you need more than one peak (or ion) to identify the compound. Someone somewhere must have determined the odds on getting the ID right based on three unique peaks as being pretty high. But on a single peak, the odds of misidentification are pretty good.

With good separation, it should (theoretically) be possible to identify multiple compounds in a single sample, as they would move through the system at different rates — even though these compounds are ultimately made of varying amounts of carbon, hydrogen, oxygen and nitrogen (and perhaps a few other elements thrown in). As long as they are sufficiently separated, they should be able to get multiple results from one run. My impression is that tests are run to look for specific things, and that multiple tests are performed, using the different “aliquots” taken from the original sample.

Larry September 26, 2007 at 10:44 am

Rant, either I’m confused or there’s something in your revised analogy that requires clarification.

In your post, you refer to compounds and to molecules. I’ll do some Chemistry 101 here, mostly because there may be people lurking here like me who don’t remember their high school chemistry. All compounds are molecules but not all molecules are compounds. For example, H2 is a molecule, but it’s not a compound. A compound requires at least two different elements. I’m assuming that you’re using these terms in the way I’m describing.

In your revised analogy, the various different kinds of balls are molecules (I think). They are combined together (loosely glued, at you put it) to form different compounds. They’re all at the bottom of the driveway. The GC guy turns on the leaf blower, and the stuff at the bottom of the driveway starts moving up the driveway. What’s not clear to me is exactly what form this stuff is in as it’s being blowed up the driveway — are we blowing up the compounds or the molecules? (Please remember, I’m only asking about the GC at this point, I’m holding questions about the MS until I get this GC thing figured out.)

I’ll hold further questions awaiting clarification.

Rant September 26, 2007 at 11:46 am

Larry,

Point taken regarding compounds versus molecules. Since we’re talking about biological molecules the terms are interchangeable, as each molecule is composed of several different types of elements, and a compound is a molecule comprising more than one element.

In the analogy, the form is ultimately molecules that we’re talking about. The balls represent different types of structures, which could be single atoms, or single ions, or clusters of atoms in a grouping (like CH2OH).

Michael September 26, 2007 at 12:19 pm

Is this correct:
The GC/MS is used to identify the presence and mass of specific molecules in a given sample. If we take a sandwich and want to know the ratio of mayo to bologna we would filter the sample (to remove the lettuce, tomato and bread) and then run the remains through the GC/MS. The sample is broken into component ions (ingredients) by this process. There are multiple ion markers that indicate the presence of mayo and bologna (each ion would represent an ingredient in the sample). WADA states that we must use at least three of these markers to ensure that we are measuring the correct stuff. You can see how this becomes sticky – each product has its own recipe, therefore we would need to know that our markers are ingredients in the products in question (and not mutual ingredients or contaminants). The resulting spectra output would indicate the mayo/bologna ratio through interpolation and comparison to charts supplied by the device manufacturer. The spectra are fingerprints for the specific compounds found. The manufacturer of the GC/MS must provide charts full of spectra of commonly looked for compounds. Each specific GC/MS machine would create it’s own fingerprint for each compound and only if it is calibrated to a specific setting (I assume that sample size is also a factor).
_
Presumably, it would be purely coincidental if two different machines provided the same spectral output, because each machine would break-up the compounds differently and filter them at different speeds and precision. However, the question of RRT implies that all GC/MS create the same specific ions, and the only difference between the spectra and output data is caused by relative retention time.
_
Now the sandwich analogy kind of fails when you get to IRMS. Nevertheless, if the mayo/bologna ratio is found to show abnormally high ratios of mayo to bologna then we have to find out how much of the mayo is that good egg and oil kind and how much is that artificial “juiced” kind. So I assume that the sample is FURTHER processed to isolate the carbon (in the form of CO2) through combustion. The IRMS then can measure the C12 and C13 (I assume because they have a different electrical charge and mass).
_
Does the lab take the same material that was used in the GC/MS and then process it in the IRMS, or do they take another portion of the original sample? The GC must be performed, in order to perform the IRMS (?) so that it can be shown that the carbon measured does in fact come from the product in question.
_
I assume this is why there are questions of relative retention time and the use of two different machines. LNDD did not utilize standard chromatographic conditions between runs, and therefore could not accurately (mathematically) adjust for the relative retention times between the GC and the IRMS. This means that LNDD should not have been able to prove the results. Dr. Brenna says that they (visually?) compare the peaks to see if they can find peaks that match the baseline rather than accurately adjusting for RRT – in fact Brenna said that you couldn’t always adjust for RRT between machines, which makes no sense to me. He seems to be saying that if two different machines are used, then the only “reliable” method to adjust is a visual comparison of the peaks arbitrarily scaled to a baseline. Huh?

William Schart September 26, 2007 at 12:53 pm

Larry:

I guess you are getting at what my next question was: are these machines/tests only used for anti-doping work or more general? I had assumed the latter, but I guess the panel’s dismissal of those who aren’t “part of the club” kind of threw me just a bit.

OK, they are machines and tests used for things other than WADA purposes. Now there certainly must be SOPs for all this. Landis’ witnesses testified as to one direction for an SOP but USADA went another direction. So who is right.

Over at TbV, one poster mentioned 3 possibilities (paraphrasing here):

1. The ID of compounds here is somewhat of an art.

2. There is some degree of differing opinions in the scientific community as to how to ID compounds, with the Landis camp of one side and USADA on the other side.

3. There is one excepted way to do the ID, so either one side is right or the other.

If #1 above is the case, one wonders if the technicians making the ID were sufficiently qualified to make the necessary IDs. If it was Mongongu and the other women (forget her name), I’d question that based on what I read re their testimony.

If #3 is the case, then there must somewhere be some commonly accepted standard that can indicate the commonly accepted method. If it sides with the Landis camp, then he definitely got screwed; if it sides with USADA, then I think us Landis supporters just may have to bite the bullet here.

If #2 is the case, then this is the most problematic situation. How to judge between 2 competing opinions, either of which has support from the scientific community? I can’t come up with a good answer to that. except to suggest if this is truly the case, then perhaps we should not be using this to destroy someone’s career until the scientific community can sort it out.

If these tests and machines are commonly used outside of the WADA system, then there are people who are qualified to answer these questions and who are not handcuffed by the WADA omerta. Perhaps Rant’s promised guess writer will shed some light here.

Larry September 26, 2007 at 1:26 pm

Rant –

The “sports balls” in our analogy (the atoms/ions/atom clusters) are separated from our mixture as a result of the work performed by the GC? Or do we only see these sports balls after the MS has ionized everything?

Larry September 26, 2007 at 2:27 pm

William, yes, you and I are seeking answers to the same questions. FWIW, I am the person who asked the question about which of the three possibilities described the state of GC/MS testing. I am told that GC/MS testing is relatively routine stuff, not easy to do correctly, but not rocket science either. If that’s the case, then we ought to be able to puzzle out whether or not the science supports the majority decision in FL’s case.

It’s a little strange that world-class experts like those presented by the FL team would disagree so sharply with the experts presented by USADA. Or maybe not so strange. In a case like this, we need to look carefully at what each side had to say. I suspect that both sides were telling the truth, but that they were talking past each other in some way. But I don’t know for certain. Let’s see what we can figure out by asking a lot of questions and learning what we can learn about this science.

Rant, Michael is anticipating where I’m trying to go, when I ask you about the stage in the process where the sports balls first emerge. The parties could not agree how to use the GC/MS data to identify the peaks graphed by the IRMS. That’s one reason why I’m spending so much time trying to figure out exactly how the GC/MS actually works.

Rant September 26, 2007 at 5:20 pm

Michael,

That’s a heck of an analogy. But I think you can take it all the way through the IRMS, especially with the idea of determining it the mayo has been “juiced.”

Larry,

The sports ball molecules I talked about earlier should emerge from the GC intact. It’s when they go into the MS or the IRMS that they get broken apart. What I was talking about earlier was an example of GC followed by MS. Should have made that more clear.

William,

Those do seem to be the three possibilities (as Larry originally posted at TBV). I’d say it’s a worthwhile exercise puzzling through all of this, as it’s a good way to determine who is closer to “The Truth.”

Larry,
How to interpret the GC/MS and GC/IRMS data seems to be the central point of debate amongst the experts, as far as I can tell. It makes me wonder, if there’s this much real contention between scientists as to how this data should be interpreted whether any results can ever be definitively determined. And not just in Floyd’s case, but every other case of this kind.

Larry September 26, 2007 at 6:30 pm

Rant –

OK. Then in our long-suffering analogy, we start with a mixture of sports balls – compounds at the bottom of our GC driveway. We switch on the GC, and (in theory, at least), each type of sports ball (each compound) proceeds separately up the driveway. We acknowledge the possibililty, however, that two or more types of sports balls might have characteristics that would have them both move up the driveway at the same time, but we’ll hopefully be able to control for this later on in the process.

At the end of the driveway, the MS Uzi starts firing electrons, fragmenting the sports balls into ions, and the ions are counted by the MS. Then the MS prepares a graph. The graph, I believe, looks like the graphs you can see in the TBV discussion. Let’s use figure 1a from this discussion, which all can view at http://bp2.blogger.com/_xX3hgPBOgag/RvbJfIA2PWI/AAAAAAAAAkw/xIVwWkS8ijI/s1600-h/landis-f3-gcms-usada-348.png.

If I’m reading correctly the graph at figure 1a, what we’re seeing is a graph showing when the fragments of the various sports balls reach the MS fragment counter, and the amount of each kind of fragment reaching the counter. The ions are sorted by retention time, the retention time being a measurement of the mass (or maybe the mass/charge ratio) of the ion in question. The taller the peak, the more ions were counted by the MS fragment counter. Correct?

Looking at the graph at figure 1a, I can count about 17 large peaks, and even more tiny peaks. Four of the large peaks are labeled, and if you’ve closely followed the testimony in the FL case, you’ll know that the first 3 of these labels refer to metabolites identified with testosterone. (The fourth label, pregnane, is a “reference compound”. We haven’t tried to figure out yet the significance of reference compounds.)

First question: what are all those other peaks in figure 1a? The ones without labels? Are these unidentified testosterone ions, or are they ions from another compound?

Michael September 26, 2007 at 8:05 pm

Elemental analyzers (in this case a capillary gas chromatograph) based on the flash combustion of solid organic samples, are interfaced to IRMS to facilitate C12/C13 isotopic analysis of unprocessed samples. The GC is necessary to separate the molecules. These molecules are then fed into the IRMS (i don’t know if this is a direct feed). The IRMS first ionizes the sample and then separates the ions according to their mass to charge ratio.
_
It seems to me that if an IRMS analysis comes up with an unusual level of C13 found in a pattern that roughly approximates the pattern found in the GC then apparently the lab is free to believe that the C13 comes from the molecule in question. If this assumption is considered reasonable by the scientific community, then I guess FL’s only recourse would be to show that the samples were contaminated (wasn’t this very thing shown in the Wiki-defense?).
_
Another way to look at it: If the lab can confidently say that C13 is in the sample, is it really necessary for them to actually prove its source? My guess would be yes because FL almost certainly ate his vegetables the night before, and certainly was loaded up with C13 from his cortisone shots. But perhaps the science would say otherwise.
_
This strikes me as an unusual lack of precision, but maybe not.

Rant September 27, 2007 at 7:36 am

Larry,

The identified peaks show various metabolites of testosterone, with the exception of pregnane, the “reference.”

What those other items are is a good question, and one we’re not going to be able to answer very easily. I think this is a GC graph, since they’re identifying compounds (a/k/a molecules), as opposed to parts of a compound (ions). If that’s the case, they must be other substances found in the urine sample. But that’s not clear. The other peaks could represent the ions that make up each of the compounds, for example. Perhaps on the USADA page that TBV references (348, I believe) there is more information about what type of output this is. It’s interesting to note that the machine is listed as an “MSD22” and the acquisition method (if I’m following the graph right) appears to be manual rather than automatic.

Michael,

There could be a number of reasons for the GC/MS and the GC/IRMS graphs to look similar, when in fact they show different things. One needs to show that the conditions under which the former was done either match, or are sufficiently close to matching, the conditions of the latter to be able to draw the conclusion that one equals the other. Careful investigation of those conditions might reveal that they weren’t matched. In which case, you can’t be confident that the GC/IRMS graph has the same meaning as the GC/MS graph.

Given that Landis had a TUE for cortisone, and the metabolites of that cortisone would eventually clear his system, just measuring C13 wouldn’t be enough to say there’s a positive finding. They’ve got to be able to positively identify what the C13 came from, because if it did come from the cortisone, then he’d be covered by the TUE.

The lack of precision in process and procedure by the lab techs performing these tests is disturbing, to say the least.

Larry September 27, 2007 at 12:06 pm

Rant –

I had thought we were dealing with a GC/MS graph. I thought that the “MS” in the machine listing meant “Mass Spectrometer”. I could be wrong. FWIW, I think I’ve only seen two types of graphs in this case — the GC/MS graphs and the GC/IRMS graphs. There might be a third type of graph floating around in this case, but I haven’t identified it.

Also FWIW, there’s no more information on USADA 348 than what’s shown in figure 1a at TBV. Also, there’s no reference I can find, in either the arbitration testimony or the majority opinion, to manual versus automatic acquisition method. So I don’t know what that tells us.

I don’t think we can progress any further here until we know what we’re looking at.

By the way, a question to leave hanging for the moment: suppose we ARE looking at a GC/MS graph, and the identified metabolites ARE ions. How is LNDD supposed to use this graph to identify peaks on the GC/IRMS graph? I could be wrong, but I had thought that the GC/IRMS does NOT ionize the GC compounds before combusting them. So how can you compare retention times (or even “eyeball” peaks on a graph) if one graph measures the mass (or mass/charge ratio) of ionized compounds and the other graph measures the mass of un-ionized compounds?

Rant September 27, 2007 at 12:42 pm

Larry,

In looking at the new discussion on retention times at TBV, my moment of unclarity has been, well, clarified. The graphs we’re looking at are GC/MS output. There was some testimony, though I can’t recall where, that had to do with the automatic vs. manual processing of data. I don’t know that it would impact the output we see, it just seemed like something we might want to keep in the backs of our minds.

Comparing peaks on a GC/MS graph with peaks on a GC/IRMS graph is/would be a tricky thing, especially if the peaks happen to be different sizes or in different locations, time-wise. Once the incoming unknowns are thoroughly combusted, I’m not sure that the resulting peaks for the different masses of CO2 on the output would have any relation to the peaks on a GC/MS test of the same unknown. Similar shapes on similar graphs could well be coincidental, rather than of any real significance.

Larry September 27, 2007 at 1:17 pm

Rant –

Looking at Michael’s last post, I need to make a correction: it looks like a GC/IRMS DOES ionize compounds. So when we’re comparing GC/MS peaks to GC/IRMS peaks, I guess we are comparing ions to ions. However, I think that different ionizers might produce different ions, or ions in different relative quantities. I don’t know this for certain, but it does appear that different machines produce different spectrums for the same compound. So … I’m still not certain about the validity of comparing GC/MS peaks to GC/IRMS peaks. Even if you used the same GC to produce both sets of peaks, the ionization takes place (I think) in the MS and the IRMS, not in the common GC.

Going back to our GC/MS graph in figure 1a at TBV, I’d still like to know what we’re seeing when we look at the unidentified peaks. Are these unidentified (and perhaps unimportant) testosterone metabolite ions, or are these ions from other substances? This relates to a second and more general question. I’ve assumed (I don’t know, but I’ve assumed) that when a lab first tests an athlete’s sample with a GC/MS, the test is performed to look for a number of possible prohibited substances, and not one substance at a time. So … (1) does the machine produce a number of graphs in a single run (each looking for a single prohibited substance) or (2) does it produce a single graph that tests for a number of substances?

My guess is that choice (2) above is correct. If that’s the case, then perhaps the unidentified peaks on figure 1a are ions of other metabolites in the sample. These peaks might be significant for testing for other prohibited substances, or they might represent metabolites that we don’t care about. But if these unidentified peaks represent ions from metabolites other than testosterone, then how is it that these other peaks are mixed in with the testosterone peaks? I thought that the GC spit out each substance, one at a time. Then I’d expect the MS would ionize each substance, again one at a time. So, the first substance would enter the ionizer, get ionized, and splat! The ions for the first substance would hit the MS particle counter, not all at the same time, but in a distinct grouping in time (sort of like standing on a path and having a swarm of gnats fly by you. They’re not all going to fly by at the same time, but they’ll be grouped together and you’ll have some time to recover before the next swarm flies by). There would then be a pause as we wait for the GC to spit out a second substance. Then the second substance would come in, get ionized and splat! The third substance, splat! And so forth, in each case with a time delay between splats as we wait for the GC to spit out another substance. With the result that each substance’s ions would be grouped together on the GC/MS chart, and you would NOT end up with interlaced ions from different substances.

Any thoughts on this? (I can only hope that my gnats analogy joins our proud parade of analogies to sports balls and sandwiches!)

Rant September 27, 2007 at 6:35 pm

Larry,

The answer is 2, it provides a graph of a number of substances. A good illustration of this is in TBV’s relative retention time discussion, where he shows a reference standard compared to an unknown. The graphs we’re seeing in the Landis case, of any significance, are of the unknowns (a/k/a Floyd’s samples).

The gnats analogy is pretty good. The only way things get mixed together is if two substances moved through with not enough separation so that there is any time between them. Then you get some overlap. If they move together near-simultaneously, then you’ll run into some problems in identification. Did that particular gnat come from group A or group B?

Larry September 27, 2007 at 8:18 pm

Rant –

If the answer is (2), then either (a) all of the peaks in figure 1a in the TBV discussion (USADA page 348) are testosterone metabolite ions (in our analogy, they’re part of the cloud of “gnats” associated with testosterone, so they all arrived together at the MS particle detector at close to the same time), or (b) some or all of these other peaks are ions of some other metabolite, in which case we’re seeing some kind of interference (in our analogy, we have an overlapping group of gnats).

If (b) is the case, then I’m not sure we can trust the graph — some of the peaks may represent a combination of ions from two different metabolites. And my guess is that (b) IS correct. Otherwise, why does figure 1a show so many more peaks than the testosterone blank shown in TBV figure 1 (USADA 345)?

But … it CAN’T be the case that we can throw out chart 1a, simply because it shows more peaks than figure 1. That’s too simple — I don’t know why it’s too simple, but it can’t be right, or else one of the FL witnesses would have pointed this out.

On the other hand, I’ll note that in the sample to spectrum library comparison shown at http://www.unsolvedmysteries.oregonstate.edu/GCMS_06.shtml, the peaks in the sample and in the library spectrum are identical. The example shown at this web site may have been over-simplified so as to clearly illustrate the workings of GC/MS, but then maybe someone can point to a real world example on the web of how to make these sample to library spectrum comparisons.

Rant September 28, 2007 at 5:54 am

Larry,

(b) is the answer. Here’s why: When running reference standards (i.e., solutions that contain a known compound or set of compounds), you get a specific peak or set of peaks. With an unknown solution, since you don’t know exactly what’s there, you’ll get a series of peaks. Some of them may correspond to your reference, and some don’t. As long as the conditions you run the reference standards vs. the unknown are the same, the conclusion would be that the peaks that match are the same as those items in the reference solution. Therefore, you’ve identified part of the unknown.

Supposing you had other reference standards that contained different compounds, you could then see if the other peaks match those. If they do, you’ve identified even more of what’s in the unknown. If you have enough reference standards, you could eventually identify everything that’s contained in the sample.

In the case of Floyd’s sample, they want to locate the four metabolites used for the testosterone testing, along with a reference and an internal standard (the last two are not necessarily the same thing). They’re not so concerned about the other compounds, whatever they may be. So they don’t throw out the data just because it has other stuff that they don’t care about. They just ignore that data.

The key is that for the tests of both the reference standards and the unknowns, the conditions need to match. If not, then you need to determine a way (if possible) to correlate the results, or to re-run the tests under identical conditions. The latter is the better way, as then the comparisons are more straight-forward.

Michael September 28, 2007 at 6:35 am

A sample passes into a GC and it will separate the molecules as the sample travels the length of the column. The molecules take different amounts of time (retention time) to come out. The mass spectrometer is downstream and captures, ionizes, and detects the molecules separately as they are injected from the GC. The GC is in effect purifying the sample for the MS, so that the molecules are injected into the MS one at a time. The GC is imperfect because multiple molecules could appear at the same time (therefore impossible to alone detect the molecules found). The resulting GC filtered molecule passes into the MS and the MS breaks the separated molecules into identifiable ions. The GC time is relevant so that you can compare the ions with the GC filtered molecules – in other words, you would know that the ions represent a product that passed through the GC at time X. This is important to increase the accuracy of the findings. If a mass spectrum appears at a given retention time in a GC-MS analysis, it provides an increased certainty that the product searched for is in the sample.

To perform the GC/IRMS the sample must be filtered by the GC in the same way as for the GC/MS, for a similar reason (The lab must ensure that it is only measuring the C13 isotope from the molecules in question). The IRMS ionizes the molecule (the MS part of the IRMS) and then subjects the resultant ions to a combustion process that separates them according to their mass to charge ratio. It is critical that the sample be filtered by the GC before entering the IRMS so that only a single molecule enters at a given time.

So, Larry your example #2 is mostly correct. Your examples of what this means, (a) and (b) are also both correct. There are impurities in the sample versus the blank. The reasons for the variations are probably related to precision of the GC in its filtration process. Perhaps there is also a QC issue here, but I don’t see the variations as substantial, and obviously FL’s experts didn’t either.

But I can’t see how the arbiters got around the simple fact that RRT is apparently critical to the proper use of a GC/IRMS. I read it over and over and can’t reconcile what the manufacturers rep said, with what Brenna said, and with what the arbiters said.

Rant September 28, 2007 at 6:54 am

Michael,

Well put. One of the real puzzlers in the majority’s opinion is how they reconcile what Dr. Davis said with what Dr. Brenna said with whatever input Dr. Botre gave them during deliberations (and which may not be entirely reflected in the final document, except implicitly). I find their explanations, though, wanting for lack of real clarity. If justice is to be served, I think they needed to do a better job of explaining why, and I think they needed to do so in a way that’s understandable to a broader audience. Someone recently suggested to me in an email that you really demonstrate mastery of a topic by being able to explain it in clear, simple terms. That’s something which is missing from the majority’s opinion as far as I can see.

Larry September 28, 2007 at 10:04 am

Michael and Rant, thank you!

So let’s return to our belabored analogy of the sports balls and the GC driveway. I’m going to try and restate the analogy in terms of what you’ve written, and please tell me if you think I have it described correctly.

The IDEAL GC leafblower would blow the sports balls up the driveway so that each different kind of sports ball would proceed up the driveway in a single swarm (or a “pulse”, which is a term I’ve seen elsewhere), each swarm would contain only a single type of sports ball and each swarm would arrive at a different time, with a long enough pause between swarms to make it easy to separately measure each swarm. However, we have not yet invented the perfect GC. The GCs we have today do not produce swarms containing balls of only a single type. So, we might end up with both footballs and basketballs in a swarm we want to measure for basketballs only.

Now, in my usual “multiple choice” fashion, I’m going to suggest three reasons why our swarms are not pure:

(1) In our example, it works out that footballs and basketballs are naturally going to move up the GC driveway at the same speed. Footballs are lighter than basketballs and should move faster than basketballs, but they’re oblong and don’t roll as easily as basketballs, so these two factors cancel each other out, and these two balls (while very different) just happen to move up the driveway together.

(2) Our GC isn’t perfect. It DOES separate the sports balls at the bottom of the driveway into separate swarms, or pulses, before the balls reach the end of the driveway. But the swarms are not and cannot be pure swarms made up of only a single type of ball. This could be due to a couple of factors. It could be because the leaf blower GRADUALLY separates the different kinds of balls from each other — the balls start moving up the driveway in more or less an undifferentiated swarm, and gradually differentiate into different swarms as they move up the driveway, but the driveway is not long enough to permit complete differentiation. Or it might be that a certain number of golf balls and rugby balls are going to be swept up by (and carried along with) the swarm of basketballs, just like a swarm of gnats might sweep up a few flies or mosquitos that get caught up in the flight path of the gnats. This is the case even though golf balls might generally move faster up the driveway than basketballs, and rugby balls might naturally move more slowly up the driveway than basketballs. Possibly it does not matter why the GC does not achieve perfect separation of different kinds of balls. But you end up with the result that each swarm or pulse is not a “pure” mixture of a single type of ball.

(3) Some combination of (1) AND (2) occurs in a GC.

From what I’ve learned, I think that cases (1) and (2) are very different, and would have different practical results that would show on our GC/MS graphs. Let’s say that (1) is the correct answer to my question. Then in my example, when we use the GC/MS to test for basketballs, and there are footballs in the mixture at the bottom of the driveway, then we can be certain that the resultant GC/MS graph is going to contain basketball ions and football ions. Maybe we can’t identify that the football ions ARE football ions. But if it’s typical for a mix of sports balls to contain footballs, and if we’re experienced at testing these mixes for the presence of basketballs, then we’re used to seeing the exact pattern of ions we get in our test, and we know we can ignore the non-basketball ions. In contrast, one day we might test for basketballs, and get a GC/MS graph with the usual peaks for basketballs, plus unidentified peaks that are NOT the sizes and in the places where we’re used to finding unidentified peaks when we test for basketballs (in other words, these peaks are NOT measuring football ions). That would give us a clue that something funny is going on.

In contrast, imagine that (2) or (3) is the case. We should end up with a GC/MS graph that shows basketball ions and unidentified ions (ions from other kinds of balls). But we would not necessarily expect to see these unidentified ions display in any familiar pattern. The process I’ve described in (2) is more chaotic — in the (2) scenario, we’ve ackowledged that the GC cannot achieve complete separation, and the degree of the separation you’d see in any particular case is going to vary depending on complex and unmeasurable factors (such as the particular composition and density of the sports ball mixture we’re trying to analyze). So in (2), we would not expect to see any particular pattern of non-basketball ions on the GC/MS graph we’re using to test for the presence of basketballs.

My guess is that either (2) or (3) is the correct answer here. If (1) were the correct answer, then I’d expect to see the parties paying more attention to the unidentified peaks in these graphs.

Please let me know if you know the answer to this question. I have more questions to follow!

Michael September 28, 2007 at 10:27 am

I think that (2) is correct. However, the mixture that results is consistent. In other words, we know that our T/E ratio will also indicate a certain level of whatever – always. I think that it is known that golf balls might show up with the footballs, and that it doesn’t matter how many, because when the MS breaks them up they can be subtracted out (they would provide a known spectral output). If it is known that they will show up, then it is easy to account for them.
_
Now apply this to FL’s case: If the arbiters said that the GC/MS was inaccurate, then how do they say that the GC/IRMS is accurate?
_
Isn’t the retention time all that matters? To say that a lab can claim that the GC/MS peaks LOOKS the same as the GC/IRMS then it must be so seams too inaccurate. Isn’t there a mathematical way to prove that the two things are measuring the same thing?

Rant September 28, 2007 at 11:32 am

Michael,

I’m pretty confident there is a way to mathematically prove (or disprove, depending on the results) what you suggest. In fact, I believe some of Dr. Meier-Augenstein’s testimony spoke to that very question. The actual proof, I imagine, will be rather complex.

Larry September 28, 2007 at 1:23 pm

Michael, with all of my chemistry 101 questions, I’m trying to work my way up to some of the fundamental questions in the FL case. You’ve mentioned two of these questions: (i) when the majority decision states that you can do an “eyeball” comparison of the peaks in a GC/MS graph and a GC/IRMS graph, isn’t this just a “fuzzy” comparison of retention times or relative retention times, and (ii) how can the GC/IRMS test be valid if it relies on an invalid GC/MS test?

Michael, you’re raising a topic I WOULD like to focus on in an upcoming post. You’re suggesting (I think) that the GC/MS graphs we’re seeing are produced after the “subtracting out” of a certain amount of ions derived from metabolites other than the metabolite we’re trying to measure. I think I’ll be ready to dive into that question soon, but I need more background information first.

Let’s assume that you’re right, Michael, and that in my prior post, choice (2) is the correct choice. (Rant, do you agree?) So, we’re acknowledging that a GC does not do a perfect job of separating a mixture into pulses containing just a single compound. Each pulse contains a mixture of compounds. If the test is performed correctly, then presumably each pulse will MOSTLY contain a single compound, but there will be other compounds in the mix, and these compounds produce most or all of the unidentified ions we see in a GC/MS chart like the one shown in figure 1a (USADA 348) of the TBV discussion. You also stated that while these peaks may be unidentified, they are NOT unexpected — they’re going to show up in predictable places and sizes.

So, when an analyst looks at a chart like the one at figure 1a, he (or she) will say something like, “look, there are the three identifying ions for testosterone we expect to see at about RT 10.8, 15.2 and 15.5 … and look, there are the two relatively tall peaks I usually see on these testosterone graphs at 10.7 and 11.2, and the smaller peak I sometimes see at 12.0 … yep, looks like a pretty typical testosterone graph to me … only that little blip over at 16.0, I don’t usually see that, but it’s probably not so large that I need to worry about it.” So I might guess, then, that most of the unidentifed peaks on figure 1a are “typical” for GC/MS graphs testing for testosterone.

When I say “typical”, I mean that you can find other GC/MS graphs with the same number of these unidentified peaks, and on these graphs you find these peaks in roughly the same locations and with roughly the same height. I’m not saying that every GC/MS graph will have the same collection of unidentified peaks. But following Michael, you’d expect to see patterns emerge. Going back to our sports ball analogy, the peaks shown on figure 1a might represent a test for the presence of basketballs, where we also got some peaks for footballs and golf balls. You wouldn’t expect the same peaks to show up in every graph where you were looking for basketballs, because maybe golf balls and footballs aren’t present in every sample. But you’d figure that you’re going to get footballs, golf balls and basketballs in enough samples so you could look at a GC/MS graph for basketballs and say, yes, these unidentified peaks “typically” occur in a significant number of other GC/MS graphs.

So … given that I didn’t hear anyone on the FL team question the number, size or location of these unidentified peaks, I’m going to assume that these unidentified peaks are “typical” for testosterone GC/MS graphs.

Still … shouldn’t we be a little bit surprised at the number and size of these unidentified peaks? Ignore for the moment the peak for the reference sample. Then the first, second and fourth tallest peaks on the chart are identified testosterone metabolites, but the third tallest peak is an unidentified ion. In fact, the peak for 5a Androstanol AC is closely surrounded by two fairly tall unidentified peaks. I count about 12 unidentified peaks with an abundance over 1,000,000. By my rough calculations, the majority of the ions graphed on figure 1a are unidentified. Doesn’t that surprise anyone? Even if we completely rule out any interference, doesn’t that mean that the graph shows the result of an ionized pulse, where the majority of the compounds in the pulse were NOT testosterone?

Rant September 30, 2007 at 6:47 pm

Larry,

From your previous example, I think both 1 and 2 are possible, but that more often the scenario in 2 is what occurs. As for being surprised at the number and size of the unidentified peaks, I don’t think so. In urine, which contains many substances being eliminated from the body, I think it’s safe to hazard a guess that there are many more compounds than what are showing up in graph 1a. I suspect that if you looked at the same type of graph for other athletes, you would find a similar number of unidentified peaks. Perhaps not all in the same places (depends on what each athlete might have in his/her system), but similar.

Larry September 30, 2007 at 9:19 pm

Rant, one or both of us are getting confused here. Of course a substance like urine is going to contain many compounds. If our GC/MS graph is supposed to display ALL of these compounds to us, then I would expect to see a lot of sizable peaks unrelated to whatever drug it is that we’re testing for.

However, in theory at least, the GC/MS graph is supposed to organize all of these compounds for us. Let’s say for the moment that we disconnect the MS portion of the machine and just graph what comes out of the GC. The GC is supposed to separate the compounds in a mixture, so that each pulse from the GC contains predominantly a single compound. Let’s say that a urine sample contains 10 compounds, and that the GC separates each compound by two seconds of retention time. Then the resultant graph would contain 10 peaks, one at 2 seconds retention time, one at 4 seconds, one at 6 seconds, etc.

Next, let’s add the MS to our theoretical example, The MS ionizes each pulse that comes out of the GC. The molecules from a single compound will all be identical before ionization, but the ionized molecules will NOT all be identical — there will be a characteristic number of different kinds of ions produced when a single molecule type is ionized, and each of these ions will have its own distinct mass or mass/charge ratio. Also, these ions will be produced in a characteristic proportion (perhaps 25% of ion 1, 20% of ion 2, etc.). Now, it’s my understanding that the ions themselves have different retention times based on their mass (or mass/charge), which is how each ion shows up on the GC/MS graph at a different point on the x axis.

However (and this is critical to my understanding of what we’re looking at), while the ions for a particular compound in a particular GC pulse will have different retention times, these retention times should be relatively close together. So, let’s go back to the example, where we had a GC spitting out pulses of (mostly) a single compound every two seconds. Let’s look at the first pulse, the one that without ionization had a retention time of two seconds. Let’s say that the first compound goes through the MS and gets ionized into 3 kinds of ions. These three ions will not all have the same retention time, but you’d expect them to have retention times that are relatively close together — maybe 1.8 seconds for the first ion, 2.1 seconds for the second ion and 2.3 seconds for the third ion. To be sure, I would not expect the retention time for these ions to be LONGER than for the ions we’re going to get when the MS ionizes the SECOND pulse spit out of the GC.

So, when we look at a graph like figure 1a, and we see two peaks identified as testosterone ions, I’m assuming that these ions came from molecules that were part of a single pulse emitted from the GC. I’m assuming that any unidentified peaks between these two peaks ALSO represent ions from this single pulse. Yes, I understand, the particular pulse being measured for testosterone in figure 1a was probably not made up exclusively of testosterone molecules — these pulses are not pure, so we’d expect to find some other kinds of peaks made of ions from other kinds of molecules.

However, I’m surprised at the size and number of these other peaks that are visible in figure 1a. What does this mean? Does this mean that the “testosterone” pulse spit out of the GC did not consist predominantly of testosterone molecules? That maybe 35% of the molecules in the pulse were testosterone, and the other 65% were a melange of other compounds?

What am I missing here?

Rant October 1, 2007 at 4:18 am

Larry,

I don’t think you’re missing anything here. Just getting hung up a bit on what those other peaks are. So this graph contains a few peaks that represent metabolites of testosterone. Those other peaks, if you had a reference sample for other compounds, are other compounds or metabolites of those compounds. When they created the “fractions” that contain various compounds for various tests, some other things were left behind -even though the vast majority of things probably was cleared away. Consider testosterone, they’ve supposed to locate 4 metabolites. If there are, say 16 or 17 other peaks, it could be that those represent metabolites of a few other compounds, maybe even related compounds. Perhaps, even, there are some things in there that should be identified to make the analysis more complete. But, as these tests have been defined, those peaks aren’t of any interest, so the lab doesn’t identify what they are. Perhaps those other peaks are other metabolites of testosterone or epitestosterone, even. If they are, then the question would become, why don’t they seem concerned about those peaks. I’m curious as to why they weren’t identified and what they might be. With peaks that line up closely to identified peaks, the issue of contamination occurs, so in that case, knowing what those other peaks are would be essential.

I think your conclusion that the fraction did not consist primarily of testosterone, or its metabolites is correct. The concentrations of these compounds is so small in the liquid that it’s like searching for a needle in a haystack. There’s a good chance that other things exist in similar or greater abundance. Still, it would be interesting to know what these other things are.

Larry October 1, 2007 at 7:59 am

Rant, OK, you and I are on the same page now. If these unidentified peaks are unidentified testosterone metabolites, then we should see these same peaks (in the same places along the x-axis and with similar heights relative to the identified peaks) in all GC/MS graphs for testosterone produced by the same machine. If these unidentified peaks represent metabolites for compounds other than testosterone (because the GC pulse we’re analyzing is not made up solely of testosterone molecules), then we would not necessarily expect to see these peaks appear in a familiar pattern, but I would expect that these peaks would be relatively small compared to the identified peaks (reflecting the relative abundance of testosterone in the analyzed GC pulse).

If we have a situation where the peaks are relatively large and appear in unfamiliar places, then my guess would be that something is wrong.

Let me raise another question. How unique are the ions produced by the MS? I’m no organic chemist, but on an atomic level, the human body is pretty simple. 87% of human body atoms are either hydrogen or oxygen. If we add carbon and nitrogen to this mix, then we’ve got 99% of our atoms covered. There’s a good deal of atomic similarity between metabolites that perform very different fuctions — I recall reading somewhere that testosterone molecules look very similar to cholesterol molecules. If you blasted testosterone and cholesterol molecules with electrons in an MS ionizer, might some of the testosterone ions be identical (in mass or mass/charge, at least) to the cholesterol ions? (This is one reason why I’ve focused on the previous question, about unidentified peaks. If the GC pulses are not relatively pure, then I worry that each peak we see on the GC/MS chart might represent ions from different compounds.)

Larry October 3, 2007 at 12:23 pm

Rant, I’m going to toss out another question at this point, though it involves skipping ahead of ourselves in the analysis.

The big debate, on TBV and elsewhere, is how do we identify the peaks in a GC/IRMS graph? We’re supposed to compare the positions of the ions identified on the GC/MS graph to the ions on the GC/IRMS graph. There’s a debate as to whether you can make this comparison mathematically, using a relative retention time kind of computation, or whether you have to use a more fuzzy and subjective “eyeballing” kind of approach to look at the pattern and location of the peaks on the two graphs.

The majority opinion points out that the gas hold-up time will be different in a GC/MS test than in a GC/IRMS test (I think the GC/IRMS is supposed to be longer), and no one seems to dispute this — the only dispute is whether, as you’ve argued, the differences can be factored into an accurate computation of relative retention time. Moreover, even if we factor out the gas hold-up time, there is a difference in the speed in which the GC/IRMS does its processing, compared to the speed of the GC/MS. So, in the example you used in this post, the GC/IRMS would always be half as fast as the GC/MS, once you took the gas hold-up time out of the equation. This is why you’d compare the GC/MS and the GC/IORMS based on RELATIVE retention time.

Are we all missing a third potential problem? Both the GC/MS and GC/IRMS machines rely on ionization of the compounds pulsed by the GC. And I think we can find support for the statement that different machines ionize in different ways and produce different ionization results. Given these differences, how can you use the GC/MS as a point of comparison to identify the peaks on a GC/IRMS graph?

Rant October 3, 2007 at 7:53 pm

Larry,

The big problem with the GC/MS and GC/IRMS tests done in the Landis case is this: When the standards were run (the mix-cal acetates), they didn’t include all the metabolites that they were looking for. So the comparison between knowns and unknowns is not as well executed as it should have been. If all four metabolites had been put into the standard, then a very easy and direct comparison could be made between the standard and the unknown, leading to an identification of the presence of those same metabolites (or not) in the unknown.

In Dr. Meier-Augenstein’s testimony, he spoke of how it appeared that the lab must have been using relative retention times to make a guess as to what peaks were what. The arbitrators, in paragraphs 182 through 188, argued (among other things) that relative retention times can’t be compared for two different systems, such as GC/MS and GC/IRMS. Their reasoning was faulty, and in deciding what to believe, their reasoning is contradictory.

To use the graphs that came out of both runs and “eyeball” the results is to apply a type of relative retention time test. The graphs appear similar, if not almost identical. So, on the one hand, they say you can’t use relative retention times to compare between systems. And on the other hand, they allow in evidence that was determined in exactly that manner.

In the process, however, they never really addressed Dr. Meier-Augenstein’s main point: The lab never proved that the peaks they claim are the metabolites are, in fact, the metabolites. How so? Because they never ran the types of standards necessary to make the proper comparisons.

In short, had the tests been run properly, they wouldn’t have needed to use the GC/MS as a point of comparison with the GC/IRMS. And the results would have been much more credible.

Larry October 3, 2007 at 10:10 pm

Rant, I need to check out the facts you are referring to about the missing metabolites in the standards run — I was not aware of this problem. (By “standards”, I assume you mean the “spiked” sample that’s run immediately before and immediately after the athlete’s sample.)

Yes, I agree, the “eyeball” test seems to me to be an imprecise relative retention test. I made that argument on TBV, and no one responded, but I think it’s true nevertheless.

It’s still not 100% clear to me whether you can do a relative retention time analysis between results from a GC/MS and a GC/IRMS. Yes, I understand that (11 – 1) / (6 – 1) = 2. And yes, the majority opinion says that the “1” is constant for all metabolites. But I think that the “2” may not be constant for all metabolites, and that would destroy the calculation. I don’t have proof for this, I’m trying to figure it out.

I don’t get how you can avoid using the GC/MS to identify the peaks on the GC/IRMS. Are you saying that you could run standards (spiked samples) in the GC/IRMS? Hmm. I haven’t seen that possibility mentioned anywhere before.

I’ll try to do some more research and see what I can find.

Rant October 4, 2007 at 5:00 am

Larry,

Not only am I saying you could run spiked samples containing all the substances of interest in order to make the comparisons with an unknown sample, I would go further to say that you should run spiked samples to do so. The best, most accurate comparison will be between a known sample and an unknown sample run in identical or near-identical conditions. And the simplest way to ensure that is to run a spiked sample prior to the unknown in order to calibrate things. Or to have run a spiked sample previously under the same conditions to use as a comparison. LNDD did neither. They included an “internal standard,” and if I recall only a single metabolite in their “spiked sample.” That’s not good enough to make an accurate comparison for the other metabolites. Things might look correct, but how do we know they are? Unless there’s a reference plot of data to compare against, it’s all just speculation — informed speculation, perhaps, but speculation nonetheless.

Larry October 4, 2007 at 5:42 am

Rant, I have not done the research, but you seem to have a good point on using samples in a GC/IRMS test. You would use a sample spiked with artificial (produced from soy) testosterone, or wouldn’t it matter?

The lawyer part of my brain likes this argument, because it turns the FL majority opinion on its head. If you can’t compare relative retention times between GC/MS and GC/IRMS machines, then the results from these two machines are not comparable in any kind of systematic way, and this practically ARGUES for the need to do sampling before and after a GC/IRMS test.

Did anyone make this point during the arbitration?

Rant October 4, 2007 at 6:11 am

Larry,

I think it would be preferable to use natural (vs. artificial) substances to spike the samples with. As long as what’s used is consistent, it probably doesn’t matter which you choose. There would be a slight difference in retention time between the real and the artificial versions of a compound, but that is covered by WADA’s plus/minus 1% or plus/minus 0.2 minutes tolerances.

I think that the point about not being able to compare relative retention times between the two systems and the need to do a more thorough “calibration” using all the items of interest was the real point of Dr. Meier-Augenstein’s testimony, which both the panel and their adviser missed, judging by how the decision was written up.

Larry October 4, 2007 at 2:59 pm

Rant, another question to throw out there.

When we look at the graphs in the FL majority opinion, or at TBV … are we looking at the actual raw data from FL’s samples?

We have not yet discussed the LNDD’s manual manipulation of the data. This is a part of the science I do not understand at all. The little bit I’ve read indicates that the data can be manipulated, that it was manipulated, and that on some level it’s OK to do so. Under what circumstances could it possibly be kosher to manipulate the data? And at what point(s) did the manipulation take place in the FL case? Do the charts we see represent manipulated data (and if so, what’s the point of trying to analyze them)?

If LNDD DID manipulate the raw data, did this happen with both the “A” samples and the “B” samples? Meaning that FL’s team witnessed the manipulation?

Rant October 4, 2007 at 7:38 pm

Larry,

Those graphs should represent the actual raw data. Part of the manipulation of the raw data involved adding in data points or moving data points to make it easier to calculate the area under various peaks in the graphs. That area under the peaks represents the amount of a substance present. By manipulating the data, and destroying (even if only accidentally or through incompetence) the original data, it’s difficult or impossible to go back and reanalyze the real original data, minus the lab tech’s manipulations. Such manipulations happened to both the A and B samples, I believe.

The areas under those peaks played a role in determining the T/E ratios and the concentrations of the various compounds identified.

It’s OK to make various manipulations to make it easier to calculate various things (up to a point), but what’s not OK is failing to record the manipulations made, and failing to keep a backup of the original, unmanipulated data. Without that, there’s no way to go back and check the lab tech’s work. Perhaps Mme. Mongongu or Mme. Frelat made some mistakes in those calculations that affected the end result. Unfortunately, we’ll never really know.

Larry October 5, 2007 at 6:28 am

Rant, agreed that the labs should record the various manipulations (not sure that the ISLs or the WADA rules require this …). I can also see a reason for making manipulations as part of an initial review of the data. We’ve talked about the number of unidentified peaks on a GC/MS graph. I could see removing the peaks you’re not interested in, to get a clearer picture of what’s going on with the peaks you ARE interested in. Sort of like digitally enhancing a picture of a large crowd, to show what’s going on with one or two people in the crowd. Then of course, you’d present both pictures in evidence — you would not pass off the enhanced picture as the original picture.

I don’t understand any reason for manipulating data in order to make a data calculation. A dumb example: let’s say you’re trying to calculate the area described by a peak on the GC/MS graph, and the shape is irregular. Let’s also say that you could calculate the area if the shape were a triangle. I suppose you could manipulate the data so that it describes a triangle. But how can you be sure that your manipulated peak had the same area as the original peak? There’s only one way to be sure that the data manipulations did not affect the area of the peak, and that’s to compare the area of the peak before and after the data manipulation. But if you could calculate the area described by the peak before the data manipulation, then there was no need to manipulate the data in the first place! Maybe this IS a dumb example, but the more I think about it, the less dumb it seems to me.

When you say that the graphs “should” represent the actual data, do you mean (1) “should” in the sense that the graphs probably represent the actual data, but who the hell knows in this case, or (2) “should” in an ideal sense, meaning that it would be good practice to present graphs containing the raw data, but that’s not what LNDD did in this case, so the graphs we have in front of us contain manipulated data.

Rant October 5, 2007 at 7:50 am

Larry,

Good analogy with the pictures.

As I recall, one of the lab techs said she added some data points and peharps moved some data points in order to make it easier to calculate the area under the curve in the graphs. You’re right, just by adding or moving data points, the area changes under the curve changes.

A good software package of the type they were using should have a number of mathematical tools necessary to calculate the area under the curve, so the data manipulation done by the lab should (in the ideal sense) not have been necessary in order to calculate the area under that portion of the curve. But they were using software that is long outdated, and who the heck knows how good the mathematical tools in that package were (my guess: Dr. Davis, perhaps?).

As for the graphs, themselves, I meant “should” as in (1), as you pointed out. The graphs ought to be the raw data. Whether or not they are is another matter.

Larry October 11, 2007 at 7:23 am

Rant, it would be good to continue this discussion. However, this area of your blog has kind of slipped below the horizon, it’s a little bit hard to find these days. Plus, I think ww’ve probably reached the point in our discussion where we need the active help of that science expert you mentioned a while back. I figure that this science expert is still bent over his bunsen burner or something. Sometimes those experts can get kind of busy.

Rant October 11, 2007 at 11:09 am

Larry,

Quite right. I think it’s time to start a new entry, “The Science of It All, part deux.” You’re also correct in assuming that my mystery ranter is still rather busy. What he’s working on will be quite interesting once it’s complete. I’ve seen a draft, but the article is not ready for “prime time.”

Previous post:

Next post: