Published 6/1/2011
Annie C. Hayashi

The well-designed clinical trial

Panel addresses the methodological hurdles of conducting orthopaedic trials

Designing a randomized orthopaedic surgical trial presents unique challenges, particularly as compared to nonsurgical studies.

These challenges were addressed in the session on “Methodological Issues in Clinical Trials,” held during the Orthopaedic Research Society’s (ORS) “Clinical Research Forum: Learning from the Past, Looking to the Future,” which was presented at the ORS Annual Meeting on Jan. 15, 2011, in Long Beach, Calif.

Blinding: In surgical trials?
A double-blind study, in which both participants and investigators are blinded to group assignments, is considered the gold standard for randomized controlled trials (RCTs). In an orthopaedic trial, however, investigators can’t blind surgeons to the treatment assignment.

When physicians and participants are not blinded to treatment assignments, they may be unduly influenced by that information. “Physicians or investigators may ask a question about a symptom or outcome in such a way that it may affect the patient’s response, particularly with ‘patient-oriented outcomes’ such as pain, disability, and satisfaction,” said Dennis M. Black, PhD.

Not blinding the study staff could also affect the primary outcome. “For example, staff may be responsible for classifying degrees of improvement in pain. If the treatment assignments are known, staff may classify similar improvements in pain differently, depending on the patient’s treatment group,” he explained. “That is obviously a problem.

“When surgery/nonsurgery status is known to the staff and the patients, the surgery group may be more likely to report improvement in pain due to the placebo effect,” said Dr. Black. “The staff may also be unconsciously biased in assessing pain or disability.”

If treatment participants or staff can’t be blinded, Dr. Black suggested blinding outcome ascertainment. In a multicenter study with central adjudication, the outcome evaluators can be blinded to the treatment. For example, radiographs can be sent electronically to a central location for blinded adjudication.

Having participants and staff guess which treatment group they were assigned to is one way of measuring the degree of blinding. “They should be correct about 50 percent of the time. If they are correct 90 percent of the time, there is a problem with the blinding,” Dr. Black said. “If the percentage is substantial, that could be an important part of the discussion in the paper and in the analysis of the results.”

Subgroup and interim analyses
The prudent use of subgroup analysis is as important as blinding to a well-designed clinical trial.

“Post-hoc subgroup analyses are common but they may distort the results,” explained Saam Morshed, MD, PhD, MPH. “They can lead to overinterpretation, fruitless and expensive additional studies, and ultimately, suboptimal patient care.”

Planned subgroup analyses can often require a larger sample size, which can affect the cost of the study. This should be considered when investigators plan budgets and apply for funding.

As with subgroup analyses, interim analyses can be beneficial or detrimental.

According to Brad Petrisor, MD, MSc, investigators can perform interim analyses so participants aren’t unnecessarily put at risk. If the interim results indicate that the treatment is highly effective or is potentially harmful, the research team may decide to stop the study.

“Ideally, a Data Safety Monitoring Board (DSMB) composed of people outside the trial—clinical experts, methodologists, and statisticians—conducts this interim analysis and suggests whether the study should continue or stop,” said Dr. Petrisor.

But many trials that are stopped early do not have DSMBs. “Trials stopped early for an apparent benefit may overestimate the truth,” he said. “An interim analysis is the only way to detect this.”

According to The Consolidated Standards of Reporting Trials (CONSORT) statement, four key methodological elements should be reported when an RCT is stopped early: planned sample size, planned interim analysis, stopping rule used, and adjusted estimates for interim analysis.

“Be cautious about stopping early for apparent benefit and be cautious of small trials with low event rates. If an interim analysis is going to be done, wait until a number of events have occurred—perhaps as many as 200 or 300. Then set the interim p value, which determines statistical significance, very tightly—at, for example, 0.001,” Dr. Petrisor concluded.

Outcomes: Establishing a clinically relevant scientific truth
“Establishing the scientific truth is only part of a trial,” said Kurt P. Spindler, MD. “We have to know whether that truth is making a clinically relevant, meaningful difference to the patient.

“Even if a clinically relevant difference is found, is it worth the resources or the cost to society or the individual? I can make a medial collateral ligament heal more quickly, but it may cost $5,000 and only be worthwhile to a professional athlete making in excess of $1 million a year and not to anyone else,” he explained.

“If we follow this paradigm to change practice, our outcomes should really establish a clinically relevant scientific truth,” he said.

In deciding which outcomes to select, Dr. Spindler advised looking at the outcome measures. “Even if you are conducting a study that costs $10 million, you will probably be limited to one, two, or three outcome measures,” he said. “The group differences need to be determined as well as the statistics and appropriate power. Decide whether the outcome is clinically relevant and can alter practice.”

According to Dr. Spindler, the process of adjudicating outcomes includes establishing clear criteria, collecting clinical documents, and using outside experts who work independently from the study. Disagreements about a case should be resolved by two or three experts. Those who collect and evaluate data should be blinded to treatment.

Sample size: How large is large enough?
“A sample size calculation is your best estimate of the number of patients that you will need to do a trial with a given study design,” said Dr. Morshed. “This is not objective and absolute. It is based on assumptions that can’t be tested. A meaningful starting point has to be established along with a testable hypothesis.”

According to Dr. Morshed, a hypothesis is derived from a well-proposed research question. He illustrated the elements of a good research question with the pneumonic “PICO”—Patients (in the study), Intervention (the specific intervention), Comparison (the comparisons in the study), and Outcomes (primary, secondary, and tertiary).

“The question ought to be simple, specific, and stated in advance,” he said.

The research question is used to develop a null hypothesis, proposing no statistically significant difference between the control and experimental groups, and an alternative hypothesis. If the null hypothesis is rejected, the alternative hypothesis is used. “A good, well-stated research hypothesis also allows us to select a statistical test,” he said.

“The statistical test is largely based on the outcome and its variability, which will assign a probability to the study findings. The probability is realized under the assumption of the null hypothesis that will provide a p value.

“Beyond p values, we need to understand other elements of hypothesis testing—alpha-errors and beta-errors,” he said.

In a study, the effect can be a true positive—a difference actually exists—which is expressed as 1-β. A false positive (an α or Type-I error) occurs when a difference is found, but no difference actually exists. In a false negative (a β or Type-II error), no difference is found when a difference actually exists. Finally, a true negative—no difference actually exists—is expressed as 1-α.

The effect size chosen is key. To determine an appropriate effect size, Dr. Morshed suggested conducting a pilot study to get a better estimate of what is a reasonable or clinically meaningful effect size.

“Variability is proportional to the required sample size,” he continued. “The less precise a measurement is, the greater the likelihood of overlap between the comparison groups, and the greater the sample size needed to detect a difference. Variability has both ‘within subject’and ‘between subject’ components.”

According to Dr. Morshed, once investigators have completed these steps, they will have enough information to plug the values into a calculator and determine the necessary sample size.

But investigators also have to consider other factors that can affect sample size. Loss to follow-up is a major issue in clinical studies. Sample size estimates have to be increased to accommodate for anticipated losses.

“The best way to increase the power of a study is to increase its sample size,” he said. “We should all be doing larger trials. But when we are faced with certain limitations, other methods can be used to increase the power without sacrificing or increasing the sample size.”

The ORS Clinical Research Forum, at the 2011 ORS Annual Meeting in Long Beach, Calif., was organized by Theodore Miclau, MD; Kristy L. Weber, MD; George F. Muschler, MD; and Mohit Bhandari, MD. The next ORS Clinical Research Forum will be held Feb. 6, 2012, in San Francisco.

Annie C. Hayashi is the manager, development and communications of the Orthopaedic Research Society. She can be reached at hayashi@ors.org

Additional Links:

HÔpital Adventiste d’Haiti

Partners in Health