Dancing a two step: Analysing non random data.

posted Dec 29, 2011, 10:44 AM by Admin PEMRF   [ updated Aug 25, 2013, 10:00 PM ]

Certainly one of the problems of the exabytes of information that will come from HITECH mandated electronic records will be the data quality.  Much of the stored data is likely meaningless, inserted only as part of endless reimbursement gamesmanship.  A thoughtful data harvesting plan can eliminate much of this noise. Let’s for a moment pretend that the problems plaguing such data sets have been solved. Other potential pitfalls remain.

We are then left with need to ensure that the resulting observational data sets will be analyzed meaningfully. There is ample reason to be worried as even a casual stroll through the trauma literature shows.  Aside from the obvious risk of confusing association with causality there are other concerns.

One is endogeneity. At its simplest a researcher may observe associations between variables without realizing that there is some other unobserved variable that is associated with both variables. There is a great video describing this here (http://www.youtube.com/watch?v=dLuTjoYmfXs).

Another problem is failing to adjust for the non-random nature of the data. The outcome of interest to a particular investigator is not likely to be random.  And if the outcome is being studied in conjunction with an exposure variable it is equally unlikely that assignment of that variable is random. So the analysis must account for the fact that neither the dependent nor independent variables are random. In some cases the forces at play, other than the hypothesis being tested, in determining this non randomness may affect these variables in directions that are inconvenient for researchers.  A medication may be more or less likely to be given to a smaller child, and a smaller child may be more or less likely to have the outcome being sought. All of this suggests a two-step approach.

 So pediatric emergency medicine and other observational studies will become increasingly reliant on well thought-out but relatively complex models to predict the baseline probability of exposures and outcomes. Only after completing this step will investigators be able to meaningfully assess the effect of the independent variable on the outcome. There are a variety of ways to do this. The appropriate one will depend on the circumstances. But the concept; a two-step approach, one to model the underlying probability of exposure, the other to model the underlying probability of the outcome, prior to testing a hypothesis is an important one.