Correlation When Data Are Missing

June 2010 | Metters, Rich

Variable correlation is important for many operations research models. Many inventory, revenue management, and queuing models presume uncorrelated demand between products, market segments, or time periods. The specific model applied, or the resulting policies of a model, can differ drastically depending on variable correlation. Having missing data are a common problem for the real world application of operations research models. This work is at the junction of the two topics of correlation and missing data. We propose a test of independence between two variables when data are missing. The typical method for determining correlation with missing data ignores all data pairs in which one point is missing. The test presented here incorporates all data. The test can be applied when both variables are continuous, when both are discrete, or when one variable is discrete and the other is continuous. The test makes no assumptions about the distribution of the two variables, and thus it can be used to extend application of non-parametric rank tests, such as Spearman’s rank correlation, to the case where data are missing. An example is shown where failure to incorporate the incomplete data yields incorrect policies.

Author

Co-author(s)

  • M. Parzen
  • S. Lipsitz
  • G. Fitzmaurice

Publication(s)

The Journal of the Operational Research Society