Are Ordinal or Categorical Measures a More Reliable Measure of Spinal Cord Injury on MRI? - A Validation of the National Institute of Neurologic Disease and Stroke (NINDS) SCI MRI Common Data Elements (CDE) Instrument.

The NINDS CDE SCI project was designed to harmonize data collection for NIH funded clinical studies in spinal cord injury (SCI). The featureset consists of a composite of categorical as well as ordinal (direct measurements) of SCI based upon features developed in prior published work. The purpose of this study was to determine if categorical representations of SCI on MRI outperform absolute measures in a multi-reader blinded evaluation.

Materials and Methods:
This study specifically focused on a subset of 18 of 52 NINDS CDE elements directly related to the injured spinal cord. Features included: length/location of cord edema/hemorrhage, absolute measures of canal/cord and lesion length and BASIC score. Four neuroradiologists and one spine neurosurgeon from "ve institutions were recruited as independent readers. 35 SCI MRI studies from twelve di!erent centers were pre-selected from a collection of over 120 studies. Anonymized exams were loaded into a cloud-based viewer platform. After a single training session, all 35 exams were scored independently by the "ve experts at their own pace. The exam order was randomized and then re-scored for a second round. Inter- and intra-rater assessment was performed using kappas for categorical items and intraclass correlation coe#cient (ICC) for ordinal measures at 95% CI.

Inter-rater agreement for all features in round one evaluations ranged from poor 0.22 (0.06, 0.37) to excellent 0.99 (0.99, 1.00). Highest inter-rater agreement was found for categorical features of edema/hemorrhage length/location relative to anatomic reference (ICC range 0.69 - 0.99) whereas lower inter-rater ICCs were found for absolute measures (ICC range 0.22 - 0.83). There was good agreement for measures at the level of injury (ICC range 0.73 - 0.83). Only minor di!erences in agreement were observed overall between the two reading sessions. Intra-rater ICCs overall ranged from good to excellent (ICC range 0.78 to 1.00) with removal of outliers. There was no signi"cant di!erence in performance between experienced neuroradiologists and the spine surgeon.

Categorical measures of SCI are more reliable and reproducible than absolute measures in clinical practice. The devised NINDS SCI MRI CDE instrument provides a uniform method for capturing reliable quantitative and categorical data for SCI investigational work and clinical trials. Multi-center SCI clinical trials should adopt categorical measures of SCI on MRI because of their better reproducibility overall.