OBJECTIVE: To evaluate the inter and intraobserver agreement of the Magerl AO and AOSpine
thoracolumbar fracture classification systems.
METHODS: The participants were divided into two groups, the first composed of six spinal surgeons and the other composed of 18 medical orthopedic residents. On two different occasions, separated by an interval of one month, the participants analyzed and classified 25 radiographs with thoracolumbar fractures using both thoracolumbar fracture classification systems, Magerl AO and AOSpine. The results were analyzed for classification reliability using the Kappa coefficient (k).
RESULTS: The Magerl AO classification system showed a fair interobserver agreement (k = 0.32), considering the fractures type and subtype, whereas the AOSpine classification system showed a moderate interobserver agreement (k = 0.59). The Magerl AO classification showed a fair intraobserver agreement for both residents and specialists (k = 0.21 and 0.38, respectively), while the AOSpine showed a substantial agreement between residents (k = 0.62) and moderate between specialists (k = 0.53).
CONCLUSIONS: When evaluating fracture morphology, the AOSpine thoracolumbar fracture classification system presented a better reliability and reproducibility compared to the Magerl AO classification system.
Keywords: Spinal injuries; Margerl AO classification; AOSpine classification; Interobserver and intraobserver agreement.
OBJETIVO: Avaliar a concordância inter e intraobservadores dos sistemas de classificação Magerl
AO e AOSpine para fraturas toracolombares.
MÉTODOS: Os participantes foram divididos em dois grupos, um com seis médicos ortopedistas especialistas em coluna e o outro com 18 médicos residentes em ortopedia. Os participantes analisaram 25 radiografias com fraturas toracolombares em duas oportunidades, com um mês de intervalo entre elas, e classificaram com o uso dos dois sistemas de classificação de fratura toracolombar, Magerl AO e AOSpine. Os dados de concordância foram analisados pelo método do coeficiente kappa.
RESULTADOS: A classificação de Magerl AO apresentou uma concordância interobservadores leve (k = 0,32), considerando o tipo e o subtipo das fraturas, enquanto a classificação AOSpine obteve uma concordância interobservadores moderada (k = 0,59). A classificação de Magerl AO apresentou uma concordância intraobservadores leve entre médicos residentes e médicos especialistas (k = 0,21 e 0,38, respectivamente), enquanto a classificação AOSpine apresentou uma boa concordância intraobservadores entre médicos residentes (k = 0,62) e moderada entre médicos especialistas (k = 0,53).
CONCLUSÃO: O sistema de classificação da AOSpine para fraturas toracolombares apresentou uma melhor confiabilidade e reprodutibilidade comparado com o sistema de classificação Magerl AO, em relação à morfologia da fratura.
Palavras-chave: Fraturas da coluna vertebral; Classificação Magerl AO; Classificação AOSpine; Concordância interobservadores e intraobservadores.
|Citation: Lopes FAR, Ferreira APRB, Santos RAA, Maçaneiro CH. Intraobserver and interobserver reproducibility of the old and new classifications of toracolombar fractures☆. 53(5):521. doi:10.1016/j.rboe.2018.07.015|
|Note: ☆ Study conducted at Instituto de Ortopedia e Traumatologia de Joinville, Joinville, SC, Brazil.|
|Received: May 19 2017; Accepted: July 13 2017|
Approximately 90% of spinal fractures affect the thoracic and lumbar regions.1 The thoracolumbar junction is very susceptible to fractures since in this region there is a difference in the rigidity from a more rigid thoracic spine to a more flexible lumbar spine.1 Over 50% of the injuries occur between T11 and L2, usually due to high-energy traumas, with other associated lesions,2 such as intra-abdominal lesions (splenic and hepatic lesions), limb fractures, and brain trauma.1
Several classification systems for thoracolumbar fractures had been described before Magerl et al.3 introduced the AO classification system for these fractures in 1994. A classification system that used vector forces as a classification criterion, included fractures that occur by compression, distraction, and torsional forces.1,3,4 This classification was established aiming at creating a standard classification system; however, it is not practical, which reduces its reliability.1 The AO spine classification group proposed a new classification system, using the Magerl AO classification system as the main reference, in order to create a similar classification, but with a better and more immediate clinical application in medical practice. The AOSpine classification was then created, which has appeared to be fairly reliable and accurate. However, it needs further assessment.5
A bone fracture classification system should be reliable, valid, and accurate, as it will aid in the prognosis and indication of treatment. A classification system is considered to be reliable when a single evaluator obtains consistent results by classifying the same fracture at different times, or when several evaluators produce the same result with the same classification. When this reliability is observed in medical practice, the classification is considered to be accurate.6,7 According to Vaccaro et al.,8 the current classification, AOSpine, allows a better understanding of the lesion, since not only it modified and simplified the old morphological classification, but also six neurological lesion topics and two patient modifiers were added, which help to guide treatment. However, there is still no global consensual classification for thoracolumbar fractures.9
This study was aimed at evaluating the reproducibility of the two AO classifications regarding the morphology of thoracolumbar fractures by assessing the intra- and inter-observer agreement.
MATERIAL AND METHODS
The study was submitted to the Research Ethics Committee of the Hospital Municipal São José (HMSJ; Joinville, SC), and approved through the Brazil Platform, under No. 1.769.539.
The study included six spine specialists working in the same orthopedics department and 18 training physicians in their first, second, and third year of the orthopedic and traumatology medical residency at the Institute of Traumatology and Orthopedics (ITO) of Joinville (Santa Catarina State, Brazil). The participants were divided into two groups, one with the spine specialists and the other with the resident physicians. Prior to their inclusion in the study, all participants signed an informed consent form.
Twenty-five radiographs in the anteroposterior and lateral views showing different patterns of thoracolumbar fractures were selected from the HMSJ (Joinville, SC) and ITO (Joinville, SC) files. The images were selected by a resident physician of the second year of orthopedics and by an orthopedic surgeon specialized in the spine; both were familiar with the classification systems. All patient identification data were removed from the images. Radiographs with artifacts and those with poor image quality, poor patient positioning, or technical defects that could have compromised the evaluation were excluded.
Magerl AO classification
This classification system uses vector forces as a classification criterion. The fractures are divided into three types: A, B, and C. Type A consists of fractures caused by compression forces; type B, by distraction forces; and type C, by torsional, rotational forces. Each type is divided into three larger groups, in numerical order; each group is then subdivided into three subgroups, according to fracture morphology, allowing a more detailed description. Severity is defined by the classification and it increases from types A to C; the same occurs in the groups and subgroups.1,3,4
This classification allows for a better understanding of the lesion. It was based on the Magerl AO morphological classification. Six topics of neurological injury and two patient modifiers were added. As for the morphological division, the types are the same as the old classification, from A to C, i.e., fractures caused by compression, distraction, and torsional forces, respectively. However, the main difference is in group modification; type A is subdivided into four groups, type B into three groups, and type C has no subdivision.5,8
The images to be classified were sent to the survey participants by E-mail. Images were assessed and classified into three stages. The first stage was the pre-test training, the purpose of which was to calibrate the classifications of participants. Participants received by E-mail a video tutorial explaining the two thoracolumbar fracture classification systems (the original Magerl and the current AOSpine). Participants also received illustrations of the two classification systems to consult during image classification. In this pre-test stage, five images were classified using the two classification systems. Participants then sent their responses by E-mail to the resident physician responsible for the study, who, together with the spine specialist, reviewed the answers and sent the participants a feedback with explanations. This step was included so that, in the next two stages, the images could be classified in a more standardized manner. In the second stage, participants received by E-mail the 25 previously selected images, which were classified using the two classification systems for thoracolumbar fractures. In the third step, 30 days later, the same 25 images were sent by E-mail to the participants in a modified order, who were required to once again classify those images using the two classification systems for thoracolumbar fractures.
All stages were performed individually by each participant. They had no access to the patients' medical history, treatment, or other complementary tests. In all stages, in addition to the 25 images to be classified, participants also received illustrations of the two classification systems to consult during image classification. The deadline for the participants to classify the images was one week. The responses were then sent by E-mail to the resident physician responsible for the study.
The AOSpine classification system characterizes fracture morphology, but also takes into account the neurological aspects of the patient for clinical decision. In turn, the Magerl AO system is predominantly morphological.10 Therefore, in this study, only the morphological part of the AOSpine classification, which is divided into types A, B, and C, with their respective subgroups, was considered for classification purposes.
Agreement was assessed using the weighted kappa coefficient method, which takes into account the fact that the variable is ordinal. The following interpretations of the kappa coefficient were used to assess the scores: <0.00, no agreement; 0.00-0.20, weak; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, good; 0.81-1.00, excellent agreement.11
Inter- and intra-observer agreement tests were performed using SPSS 20.0 (IBM Statistics) and GraphPad software (QuickCalcs).
The Magerl AO classification presented a fair inter-observer agreement (k = 0.32), considering the type and subtype of the fractures of all images, while the AOSpine classification obtained a moderate inter-observer agreement (k = 0.59; Table 1). When considering only the type of fracture (A/B/C), without discriminating its subtype, the overall kappa value of the Magerl AO classification was 0.75, which is a good agreement, and AOSpine presented k = 0.85; nonetheless, no statistically significant difference was observed between the two classification systems (p = 0.57; Table 2). The kappa values for the inter-observer agreement of the Magerl AO and AOSpine classifications for each morphological type of fractures are shown in Table 2.
|Classification||Residents vs. specialists (kappa value)||p|
|Magerl AO classification||0.32||0.138|
Taking into account the type and subtype of the fractures, the Magerl AO classification showed a fair intra-observer agreement, both for resident physicians (k = 0.21) and the specialist surgeons (k = 0.38). In turn, the AOSpine classification showed a good intra-observer agreement among resident physicians (k = 0.62) and a moderate agreement among specialists (k = 0.53; Table 3).
|Kappa value (p)||Kappa value (p)|
|Magerl AO classification||0.21 (0.074)||0.38 (0.084)|
|AOSpine Classification||0.62 (0.052)||0.53 (0.063)|
Considering only the morphological type of fractures (A/B/C), the intra-observer reproducibility of the Magerl AO classification was good (k = 0.68 for the resident physicians and k = 0.76 for specialists). The AOSpine classification presented an excellent reproducibility for both resident physicians (k = 0.82) and specialists (k = 0.96). However, there was no statistically significant difference between the two classification systems, both for resident physicians (p = 0.67) and specialists (p = 0.36; Table 4). The kappa values that describe the intra-observer agreement for the Magerl AO and AOSpine classifications for each morphological type of fractures are shown in Table 4.
|Classification||Residents (kappa value)||Specialistsa,b (kappa value)|
|Magerl AO classification|
Applicability of the classification systems
Of the participants, 68.2% believe that the AOSpine classification has better applicability when compared with the Magerl AO classification.
To the best of the authors' knowledge, the literature presents very few studies that have evaluated the inter- and intra-observer agreement between the two AO classification systems for thoracolumbar fractures, the former Magerl AO and the current AOSpine.
The present study demonstrated that the AOSpine classification presented better inter- and intra-observer agreement when compared with the Magerl AO classification, when the type and subtype of the fractures were included. When considering only the type of fractures (A/B/C), the inter- and intra-observer agreement increased considerably in the two classification systems evaluated, and no statistically significant difference was observed between them; this result is justified by the fact that the criteria used to classify the type of fractures are the same for both classification systems. Kepler et al.10 also used the AOSpine classification system to analyze 25 cases, which were classified by 100 spine surgeons. Considering only the type of fracture (A/B/C), the inter-observer agreement was good (k = 0.74) and the intra-observer agreement was excellent (k = 0.81).
Considering the type and subtype of the fractures, the present findings corroborate those of previous studies that also found low to moderate inter- and intra-observer agreement when using the Magerl AO classification. Oner et al.12 assessed the Magerl AO classification system and indicated a low inter-observer agreement (k = 0.35) and a moderate intra-observer agreement (k = 0.41). Wood et al.13 observed that considering the type and the group of the Magerl AO classification (A1, A2, A3, B1, B2, etc.) to assess interobserver agreement, the k-value was 0.53 (from 0.33 to 0.68), which indicates a moderate agreement. That is, the more complex that classification becomes, including types, groups, and subgroups, the lower the agreement, which affects its reproducibility. Maçaneiro et al.14 assessed the inter-observer agreement of 40 cases of thoracolumbar spine fractures using the Magerl AO classification and observed a fair agreement, both for fracture type (k = 0.39) and group (k = 0.32).
In the present study, the low inter- and intra-observer agreement of the Magerl AO classification system can be justified by its complexity, as it is a very inclusive system in which the observer needs to evaluate many variables, hindering its use in clinical practice. Another possible reason would be the high number of training physicians included in the present study; their inexperience in using such a classification may have statistically contributed to the low agreement. Furthermore, it could be suggested that the lack of complementary exams in the evaluation of the images, such as computed tomography or magnetic resonance imaging, may have interfered in the classification results.
The AOSpine Trauma Knowledge Forum developed the new classification system, AOSpine, which combined the characteristics of the Margerl AO system and the Thoracolumbar Injury Classification System (TLICS), aiming to create a globally accepted system. After this classification system was developed, that group also assessed the morphological aspects of the classification, measuring the inter- and intra-observer reliability. Considering the type and group of fractures, the inter-observer agreement was good (k = 0.64), and so was the intra-observer agreement (k = 0.77).8 In a recent study, the inter-observer reliability of three classification systems (Magerl AO, TLICS, and AOSpine) was assessed, considering the type and group of fractures. Similarly to the present study, the Magerl AO classification presented a slight agreement (k = 0.38). In turn, contrary to the present results (k = 0.59), the AOSpine classification presented a good inter-observer agreement (k = 0.62).17 Azimi et al.18 also observed excellent k-values (from 0.83 to 0.89) in the intra- and inter-observer assessment using the AOSpine classification.
In the present study, participants were asked for their personal opinion regarding the applicability of Magerl AO and AOSpine classification systems. Most of the participants (68.2%) believed that AOSpine had better applicability.
In fact, AOSpine was shown to be a reproducible and valid classification system, easily understood and readily applicable in clinical practice. A worldwide accepted classification system is important for surgeons and researchers to reach a standardized diagnosis and treatment for thoracolumbar fractures.
The authors' efforts to exclude low-contrast or poorly positioned radiographs do not reflect the way these exams are used in clinical practice. Moreover, the lack of further imaging tests in the present study, such as computed tomography and magnetic resonance imaging, may cause failure to diagnose posterior complex injuries and limit the results.
Although this is a promising system for classifying fractures, the authors suggest that future studies should assess the other variables of the AOSpine system, i.e., the neurological state and the key modifiers, as well as evaluating their relationship with the morphological classification types.
It was observed that the AOSpine classification system for thoracolumbar fractures presented better reliability and reproducibility when compared with the Magerl AO classification system. The AOSpine was shown to be a good system to classify thoracolumbar fractures regarding their morphology, justifying its standardization for use with these fractures.