Skip to main content

The University of Texas Houston Stroke Registry (UTHSR): implementation of enhanced data quality assurance procedures improves data quality



Limited information has been published regarding standard quality assurance (QA) procedures for stroke registries. We share our experience regarding the establishment of enhanced QA procedures for the University of Texas Houston Stroke Registry (UTHSR) and evaluate whether these QA procedures have improved data quality in UTHSR.


All 5093 patient records that were abstracted and entered in UTHSR, between January 1, 2008 and December 31, 2011, were considered in this study. We conducted reliability and validity studies. For reliability and validity of data captured by abstractors, a random subset of 30 records was used for re-abstraction of select key variables by two abstractors. These 30 records were re-abstracted by a team of experts that included a vascular neurologist clinician as the “gold standard”. We assessed inter-rater reliability (IRR) between the two abstractors as well as validity of each abstractor with the “gold standard”. Depending on the scale of variables, IRR was assessed with Kappa or intra-class correlations (ICC) using a 2-way, random effects ANOVA. For assessment of validity of data in UTHSR we re-abstracted another set of 85 patient records for which all discrepant entries were adjudicated by a vascular neurology fellow clinician and added to the set of our “gold standard”. We assessed level of agreement between the registry data and the “gold standard” as well as sensitivity and specificity. We used logistic regression to compare error rates for different years to assess whether a significant improvement in data quality has been achieved during 2008–2011.


The error rate dropped significantly, from 4.8% in 2008 to 2.2% in 2011 (P < 0.001). The two abstractors had an excellent IRR (Kappa or ICC ≥ 0.75) on almost all key variables checked. Agreement between data in UTHSR and the “gold standard” was excellent for almost all categorical and continuous variables.


Establishment of a rigorous data quality assurance for our UTHSR has helped to improve the validity of data. We observed an excellent IRR between the two abstractors. We recommend training of chart abstractors and systematic assessment of IRR between abstractors and validity of the abstracted data in stroke registries.

Peer Review reports


Medical registries have been used for many years as sources of clinical data that can support evidence-based medicine and decision-making. Registries are classified according to the disease or disorder, and are defined by patients having the same diagnosis. Stroke is the leading cause of serious, long-term disability and the fourth leading cause of death in the United States [1]. Stroke is the second leading cause of death globally, and all nations, regardless of their health care system, face similar medical and economic burdens [2].

The Harvard Registry is the oldest stroke registry in the US [3, 4]. During the last decade, there has been an increased interest in developing other stroke registries to monitor and collect data for improving the quality of care for stroke patients through the assessment of adherence to established performance measures for acute stroke care [5], to study the epidemiology and etiology of specific types of strokes, and to decrease the proportion of premature deaths and disabilities caused by acute stroke. A brief summary of several stroke registries is provided in Table 1.

Table 1 Brief summary of several stroke registries in the world

Despite availability of many stroke registries, limited information has been published regarding standard procedures to ensure reliability and validity of data in stroke registries [5, 16, 18, 2931]. For example, Reeves et al. (2008) reported data regarding reliability of abstracted data collected during 2001–2004 from Michigan PCNASR [29]. Recently, Xian et al. (2012) reported data regarding validity of data in the GWTG registry indicating a high level of accuracy for select variables including age, diagnosis, arrival date/time, tPA therapy when compared with audited data from medical records [5]. However, information regarding development and implementation of standard procedures to ensure data quality for stroke registries is limited, particularly at various stages of data management including chart abstraction and quality control of data.

Since 2001, the Specialized Program Of Translational Research in Acute Stroke (SPOTRIAS), funded by the National Institute of Neurological Disorders and Stroke (NINDS), supported the development of prototype registries, which were led by academic principal investigators and medical institutions, to collect data on the quality of care provided to stroke patients from the initial emergency response to hospital discharge. Currently, there are eight funded SPOTRIAS sites that collaborate on this effort. The Stroke Program at the University of Texas Health Science Center at Houston (UTHealth) is part of the SPOTRIAS network [32].

The UTHealth stroke program has played a significant role in the treatment and prevention of stroke and is committed to high quality research, clinical practice, education, and optimal implementation of thrombolysis therapy following acute stroke in Houston, with thrombolytic treatment rates exceeding 30%. As a major stroke center, the Memorial Hermann Hospital-Texas Medical Center, Houston (MHH-TMC) has served as a leader in stroke research for some of the most important acute stroke studies in the world, including the NINDS tissue plasminogen activator (tPA) trial, which led to the approval of the clot-dissolving drug tPA in the treatment for acute stroke [33]. These achievements contributed to our success in being selected as a SPOTRIAS site.

The UTHealth SPOTRIAS data core is responsible for data abstraction, data entry, quality control, statistical analysis, and management of data for the UTHealth Stroke Registry (UTHSR) and other clinical trials. The data core has invested a significant amount of effort to improve the quality of the data in UTHSR. During the past 10 years, we have gained significant insight regarding the design, development, maintenance, quality control, and utilization of our stroke registry. The purpose of this article is to describe the development and assessment of enhanced quality assurance (QA) procedures in UTHSR and compare data quality in UTHSR before and after implementation of our enhanced QA procedures.


History of Houston stroke program and development of UTHSR

Houston is the fourth most populated city in the US and is home to the largest medical center in the world [34]. MHH-TMC was the first hospital in the Texas Medical Center. The Neurology department at MHH-TMC was one of the first in the US to use tPA for acute stroke and it was also the first hospital named as a “primary stroke center” by the state of Texas [35]. Historically, the stroke program at UTHealth formed in 1979 with the recruitment of Dr. Grotta. In 1986, the stroke team began to keep a written “log” of all patients admitted to the stroke service. Once the UTHealth stroke team started testing tPA in 1989, they began to keep slightly more detailed records. Once tPA was approved in 1996 [36], the data collected pertaining to tPA use became even more detailed in the stroke database leading to some of the UTHealth stroke team’s earlier publications. In 2002, the SPOTRIAS P50 mechanism funded a data core, which prompted the UTHealth stroke team to convert to an electronic database. In 2003, the UTHealth stroke team began to design and record data in UTHSR for all patients admitted to the UTHealth stroke service at MHH-TMC. In 2008, with the second round of SPOTRIAS funding, the data core made a specific commitment to develop and implement enhanced strategies for improving data quality in UTHSR. The data core represents a collaborative team of investigators supported by data managers, statisticians, programmers (system and web developers), two chart abstractors, and a quality control abstractor. An organizational chart that demonstrates the role and working relationships among various members of the Data Core is provided in Figure 1.

Figure 1
figure 1

Organizational chart for UTHealth Stoke Data Core.

Design of UTHSR, data elements, and data sources

Attributes considered in designing a registry must ensure that data are valid, reliable, responsive, interpretable, and translatable [37]. UTHSR is a prospective registry initially designed to capture essential information on all patients admitted to the UTHealth in-patient stroke service at MHH-TMC, with the primary aims of tracking the number of patients treated with intravenous (IV) tPA, their essential demographics, and complication rates, and to support research by members of the stroke team. With the funding of SPOTRIAS, the Principal Investigators (PIs) of the original SPOTRIAS sites decided to obtain common data elements that described essential demographics of all patients treated with IV tPA or enrolled in any clinical trials. UTHSR was consequently expanded to incorporate other elements including those variables that were needed for clinical trials that were conducted by the UT stroke team and variables that were needed for reporting to The Joint Commission (TJC) [38, 39], as well as select variables to meet minimum requirements for reporting to Centers for Medicare and Medicaid Services (CMS) as they pertain to the vascular neurology aspects of required reporting [40]. All patients who have been admitted to the stroke unit at MHH-TMC are classified by stroke diagnosis subtypes, including infarct (non-hemorrhagic stroke), intracerebral hemorrhage (ICH), intraventricular hemorrhage (IVH), transient ischemic attack (TIA), subarachnoid hemorrhage (SAH), epidural hematomas (EDH), subdural hematomas (SDH), non-acute infarct, and others that could not be classified as any of the above (“Not stroke”), and are entered in UTHSR. Other data elements include admission information (e.g., arrival date and time), medical history, National Institutes of Health Stroke Scale (NIHSS), modified Rankin Scale (mRS) score, Glasgow Coma Scale (GCS), laboratory results, CT scan, CT scan angiogram, MRI, MR angiogram images, thrombolysis therapy (e.g., tPA time and door to needle time), intra-arterial therapy (IAT), complications, and discharge information including: death, mRS on discharge (or day 7, whichever comes first), discharge disposition (home, skilled nursing facility, etc.), and particularly patient education and mRS at 90 days. Currently, the UTHealth stroke team captures up to 235 variables for each patient depending on stroke subtypes. Since some of these variables have multiple responses (e.g., medical history), the number of fields in UTHSR is 372. As UTHSR is modified, corresponding changes to the codebook are made; the codebook is also updated periodically as changes to the abstraction rules are identified or where clarity can be improved. The data core has developed policies for documentation. Members of the data core are responsible for adhering to all policies and procedures established.

Data sources and data entry into UTHSR

The most important source for data abstraction is the MHH-TMC electronic medical records (EMR) that includes all related personal and medical information. All registry data are manually abstracted from electronic chart review and from rounding with the stroke team. Our abstractors review the entire chart and capture the required information for each patient from admission to discharge from the stroke service. Ambiguous and questionable data, particularly from complicated cases, are discussed in weekly meetings whose regular attendees include vascular neurology UTHealth faculty members involved in the stroke program. In addition, the data core holds weekly meetings to give the abstractors an opportunity to discuss issues related to UTHSR and the abstraction of data. At first, abstraction was carried out by stroke research nurses and fellows, but after SPTORIAS funding, we hired dedicated full-time abstractors who are key members of the data core. Requirements are familiarity with medical terminology and willingness to stay for at least 1 year; our abstractors are usually health care professionals or other medical personnel. Abstractor training will be described later, but is a critical part of the quality assurance component of UTHSR. UTHSR codebook serves as the protocol for data abstraction. Since registries must confidentially maintain patients’ health information, security is an important issue [37]. We maintain confidentiality during all phases of data abstraction, monitoring, analysis, dissemination, and publication. The UTHealth servers reside in a secure location with limited access. All data are automatically backed up daily with redundant storage in a protected off-site location in accordance with UTHealth policies.

Data quality assurance procedures

QA steps include establishment and implementation of procedures that ensure the quality of data from the point of abstraction to analysis [41]. Strategies and procedures that help to improve data quality in stroke registries include training of abstractors [42], assessment of reliability (e.g., inter-rater reliability (IRR) between abstractors) [5, 29, 30], and validity of data [43]. After data cleaning [44] and resolution of potential discrepancies, an assessment of validity of data (e.g., accuracy level or error rate) based on a sample of re-abstracted or audited records in the registry is done. For UTHSR, our QA procedures mainly focus on the training of data abstractors, assessment of IRR between abstractors, development and implementation of formal data cleaning procedures, and evaluation of validity of data by calculating error rate or accuracy rate and other measures of validity (e.g., sensitivity, specificity), as will be described here.

Data abstractors’ training

For UTHSR, each abstractor is trained by using a codebook (or data dictionary) because a good understanding of the variables in the registry and where to locate their appropriate values by abstractors is essential to ensure data quality. The training of data abstractors also includes re-abstraction of data for a set of patients whose records are already available in UTHSR. Once an abstractor has demonstrated a reasonable level of confidence in abstraction, we provide a new set of patients for abstraction. During this phase, the abstractors attend rounds on the stroke clinical service and enter data in the registry under the supervision of a more experienced abstractor. Abstractors are also trained in contacting the discharged patients to obtain the 90-day mRS data.

Reliability study

Reeves et al. (2008) have demonstrated assessment of inter-rater reliability to establish the reliability of abstraction between two abstractors (hospital abstractors and the audit abstractor) [29]. In this study, for evaluating IRR between two abstractors, we randomly selected 30 patient records between 2008–2011 and asked two abstractors to re-abstract select variables including initial presentation, arrival time, age, INR, stroke type, onset time, tPA therapy, tPA time, symptomatic hemorrhage, mRS on discharge, and disposition at discharge. For continuous variables, IRR is assessed through the intra-class correlation coefficient (ICC) [29]. For categorical variables, IRR is assessed using the Kappa statistic [29, 42]. Since the IRR measures alone are not sufficient to assess the consistency of the abstraction, for binary variables, we assessed the Bias Index (BI) as defined by Reeves et al. (2008) [29]. For continuous variables, we calculated mean differences to assess the BI. We also provide 95% confidence intervals for the IRR measures.

Data cleaning

Data cleaning refers to a set of processes that involve identification and resolution of all discrepant data, including missing values, incorrect or out-of-range values, or implausible responses that are logically inconsistent with other responses in the database [37, 44]. Establishing standard data cleaning processes helps to detect and correct errors, resulting in higher data quality [45]. Since 2003, UTHSR has undergone a series of changes including the development and implementation of additional data quality checks in our data collection system program for prevention of data entry errors. In addition, we have developed and implemented data cleaning, including about 350 univariable and multivariable rules that detect potential data inconsistencies (invalid missing, impossible and implausible). Invalid missing fields are defined as those where the information should have been collected but was not collected or not entered in the registry. Impossible data are defined as data entries that are invalid, do not comply with the codebook, or are out of range. Implausible data are defined as those that are logically inconsistent with data in other fields or seem to be unusual based on statistical rules (e.g., Chebyshev’s rule). Chebyshev’s rule states that for random variables with finite variance, no more than 4% of the data can exceed more than five standard deviations to the right or left of the mean of the distribution. This rule helps to identify potential outliers regardless of the shape of the distribution [46, 47]. For some variables, we check the measurements above the 90th and below the 10th percentiles. Based on these rules, the data management analyst prepares a list of invalid missing, impossible, and implausible data for the abstractors so they can double check their entries and resolve potential errors. Once these issues are addressed, the data manager will rerun the same program to confirm that all issues are resolved.

Validity study

Validity of captured data by abstractors in a registry against a set of audited medical records or a “gold standard” provides an assessment of quality of data abstracted [43]. Xian et al. (2012) reported an overall composite accuracy rate of 96.1% for all data elements in the GWTG-Stroke registry [5]. Riks-Stroke registry also reported at least 95% consistency between the medical chart vs. what is recorded in Riks-Stroke registry for stroke subtype and clinical data, but much lower (approximately 85%) for data related to the health-care organization at the participating hospitals [16]. We have established a rigorous process for assessing validity of data in UTHSR. For this purpose, we randomly selected 115 patient records from UTHSR that consisted of the 30 patient records (with selected variables that were used for the reliability study) and another set of 85 patient records for which our quality control (QC) abstractor re-abstracted data for all variables by reviewing medical record charts and hospital records. These data were adjudicated by a team of experts in the data core that included a vascular neurologist (faculty or fellow), herein called the “gold standard”. These 115 patient records resulted in a total of 8877 data entries or data points in UTHSR during 2008–2011. We calculated the proportion of discrepant data elements relative to the number of entries re-abstracted (i.e., 8877). This error rate could be a combination of data entry in the registry and error in abstraction. For binary variables, we also assessed sensitivity and specificity as shown in Reeves et al. (2011) [31, 43]. Since sensitivity and specificity are not applicable for continuous variables, we calculated differences between the two sets of data for these variables. Agreement for these variables is defined as when there is no difference between the two set of values. We also calculated mean difference for each variable between the two sets of data. In addition, since we had the “gold standard” for the 30 patient records that were included in the reliability study, we assessed validity of data abstracted by each of the two abstractors involved in the reliability study.

Statistical analysis

We conducted descriptive analyses to provide summary statistics for certain key variables in UTHSR. For continuous variables with normal distributions, we used means (Standard Deviation (SD)) but when there was a significant departure from normality (e.g., skewed distributions) we used medians (Interquartile Range (IQR)). We calculated performance measures of care for stroke patients (STK-1 to STK-10) [48, 49] in UTHSR based on eligibility of patient related to diagnoses of ischemic stroke and hemorrhagic stroke. We used the Kappa statistic to estimate IRR for nominal variables along with 95% lower confidence limits (LCL) for IRR [50]. For continuous and ordinal variables, we used ICC with a 2-way, random effects ANOVA model [29]. In addition, we calculated the bias index (BI) [29] between data re-abstracted by the two abstractors. For binary variables, the bias index ranges between −1 and +1 with zero indicating no bias [29]. For continuous variables we used mean difference between the two set of values compared as the bias index. When the two abstractors are compared, a positive or negative BI indicates bias between the abstractors [51]. However, when each abstractor is compared with the “gold standard”, a positive or a negative BI shows that the distribution of values produced by the abstractor is shifted to the right or left of the “gold standard”, respectively.

For the validity study, as mentioned earlier we assessed level of agreement, sensitivity, and specificity along with 95% confidence intervals (CIs). Finally, we used logistic regression to compare error rates between 2008 and subsequent years. We also calculated 95% CIs for error rates in different years. All comparisons were made at 5% level of significance. All analyses were conducted using SPSS, Version 20 [52] or SAS software, Version 9.3 [53].


Descriptive analysis of our total registry data indicated that the mean age of patients was 62.7 years (SD = 15.8); 50.2% were male; 31.7% were African-American; 48.3% were Caucasian; and 14.1% were Hispanic. The distribution of stroke subtypes included 51.4% infarcts and 24.6% hemorrhage stroke (ICH, SAH, EDH, SDH, and IVH).

About 61% of the patients arrived at the hospital by ambulance and nearly 22% were transported by air. Nearly 39% were transferred patients from other hospitals. Approximately 37% of patients who had infarct reported an onset time of less than 2 hours prior to presentation. For patients with infarct, the mean time from arrival to thrombolysis treatment time was 68 minutes (including patients treated with off-label use of thrombolytic). The overall in-hospital mortality among the stroke patients was 8.2% (ischemic stroke 5.8%, ICH 21.4%). The median discharge mRS was 4.0 for all discharged patients. Additional information regarding demographics and other characteristics of the patients at hospital arrival and discharge are reported in Table 2.

Table 2 Summary characteristics of patients at arrival and discharge time in UTHSR, 2008–2011 (N=5093)

Overall, during 2008–2011, 32.2% of patients presenting to our emergency department (ED) with acute cerebral infarct received tPA within 4.5 hours of symptom onset and 24.1% received tPA within 3 hours of symptom onset. The data indicate a significant upward trend in the proportion of patients who received tPA between 2008–2011 (P-value based on Chi-square test for linear trend < 0.02). Other data related to the tPA times are presented in Table 3.

Table 3 Distribution of “Onset to tPA Time” among infarct patients presenting at MHH-TMC, 2008–2011 (N=1627)

The discharge rate on antithrombotic therapy (STK-2) was 94.3% and 87.1% of eligible patients received thrombolytic therapy (STK-4) that met the standards set for performance measures. Similarly, for stroke education (STK-8) 88.8% of the patients met the standards set for performance measure. The Chi-square test for linear trend did not show any significant upward or downward trends in the performance measures over the 4 years. All data related to the performance measures are reported in Table 4.

Table 4 Summary statistics regarding performance measures of care for stroke patients in UTHSR by year, 2008-2011

We observed an excellent IRR (Kappa ≥ 0.75) between the two abstractors for most categorical variables including diagnosis of stroke as infarct and ICH, tPA therapy, disposition, and initial presentation. IRR was moderate (0.40 ≤ Kappa < 0.75) for diagnosis of stroke as TIA. For most continuous variables, we also observed an excellent IRR (ICC ≥ 0.75), including age, onset time, arrival time, INR and mRS on discharge, except for IRR of tPA time that was poor (−0.48). Similarly, when we assessed the validity of categorical variables abstracted by each abstractor against the “gold standard”, for both abstractors we observed at least 93.3% agreement. For both abstractors, we also found 100 % sensitivity for these variables, except for initial presentation by abstractor #2 that had sensitivity of 95.4%. Specificity was 100% for all categorical variables except for disposition at “home” and “skilled nursing facility” for both abstractors that had specificity of 91.3% and 96.6%, respectively. For continuous variables, for both abstractors we observed a correlation of at least 0.83 for all variables, except for tPA time and INR for abstractor #2 that had correlation coefficients of −0.49 and 0.39, respectively. For continuous variables, we also reported mean differences. All data regarding the validity of the abstraction by the two abstractors against the “gold standard” are summarized in Table 5.

Table 5 IRR between the two abstractors, and validity against the “gold standard” for select variables, (N=30)

For all categorical variables in UTHSR we observed an accuracy rate (% agreement with the “gold standard”) of at least 96.5%. While sensitivity and specificity were 100% for most binary variables, wake-up stroke had the lowest sensitivity (91.7%) and IA therapy had the lowest specificity (92.9%). For continuous variables, we also observed a high level of agreement (>94.2%) for most variables, except for some date/time variables including CT time and arrival time that had accuracy rates of 80.8% and 85.2%, respectively. We also found very high correlations (r>0.96) between UTHSR records and the “gold standard” for almost all continuous variables, except for Glucose that had r=0.86. Mean differences for these variables between UTHSR records and the “gold standard” are reported in Table 6.

Table 6 Measures of validity of UTHSR data against “gold standard” based on 115 patient records, 2008–2011

Finally, our analysis indicated error rates of 4.8% [95% CI (3.9, 5.7)], 2.3% [95% CI (1.7, 3.0)], 4.6% [95% CI (3.8, 5.5)], and 2.2% [95% CI (1.6, 2.8)] for years 2008, 2009, 2010, and 2011, respectively. Furthermore, the differences between error rate in 2008 and subsequent years were statistically significant (all P < 0.001), except for the difference between 2008 and 2010. The numbers of data points used for calculation of error rates and 95% CIs for each of the four years, 2008–2011, are displayed in Figure 2.

Figure 2
figure 2

Estimated error rate (%) in UTHSR data with 95% CIs by year during 2008–2011.


In this article, we describe the development and assessment of enhanced quality assurance procedures in UTHSR and compare data quality in UTHSR before and after implementation of our enhanced QA procedures. Since we implemented our enhanced QA procedures for UTHSR at the end of 2008 when we received the second round of funding for SPOTRIAS, we believe 2008 serves as an important reference point and any potential improvements in our registry data would be reflected from 2009 and thereafter. Our finding of a significant reduction in UTHSR error rate from 4.8% in 2008 to 2.2% in 2011 (P < 0.0001) indicates that our effort in enhancing and formalizing the QA procedures has been successful in reducing the error rate. Though we observed an increase in error rate from 2.3% in 2009 to 4.6% in 2010 (P < 0.0001), we believe this is partly attributed to recruitment of two new chart abstractors in early 2010.

Other stroke registries have published data from different parts of the world [2, 3, 6, 9, 10, 12, 1623, 25],[27, 28, 54]. However, limited data have been reported regarding data quality as well as data quality assurance procedures for stroke registries. In fact, among the fourteen stroke registries that have published articles in English for which we had access to the full articles [2, 3, 5, 6, 10, 13, 14, 1619],[21, 24, 25, 2730, 54, 55], we found that only seven (50%) reported on the data quality of the data in their registries [2, 5, 10, 13, 14, 16, 18, 24],[29, 54, 55]. The remaining seven registries did not discuss their data quality assurance procedures [3, 17, 19, 21, 25, 27, 28]. Since one of the aims of developing stroke registries is to improve the quality of care, and to reduce mortality attributable to stroke [6], it is imperative that the data used for such assessments is of high quality. Therefore, we devote a significant portion of the discussion in this paper to the enhanced QA procedures that we have established for ensuring data quality from chart abstraction to data management and analysis.

Training of abstractors and assessment of reliability and validity of abstracted data

Reeves et al. (2008) has highlighted the importance of training in maintaining data quality for PCNASR registries [29]. Our quality assurance process rests on an understanding that each new abstractor should be provided sufficient training for codebook definitions and specialty issues of stroke (e.g., CT time). Historically, our abstractors used to be healthcare professionals including nurse practitioners; only recently, we have hired non-clinicians. This makes a difference in the level of training that needs to be done. To ensure reliability and validity of data, we believe training of abstractors should be mandatory [29].

The complexity and differences in the interpretation of the key variables in medical records for stroke patients demands continuous training and evaluation of the work conducted by the abstractors to ensure data quality in a stroke registry. For evaluation of data abstracted, we have conducted a reliability study to assess IRR between the two abstractors and a validity study to assess the level of agreement, sensitivity, and specificity between the data abstracted by each abstractor and the “gold standard”. For these evaluations, we have only used select variables from a set of 30 patient records for which the “gold standard” data were available.

We found excellent IRR (ICC ≥ 0.75) between the two abstractors for most continuous variables including age, onset time, arrival time, and mRS on discharge. These findings are consistent with those reported by Reeves et al. (2008) for PCNASR that indicated excellent reliability for age, stroke onset time, ED arrival time, and mRS [29]. Our findings are also consistent with those reported by Xian et al. (2012) for GWTG-Stroke registry indicating an excellent IRR for age [5]. We also found poor IRR (ICC = −0.48) for tPA time with a large mean difference between the two abstractors caused by discrepancies in dates (i.e., a wrong day, month, or year) for 3 patients out of 7 dates who received tPA therapy.

For categorical variables, we observed excellent IRR (Kappa ≥ 0.75) between the two abstractors for most variables including diagnosis of stroke as infarct and ICH, tPA therapy, disposition, and initial presentation, except for diagnosis of stroke as TIA that had a moderate IRR (Kappa = 0.65). These findings are consistent with those reported by Xian et al. (2012) for GWTG-Stroke registry that indicated excellent IRR for final clinical diagnosis, tPA therapy and discharge destination [5]. However, in PCNASR, Reeves et al. (2008) found poor reliability for stroke team consultation, time of initial brain imaging, discharge destination, and stroke/TIA diagnosed in emergency department [29]. Nonetheless, excellent IRR between the two abstractors alone does not indicate the validity of data abstracted. We believe assessment of both IRR and validity (% agreement, sensitivity, and specificity) provide a more informed evaluation of abstractors’ performance.

Our validity study based on 30 patient records revealed that for all selected categorical variables, the agreement between the data abstracted by each of the two abstractors against the “gold standard” was above 93.3%. Since the level of agreement alone does not indicate the validity of the data abstracted, we also computed bias index between each of the two abstractors and the “gold standard”. For example, for initial presentation at MHH-TMC we observed an agreement of 96.7% for abstractor #2 but the bias index for this variable against the “gold standard” was −0.03, indicating the tendency for abstractor #2 to record initial presentation as “other hospitals” while the “gold standard” indicated initial presentation at MHH-TMC. For continuous variables, we found a correlation of at least 0.83 for all selected variables, except for tPA time (r = −0.49) and INR (r = 0.39) for abstractor #2. Because a high correlation alone does not indicate the validity of the data abstracted, we also computed mean differences for these variables between each of the two abstractors and the “gold standard”. For example, for onset time we observed a perfect correlation (r =1) for both abstractors but the mean differences for this variable against the “gold standard” was 9.4 minutes for abstractor #1 and 14.1 minutes for abstractor #2. Interpretation of mean differences is similar to that of the bias index as described earlier. The 95% CIs for mean differences indicate no significant differences between the abstracted data by the two abstractors and the “gold standard”, except for tPA time for abstractor #2 that 95% CI is not reported. This is because abstractor #2 reported 3 wrong dates (i.e., a wrong day, month or year) out of 7 dates for patients who received tPA therapy, which resulted in a misleading mean difference. These findings indicate areas where additional training may be needed for the abstractors. Xian et al. (2012) used the audit abstractor for assessment of validity and accuracy of abstractors’ work against the audit data in GWTG-Stroke registry. They also reported a high accuracy rate for the majority of variables such as age, diagnosis and evaluation, arrival date/time, tPA therapy [5]. We believe an objective evaluation of the abstractors’ performance and providing additional training will result in improved data quality.

Data cleaning

Data cleaning is an essential component of QA processes which includes identification and resolution of all discrepant data in the database [37, 44]. However, the extent of data cleaning varies for different registries that reported their QA procedure [2, 5, 13, 18, 24]. As mentioned earlier, for UTHSR we developed a comprehensive program that included about 350 univariable and multivariable validation rules that were used to identify discrepant data. Whereas for GWTG-Stroke registry, the data abstraction tool included predefined logic features and user alerts to identify potentially invalid format or data entry. Required fields were structured so that valid data must be entered before the data are saved. Range checks were used for inconsistent or out-of-range data and prompted the user to correct or review data entries that were considered out of range [13]. Similarly, Hsieh et al. (2010) reported using logic checks and variable limits to prevent inaccurate data entries in TSR [2]. Since the data cleaning process does not capture all discrepant data, some recommend a random audit of a fraction of data in a stroke registry. For example, Hsieh et al. (2010) used random auditing of 5% of all cases entered in TSR [2].

Assessment of validity for data in UTHSR

It is important to note that accuracy of data is dependent on the type (e.g., categorical or continuous) and complexity of capturing accurate values for variables in a registry. For example, some registries have reported lower levels of accuracy for variables that involve date/time (e.g., time to an event) [56]. Therefore, it is important to assess validity of data for key variables in a stroke registry.

Our assessment of validity of data against the “gold standard” for categorical variables, based on 115 patient records for key variables and 85 patient records for the remaining variables, indicates an excellent agreement (> 96%) between the UTHSR data against the “gold standard”. Our findings are consistent with those reported by Xian et al. (2012), indicating high levels of agreement between audited data and medical records for categorical variables, except for DVT prophylaxis that had 79% agreement [5]. In addition, levels of sensitivity and specificity for categorical variables in UTHSR were excellent (above 92.9%). For continuous variables, we found a correlation of at least 0.86 for all selected variables. However, agreement for CT time and arrival time were 80.8% and 85.2%, respectively. For reasons described earlier, we also computed mean difference for these variables between the UTHSR and the “gold standard”. For example, for arrival time we observed a perfect correlation (r =1) for UTHSR but the mean difference for this variable against the “gold standard” was 26.7 [95% CI (−36.3, 89.6)] minutes. Therefore, specific attention needs to be made regarding CT time or arrival time because wrong dates (i.e., a wrong day, month or year) or sometimes error due to using different sources for capturing these variables could result in mean difference between the abstracted data by the abstractors and the “gold standard”. Others have also reported difficulty in capturing time related variables in stroke registries. For example, George et al. (2009) reported that for the majority (57.8%) of patients in PCNASR, the time from onset of symptoms to hospital arrival was not recorded or was not known [6]. Xian, et al. (2012) also reported on validity of data for continuous variables in GWTG-Stroke registry indicating accuracy rates of at least 85% for majority of variables, notably an agreement of more than 93.6 for arrival time and brain imaging time. However, their higher levels of agreement for the arrival and brain imaging times could be due to their definition of agreement for continuous variables, as they consider such data accurate if the values in the registry and in the audited records are within 15 minutes of each other [5], whereas we considered a perfect agreement for these variables in our study.

During 2008–2011, for UTHSR we observed a significant improvement in the error rate (dropped from 4.8% in 2008 to 2.2% in 2011). We attribute this improvement to our effort in enhancing and formalizing our quality assurance procedures that include enhancements in software development, improving instructions in the codebook, training of abstractors, evaluating reliability between abstractors, data cleaning, and assessment of validity of data in UTHSR. A few stroke registries have published accuracy rates or error rates as indicators of the overall validity of data in their registries [5, 16]. For example, Asplund et al. (2011) reported an accuracy rate of at least 95% between the medical charts and data recorded in Riks-Stroke registry for stroke subtype and clinical data, but somewhat lower (approximately 85%) for data related to the healthcare organization at the participating hospitals [16]. Similarly, Xian et al. (2012) also reported 96.1% accuracy rate for data quality of GWTG-Stroke registry [5]. These published error rates are similar to that of our 2008 data from UTHSR, before our enhanced data quality assurance procedures were implemented. However, to our knowledge, our overall error rate in 2011 of 2.2% in UTHSR appears to be the lowest among all published error rates from stroke registries so far.

There are no established acceptable error rates for key variables in stroke registries, but for other epidemiologic and clinical trials the CDC recommends an error rate of 0.3% (3 per 1,000 entries) [57], which is usually achieved by conducting double data entry. Others have reported a more liberal estimate for the error rate, 1% to 5% for general databases used by many companies. However, 0.1% to 0.5% error rate is acceptable for clinical trials [58, 59]. Studies that used single data entry procedures reported higher error rates compared with those who used double data entry [60]. Since the source of data error in stroke registries could be due to an error in abstraction or data entry, in order to achieve a significant reduction in the error rate, one should perform double data abstraction as well as double data entry. The decision whether to use a double data entry or single data entry process largely depends upon the availability of resources. For these reasons, some registries employ a single data abstraction and data entry procedure along with some sort of quality assurance procedure by re-abstracting and entering a small fraction (5%) for all the records to assess data quality in stroke registries [2].

Since the main focus of this paper is to assess the effect of enhanced QA procedures on data quality in UTHSR, we limited our discussion of the data obtained on our patients in UTHSR. While UTHSR data are mostly consistent with other stroke registries for hospitalized stroke patients, we observed differences in some characteristics. For example, the mean age of patients in UTHSR is 62.7 years (median = 63), significantly younger than that of the patients in ASTRAL (median = 72.5) [21] and PCNASR in Michigan (mean = 70.9 years) [29]. Four other participating states (Georgia, Illinois, Massachusetts, and North Carolina) that contributed data to PCNASR during 2005–2007, reported a median age of 72 years, but Georgia had younger patients (median = 67 years) [6]. In our registry, 51.4% had infarcts and 24.6% had hemorrhagic stroke. PCNASR reported that 14% of their patients were hemorrhagic and 56-58% were ischemic stroke [6, 29]. A very large number of patients with ICH who were transferred to MHH-TMC rather than presenting directly to the ED, could explain a relatively higher percent of higher ICH patients in UTHSR. Compared with most other registries, UTHSR has a higher rate of air transport (22.5%). For example, Austrian Stroke Unit Registry reported that only 4.1% of patients were transported by helicopter from 32 stroke units between 2003 and 2009 [61]. This difference could be due to the fact that for over 30 years the UT Stroke team and MHH-TMC have been providing medivac helicopter service (Life Flight®) to bring stroke patients quickly to our stroke unit. In addition, depending upon distance and stroke severity, many other hospitals also choose to fly transferred patients to MHH-TMC. Furthermore, for more than five years, our stroke team has utilized telemedicine to “reach out” to smaller hospitals that do not have stroke expertise.

In UTHSR, the distribution of NIHSS baseline (NIHSS on admission) score comprised 40% less than or equal to 4; 31.5% within 5–14; and 28.5% more than 14. Our result for less severe stroke (NIHSS baseline ≤ 4) is consistent with the Switzerland registry (ASTRAL) (40.8%) [21]. For patients with ICH, the median NIHSS score was twice that of NIHSS score of patients with ischemic stroke. In UTHSR, the percentage of eligible patients who received tPA treatment within 3 hours had an upward trend, 21.3% in 2008 which increased to 27.3% in 2011. The Joint Commission Disease-Specific Care (DSC) suggested 10 items as performance measures for stroke registries [48, 49]. We did not find a significant linear trend in the performance measures during 2008–2011.

Finally, we recognize that implementation of the proposed enhanced QA procedures may be cost prohibitive due to unavailability of the required resources for some institutions. We have been fortunate to have funding from SPOTRIAS that helped us to develop and implement these enhanced QA procedures in UTHSR. We believe that the utility of the proposed QA procedures is dependent on the objectives and use of data in stroke registries. Therefore, stroke centers that have or are planning to establish a stroke registry should carefully evaluate their needs and implement a suitable level of QA procedure to ensure reliability and validity of data.


The decision to use 30 patient records for the reliability study (i.e., to assess IRR) was based on the following reports. First, Sim and Wright (2005) [62] have shown that sample size of 30 provides nearly 80% power for detecting an association between the two abstractors as long as Kappa ≥ 0.5. Second, Walter et al. (1998) have shown that sample size of greater than 22 provides at least 80% power for detecting any association between the two abstractors with respect to continuous measures as long as the ICC ≥ 0.5 [63]. However, we acknowledge a limitation that the sample of 30 for the “gold standard” records may not be sufficient to conduct inferential statistics on these data. Therefore, for assessment of validity of data in UTHSR, we have combined the 30 “gold standard” records with another 85 patient records for which information about all variables were re-abstracted and adjudicated by our data core team that included a vascular neurology clinician (faculty or fellow). This resulted in a total of 115 patient records used for assessing the validity of data in UTHSR. Since only select variables were included in the “gold standard” for the 30 patient records, the numbers of observations available for different variables in the validity study for UTHSR vary. Therefore, we recommend a careful interpretation of the data related to various inferences that could be made from these measures of validity. Overall, we believe the number of patient records used for the reliability and validity studies provide important information as to whether additional training is needed for any of the abstractors and for which variables such training is needed.


In this study, we have described the UTHealth Stroke Registry and shown that establishment of enhanced data quality assurance has helped to improve the validity of our data. Our enhanced quality assurance procedures included training of abstractors, assessment of IRR between abstractors as well as assessment of validity of data abstracted compared with the “gold standard”, and development and implementation of univariable and multivariable data cleaning rules. We have observed an excellent inter-rater reliability and validity for almost all key variables. Our resulting data compare well with data from other registries and certification guidelines, and demonstrate tPA treatment rates that are among the highest reported.


  1. Towfighi A, Saver JL: Stroke declines from third to fourth leading cause of death in the United States: historical perspective and challenges ahead. Stroke; a journal of cerebral circulation. 2011, 42 (8): 2351-2355. 10.1161/STROKEAHA.111.621904.

    Article  PubMed  Google Scholar 

  2. Hsieh FI, Lien LM, Chen ST, Bai CH, Sun MC, Tseng HP, Chen YW, Chen CH, Jeng JS, Tsai SY, et al: Get With the Guidelines-Stroke performance indicators: surveillance of stroke care in the Taiwan Stroke Registry: Get With the Guidelines-Stroke in Taiwan. Circulation. 2010, 122 (11): 1116-1123. 10.1161/CIRCULATIONAHA.110.936526.

    Article  PubMed  Google Scholar 

  3. Caplan LR, Mohr JP: Harvard Stroke Registry: make-up and purpose. Neurology. 1979, 29 (5): 755-10.1212/WNL.29.5.755.

    CAS  Article  PubMed  Google Scholar 

  4. Mohr JP, Caplan LR, Melski JW, Goldstein RJ, Duncan GW, Kistler JP, Pessin MS, Bleich HL: The Harvard Cooperative Stroke Registry: a prospective registry. Neurology. 1978, 28 (8): 754-762. 10.1212/WNL.28.8.754.

    CAS  Article  PubMed  Google Scholar 

  5. Xian Y, Fonarow GC, Reeves MJ, Webb LE, Blevins J, Demyanenko VS, Zhao X, Olson DM, Hernandez AF, Peterson ED, et al: Data quality in the American Heart Association Get With The Guidelines-Stroke (GWTG-Stroke): results from a national data validation audit. Am Heart J. 2012, 163 (3): 392-398. 10.1016/j.ahj.2011.12.012.

    Article  PubMed  Google Scholar 

  6. George MG, Tong X, McGruder H, Yoon P, Rosamond W, Winquist A, Hinchey J, Wall HK, Pandey DK: Paul Coverdell National Acute Stroke Registry Surveillance - four states, 2005–2007. MMWR Surveill Summ. 2009, 58 (5): 1-23.

    PubMed  Google Scholar 

  7. Reeves MJ, Arora S, Broderick JP, Frankel M, Heinrich JP, Hickenbottom S, Karp H, Labresh KA, Malarcher A, Mensah G, et al: Acute stroke care in the US: results from 4 pilot prototypes of the Paul Coverdell National Acute Stroke Registry. Stroke; a journal of cerebral circulation. 2005, 36 (6): 1232-1240.

    Article  PubMed  Google Scholar 

  8. Centers for Disease Control and Prevention: PCNASR History. 2013, 2-13-2013. 3-1-2013

    Google Scholar 

  9. Caplan LR, Wityk RJ, Glass TA, Tapia J, Pazdera L, Chang HM, Teal P, Dashe JF, Chaves CJ, Breen JC, et al: New England Medical Center Posterior Circulation registry. Ann Neurol. 2004, 56 (3): 389-398. 10.1002/ana.20204.

    Article  PubMed  Google Scholar 

  10. Caplan L, Chung CS, Wityk R, Glass T, Tapia J, Pazdera L, Chang HM, Dashe J, Chaves C, Vemmos K, et al: New England medical center posterior circulation stroke registry: I. Methods, data base, distribution of brain lesions, stroke mechanisms, and outcomes. J Clin Neurol. 2005, 1 (1): 14-30. 10.3988/jcn.2005.1.1.14.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Caplan LR: Stroke classification: a personal view. Stroke;a journal of cerebral circulation. 2011, 42 (1 Suppl): S3-S6.

    Article  PubMed  Google Scholar 

  12. American Heart Association: American Stroke Association. 2011, Get With The Guidelines-Stroke, 10-7-2011. 4-30-2012

    Google Scholar 

  13. Schwamm LH, Fonarow GC, Reeves MJ, Pan W, Frankel MR, Smith EE, Ellrodt G, Cannon CP, Liang L, Peterson E, et al: Get With the Guidelines-Stroke is associated with sustained improvement in care for patients hospitalized with acute stroke or transient ischemic attack. Circulation. 2009, 119 (1): 107-115. 10.1161/CIRCULATIONAHA.108.783688.

    Article  PubMed  Google Scholar 

  14. Reeves MJ, Grau-Sepulveda MV, Fonarow GC, Olson DM, Smith EE, Schwamm LH: Are quality improvements in the get with the guidelines: stroke program related to better care or better data documentation?. Circ Cardiovasc Qual Outcomes. 2011, 4 (5): 503-511. 10.1161/CIRCOUTCOMES.111.961755.

    Article  PubMed  Google Scholar 

  15. Fonarow GC, Reeves MJ, Smith EE, Saver JL, Zhao X, Olson DW, Hernandez AF, Peterson ED, Schwamm LH: Characteristics, performance measures, and in-hospital outcomes of the first one million stroke and transient ischemic attack admissions in get with the guidelines-stroke. Circ Cardiovasc Qual Outcomes. 2010, 3 (3): 291-302. 10.1161/CIRCOUTCOMES.109.921858.

    Article  PubMed  Google Scholar 

  16. Asplund K, Hulter AK, Appelros P, Bjarne D, Eriksson M, Johansson A, Jonsson F, Norrving B, Stegmayr B, Terent A, et al: The Riks-Stroke story: building a sustainable national register for quality assessment of stroke care. Int J Stroke. 2011, 6 (2): 99-108. 10.1111/j.1747-4949.2010.00557.x.

    Article  PubMed  Google Scholar 

  17. Fang J, Kapral MK, Richards J, Robertson A, Stamplecoski M, Silver FL: The registry of canadian stroke network : an evolving methodology. Acta Neurol Taiwan. 2011, 20 (2): 77-84.

    PubMed  Google Scholar 

  18. Cadilhac DA, Lannin NA, Anderson CS, Levi CR, Faux S, Price C, Middleton S, Lim J, Thrift AG, Donnan GA: Protocol and pilot data for establishing the Australian Stroke Clinical Registry. Int J Stroke. 2010, 5 (3): 217-226. 10.1111/j.1747-4949.2010.00430.x.

    Article  PubMed  Google Scholar 

  19. Hajat C, Heuschmann PU, Coshall C, Padayachee S, Chambers J, Rudd AG, Wolfe CD: Incidence of aetiological subtypes of stroke in a multi-ethnic population based study: the South London Stroke Register. J Neurol Neurosurg Psychiatry. 2011, 82 (5): 527-533. 10.1136/jnnp.2010.222919.

    Article  PubMed  Google Scholar 

  20. Smeeton NC, Corbin DO, Hennis AJ, Hambleton IR, Fraser HS, Wolfe CD, Heuschmann PU: A comparison of acute and long-term management of stroke patients in Barbados and South London. Cerebrovasc Dis. 2009, 27 (4): 328-335. 10.1159/000202009.

    Article  PubMed  Google Scholar 

  21. Michel P, Odier C, Rutgers M, Reichhart M, Maeder P, Meuli R, Wintermark M, Maghraoui A, Faouzi M, Croquelois A, et al: The Acute STroke Registry and Analysis of Lausanne (ASTRAL): design and baseline analysis of an ischemic stroke registry including acute multimodal imaging. Stroke; a journal of cerebral circulation. 2010, 41 (11): 2491-2498. 10.1161/STROKEAHA.110.596189.

    Article  PubMed  Google Scholar 

  22. Hofer C, Kiechl S, Lang W: The Austrian Stroke-Unit-Registry. Wiener Medizinische Wochenschrift. 2008, 158 (15-16): 411-417. 10.1007/s10354-008-0563-6.

    Article  PubMed  Google Scholar 

  23. Heuschmann PU, Kolominsky-Rabas PL, Kugler C, Leffmann C, Neundorfer B, Haass A, Lowitzsch K, Berger K: Quality assurance in treatment of stroke: basic module of the German Stroke Registry Study Group. Gesundheitswesen. 2000, 62 (10): 547-552. 10.1055/s-2000-13039.

    CAS  Article  PubMed  Google Scholar 

  24. Heuschmann PU, Berger K, Misselwitz B, Hermanek P, Leffmann C, Adelmann M, Buecker-Nott HJ, Rother J, Neundoerfer B, Kolominsky-Rabas PL: Frequency of thrombolytic therapy in patients with acute ischemic stroke and the risk of in-hospital mortality: the German Stroke Registers Study Group. Stroke;a journal of cerebral circulation. 2003, 34 (5): 1106-1113. 10.1161/01.STR.0000065198.80347.C5.

    Article  PubMed  Google Scholar 

  25. Thorvaldsen P, Davidsen M, Bronnum-Hansen H, Schroll M: Stable stroke occurrence despite incidence reduction in an aging population: stroke trends in the danish monitoring trends and determinants in cardiovascular disease (MONICA) population. Stroke;a journal of cerebral circulation. 1999, 30 (12): 2529-2534. 10.1161/01.STR.30.12.2529.

    CAS  Article  PubMed  Google Scholar 

  26. Thorvaldsen P, Kuulasmaa K, Rajakangas AM, Rastenyte D, Sarti C, Wilhelmsen L: Stroke trends in the WHO MONICA project. Stroke;a journal of cerebral circulation. 1997, 28 (3): 500-506. 10.1161/01.STR.28.3.500.

    CAS  Article  PubMed  Google Scholar 

  27. Wang Y, Cui L, Ji X, Dong Q, Zeng J, Wang Y, Zhou Y, Zhao X, Wang C, Liu L, et al: The China National Stroke Registry for patients with acute cerebrovascular events: design, rationale, and baseline patient characteristics. Int J Stroke. 2011, 6 (4): 355-361. 10.1111/j.1747-4949.2011.00584.x.

    Article  PubMed  Google Scholar 

  28. Takizawa S, Shibata T, Takagi S, Kobayashi S: Seasonal Variation of Stroke Incidence in Japan for 35631 Stroke Patients in the Japanese Standard Stroke Registry, 1998–2007. J Stroke Cerebrovasc Dis. 2011, 22 (1): 36-41.

    Article  PubMed  Google Scholar 

  29. Reeves MJ, Mullard AJ, Wehner S: Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry. BMC Neurol. 2008, 8 (19):

  30. Yoon SS, George MG, Myers S, Lux LJ, Wilson D, Heinrich J, Zheng ZJ: Analysis of Data-Collection Methods for an Acute Stroke Care Registry. Am J Prev Med. 2006, 31 (6 Suppl 2): S196-S201.

    Article  PubMed  Google Scholar 

  31. Reeves MJ, Wehner S, Organek N, Birbeck GL, Jacobs BS, Kothari R, Hickenbottom S, Mullard AJ: Accuracy of identifying acute stroke admissions in a Michigan Stroke Registry. Prev Chronic Dis. 2011, 8 (3): A62-

    PubMed  PubMed Central  Google Scholar 

  32. National Institute of Neurological Disorders and Stroke (NINDS): Specialized Program Of Translational Research in Acute Stroke (SPOTRIAS) network. 2012, NINDS, 3-29-2012

    Google Scholar 

  33. The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group: Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995, 333 (24): 1581-1587.

    Article  Google Scholar 

  34. The City of Houston: Houston Facts and Figures. 2011, Houston, Texas: The City of Houston official site, 9-8-2011

    Google Scholar 

  35. Memorial Hermann Hospital: “Primary Stroke Center” by the State of Texas. 2012, Memorial Hermann Hospital, 3-29-2012

    Google Scholar 

  36. Zivin JA: Acute stroke therapy with tissue plasminogen activator (tPA) since it was approved by the U.S. Food and Drug Administration (FDA). Ann Neurol. 2009, 66 (1): 6-10. 10.1002/ana.21750.

    Article  PubMed  Google Scholar 

  37. Gliklich RE, Dreyer NA: Registries for evaluating patient outcomes: a user's guide. 2010, US Dept. of Health and Human Services, Public Health Service, Agency for Healthcare Research and Quality

    Google Scholar 

  38. The Joint Commission: Joint Commission on Accreditation of Healthcare Organizations (JCAHO) standards. 2012, 4-21-2012

    Google Scholar 

  39. Chang A, Schyve PM, Croteau RJ, O’Leary DS, Loeb JM: The JCAHO patient safety event taxonomy: a standardized terminology and classification schema for near misses and adverse events. Int J Qual Health Care. 2005, 17 (2): 95-105. 10.1093/intqhc/mzi021.

    Article  PubMed  Google Scholar 

  40. Centers for Medicare and Medicaid Services: Centers for Medicare and Medicaid Services (CMS). 2012, 7-18-2012

    Google Scholar 

  41. Gassman JJ, Owen WW, Kuntz TE, Martin JP, Amoroso WP: Data quality assurance, monitoring, and reporting. Control Clin Trials. 1995, 16 (2 Suppl): 104S-136S.

    CAS  Article  PubMed  Google Scholar 

  42. Reisch LM, Fosse JS, Beverly K, Yu O, Barlow WE, Harris EL, Rolnick S, Barton MB, Geiger AM, Herrinton LJ, et al: Training, Quality Assurance, and Assessment of Medical Record Abstraction in a Multisite Study. Am J Epidemiol. 2003, 157 (6): 546-551. 10.1093/aje/kwg016.

    Article  PubMed  Google Scholar 

  43. Malin JL, Kahn KL, Adams J, Kwan L, Laouri M, Ganz PA: Validity of cancer registry data for measuring the quality of breast cancer care. J Natl Cancer Inst. 2002, 94 (11): 835-844. 10.1093/jnci/94.11.835.

    Article  PubMed  Google Scholar 

  44. Van den Broeck J, Argeseanu Cunningham S, Eeckels R, Herbst K: Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities. PLoS Med. 2005, 2 (10): e267-10.1371/journal.pmed.0020267.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Rahbar MH, Wyatt G, Sikorskii A, Victorson D, Ardjomand-Hessabi M: Coordination and management of multisite complementary and alternative medicine (CAM) therapies: experience from a multisite reflexology intervention trial. Contemp Clin Trials. 2011, 32 (5): 620-629. 10.1016/j.cct.2011.05.015.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Maletic JI, Marcus A: Data Cleansing: A Prelude to Knowledge Discovery. Data Mining and Knowledge Discovery Handbook. Edited by: Oded M, Lior R. 2010, Springer, 19-32. 2

    Google Scholar 

  47. Barnett V, Lewis T: Outliers in Statistical Data. 1994, John Wiley and Sons

    Google Scholar 

  48. Centers for Medicare & Medicaid Services (CMS) and The Joint Commission: Specifications Manual for National Hospital Inpatient Quality Measures-Version 3.2.C. 2011, 3-12-0013

    Google Scholar 

  49. Joint Commission Disease-Specific Care (DSC): Disease-Specific Care Certification Program STROKE Performance Measurement Implementation Guide. 2008, 5-6-2012, 2

    Google Scholar 

  50. Hripcsak G, Heitjan DF: Measuring agreement in medical informatics reliability studies. J Biomed Inform. 2002, 35 (2): 99-110. 10.1016/S1532-0464(02)00500-2.

    Article  PubMed  Google Scholar 

  51. Byrt T, Bishop J, Carlin JB: Bias, prevalence and kappa. J Clin Epidemiol. 1993, 46 (5): 423-429. 10.1016/0895-4356(93)90018-V.

    CAS  Article  PubMed  Google Scholar 

  52. SPSS Inc: IBM SPSS Statistics 20. 2011, Chicago, IL, USA: SPSS Inc., an IBM Company

    Google Scholar 

  53. SAS Institute Inc.: SAS® 9.3. 2011, Cary, NC, USA: SAS Institute Inc.

    Google Scholar 

  54. Reeves MJ, Fonarow GC, Smith EE, Pan W, Olson D, Hernandez AF, Peterson ED, Schwamm LH: Representativeness of the Get With The Guidelines-Stroke Registry: comparison of patient and hospital characteristics among Medicare beneficiaries hospitalized with ischemic stroke. Stroke;a journal of cerebral circulation. 2012, 43 (1): 44-49. 10.1161/STROKEAHA.111.626978.

    Article  PubMed  Google Scholar 

  55. Reeves MJ, Parker C, Fonarow GC, Smith EE, Schwamm LH: Development of stroke performance measures: definitions, methods, and current measures. Stroke;a journal of cerebral circulation. 2010, 41 (7): 1573-1578. 10.1161/STROKEAHA.109.577171.

    Article  PubMed  Google Scholar 

  56. Hlaing T, Hollister L, Aaland M: Trauma registry data validation: Essential for quality trauma care. J Trauma Acute Care Surg. 2006, 61 (6): 1400-1407. 10.1097/01.ta.0000195732.64475.87.

    Article  Google Scholar 

  57. Centers for Disease Control and Prevention: National Survey of Ambulatory Surgery Data Collection and Processing. 2009, 4-8-2010

    Google Scholar 

  58. Prokscha S: Entering data. Practical Guide to Clinical Data Management. 2007, Taylor & Francis Group, 43-52. 2

    Google Scholar 

  59. Bagniewska A, Black D, Molvig K, Fox C, Ireland C, Smith J, Hulley S: Data quality in a distributed data processing system: the SHEP pilot study. Control Clin Trials. 1986, 7 (1): 27-37. 10.1016/0197-2456(86)90005-X.

    CAS  Article  PubMed  Google Scholar 

  60. Reynolds-Haertle RA, McBride R: Single vs. double data entry in CAST. Control Clin Trials. 1992, 13 (6): 487-494. 10.1016/0197-2456(92)90205-E.

    CAS  Article  PubMed  Google Scholar 

  61. Reiner-Deitemyer V, Teuschl Y, Matz K, Reiter M, Eckhardt R, Seyfang L, Tatschl C, Brainin M: Helicopter transport of stroke patients and its influence on thrombolysis rates: data from the Austrian Stroke Unit Registry. Stroke; a journal of cerebral circulation. 2011, 42 (5): 1295-1300. 10.1161/STROKEAHA.110.604710.

    Article  PubMed  Google Scholar 

  62. Sim J, Wright CC: The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005, 85 (3): 257-268.

    PubMed  Google Scholar 

  63. Walter SD, Eliasziw M, Donner A: Sample size and optimal designs for reliability studies. Stat Med. 1998, 17 (1): 101-110. 10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>3.0.CO;2-E.

    CAS  Article  PubMed  Google Scholar 

Pre-publication history

Download references


This research is co-funded by the National Institute of Neurological Disorders and Stroke (NINDS) through a SPOTRIAS program grant [5P50NS044227] awarded to the University of Texas Health Science Center at Houston (UTHealth). We also acknowledge the support provided by the Biostatistics/ Epidemiology/ Research Design (BERD) component of the Center for Clinical and Translational Sciences (CCTS) for this project. CCTS is mainly funded by the National Center for Research Resources (NCRR) through an NIH Centers for Translational Science Award (CTSA) grant (UL1 RR024148), with its continuation through National Center for Advancing Translational Sciences (NCATS) grant (UL1TR000321), awarded to the University of Texas Health Science Center at Houston in 2006. The content is solely the responsibility of the authors and does not represent the official views of the NINDS, NCATS and NCRR or NIH. Finally, we acknowledge contributions by other colleagues who have contributed to the earlier phase of the development of UTHSR including Dr. Karen Albright and Ms. Miriam Morales as well as other members of Memorial Hermann Hospital -Texas Medical Center.

Author information



Corresponding author

Correspondence to Mohammad H Rahbar.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MHR, NRG, MRS, and JCG have made substantial contributions to conception and study design; NRG, JCG, SIS, MRS, FSV, JDT, AT, AAD, RMM, and EEC contributed to acquisition of data; MHR, NRG, MRS, RP, AT, EEC, FSV, JDT, AAD, RMM have made contributions to data quality assurance procedures; MHR, HP, MAH, RP, and MRS conducted data analysis; MHR, NRG, JCG, and MAH have contributed to interpretation of data; MHR, MAH, and AT significantly contributed to drafting of the manuscript and NRG, JCG, SIS, MRS provided critical revision of the manuscript; All authors have read and approved the final version submitted for publication.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Rahbar, M.H., Gonzales, N.R., Ardjomand-Hessabi, M. et al. The University of Texas Houston Stroke Registry (UTHSR): implementation of enhanced data quality assurance procedures improves data quality. BMC Neurol 13, 61 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Stroke
  • Registry
  • Quality assurance procedures
  • Inter-rater reliability
  • Validity
  • Quality control
  • Error rate
  • Gold standard