The University of Texas Houston Stroke Registry (UTHSR): implementation of enhanced data quality assurance procedures improves data quality

Background Limited information has been published regarding standard quality assurance (QA) procedures for stroke registries. We share our experience regarding the establishment of enhanced QA procedures for the University of Texas Houston Stroke Registry (UTHSR) and evaluate whether these QA procedures have improved data quality in UTHSR. Methods All 5093 patient records that were abstracted and entered in UTHSR, between January 1, 2008 and December 31, 2011, were considered in this study. We conducted reliability and validity studies. For reliability and validity of data captured by abstractors, a random subset of 30 records was used for re-abstraction of select key variables by two abstractors. These 30 records were re-abstracted by a team of experts that included a vascular neurologist clinician as the “gold standard”. We assessed inter-rater reliability (IRR) between the two abstractors as well as validity of each abstractor with the “gold standard”. Depending on the scale of variables, IRR was assessed with Kappa or intra-class correlations (ICC) using a 2-way, random effects ANOVA. For assessment of validity of data in UTHSR we re-abstracted another set of 85 patient records for which all discrepant entries were adjudicated by a vascular neurology fellow clinician and added to the set of our “gold standard”. We assessed level of agreement between the registry data and the “gold standard” as well as sensitivity and specificity. We used logistic regression to compare error rates for different years to assess whether a significant improvement in data quality has been achieved during 2008–2011. Results The error rate dropped significantly, from 4.8% in 2008 to 2.2% in 2011 (P < 0.001). The two abstractors had an excellent IRR (Kappa or ICC ≥ 0.75) on almost all key variables checked. Agreement between data in UTHSR and the “gold standard” was excellent for almost all categorical and continuous variables. Conclusions Establishment of a rigorous data quality assurance for our UTHSR has helped to improve the validity of data. We observed an excellent IRR between the two abstractors. We recommend training of chart abstractors and systematic assessment of IRR between abstractors and validity of the abstracted data in stroke registries.


Backgrounds
Medical registries have been used for many years as sources of clinical data that can support evidence-based medicine and decision-making. Registries are classified according to the disease or disorder, and are defined by patients having the same diagnosis. Stroke is the leading cause of serious, long-term disability and the fourth leading cause of death in the United States [1]. Stroke is the second leading cause of death globally, and all nations, regardless of their health care system, face similar medical and economic burdens [2].
The Harvard Registry is the oldest stroke registry in the US [3,4]. During the last decade, there has been an increased interest in developing other stroke registries to monitor and collect data for improving the quality of care for stroke patients through the assessment of adherence to established performance measures for acute stroke care [5], to study the epidemiology and etiology of specific types of strokes, and to decrease the proportion of premature deaths and disabilities caused by acute stroke. A brief summary of several stroke registries is provided in Table 1.
Despite availability of many stroke registries, limited information has been published regarding standard procedures to ensure reliability and validity of data in stroke registries [5,16,18,[29][30][31]. For example, Reeves et al. (2008) reported data regarding reliability of abstracted data collected during 2001-2004 from Michigan PCNASR [29]. Recently, Xian et al. (2012) reported data regarding validity of data in the GWTG registry indicating a high level of accuracy for select variables including age, diagnosis, arrival date/time, tPA therapy when compared with audited data from medical records [5]. However, information regarding development and implementation of standard procedures to ensure data quality for stroke registries is limited, particularly at various stages of data management including chart abstraction and quality control of data.
Since 2001, the Specialized Program Of Translational Research in Acute Stroke (SPOTRIAS), funded by the National Institute of Neurological Disorders and Stroke (NINDS), supported the development of prototype registries, which were led by academic principal investigators and medical institutions, to collect data on the quality of care provided to stroke patients from the initial emergency response to hospital discharge. Currently, there are eight funded SPOTRIAS sites that collaborate on this effort. The Stroke Program at the University of Texas Health Science Center at Houston (UTHealth) is part of the SPOTRIAS network [32].
The UTHealth stroke program has played a significant role in the treatment and prevention of stroke and is committed to high quality research, clinical practice, education, and optimal implementation of thrombolysis therapy following acute stroke in Houston, with thrombolytic treatment rates exceeding 30%. As a major stroke center, the Memorial Hermann Hospital-Texas Medical Center, Houston (MHH-TMC) has served as a leader in stroke research for some of the most important acute stroke studies in the world, including the NINDS tissue plasminogen activator (tPA) trial, which led to the approval of the clot-dissolving drug tPA in the treatment for acute stroke [33]. These achievements contributed to our success in being selected as a SPOTRIAS site.
The UTHealth SPOTRIAS data core is responsible for data abstraction, data entry, quality control, statistical analysis, and management of data for the UTHealth Stroke Registry (UTHSR) and other clinical trials. The data core has invested a significant amount of effort to improve the quality of the data in UTHSR. During the past 10 years, we have gained significant insight regarding the design, development, maintenance, quality control, and utilization of our stroke registry. The purpose of this article is to describe the development and assessment of enhanced quality assurance (QA) procedures in UTHSR and compare data quality in UTHSR before and after implementation of our enhanced QA procedures.

History of Houston stroke program and development of UTHSR
Houston is the fourth most populated city in the US and is home to the largest medical center in the world [34]. MHH-TMC was the first hospital in the Texas Medical Center. The Neurology department at MHH-TMC was one of the first in the US to use tPA for acute stroke and it was also the first hospital named as a "primary stroke center" by the state of Texas [35]. Historically, the stroke program at UTHealth formed in 1979 with the recruitment of Dr. Grotta. In 1986, the stroke team began to keep a written "log" of all patients admitted to the stroke service. Once the UTHealth stroke team started testing tPA in 1989, they began to keep slightly more detailed records. Once tPA was approved in 1996 [36], the data collected pertaining to tPA use became even more detailed in the stroke database leading to some of the UTHealth stroke team's earlier publications. In 2002, the SPOTRIAS P50 mechanism funded a data core, which prompted the UTHealth stroke team to convert to an electronic database. In 2003, the UTHealth stroke team began to design and record data in UTHSR for all patients admitted to the UTHealth stroke service at MHH-TMC. In 2008, with the second round of SPOTRIAS funding, the data core made a specific commitment to develop and implement enhanced strategies for improving data quality in UTHSR. The data core represents a collaborative team of investigators supported by data managers, statisticians, programmers (system and web developers), two chart abstractors, and a quality control abstractor. An organizational chart that demonstrates the role and working relationships among various members of the Data Core is provided in Figure 1.

Design of UTHSR, data elements, and data sources
Attributes considered in designing a registry must ensure that data are valid, reliable, responsive, interpretable, and translatable [37]. UTHSR is a prospective registry initially designed to capture essential information on all patients admitted to the UTHealth in-patient stroke service at MHH-TMC, with the primary aims of tracking the number of patients treated with intravenous (IV) tPA, their essential demographics, and complication rates, and to support research by members of the stroke team. With the funding of SPOTRIAS, the Principal Investigators (PIs) of the original SPOTRIAS sites decided to obtain common data elements that described essential demographics of all patients treated with IV tPA or enrolled in any clinical trials. UTHSR was consequently expanded to incorporate other elements including those variables that were needed for clinical trials that were conducted by the UT stroke team and variables that were needed for reporting to The  [6][7][8] with an overall goal of tracking and improving the quality of hospital-based acute stroke care currently available to reduce mortality attributable to stroke, prevent stroke-related disabilities, and prevent recurrent strokes [6].
New England Medical Center Posterior Circulation Registry (NEMC-PCR) From 1988-1996 the NEMC-PCR thoroughly evaluated all posterior circulation ischemia patients using brain imaging, vascular studies, and appropriate cardiac and hematological investigations to study the epidemiology and etiology of specific types of strokes [9][10][11].
Get With The Guidelines (GWTG) Since 2003, the American Heart Association/American Stroke Association has developed a national stroke registry and quality improvement program, known as Get With The Guidelines (GWTG) [5,[12][13][14][15].
Swedish Stroke Register (Riks-Stroke) Riks-Stroke was established in 1994 in which patients are followed during the first year after stroke [16].
Registry of the Canadian Stroke Network (RCSN) RCSN was established in 2001 to allow for the assessment and monitoring of stroke care delivery and outcomes [17].
Australian Stroke Clinical Registry (AuSCR) AuSCR was established in 2009 to provide national data on the process of care and outcomes for patients who are admitted to hospitals with acute stroke or transient ischemic attack [18].
South London Stroke Register (SLSR) SLSR is a population based stroke registry that includes stroke patients of all age groups between 1995-1999 [19,20].
Joint Commission (TJC) [38,39], as well as select variables to meet minimum requirements for reporting to Centers for Medicare and Medicaid Services (CMS) as they pertain to the vascular neurology aspects of required reporting [40]. All patients who have been admitted to the stroke unit at MHH-TMC are classified by stroke diagnosis subtypes, including infarct (non-hemorrhagic stroke), intracerebral hemorrhage (ICH), intraventricular hemorrhage (IVH), transient ischemic attack (TIA), subarachnoid hemorrhage (SAH), epidural hematomas (EDH), subdural hematomas (SDH), non-acute infarct, and others that could not be classified as any of the above ("Not stroke"), and are entered in UTHSR. Other data elements include admission information (e.g., arrival date and time), medical history, National Institutes of Health Stroke Scale (NIHSS), modified Rankin Scale (mRS) score, Glasgow Coma Scale (GCS), laboratory results, CT scan, CT scan angiogram, MRI, MR angiogram images, thrombolysis therapy (e.g., tPA time and door to needle time), intra-arterial therapy (IAT), complications, and discharge information including: death, mRS on discharge (or day 7, whichever comes first), discharge disposition (home, skilled nursing facility, etc.), and particularly patient education and mRS at 90 days. Currently, the UTHealth stroke team captures up to 235 variables for each patient depending on stroke subtypes. Since some of these variables have multiple responses (e.g., medical history), the number of fields in UTHSR is 372. As UTHSR is modified, corresponding changes to the codebook are made; the codebook is also updated periodically as changes to the abstraction rules are identified or where clarity can be improved. The data core has developed policies for documentation. Members of the data core are responsible for adhering to all policies and procedures established.

Data sources and data entry into UTHSR
The most important source for data abstraction is the MHH-TMC electronic medical records (EMR) that includes all related personal and medical information. All registry data are manually abstracted from electronic chart review and from rounding with the stroke team. Our abstractors review the entire chart and capture the required information for each patient from admission to discharge from the stroke service. Ambiguous and questionable data, particularly from complicated cases, are discussed in weekly meetings whose regular attendees include vascular neurology UTHealth faculty members involved in the stroke program. In addition, the data core holds weekly meetings to give the abstractors an opportunity to discuss issues related to UTHSR and the abstraction of data. At first, abstraction was carried out by stroke research nurses and fellows, but after SPTORIAS funding, we hired dedicated full-time abstractors who are key members of the data core. Requirements are familiarity with medical terminology and willingness to stay for at least 1 year; our abstractors are usually health care professionals or other medical personnel. Abstractor training will be described later, but is a critical part of the quality assurance component of UTHSR. UTHSR codebook serves as the protocol for data abstraction. Since registries must confidentially maintain patients' health information, security is an important issue [37]. We maintain confidentiality during all phases of data abstraction, monitoring, analysis, dissemination, and publication. The UTHealth servers reside in a secure location with limited access. All data are automatically backed up daily with redundant storage in a protected off-site location in accordance with UTHealth policies.
Data quality assurance procedures QA steps include establishment and implementation of procedures that ensure the quality of data from the point of abstraction to analysis [41]. Strategies and procedures that help to improve data quality in stroke registries include training of abstractors [42], assessment of reliability (e.g., inter-rater reliability (IRR) between abstractors) [5,29,30], and validity of data [43]. After data cleaning [44] and resolution of potential discrepancies, an assessment of validity of data (e.g., accuracy level or error rate) based on a sample of re-abstracted or audited records in the registry is done. For UTHSR, our QA procedures mainly focus on the training of data abstractors, assessment of IRR between abstractors, development and implementation of formal data cleaning procedures, and evaluation of validity of data by calculating error rate or accuracy rate and other measures of validity (e.g., sensitivity, specificity), as will be described here.

Data abstractors' training
For UTHSR, each abstractor is trained by using a codebook (or data dictionary) because a good understanding of the variables in the registry and where to locate their appropriate values by abstractors is essential to ensure data quality. The training of data abstractors also includes re-abstraction of data for a set of patients whose records are already available in UTHSR. Once an abstractor has demonstrated a reasonable level of confidence in abstraction, we provide a new set of patients for abstraction. During this phase, the abstractors attend rounds on the stroke clinical service and enter data in the registry under the supervision of a more experienced abstractor. Abstractors are also trained in contacting the discharged patients to obtain the 90-day mRS data.
Reliability study Reeves et al. (2008) have demonstrated assessment of inter-rater reliability to establish the reliability of abstraction between two abstractors (hospital abstractors and the audit abstractor) [29]. In this study, for evaluating IRR between two abstractors, we randomly selected 30 patient records between 2008-2011 and asked two abstractors to re-abstract select variables including initial presentation, arrival time, age, INR, stroke type, onset time, tPA therapy, tPA time, symptomatic hemorrhage, mRS on discharge, and disposition at discharge. For continuous variables, IRR is assessed through the intra-class correlation coefficient (ICC) [29]. For categorical variables, IRR is assessed using the Kappa statistic [29,42]. Since the IRR measures alone are not sufficient to assess the consistency of the abstraction, for binary variables, we assessed the Bias Index (BI) as defined by Reeves et al. (2008) [29]. For continuous variables, we calculated mean differences to assess the BI. We also provide 95% confidence intervals for the IRR measures.

Data cleaning
Data cleaning refers to a set of processes that involve identification and resolution of all discrepant data, including missing values, incorrect or out-of-range values, or implausible responses that are logically inconsistent with other responses in the database [37,44]. Establishing standard data cleaning processes helps to detect and correct errors, resulting in higher data quality [45]. Since 2003, UTHSR has undergone a series of changes including the development and implementation of additional data quality checks in our data collection system program for prevention of data entry errors. In addition, we have developed and implemented data cleaning, including about 350 univariable and multivariable rules that detect potential data inconsistencies (invalid missing, impossible and implausible). Invalid missing fields are defined as those where the information should have been collected but was not collected or not entered in the registry. Impossible data are defined as data entries that are invalid, do not comply with the codebook, or are out of range. Implausible data are defined as those that are logically inconsistent with data in other fields or seem to be unusual based on statistical rules (e.g., Chebyshev's rule). Chebyshev's rule states that for random variables with finite variance, no more than 4% of the data can exceed more than five standard deviations to the right or left of the mean of the distribution. This rule helps to identify potential outliers regardless of the shape of the distribution [46,47]. For some variables, we check the measurements above the 90 th and below the 10 th percentiles. Based on these rules, the data management analyst prepares a list of invalid missing, impossible, and implausible data for the abstractors so they can double check their entries and resolve potential errors. Once these issues are addressed, the data manager will rerun the same program to confirm that all issues are resolved.

Validity study
Validity of captured data by abstractors in a registry against a set of audited medical records or a "gold standard" provides an assessment of quality of data abstracted [43]. Xian et al. (2012) reported an overall composite accuracy rate of 96.1% for all data elements in the GWTG-Stroke registry [5]. Riks-Stroke registry also reported at least 95% consistency between the medical chart vs. what is recorded in Riks-Stroke registry for stroke subtype and clinical data, but much lower (approximately 85%) for data related to the health-care organization at the participating hospitals [16].
We have established a rigorous process for assessing validity of data in UTHSR. For this purpose, we randomly selected 115 patient records from UTHSR that consisted of the 30 patient records (with selected variables that were used for the reliability study) and another set of 85 patient records for which our quality control (QC) abstractor re-abstracted data for all variables by reviewing medical record charts and hospital records. These data were adjudicated by a team of experts in the data core that included a vascular neurologist (faculty or fellow), herein called the "gold standard". These 115 patient records resulted in a total of 8877 data entries or data points in UTHSR during 2008-2011. We calculated the proportion of discrepant data elements relative to the number of entries re-abstracted (i.e., 8877). This error rate could be a combination of data entry in the registry and error in abstraction. For binary variables, we also assessed sensitivity and specificity as shown in   [31,43]. Since sensitivity and specificity are not applicable for continuous variables, we calculated differences between the two sets of data for these variables. Agreement for these variables is defined as when there is no difference between the two set of values. We also calculated mean difference for each variable between the two sets of data. In addition, since we had the "gold standard" for the 30 patient records that were included in the reliability study, we assessed validity of data abstracted by each of the two abstractors involved in the reliability study.
Statistical analysis We conducted descriptive analyses to provide summary statistics for certain key variables in UTHSR. For continuous variables with normal distributions, we used means (Standard Deviation (SD)) but when there was a significant departure from normality (e.g., skewed distributions) we used medians (Interquartile Range (IQR)). We calculated performance measures of care for stroke patients (STK-1 to STK-10) [48,49] in UTHSR based on eligibility of patient related to diagnoses of ischemic stroke and hemorrhagic stroke. We used the Kappa statistic to estimate IRR for nominal variables along with 95% lower confidence limits (LCL) for IRR [50]. For continuous and ordinal variables, we used ICC with a 2-way, random effects ANOVA model [29]. In addition, we calculated the bias index (BI) [29] between data re-abstracted by the two abstractors. For binary variables, the bias index ranges between −1 and +1 with zero indicating no bias [29]. For continuous variables we used mean difference between the two set of values compared as the bias index. When the two abstractors are compared, a positive or negative BI indicates bias between the abstractors [51]. However, when each abstractor is compared with the "gold standard", a positive or a negative BI shows that the distribution of values produced by the abstractor is shifted to the right or left of the "gold standard", respectively.
For the validity study, as mentioned earlier we assessed level of agreement, sensitivity, and specificity along with 95% confidence intervals (CIs). Finally, we used logistic regression to compare error rates between 2008 and subsequent years. We also calculated 95% CIs for error rates in different years. All comparisons were made at 5% level of significance. All analyses were conducted using SPSS, Version 20 [52] or SAS software, Version 9.3 [53].

Results
Descriptive analysis of our total registry data indicated that the mean age of patients was 62.7 years (SD = 15.8); 50.2% were male; 31.7% were African-American; 48.3% were Caucasian; and 14.1% were Hispanic. The distribution of stroke subtypes included 51.4% infarcts and 24.6% hemorrhage stroke (ICH, SAH, EDH, SDH, and IVH).
About 61% of the patients arrived at the hospital by ambulance and nearly 22% were transported by air. Nearly 39% were transferred patients from other hospitals. Approximately 37% of patients who had infarct reported an onset time of less than 2 hours prior to presentation. For patients with infarct, the mean time from arrival to thrombolysis treatment time was 68 minutes (including patients treated with off-label use of thrombolytic). The overall in-hospital mortality among the stroke patients was 8.2% (ischemic stroke 5.8%, ICH 21.4%). The median discharge mRS was 4.0 for all discharged patients. Additional information regarding demographics and other characteristics of the patients at hospital arrival and discharge are reported in Table 2.
Overall, during 2008-2011, 32.2% of patients presenting to our emergency department (ED) with acute cerebral infarct received tPA within 4.5 hours of symptom onset and 24.1% received tPA within 3 hours of symptom onset. The data indicate a significant upward trend in the proportion of patients who received tPA between 2008-2011 (P-value based on Chi-square test for linear trend < 0.02). Other data related to the tPA times are presented in Table 3.
The discharge rate on antithrombotic therapy (STK-2) was 94.3% and 87.1% of eligible patients received thrombolytic therapy (STK-4) that met the standards set for performance measures. Similarly, for stroke education (STK-8) 88.8% of the patients met the standards set for performance measure. The Chi-square test for linear trend did not show any significant upward or downward trends in the performance measures over the 4 years. All data related to the performance measures are reported in Table 4.
We observed an excellent IRR (Kappa ≥ 0.75) between the two abstractors for most categorical variables including diagnosis of stroke as infarct and ICH, tPA therapy, disposition, and initial presentation. IRR was moderate (0.40 ≤ Kappa < 0.75) for diagnosis of stroke as TIA. For most continuous variables, we also observed an excellent IRR (ICC ≥ 0.75), including age, onset time, arrival time, INR and mRS on discharge, except for IRR of tPA time that was poor (−0.48). Similarly, when we assessed the validity of categorical variables abstracted by each For continuous variables, we also reported mean differences. All data regarding the validity of the abstraction by the two abstractors against the "gold standard" are summarized in Table 5. For all categorical variables in UTHSR we observed an accuracy rate (% agreement with the "gold standard") of at least 96.5%. While sensitivity and specificity were 100% for most binary variables, wake-up stroke had the lowest sensitivity (91.7%) and IA therapy had the lowest specificity (92.9%). For continuous variables, we also observed a high level of agreement (>94.2%) for most variables, except for some date/time variables including CT time and arrival time that had accuracy rates of 80.8% and 85.2%, respectively. We also found very high correlations (r>0.96) between UTHSR records and the "gold standard" for almost all continuous variables, except for Glucose that had r=0.86. Mean differences for these variables between UTHSR records and the "gold standard" are reported in Table 6.
Finally, our analysis indicated error rates of 4.

Discussion
In this article, we describe the development and assessment of enhanced quality assurance procedures in UTHSR and compare data quality in UTHSR before and after implementation of our enhanced QA procedures. Since we implemented our enhanced QA procedures for UTHSR at the end of 2008 when we received the second round of funding for SPOTRIAS, we believe 2008 serves as an important reference point and any potential improvements in our registry data would be reflected from 2009 and thereafter. Our finding of a significant reduction in UTHSR error rate from 4.8% in 2008 to 2.2% in 2011 (P < 0.0001) indicates that our effort in enhancing and formalizing the QA procedures has been successful in reducing the error rate. Though we observed an increase in error rate from 2.3% in 2009 to 4.6% in 2010 (P < 0.0001), we believe this is partly attributed to recruitment of two new chart abstractors in early 2010.

Training of abstractors and assessment of reliability and validity of abstracted data
Reeves et al. (2008) has highlighted the importance of training in maintaining data quality for PCNASR registries [29]. Our quality assurance process rests on an understanding that each new abstractor should be provided sufficient training for codebook definitions and specialty issues of stroke (e.g., CT time). Historically, our abstractors used to be healthcare professionals including nurse practitioners; only recently, we have hired non-clinicians. This makes a difference in the level of training that needs to be done. To ensure reliability and validity of data, we believe training of abstractors should be mandatory [29]. The complexity and differences in the interpretation of the key variables in medical records for stroke patients demands continuous training and evaluation of the work conducted by the abstractors to ensure data quality in a stroke registry. For evaluation of data abstracted, we have conducted a reliability study to assess IRR between the two abstractors and a validity study to assess the level of agreement, sensitivity, and specificity between the data abstracted by each abstractor and the "gold standard". For these evaluations, we have only used select variables from a set of 30 patient records for which the "gold standard" data were available.
We found excellent IRR (ICC ≥ 0.75) between the two abstractors for most continuous variables including age, onset time, arrival time, and mRS on discharge. These findings are consistent with those reported by Reeves et al. (2008) for PCNASR that indicated excellent reliability for age, stroke onset time, ED arrival time, and mRS [29]. Our findings are also consistent with those reported by Xian et al. (2012) for GWTG-Stroke registry indicating an excellent IRR for age [5]. We also found poor IRR (ICC = −0.48) for tPA time with a large mean difference between the two abstractors caused by discrepancies in dates (i.e., a wrong day, month, or year) for 3 patients out of 7 dates who received tPA therapy.
For categorical variables, we observed excellent IRR (Kappa ≥ 0.75) between the two abstractors for most variables including diagnosis of stroke as infarct and ICH, tPA therapy, disposition, and initial presentation, except for diagnosis of stroke as TIA that had a moderate IRR (Kappa = 0.65). These findings are consistent with those reported by Xian et al. (2012) for GWTG-Stroke registry that indicated excellent IRR for final clinical diagnosis, tPA therapy and discharge destination [5]. However, in PCNASR, Reeves et al. (2008) found poor reliability for stroke team consultation, time of initial brain imaging, discharge destination, and stroke/TIA diagnosed in emergency department [29]. Nonetheless, excellent IRR between the two abstractors alone does not indicate the validity of data abstracted. We believe assessment of both IRR and validity (% agreement, sensitivity, and specificity) provide a more informed evaluation of abstractors' performance.
Our validity study based on 30 patient records revealed that for all selected categorical variables, the agreement between the data abstracted by each of the two abstractors against the "gold standard" was above 93.3%. Since the level of agreement alone does not indicate the validity of the data abstracted, we also computed bias index between each of the two abstractors and the "gold standard". For example, for initial presentation at MHH-TMC we observed an agreement of 96.7% for abstractor #2 but the bias index for this variable against the "gold standard" was −0.03, indicating the tendency for abstractor #2 to record initial presentation as "other hospitals" while the "gold standard" indicated initial presentation at MHH-TMC. For continuous variables, we found a correlation of at least 0.83 for all selected variables, except for tPA time (r = −0.49) and INR (r = 0.39) for abstractor #2. Because a high correlation alone does not indicate the validity of the data abstracted, we also computed mean differences for these variables between each of the two abstractors and the "gold standard". For example, for onset time we observed a perfect correlation (r =1) for both abstractors but the mean differences for this variable against the "gold standard" was 9.4 minutes for abstractor #1 and 14.1 minutes for abstractor #2. Interpretation of mean differences is similar to that of the bias index as described earlier. The 95% CIs for mean differences indicate no significant differences between the abstracted data by the two abstractors and the "gold standard", except for tPA time for abstractor #2 that 95% CI is not reported. This is because abstractor #2 reported 3 wrong dates (i.e., a wrong day, month or year) out of 7 dates for patients who received tPA therapy, which resulted in a misleading mean difference. These findings indicate areas where additional training may be needed for the abstractors. Xian et al. (2012) used the audit abstractor for assessment of validity and accuracy of abstractors' work against the audit data in GWTG-Stroke registry. They also reported a high accuracy rate for the majority of variables STK-2 = Discharged on Antithrombotic Therapy: Ischemic stroke patients prescribed antithrombotic therapy at hospital discharge. STK-3 = Anticoagulation Therapy for Atrial Fibrillation/Flutter: Ischemic stroke patients with atrial fibrillation/flutter who are prescribed anticoagulation therapy at hospital discharge. STK-4 = Thrombolytic Therapy: Acute ischemic stroke patients who arrive at this hospital within 2 hours of time last known well and for whom IV t-PA was initiated at this hospital within 3 hours of time last known well. STK-5 = Antithrombotic Therapy By End of Hospital Day 2: Ischemic stroke patients administered antithrombotic therapy by the end of hospital day 2. STK-6 = Discharged on Statin Medication: Ischemic stroke patients with LDL greater than or equal to 100 mg/dL, or LDL not measured, or who were on a lipid-lowering medication prior to hospital arrival are prescribed statin medication at hospital discharge. STK-7 = Dysphagia Screening: Patients with ischemic or hemorrhagic stroke who undergo evidence-based bedside testing protocol approved by the hospital before being given any food fluids, or medication by mouth. STK-8 = Stroke Education: Ischemic or hemorrhagic stroke patients or their caregivers who were given educational materials during the hospital stay addressing all of the following: activation of emergency medical system, need for follow-up after discharge, medications prescribed at discharge, risk factors for stroke, and warning signs and symptoms of stroke. STK-9 = Smoking Cessation/Advice/Counseling: Patients with ischemic or hemorrhagic stroke with a history of smoking cigarettes, who are, or whose caregivers are, given smoking cessation advice or counseling during hospital stay. For the purposes of this measure, a smoker is defined as someone who has smoked cigarettes anytime during the year prior to hospital arrival. STK-10 = Assessed for Rehabilitation: Ischemic or hemorrhagic stroke patients who were assessed for rehabilitation services. ** Calculations for measures STK-7 and STK-9 are based on definitions provided in Disease-Specific Care Certification Program STROKE Performance Measurement Implementation Guide (2008) [49].  a 95% LCL = 95% Lower Confidence Limit; b 95% CI = 95% Confidence Interval; c MD = Mean difference for reliability study is calculated based on abstractor #1 minus abstractor #2; for validity study the difference is based on abstractor #1 or abstractor #2 minus the "gold standard"; d ICC = Intra-class correlation; NR = Not reported because calculations were misleading due to an error in re-abstraction by abstractor #2 who reported 3 wrong dates (i.e., a wrong day, month or year) out of 7 dates for patients who received tPA therapy, which resulted in a misleading mean difference.
such as age, diagnosis and evaluation, arrival date/time, tPA therapy [5]. We believe an objective evaluation of the abstractors' performance and providing additional training will result in improved data quality.

Data cleaning
Data cleaning is an essential component of QA processes which includes identification and resolution of all discrepant data in the database [37,44]. However, the extent of data cleaning varies for different registries that reported their QA procedure [2,5,13,18,24]. As mentioned earlier, for UTHSR we developed a comprehensive program that included about 350 univariable and multivariable validation rules that were used to identify discrepant data. Whereas for GWTG-Stroke registry, the data abstraction tool included predefined logic features and user alerts to identify potentially invalid format or data entry. Required fields were structured so that valid data must be entered before the data are saved. Range checks were used for inconsistent or out-of-range data and prompted the user to correct or review data entries that were considered out of range [13]. Similarly, Hsieh et al. (2010) reported using logic checks and variable limits to prevent inaccurate data entries in TSR [2]. Since the data cleaning process does not capture all discrepant data, some recommend a random audit of a fraction of data in a stroke registry. For example, Hsieh et al. (2010) used random auditing of 5% of all cases entered in TSR [2].

Assessment of validity for data in UTHSR
It is important to note that accuracy of data is dependent on the type (e.g., categorical or continuous) and complexity of capturing accurate values for variables in a registry. For example, some registries have reported lower levels of accuracy for variables that involve date/time (e.g., time to an event) [56]. Therefore, it is important to assess validity of data for key variables in a stroke registry. Our assessment of validity of data against the "gold standard" for categorical variables, based on 115 patient records for key variables and 85 patient records for the remaining variables, indicates an excellent agreement (> 96%) between the UTHSR data against the "gold standard". Our findings are consistent with those reported by Xian et al. (2012), indicating high levels of agreement between audited data and medical records for categorical variables, except for DVT prophylaxis that had 79% agreement [5]. In addition, levels of sensitivity and specificity for categorical variables in UTHSR were excellent (above 92.9%). For continuous variables, we found a correlation of at least 0.86 for all selected variables. However, agreement for CT time and arrival time were 80.8% and 85.2%, respectively. For reasons described earlier, we also computed mean difference for these variables between the UTHSR and the "gold standard". For example, for arrival time we observed a perfect correlation (r =1) for UTHSR but the mean difference for this variable against the "gold standard" was 26.7 [95% CI (−36.3, 89.6)] minutes. Therefore, specific attention needs to be made regarding CT time or arrival time because wrong dates (i.e., a wrong day, month or year) or sometimes error due to using different sources for capturing these variables could result in mean difference between the abstracted data by the abstractors and the "gold standard". Others have also reported difficulty in capturing time related variables in stroke registries. For example, George et al. (2009) reported that for the majority (57.8%) of patients in PCNASR, the time from onset of symptoms to hospital arrival was not recorded or was not known [6]. Xian, et al. (2012) also reported on validity of data for continuous variables in GWTG-Stroke registry indicating accuracy rates of at least 85% for majority of variables, notably an agreement of more than 93.6 for arrival time and brain imaging time. However, their higher levels of agreement for the arrival and brain imaging times could be due to their definition of agreement for continuous variables, as they consider such data accurate if the values in the registry and in the audited records are within 15 minutes of each other [5], whereas we considered a perfect agreement for these variables in our study.
During 2008-2011, for UTHSR we observed a significant improvement in the error rate (dropped from 4.8% in 2008 to 2.2% in 2011). We attribute this improvement to our effort in enhancing and formalizing our quality assurance procedures that include enhancements in software development, improving instructions in the codebook, training of abstractors, evaluating reliability between abstractors, data cleaning, and assessment of validity of data in UTHSR. A few stroke registries have published accuracy rates or error rates as indicators of the overall validity of data in their registries [5,16]. For example, Asplund et al. (2011) reported an accuracy rate of at least 95% between the medical charts and data recorded in Riks-Stroke registry for stroke subtype and clinical data, but somewhat lower (approximately 85%) for data related to the healthcare organization at the participating hospitals [16]. Similarly, Xian et al. (2012) also reported 96.1% accuracy rate for data quality of GWTG-Stroke registry [5]. These published error rates are similar to that of our 2008 data from UTHSR, before our enhanced data quality assurance procedures were implemented. However, to our knowledge, our overall error rate in 2011 of 2.2% in UTHSR appears to be the lowest among all published error rates from stroke registries so far.
There are no established acceptable error rates for key variables in stroke registries, but for other epidemiologic and clinical trials the CDC recommends an error rate of 0.3% (3 per 1,000 entries) [57], which is usually achieved by conducting double data entry. Others have reported a more liberal estimate for the error rate, 1% to 5% for general databases used by many companies. However, 0.1% to 0.5% error rate is acceptable for clinical trials [58,59]. Studies that used single data entry procedures reported higher error rates compared with those who used double data entry [60]. Since the source of data error in stroke registries could be due to an error in abstraction or data entry, in order to achieve a significant reduction in the error rate, one should perform double data abstraction as well as double data entry. The decision whether to use a double data entry or single data entry process largely depends upon the availability of resources. For these reasons, some registries employ a single data abstraction and data entry procedure along with some sort of quality assurance procedure by re-abstracting and entering a small fraction (5%) for all the records to assess data quality in stroke registries [2].
Since the main focus of this paper is to assess the effect of enhanced QA procedures on data quality in UTHSR, we limited our discussion of the data obtained on our patients in UTHSR. While UTHSR data are mostly consistent with other stroke registries for hospitalized stroke patients, we observed differences in some characteristics. For example, the mean age of patients in UTHSR is 62.7 years (median = 63), significantly younger than that of the patients in ASTRAL (median = 72.5) [21] and PCNASR in Michigan (mean = 70.9 years) [29]. Four other participating states (Georgia, Illinois, Massachusetts, and North Carolina) that contributed data to PCNASR during 2005-2007, reported a median age of 72 years, but Georgia had younger patients (median = 67 years) [6]. In our registry, 51.4% had infarcts and 24.6% had hemorrhagic stroke. PCNASR reported that 14% of their patients were hemorrhagic and 56-58% were ischemic stroke [6,29]. A very large number of patients with ICH who were transferred to MHH-TMC rather than presenting directly to the ED, could explain a relatively higher percent of higher ICH patients in UTHSR. Compared with most other registries, UTHSR has a higher rate of air transport (22.5%). For example, Austrian Stroke Unit Registry reported that only 4.1% of patients were transported by helicopter from 32 stroke units between 2003 and 2009 [61]. This difference could be due to the fact that for over 30 years the UT Stroke team and MHH-TMC have been providing medivac helicopter service (Life Flight®) to bring stroke patients quickly to our stroke unit. In addition, depending upon distance and stroke severity, many other hospitals also choose to fly transferred patients to MHH-TMC. Furthermore, for more than five years, our stroke team has utilized telemedicine to "reach out" to smaller hospitals that do not have stroke expertise.
In UTHSR, the distribution of NIHSS baseline (NIHSS on admission) score comprised 40% less than or equal to 4; 31.5% within 5-14; and 28.5% more than 14. Our result for less severe stroke (NIHSS baseline ≤ 4) is consistent with the Switzerland registry (ASTRAL) (40.8%) [21]. For patients with ICH, the median NIHSS score was twice that of NIHSS score of patients with ischemic stroke. In UTHSR, the percentage of eligible patients who received tPA treatment within 3 hours had an upward trend, 21.3% in 2008 which increased to 27.3% in 2011. The Joint Commission Disease-Specific Care (DSC) suggested 10 items as performance measures for stroke registries [48,49]. We did not find a significant linear trend in the performance measures during 2008-2011.
Finally, we recognize that implementation of the proposed enhanced QA procedures may be cost prohibitive due to unavailability of the required resources for some institutions. We have been fortunate to have funding from SPOTRIAS that helped us to develop and implement these enhanced QA procedures in UTHSR. We believe that the utility of the proposed QA procedures is dependent on the objectives and use of data in stroke registries. Therefore, stroke centers that have or are planning to establish a stroke registry should carefully evaluate their needs and implement a suitable level of QA procedure to ensure reliability and validity of data.
Limitations The decision to use 30 patient records for the reliability study (i.e., to assess IRR) was based on the following reports. First, Sim and Wright (2005) [62] have shown that sample size of 30 provides nearly 80% power for detecting an association between the two abstractors as long as Kappa ≥ 0.5. Second, Walter et al. (1998) have shown that sample size of greater than 22 provides at least 80% power for detecting any association between the two abstractors with respect to continuous measures as long as the ICC ≥ 0.5 [63]. However, we acknowledge a limitation that the sample of 30 for the "gold standard" records may not be sufficient to conduct inferential statistics on these data. Therefore, for assessment of validity of data in UTHSR, we have combined the 30 "gold standard" records with another 85 patient records for which information about all variables were re-abstracted and adjudicated by our data core team that included a vascular neurology clinician (faculty or fellow). This resulted in a total of 115 patient records used for assessing the validity of data in UTHSR. Since only select variables were included in the "gold standard" for the 30 patient records, the numbers of observations available for different variables in the validity study for UTHSR vary. Therefore, we recommend a careful interpretation of the data related to various inferences that could be made from these measures of validity. Overall, we believe the number of patient records used for the reliability and validity studies provide important information as to whether additional training is needed for any of the abstractors and for which variables such training is needed.

Conclusions
In this study, we have described the UTHealth Stroke Registry and shown that establishment of enhanced data quality assurance has helped to improve the validity of our data. Our enhanced quality assurance procedures included training of abstractors, assessment of IRR between abstractors as well as assessment of validity of data abstracted compared with the "gold standard", and development and implementation of univariable and multivariable data cleaning rules. We have observed an excellent inter-rater reliability and validity for almost all key variables. Our resulting data compare well with data from other registries and certification guidelines, and demonstrate tPA treatment rates that are among the highest reported.