The master thesis presents a possible case of using reimbursement data for the provided treatments in the healthcare system, which are being collected by the Health Insurance Institute of Slovenia (ZZZS), to build a dashboard that displays a system of key performance indicators of quality and efficiency of care for patients with coronary artery disease in the Slovenian healthcare system. The goal of the work was to design a system of performance indicators of quality and efficiency of care for patients with coronary artery desease based on administrative data in a way that follows the paradigm of Value-Based Healthcare and Donabedian's model of measuring the quality and efficiency of healthcare. Performance indicators should be presented on the dashboard in a user-friendly, transparent, understandable, and interactive manner, which was possible to achieve with the R package Shiny. The research process of building the dashboard consisted of four steps in which a combination of qualitative and quantitative scientific methods was used in collaboration with a panel of domain experts to answer three research questions. Because the data used in this work are administrative, meaning their primary purpose is assuring payment for the provided healthcare services and products, the goal of the first research question was an assessment of the quality, fitness for use and possible biases for the secondary use case of designing the system of key performance indicators. The second research question, and also the motivation for building the dashboard, is to determine if there is a significant observed difference in the quality and efficiency of healthcare among providers, measured with the designed system of key performance indicators. Since the data enables this, the goal of the third research question is to examine if there is a connection between healthcare outcomes and processes measured with the designed system of key performance indicators and the context of patients' local area of residence - municipality. The research process included patients with a diagnosis of coronary artery disease, hospitalized between 1 January 2015 and 30 June 2021 in the Slovenian healthcare system. On the dashboard, 14 Slovenian hospitals are compared, and the estimation of the connection between performance indicators and contexts of patients' local area of residence is analyzed for 212 Slovenian municipalities.
I started my research process by designing the final selection of 13 performance indicators of quality and efficiency presented on the dashboard. First, I conducted semi-structured interview with two representatives of the ZZZS during which I gained basic insights into designing and calculating the indicators they use. After the method of semi-structured interview was completed, I conducted a focus group for which I prepared two selections of possible candidates for the indicators. The first, broader set was prepared using mostly online sources, while the second set was developed on the theoretical basis of the Value-Based Healthcare paradigm, Donabedian's model, and also the feasibility of calculation based on data received was taken into account. After the first iteration of the focus group method was completed, the final selection of indicators presented on the dashboard, was made in collaboration with a panel of domain experts, using methods of online survey and second iteration of the focus group method.
In the second stage of the research process, I focused on assessing the quality and the question if the data for the paid treatments are fit for designing the prototype of the dashboard. The qualitative research methods I used at this stage were a semi-structured interview and a focus group. I interviewed two ZZZS representatives to gain insights into creation, storage, and possible problems that may arise during creation of the data. For estimating the quality of data, I used two different approaches, both recommended by the relevant literature. Using the first methodology I assessed the quality of data by estimating three data quality dimensions - conformance, completeness, and plausibility -, all defined by strict methodological criteria. The second approach to estimating the quality of data was based on concepts in medicine (admissions, discharges, deaths, etc.), and the assumption that these concepts have a mostly constant variability (statistical stability), which means that large and unexplained differences from that assumption could point to data quality issues. For both of these approaches, I used methods of graphical visualizations and tabular summarization.
The purpose of the third stage in the research process was to calculate the 13 indicators of quality and efficiency of care in such a way that the values will enable the best possible comparison of the healthcare providers, which means they will be controlled for different levels of risks that are present at the level of patients (age, sex, etc.), hospitals and patients' local areas of residence - municipalities in Slovenia. This approach enabled me to address the question if there is a significant difference in quality and efficiency of care among providers and also enabled me to assess if there is a connection between healthcare outcomes and process indicators with contexts of patients' local area of residence - municipalities. Based on the nature of these research questions, the type of performance indicators I was working with, and literature recommendation, I decided to use a quantitative method of statistical regression. For analysing and calculating all 13 performance indicators, I used the generalized linear mixed models methodology. For performance indicators that are represented as a ratio (Hospital mortality), I used multilevel logistic regression, and for performance indicators of a numerical type, I used either multilevel negative binomial, or multilevel Gamma regression. I started the calculation procedure by preparing the model variables. Thirteen variables were dependent and represented each performance indicator, fourteen explanatory variables represented the characteristics of patients, two explanatory variables represented the characteristics of hospitals and eighteen explanatory variables represented the properties of Slovenian municipalities. For the purspose of dimensionality reduction, I conducted the method of explanatory factor analysis on 17 variables, which represented characteristics of municipalities. The method indicated the use of three factor variables, which together with the variable Average time of transportation to the hospital, became new municipality level explanatory variables in the models. For 12 out of 13 performance indicators of quality and efficency of care, I randomly and stratified split the data with patients that met inclusion criteria into training and test data sets in a 50:50 ratio. Due to small sample size, I did not use the split of the data in case of one indicator. On the training set, I estimated three types of regression models. The first type had no explanatory variables included, the second type had only patient-level explanatory variables included, and for the third type the variables that represented hospital and municipality characteristics were also included in the model. Variables of hospitals and municipalities contexts, which represent the random part of the generalized linear mixed effects models, were always included. To find more optimal and less complex models, I used the Stepwise method, which uses the principle of minimazing the Bayes information criterion (BIC). By using the Stepwise method, I was able to reduce the number of patient characteristics explanatory variables included in the models. The results of statistical modeling were the coefficient estimates and their corresponding 95-percent Wald's confidence intervals, which were set out in the tables. Generalized linear mixed effects models allow the estimation of the so-called Variance partition coefficient (VPC) and Median odds/incidence ratio (MOR/MIR), both used to assess the connection between the characteristics of contexts (patients, hospitals, municipalities) with the dependent variable that represents every performance indicator. For the assessment of models fit to the data, I used BIC and Nakagawa R^2, a pseudo coefficient of determination. For the assessment of the model's predictive power, I used Area under receiver operating characteristic (ROC) curve (AUC) and Root mean square error (RMSE) measures. The final calculation of the performance indicators values displayed on the dashboard was made using the standardization methodology. I used the most complex model to make predictions of the dependent variable for every patient on the test data. For every provider, I calculated the quotient between the observed and expected (model-predicted) value of the performance indicator. I called this quotient a weight, which is provider-specific. On the dashboard, this standardized value of the performance indicator is displayed, which is just a product of the provider-specific weight and an appropriate central tendency value (mean for 12 indicators, median for 1 indicator) for all the patients included in the analysis.
The fourth and final step in the research process for this master thesis was building the prototype of the dashboard itself. I designed it based on the insights from the method of semi-structured interview with representatives of ZZZS and good practices from relevant literature. Theoretically speaking, the dashboard is of a strategic type, meaning it offers a general overview of the quality and efficiency of care between 1 January 2015 and 30 June 2021. I used modern and interactive visualizations and an intuitive graphical user interface for their dynamic adjusting. The dashboard itself consists of three pages. The first page uses Value Boxes to display the observed values of the performance indicators on a population level. The values can be modified based on the date of hospitalization, sex, age, and type of coronary artery disease. On the second page of the dashboard, I provide a comparison between observed hospitals. Using scatterplots, I show the standardized values of the performance indicators for every healthcare provider, while also adding a corresponding 95-confidedence intervals and a central tendency line on scatterplots. The scatterplots are divided into sub-pages based on the stage of the healthcare process they represent. The last page of the dashboard displays observed and model-predicted values of indicators by municipalities.
Based on the collaboration with domain experts and the results of online survey and focus group methods, the first step of the research process resulted in a final selection of 13 key performance indicators of quality and efficiency of care that are displayed on the dashboard. They are divided into five sets based on stages of healthcare for patients. I named these sets as follows: mortality, re-hospitalizations, process of care, patient care after discharge or rehabilitation, and economic aspects of healthcare.
The second stage of the research process resulted in an insight that the quality of the data for the paid healthcare treatments, estimated using both data quality dimensions and conceptual approaches, is generally high. There is very little to no noncompliance to established variable conformance rules. Row completeness of data is also very close to 100 percent. Based on the analysis of the values that variables take, there is very little reason to believe they are not plausible or believable, which was also validated with the focus group method in collaboration with the panel of domain experts. I did find some possible data quality issues using the conceptual approach, but these problems were explained by the influence of the covid-19 pandemic, and one bigger outage I detected was explained by data warehouse procedure changes at the ZZZS. The general assessment, formed also from the insights of a semi-structured interview, is that data are mostly fit for my purpose of building the dashboard.
The main research result in the third stage of the process of building the dashboard, which consisted of calculating the performance indicator values using statistical modelling, shows that applying generalized linear mixed models methodology was justified since the estimated variances of the random intercept variables for hospitals and municipalities contexts were not close or equal to zero. BIC and Nakagawa R^2 measures seemed to indicate that models fitted the data better for those performance indicators that related closer to the time of index hospitalization. BIC generally improved the most with the inclusion of patient characteristics explanatory variables in the models and did not improve much further with the inclusion of hospital and municipality characteristics independent variables. The predictive power of the models measured with AUC and RMSE was also generally better with models for those performance indicators that were closer to the time of index hospitalization. In addition to BIC, observed Variance partition coefficients (VPC) seemed to indicate that the main source of risk lies at the patient individual level, followed by hospital and patient local area of residence - municipalities contexts. For the latter, no significant connection with the outcomes or process indicators is observed. I make this claim based on the detected VPC values, no significant improvement on BIC or Nakagawa R^2 when adding variables of patients' local area contexts, and also based on the exponentially transformed estimated model coefficients for these variables, which are in most cases very close to 1, meaning no significant change in odds or relative ratios.
Based on the observed values of 13 performance indicators displayed on the dashboard, I observe the expected difference in quality and efficiency of care that stems from different patients' characteristics (age, sex, type of coronary artery disease) baseline risks. The comparison of providers seems to indicate differences both in the quality measured with the relevant outcomes, but also in the efficiency measured with process indicators of care provided to the patients. Some differences persist even after I control for different levels of risks that stem from patient, hospital, and patients' local area contexts using the adjusted values approach. For performance indicators of the type of ratio, the difference between providers is on average 4 to 10 percentage points, depending on the indicator itself. The differences between hospitals are also observed based on VPC and MOR/MIR values, but they generally decrease with adding explanatory variables in models, although some observed variability remains unexplained.
This master thesis is a retrospective observational study. The main limitation of this type of research stems from unknown confounding variables, which have the characteristic of being associated with both dependent and independent variables and can affect the model results if left unaccounted for. The limitation of my work is also based on the use of data that are primarily intended for other purposes, so I need to emphasize all observed and unobserved biases and limitations of such data that can influence the results. There is also quite a big simplification in the research of the association between local area contexts and performance indicators, which is the assumption that all the patients from the same municipality have the same local area characteristics risk level. I also have to emphasize the limitation of determining the index hospitalizations, which for this work was assumed as every first hospitalization noted in the data, but this could in fact be second or third hospitalization and the index hospitalization happened before 1 January 2015 but is of course not present in the data. I would advise some caution when interpreting results due to statistical methods being used since having only 14 hospitals on the models' second level makes quite a small sample size when using generalized linear mixed effects models methodology.
|