Predicting Risk-of-Readmission for Congestive Heart Failure Patients: A Multi-Layer Approach - PDF

Please download to get full document.

View again

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Instruction manuals

Published:

Views: 260 | Pages: 10

Extension: PDF | Download: 0

Share
Related documents
Description
Predicting Risk-of-Readmission for Congestive Heart Failure Patients: A Multi-Layer Approach Kiyana Zolfaghar, MS 1, Nele Verbiest, MS 2, Jayshree Agarwal, MS 1, Naren Meadem, BS 1, Si-Chi Chin, PhD 1,
Transcript
Predicting Risk-of-Readmission for Congestive Heart Failure Patients: A Multi-Layer Approach Kiyana Zolfaghar, MS 1, Nele Verbiest, MS 2, Jayshree Agarwal, MS 1, Naren Meadem, BS 1, Si-Chi Chin, PhD 1, Senjuti Basu Roy, PhD 1, Ankur Teredesai, PhD 1, David Hazel, MS 1, Paul Amoroso, MD 3, Lester Reed, MD 3 1 Institute of Technology, CWDS, University of Washington Tacoma, WA; 2 Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium; 3 Multicare Health System, Tacoma, WA Abstract Mitigating risk-of-readmission of Congestive Heart Failure (CHF) patients within 30 days of discharge is important because such readmissions are not only expensive but also critical indicator of provider care and quality of treatment. Accurately predicting the risk-of-readmission may allow hospitals to identify high-risk patients and eventually improve quality of care by identifying factors that contribute to such readmissions in many scenarios. In this paper, we investigate the problem of predicting risk-of-readmission as a supervised learning problem, using a multi-layer classification approach. Earlier contributions inadequately attempted to assess a risk value for 30 day readmission by building a direct predictive model as opposed to our approach. We first split the problem into various stages, (a) at risk in general (b) risk within 60 days (c) risk within 30 days, and then build suitable classifiers for each stage, thereby increasing the ability to accurately predict the risk using multiple layers of decision. The advantage of our approach is that we can use different classification models for the subtasks that are more suited for the respective problems. Moreover, each of the subtasks can be solved using different features and training data leading to a highly confident diagnosis or risk compared to a one-shot single layer approach. An experimental evaluation on actual hospital patient record data from Multicare Health Systems shows that our model is significantly better at predicting risk-of-readmission of CHF patients within 30 days after discharge compared to prior attempts. Introduction With the overwhelming increase in available health care data, analyzing and mining this data has gained more interest over the last decade. Improving awareness, personalizing medical treatments and ameliorating health care standards are only a few examples of opportunities that result from mining health care data 1. In this work, we focus on building a predictive model to enhance quality of care 2 for patients with cardiac heart failure. The main goal is to predict the level of risk of patients being discharged after a Congestive Heart Failure (CHF) in order to assess if they are likely to be at high risk of readmission within the next 30 days. We approach this as a classification problem to classify patients into high or low risk given historical discharge history data along with variety of other parameters. We leverage historic patient data that contains admission-readmission histories of CHF patients.. Moreover, hospital readmission is expensive and generally preventable 3. If CHF readmission could be predicted accurately, hospitals would invest more purposefully in improving hospital care by reducing risk of infection, reconciling medications, educating patients on what exact symptoms to monitor, and assess readiness of patients for discharge 4. At first, the 30 day window seems to be arbitrary, but it is indeed a clinically meaningful time window for hospitals, and the Center for Medicare and Medicaid Services (CMS) has started using the 30 day all cause heart failure readmission rate as a publicly reported efficiency metric. Moreover, all cause 30 day readmission rate for patients with CHF has increased by 11 percent between 1992 and Predicting if patients discharged with CHF will be readmitted within 30 days is traditionally approached as a single classification task. We observe two main drawbacks of this approach: (a) firstly, classification of risk of readmission is highly imbalanced, as can be seen from Figure 1, and is hence inherently difficult to solve 5, and (b) secondly, (COMPLETE THIS HERE) Traditional classification methods will generally tend to assign most of the patients to the majority class (no readmission), as the training data consists mostly of majority instances. Another issue is in including all patients discharged with CHF to build the classification model might not be meaningful, as patients that were discharged after a long length of stay can have characteristics that are totally different from patients that were discharged after a short length of stay, and are hence irrelevant for the 30 days classification task. Figure 1: The number of times a patient was readmitted within 30 days after discharge from CHF in a span of 3 years. In this paper we address these drawbacks by introducing a multi-layer classification strategy. The main idea is: we first build a rough model that predicts if patients will be readmitted within a given time window longer than 30 days, and then use a more refined model to predict if patients will be readmitted within 30 days. Specifically, in order to predict if any patient discharged after CHF will be readmitted within 30 days, we first use a coarse grain model to predict if the patient is likely to be readmitted at all (in any reasonable timeframe). If not, we can mostly conclude that the patient will not be highly likely to be readmitted within 30 days (a very short timeframe). Else, we predict if the patient will be readmitted within a large time window. If not, than we can conclude that the patient will not be readmitted within 30 days. If the outcome is that the patient will be readmitted within the large time window, we can use the more refined model to predict if the patient will be readmitted within 30 days. This multi-layer classifier allows for flexibility in many ways. The main advantage is that we can use different models for respective granularity of problems. If we use different classifiers for different layers, we can use different features for each layer; and the classification tasks can be more refined as it only considers patients in the training data that were readmitted within the large time window. The second advantage is that we can split up the imbalanced classification problem in two more or relatively more balanced classification problems. The main contributions of this paper are: We introduce a multi-layer classifier to predict if patients are likely to be readmitted within 30 days after being discharged from CHF We perform an experimental study using a real-world data set provided by the Multicare Health Systems The remainder of this paper is structured as follows. In the next Section, we describe our multi-layer approach in detail, and describe the classifiers and feature selection methods that are used in the layers. Next, we evaluate the performance of our approach in the experimental Section, and compare it with state-of-the-art methods. Afterwards, we study related work, and we conclude and suggest further research directions in the concluding Section. Multi-layer Classification for Readmission of Congestive Heart Failure Patients In this section we propose a multi-layer classifier method for predicting readmission of congestive heart failure patients. Instead of tackling the classification problem at once, we divide it in three sub-problems, as depicted in Figure 2. For a new patient discharged after CHF treatment, we first predict if she will ever be readmitted to the hospital. If the prediction is that the patient will likely never be readmitted, we are done with the prediction task. If the outcome is that the patient may be readmitted (i.e. predicted yes), we use another model (layer) to predict if the patient will be readmitted within 60 days. Again, if the outcome is no, this means that the patient will not be readmitted within 60 days, and hence we output that the patient will not be readmitted within 30 days neither. If the outcome is again a yes, we use yet another model (hence multi-layer) to predict if the patient will be readmitted within 30 days. The outcome of this final classification is then returned as the final classification. Figure 2 Subdivision of the classification problem into multiple layers. Training data that is used in each layer is different. The upper layer uses all the training data. At the second layer, only the patients in the training data that are readmitted are used. In the last layer of the problem, only the patients that are admitted within 60 days are used. As a result, the training data that is used in the second and final layer is more refined than the original data. The purpose of this is to provide each sub-problem only with the relevant data. For example, if we want to predict if a patient will be readmitted within 30 days, the information about patients that will never be readmitted is not relevant and might disturb the classification. Another important advantage of this approach is that the highly imbalanced problem is divided into three more or less balanced problems. The data distribution is depicted in Figure 3. In general, a classification problem is called imbalanced if its Imbalance Ratio (IR, number of majority instances divided by the number of minority instances) is more than 2. In the original problem, the positive class (patients readmitted within 30 days) covered 1477 patients, while the majority class covered 8293 patients. The imbalance ratio of this problem is 5.6, making it severely imbalanced. Number of patients that was never readmitted is 5503 and the total number of patients considered is 9770, resulting in an IR of 1.7 leading to a more balanced problem that is generally easier to solve. The threshold 60 at the second layer of the multi-classifier was chosen to balance the second layer problem, such that the IR of the second layer is 1. The number of patients that were readmitted within 30 days is 1477, so the IR of the final layer is 1.4. We conclude that using this multi-layer approach, the heavily imbalanced original problem is divided into subtasks (layers) that are more or less balanced. Figure 3: Distribution of the patients based on the number of days until readmission after CHS. By dividing the problem in three parts, each of the subtasks is balanced. Furthermore, we can consider different features in each sub-problem. For instance, features which are good to predict if a patient will ever be readmitted or not, might not be relevant features to predict if the patient will be readmitted within 30 days. Therefore, we apply feature selection in every layer of the multi-layered classifier. As a result, each layer will work with features that are suited for the corresponding classification task. The feature selection technique that we use in this paper is the Chi-square test 6, as this technique has proven to be successful in earlier works. This test calculates for each feature a score that expresses its relevance with respect to the decision class, and then decides based on this score which features to retain. Finally, we can also use different classifiers for the different sub-problems. There are two advantages related to this property. The first one is that it can occur that one classifier is well suited for one classification problem but not for the other. For instance, one classifier can work well for the second layer problem, but not for the third layer problem. Secondly, some classifiers require a longer running time than others, and it might not always be feasible to apply them to each layer of the problem. However, it is possible to apply these more involved classifiers to the final layer of the classification problem. We hope that using a more refined classifier for the final layer of our approach will improve classification results. We propose two different multi-layer classifiers, as described in Table 2. The first classifier, to which we will refer to as MLC1, is a multi-layer classifier that uses the Naïve Bayes (NB 7 ) classifier in each layer of the problem. The second classifier, called MLC2, uses NB in the first two coarse layers of the problem, and then uses a Support Vector Machine (SVM 8 ) classifier for the final classification problem. We work with NB because it is a fast and simple model that has shown to be effective in many real-world problems. The SVM classifier is more time-consuming, but it is generally more accurate. Therefore, we use it in the last layer of one of the multi-layer classifiers. Table 1: The classifiers (NB or SVM) that are used in each layer of the two multi-layer classifiers. MLC1 MLC2 Predicting if patient will be ever readmitted NB NB Predicting if patient will be readmitted within 60 days NB NB Predicting if patient will be readmitted within 30 days NB SVM Experimental Evaluation: Set-up The dataset used to derive our readmission prediction model is provided by Multicare Health System (MHS). We are given a set of tables where each table contains data related to the patients. Hospital encounters with discharge diagnosis of CHF (primary or secondary) are considered as the potential index admission due to CHF. We only consider patients with a discharge diagnosis of the International Classification of Diseases, 9th Revision, Clinical Modification Codes (ICD-9 CM) related to CHF, listed in Table 2. Table 2: The ICD-9 CM codes for CHF ICD-9 CM codes Description Malignant hypertensive heart disease with heart failure Benign hypertensive heart disease with heart failure Unspecified hypertensive heart disease with heart failure Malignant hypertensive heart and kidney disease with heart failure and with chronic kidney disease stage I through stage IV, or unspecified Malignant hypertensive heart and kidney disease with heart failure and chronic kidney disease stage V or end stage renal disease Benign hypertensive heart and kidney disease with heart failure and with chronic kidney disease stage I through stage IV, or unspecified Benign hypertensive heart and kidney disease with heart failure and chronic kidney disease stage V or end stage renal disease Unspecified hypertensive heart and kidney disease with heart failure and with chronic kidney disease stage I through stage IV, or unspecified Unspecified hypertensive heart and kidney disease with heart failure and chronic kidney disease stage V or end stage renal disease 428.XX Heart Failure codes All the patients can be identified by a unique patient id and each hospital encounter is uniquely identified by an admission id. Multiple admissions (i.e., readmissions) of the same patient can be identified by using the patient id. Our entity of observation is each CHF hospital encounter and we consider only the admissions when a patient is discharged to home to exclude inter hospital transfers. Admissions encountering in-hospital deaths are not included in our analysis because we are more interested in predicting readmissions. We calculate the days elapsed between the last discharge due to CHF and next admission in order to identify if the readmission has occurred within 30 days. The dataset consists of CHF hospitalization for patients discharged since It provides information of 6348 patients diagnosed with CHF and number of hospital encounters generated by these patients during is As mentioned earlier, various supporting tables are provided to get a complete understanding the patients related to heart failure and to identify the attributes to be used as predictor variables in modeling. The detailed description of some of the attributes is given in Table 3. The key socio-demographic factors related to patients are, gender, race, marital status. Some of the other important factors pertinent to CHF are ejection fraction which represents the volumetric fraction of blood pumped out of the ventricle with each heartbeat, blood pressure, primary and secondary diagnosis, other comorbidity variables, APR- DRG code (All Patient Refined Diagnosis Related Groups Definition; a classification system that classifies patients according to reason of admission) for severity of illness and APR-DRG code for risk of mortality. Information about the discharge disposition of patients like the discharge status, discharge destination, length of stay and followup plans are also found to be correlated to CHF readmissions. In addition, 34 cardiovascular and comorbidity attributes 14 mentioned in Table 3 are also used. Based on our initial understanding we observed that ejection fraction has about 59% of missing values followed by APR-DRG code for severity of illness (13.3%) and blood pressure (12.6%). We imputed the missing value of ejection fraction and after removing the instances with other null values; our final dataset consists of 9770 instances on which the model is built. Table 3: Description of different attributes Variable Type Mean/No. of Domain Values Age Numeric 69 Gender Categorical 2(M, F) Marital status Categorical 9 such as married, divorced Ethnic group Categorical 9 such as Caucasian, Asian, African-American Discharge follow-up plan Categorical 7 such as 2 days, 5 days Discharge destination Categorical 70 Discharge status Categorical 15 such as discharged to home, discharged to rehab facility Admit source Categorical 6 such as transfer from hospital, emergency room Admit type Categorical 4 such as elective, emergency Blood Pressure Categorical 9 Ejection fraction value Numeric Secondary diagnosis count Numeric Discharge APR-DRG Severity of illness Categorical 4 such as 1(least severe), 2, 3, 4(most severe) Discharge APR-DRG Risk of mortality Categorical 4 such as 1(least severe), 2, 3, 4(most severe). Length of stay Numeric 5 IsHFPrimary Categorical 2(Y,N) Congestive heart failure Categorical 2 (0,1) Acute coronary syndrome Categorical 2 (0,1) Arrhythmias Categorical 2 (0,1) Cardio-respiratory failure and shock Categorical 2 (0,1) Valvular and rheumatic heart disease Categorical 2 (0,1) Vascular or circulatory disease Categorical 2 (0,1) Chronic atherosclerosis Categorical 2 (0,1) Other and unspecified heart disease Categorical 2 (0,1) Hemiplegia, paraplegia, paralysis, functional Categorical 2 (0,1) disability Stroke Categorical 2 (0,1) Renal failure Categorical 2 (0,1) COPD Categorical 2 (0,1) Diabetes and DM complications Categorical 2 (0,1) Disorders of fluid/electrolyte/acid base Categorical 2 (0,1) Other urinary tract disorders Categorical 2 (0,1) Decubitus ulcer or chronic skin ulcer Categorical 2 (0,1) Other gastrointestinal disorders Categorical 2 (0,1) Peptic ulcer, hemorrhage, other Categorical 2 (0,1) specified gastrointestinal disorders Severe hematological disorders Categorical 2 (0,1) Nephritis Categorical 2 (0,1) Dementia and senility Categorical 2 (0,1) Metastatic cancer and acute Categorical 2 (0,1) leukemia Cancer Categorical 2 (0,1) Liver and biliary disease Categorical 2 (0,1) End-stage renal disease or dialysis Categorical 2 (0,1) Asthma Categorical 2 (0,1) Iron deficiency and Categorical 2 (0,1) other/unspecified anemias and blood disease Pneumonia Categorical 2 (0,1) Drug/alcohol abuse/dependence/psychosis Categorical 2 (0,1) Major pysch disorders Categorical 2 (0,1) Depression Categorical 2 (0,1) Other psychiatric disorders Categorical 2 (0,1) Fibrosis of lung and other chronic lung disorders Categorical 2 (0,1) Protein-calorie malnutrition Categorical 2 (0,1) We compare our model with two relevant baseline methods. Both baseline methods first apply the same feature selection method to the data as in our model, namely Chi-Square. After that, we use both NB and SVM to classify the data. Both baseline methods use all the data to predict if a patient discharged from CHS will be readmitted within 30 days. Before running the algorithms on the data, we first impute missing values in the Ejection Fraction feature. We do this both for the baseline methods as for our
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x