Learning from Cardiac Arrhythmias to Improve Customer Satisfaction — Working with Imbalanced Data

Felipe Alonso-Atienza

d&a blog

Background

Customer satisfaction is a complex indicator. It can change after every interaction, can be influenced by external factors, and can lead to multiple outcomes. Cumulative dissatisfaction can lead the customer to terminate the relationship with the service provider and yet, it is difficult to measure with traditional methods like customer satisfaction surveys. Machine Learning algorithms offer today the possibility to use data from different sources to estimate customer satisfaction.
Recent advances in the treatment of life-threatening arrhythmias offer interesting solutions to estimate customer satisfaction. Heart arrhythmias are changes in the normal sequence of electrical impulses in the heart. Severe arrhythmias can cause a cardiac arrest and be a cause of death. The only effective way to treat these lethal arrhythmia is the application of a high-energy electric defibrillation shock using an automated external defibrillator (AED), which include a shock advice algorithm that analyzes the electrocardiogram (ECG), and delivers an electric shock if lethal arrhythmias (so-called shockable) are detected. This algorithm processes real-time data from the heart and calculates the intensity of the electroshock that is necessary to revert the arrhythmia.

The Machine Learning methodology applied in the detection of life-threatening arrhythmias can also be used to identify dissatisfied customers, using different types of data.

Analytical Framework and Data Sources

Developing an automatic shockable arrhythmia detector in AEDs by using machine learning techniques require working with imbalanced datasets where standard classification algorithms are biased to the majority class, hence compromising their performance.
There are a several ways to deal with imbalanced data in classification:

  • Data sampling, aiming to reduce (undersampling) the majority class, or increasing the minority class (by oversampling or by synthesizing new samples).
  • Cost-sensitive learning, introducing a higher penalization cost for the minority class misclassification errors.
  • Use singular assessment metrics. Accuracy should not be the metric to guide the algorithms learning process, but the balanced error rate (BER) or the F1-score instead.

In the case of arrhythmias, the BER metric can be used to set machine learning algorithms free-parameters, for several reasons: i) it is defined as a trade-off between sensitivity (Se) and specificity (Sp).

BER = 1 – 0.5*(Se + Sp),

which are key metrics in medical diagnosis settings; ii) it is easy to compute; and iii) it provides a good resulting performance. Also, it will be required to introduce higher penalization cost for the minority class.

Results

By using the BER metrics to assess machine learning algorithms performance, it is possible to i) provide a robust life-threatening arrhythmia detector; and ii) identify which ECG parameters were more important for the detection of shockable rhythms in EADs.

At BBVA, this methodology is now being tested for other type of scenarios, like the Quality by Behavioral Analytics. The objective is to obtain a customer satisfaction score based on customer behavior. Input attributes include customer profile, sociodemographics, digital behavior, transactions performed, products owned, or claims.

To test the validity of the model, we trained a model of attrition assuming that dissatisfied customers are those who leave the bank. This constitutes an imbalanced dataset (p+1 ≈ 3%), and the methodology used for the study of arrhythmias demonstrated promising results. We reached Se and Sp values higher than 80% and an area under the ROC curve of 0.91.