Telecom QoE Analytics

Overview

Telecom QoE Analytics is a comprehensive Data Science Practice project utilizing synthetic telecom-digital-twin dataset. It demonstrates end-to-end analytics capability from raw data profiling and rigorous statistical testing to advanced machine learning modeling and strategic troubleshooting, focused on improving Quality of Experience (QoE) in telecommunications networks.

The project implements a six-phase analytics pipeline: (1) Data Profiling & EDA, (2) Statistical Analysis & Causal Inference, (3) ML Regression for QoE Prediction, (4) ML Classification for Degradation Prediction, (5) Unsupervised Learning & Anomaly Detection, (6) Executive Summary & Strategic Insights. All phases prioritize interpretability and actionability over theoretical complexity.

Key findings include: Cell Congestion has massive effect size (Cohen's d = -2.12) on QoE, far outweighing other metrics. XGBoost achieved strong R² performance (0.7247) for QoE prediction. LightGBM achieved high ROC-AUC (0.9645) for degradation classification with excellent recall (0.92). Anomalies cluster around 5 PM busy hour, suggesting peak load correlation.

                Key Achievement: Built comprehensive six-phase analytics pipeline with statistical rigor, SHAP interpretability, and high-performance ML models demonstrating end-to-end data science methodology
            

Key Metrics & Results

6 Phase

Analytics Pipeline

Statistical

Rigor

SHAP

Interpretability

XGBoost LightGBM

Models

Problem Statement

Telecom operators need to understand drivers of Quality of Experience (QoE) degradation to prioritize network investments. Traditional analytics provide correlations but lack causal inference. ML models need interpretability for field engineers. Anomaly detection must balance recall vs precision for SLA compliance.

Business Context

Network operations require actionable insights, not just model predictions. Field engineers need to understand why cells are degraded (feature importance). False negatives (missing outages) are more costly than false positives (false alarms). Strategic recommendations must translate technical findings to business value.

Technical Challenges

Moving beyond correlation to causal inference
Ensuring ML model interpretability for business stakeholders
Handling class imbalance in degradation prediction
Defining 'normal' in highly dynamic networks for anomaly detection
Translating technical findings to strategic recommendations

Solution Architecture

A six-phase structured pipeline: (1) Data Profiling with schema validation and QoE distribution analysis, (2) Statistical Analysis with ANOVA and effect size (Cohen's d), (3) ML Regression using XGBoost with Optuna tuning, (4) ML Classification using LightGBM with class imbalance handling, (5) Unsupervised Learning with STL decomposition and Isolation Forest, (6) Executive Summary translating findings to strategic recommendations.

System Components

Statistical Rigor Phase

Hypothesis testing (ANOVA) confirms QoE differences between segments. Effect size analysis (Cohen's d) quantifies impact magnitude. Identified congestion as primary driver (d=-2.12).

ML Regression Pipeline

XGBoost Regressor tuned with Optuna. Achieved R²=0.7247, MAE=0.3672, RMSE=0.4560 on test set. Feature importance identifies latency and congestion as top predictors.

ML Classification Pipeline

LightGBM Classifier handling class imbalance. Achieved ROC-AUC=0.9645 with precision=0.46 and recall=0.92 for minority 'Low QoE' class. Excellent recall ensures proactive intervention. Serves as CEM dashboard engine.

Anomaly Detection System

STL Decomposition for trend/seasonality removal, followed by Isolation Forest. Successfully isolated anomalies (~5% of data) clustering around 5 PM busy hour.

SHAP Interpretability

Game-theoretic feature attribution proving congestion (not just signal strength) is primary QoE driver. Provides explainability for business stakeholders.

Technology Stack Rationale

XGBoost/LightGBM chosen over deep learning for tabular data and superior interpretability. SHAP provides consistent feature attribution vs biased gain metrics. Isolation Forest handles multivariate anomalies. STL decomposition removes temporal patterns. Optuna enables automated hyperparameter tuning.

Implementation Highlights

Key Features

Six-Phase Analytics Pipeline: Structured approach from EDA to strategic insights with clear progression
Statistical Rigor: Hypothesis testing and effect size analysis (Cohen's d) for causal inference
SHAP Interpretability: Game-theoretic feature attribution proving congestion is primary QoE driver
High-Performance Models: Strong performance with class imbalance handling on synthetic dataset
Business Translation: Executive summary converts technical findings to strategic recommendations

Detailed Code Documentation

Deep dive into the technical implementation with annotated code examples

View Technical Details

Challenges & Solutions

Challenge 1

Moving beyond correlation to causal inference

Solution

Implemented ANOVA hypothesis testing and effect size analysis (Cohen's d). Quantified impact magnitude: congestion has d=-2.12, far outweighing other factors. Proved congestion is primary driver, not just correlated.

Challenge 2

Ensuring ML model interpretability for business stakeholders

Solution

Adopted SHAP for feature attribution. Provides game-theoretic guarantees of consistency. Proved congestion (not just signal strength) is primary QoE driver, directly influencing backhaul expansion recommendation.

Challenge 3

Handling class imbalance in degradation prediction

Solution

Used LightGBM with class_weight parameter. Tuned threshold to maximize recall (sensitivity) for minority 'Low QoE' class. Achieved strong ROC-AUC performance with high recall.

Challenge 4

Defining 'normal' in highly dynamic networks

Solution

Used STL decomposition to remove trend/seasonality from time-series. Applied Isolation Forest on residuals. Successfully isolated anomalies (~5%) clustering around 5 PM busy hour.

Results & Impact

Identified congestion as primary QoE driver (effect size d=-2.12). Achieved strong ML performance on synthetic dataset: XGBoost R²=0.7247, LightGBM ROC-AUC=0.9645. Generated strategic recommendations: prioritize backhaul expansion, optimize latency, deploy proactive alerts. Models demonstrate capability for CEM dashboard deployment.

                Production Performance
                XGBoost Regression: R²=0.7247, MAE=0.3672, RMSE=0.4560 on test set
LightGBM Classification: ROC-AUC=0.9645 with precision=0.46, recall=0.92 for Low QoE
Anomaly Detection: Successfully isolated ~5% anomalies clustering at 5 PM
Effect Size Analysis: Congestion has d=-2.12 (massive impact)
SHAP Interpretability: Proved congestion is primary driver, not signal strength

            

Lessons Learned

What Worked Well

Tree-based models (XGBoost/LightGBM) excelled on tabular telecom data
SHAP provided superior interpretability vs standard gain metrics
Effect size analysis (Cohen's d) identified true drivers vs correlations
STL decomposition effectively removed temporal patterns for anomaly detection
Structured six-phase pipeline demonstrated end-to-end thinking

Overview

Key Metrics & Results

Problem Statement

Business Context

Technical Challenges

Solution Architecture

System Components

Statistical Rigor Phase

ML Regression Pipeline

ML Classification Pipeline

Anomaly Detection System

SHAP Interpretability

Technology Stack Rationale

Implementation Highlights

Key Features

Detailed Code Documentation

Challenges & Solutions

Challenge 1

Solution

Challenge 2

Solution

Challenge 3

Solution

Challenge 4

Solution

Results & Impact

Production Performance

Lessons Learned

What Worked Well

What I'd Do Differently

Future Enhancements

Related Projects

Telecom QoE Analytics

Overview

Key Metrics & Results

Problem Statement

Business Context

Technical Challenges

Solution Architecture

System Components

Statistical Rigor Phase

ML Regression Pipeline

ML Classification Pipeline

Anomaly Detection System

SHAP Interpretability

Technology Stack Rationale

Implementation Highlights

Key Features

Detailed Code Documentation

Challenges & Solutions

Challenge 1

Solution

Challenge 2

Solution

Challenge 3

Solution

Challenge 4

Solution

Results & Impact

Production Performance

Lessons Learned

What Worked Well

What I'd Do Differently

Future Enhancements

Related Projects

telecom-digital-twin

telecom-ml-framework