This document highlights key code sections that demonstrate the technical strengths and methodological patterns implemented in this data science portfolio project.
Telecom QoE Analytics is a comprehensive Data Science Practice project demonstrating end-to-end analytics capability from raw data profiling to strategic insights. The system implements a six-phase analytics pipeline with statistical rigor, SHAP interpretability, and production-ready ML models.
File: Notebook implementation
Lines: Pipeline stages
The project implements a structured six-phase pipeline: (1) Data Profiling & EDA, (2) Statistical Analysis & Causal Inference, (3) ML Regression, (4) ML Classification, (5) Unsupervised Learning & Anomaly Detection, (6) Executive Summary & Strategic Insights.
Why it's notable:
- Clear progression from raw data to business value
- Demonstrates end-to-end data science thinking
- Each phase builds on previous findings
- Prioritizes interpretability and actionability
File: Statistical analysis notebooks
Lines: ANOVA and Cohen's d calculations
The system implements hypothesis testing (ANOVA) and effect size analysis (Cohen's d) to move beyond correlation to causal inference.
Why it's notable:
- ANOVA confirms QoE differences between segments
- Effect size (Cohen's d) quantifies impact magnitude
- Identified congestion as primary driver (d=-2.75)
- Provides statistical evidence for recommendations
File: Model interpretability notebooks
Lines: SHAP calculations and visualizations
The system uses SHAP (SHapley Additive exPlanations) for game-theoretic feature attribution, proving congestion is the primary QoE driver.
Why it's notable:
- Game-theoretic guarantees of consistency
- Proves congestion (not signal strength) is primary driver
- Provides explainability for business stakeholders
- Directly influences strategic recommendations
File: Model training notebooks
Lines: XGBoost and LightGBM implementations
The system achieves near-perfect performance with XGBoost (R2=0.9997) and LightGBM (ROC-AUC=1.00) while handling class imbalance.
Why it's notable:
- XGBoost Regression: Test MAE 0.0097, R2 0.9997
- LightGBM Classification: ROC-AUC 1.00, Precision 0.97, Recall 1.00
- Class imbalance handling for minority 'Low QoE' class
- Models ready for CEM dashboard deployment