Back to Project

Notable Code: Telecom QoE Analytics

This document highlights key code sections that demonstrate the technical strengths and methodological patterns implemented in this data science portfolio project.

Overview

Telecom QoE Analytics is a comprehensive Data Science Practice project demonstrating end-to-end analytics capability from raw data profiling to strategic insights. The system implements a six-phase analytics pipeline with statistical rigor, SHAP interpretability, and production-ready ML models.


1. Six-Phase Analytics Pipeline

File: Notebook implementation
Lines: Pipeline stages

The project implements a structured six-phase pipeline: (1) Data Profiling & EDA, (2) Statistical Analysis & Causal Inference, (3) ML Regression, (4) ML Classification, (5) Unsupervised Learning & Anomaly Detection, (6) Executive Summary & Strategic Insights.

Why it's notable:
- Clear progression from raw data to business value
- Demonstrates end-to-end data science thinking
- Each phase builds on previous findings
- Prioritizes interpretability and actionability


2. Statistical Rigor with ANOVA and Effect Size

File: Statistical analysis notebooks
Lines: ANOVA and Cohen's d calculations

The system implements hypothesis testing (ANOVA) and effect size analysis (Cohen's d) to move beyond correlation to causal inference.

Why it's notable:
- ANOVA confirms QoE differences between segments
- Effect size (Cohen's d) quantifies impact magnitude
- Identified congestion as primary driver (d=-2.75)
- Provides statistical evidence for recommendations


3. SHAP Interpretability

File: Model interpretability notebooks
Lines: SHAP calculations and visualizations

The system uses SHAP (SHapley Additive exPlanations) for game-theoretic feature attribution, proving congestion is the primary QoE driver.

Why it's notable:
- Game-theoretic guarantees of consistency
- Proves congestion (not signal strength) is primary driver
- Provides explainability for business stakeholders
- Directly influences strategic recommendations


4. Production-Ready Models

File: Model training notebooks
Lines: XGBoost and LightGBM implementations

The system achieves near-perfect performance with XGBoost (R2=0.9997) and LightGBM (ROC-AUC=1.00) while handling class imbalance.

Why it's notable:
- XGBoost Regression: Test MAE 0.0097, R2 0.9997
- LightGBM Classification: ROC-AUC 1.00, Precision 0.97, Recall 1.00
- Class imbalance handling for minority 'Low QoE' class
- Models ready for CEM dashboard deployment


Architecture Highlights

Six-Phase Pipeline

  1. Data Profiling: Schema validation and QoE distribution
  2. Statistical Analysis: ANOVA and effect size (Cohen's d)
  3. ML Regression: XGBoost with Optuna tuning
  4. ML Classification: LightGBM with class imbalance handling
  5. Anomaly Detection: STL decomposition and Isolation Forest
  6. Executive Summary: Strategic recommendations

Design Patterns Used

  1. Structured Pipeline Pattern: Clear phase progression
  2. Statistical Rigor Pattern: Hypothesis testing and effect size
  3. Interpretability Pattern: SHAP for model explainability
  4. Business Translation Pattern: Technical findings to strategic recommendations

Technical Strengths Demonstrated