My Projects
Employee Risk Attrition Assessment
Objective: Developed a predictive model to assess employee attrition risk across departments and identify key factors contributing to turnover.
Dataset Preparation:
-
Analyzed and preprocessed a dataset of 10,000+ employee records, including demographics, job performance metrics, and working hours.
-
Addressed missing values, normalized continuous variables, and encoded categorical features for model training.
-
Conducted exploratory data analysis (EDA) to uncover patterns and correlations in employee turnover.
Model:
-
Built a Logistic Regression model to predict the likelihood of employee attrition.
-
Applied feature engineering to identify significant predictors such as underutilization (working less than 250 hours/month).
-
Optimized the model with a learning rate of 0.01 and L2 regularization to prevent overfitting.
Results:
-
Achieved 85% accuracy on the validation dataset, with precision and recall metrics tailored for high-risk employee detection.
-
Identified actionable insights, such as the impact of workload imbalance and job satisfaction, to improve retention strategies.
Skills & Tools: Python, Logistic Regression, Feature Engineering, Tableau, Power BI, Statistical Analysis, ETL Workflows.
Speech Recognition Using Hidden Markov Models (HMM)
Objective: Designed a speech recognition system to process and decode speech signals using Hidden Markov Models, enabling efficient recognition of spoken commands.
Dataset Preparation:
-
Collected and preprocessed a dataset of audio files, including noise filtering and segmentation into phonemes.
-
Extracted Mel-frequency cepstral coefficients (MFCCs) as input features for acoustic modeling.
Model:
-
Implemented a Hidden Markov Model (HMM) for acoustic sequence modeling and speech decoding.
-
Trained the HMM with supervised learning using labeled speech data to map audio features to text.
-
Fine-tuned the model for improved recognition of short phrases and isolated words.
Results:
-
Achieved 90% accuracy in recognizing isolated words and 80% accuracy for short phrases on the test set.
-
Demonstrated real-time recognition capability with minimal latency for basic spoken commands.
Skills & Tools: Python, Hidden Markov Models, MFCC, Speech Signal Processing, NumPy, SciPy, Acoustic Modeling.

Disease Prediction: Leveraging Support Vector Machines (SVM) and K-Nearest Neighbors (KNN)
Objective: Developed predictive models to classify disease outcomes based on patients' symptoms, demographics, and health indicators, aiding in accurate diagnosis and personalized treatment planning.
Dataset Preparation:
-
Processed and cleaned a dataset of patient records, including handling missing data, normalizing features, and encoding categorical variables.
-
Conducted exploratory data analysis (EDA) to identify key features influencing disease outcomes.
Model:
-
Trained a Support Vector Classification (SVC) model to separate classes using hyperplane-based decision boundaries.
-
Implemented a K-Nearest Neighbors (KNN) algorithm for proximity-based classification, leveraging neighborhood patterns for prediction.
-
Tuned hyperparameters for both models (e.g., kernel type for SVC, number of neighbors for KNN) to maximize accuracy.
Results:
-
Achieved 85% accuracy using SVC and 82% accuracy with KNN on the test dataset.
-
Demonstrated improved precision and recall for high-risk disease categories, enhancing clinical decision-making reliability.
Skills & Tools: Python, Scikit-learn, SVM, KNN, Data Preprocessing, EDA, Hyperparameter Tuning, Statistical Analysis.

Customer Attrition Analysis
Objective: Built a data-driven solution to analyze and predict customer attrition, enabling businesses to identify and retain at-risk customers.
Dataset Preparation:
-
Collected and processed a dataset of customer transactional records, demographics, and interaction logs.
-
Designed and implemented ETL (Extract, Transform, Load) pipelines to clean, transform, and store data in a structured format for seamless machine learning integration.
-
Addressed data inconsistencies, handled missing values, and ensured data quality through rigorous validation techniques.
Model:
-
Conducted exploratory data analysis (EDA) to identify significant features influencing customer retention and attrition.
-
Utilized logistic regression and decision tree models to predict churn probabilities, achieving high accuracy on validation data.
-
Performed feature importance analysis to provide actionable insights for improving customer retention strategies.
Results:​
-
Optimized database queries and storage, reducing data retrieval time by 40%.
-
Delivered a predictive model that identified at-risk customers with 85% accuracy, empowering stakeholders to develop targeted retention campaigns.
Skills & Tools: Python, SQL, ETL Pipelines, Data Preprocessing, EDA, Relational Databases, Machine Learning, Tableau, Power BI.