top of page

My Projects

Employee Risk Attrition Assessment

Objective: Developed a predictive model to assess employee attrition risk across departments and identify key factors contributing to turnover.

 

Dataset Preparation:

  • Analyzed and preprocessed a dataset of 10,000+ employee records, including demographics, job performance metrics, and working hours.

  • Addressed missing values, normalized continuous variables, and encoded categorical features for model training.

  • Conducted exploratory data analysis (EDA) to uncover patterns and correlations in employee turnover.

Model:

  • Built a Logistic Regression model to predict the likelihood of employee attrition.

  • Applied feature engineering to identify significant predictors such as underutilization (working less than 250 hours/month).

  • Optimized the model with a learning rate of 0.01 and L2 regularization to prevent overfitting.

Results:

  • Achieved 85% accuracy on the validation dataset, with precision and recall metrics tailored for high-risk employee detection.

  • Identified actionable insights, such as the impact of workload imbalance and job satisfaction, to improve retention strategies.

Skills & Tools: Python, Logistic Regression, Feature Engineering, Tableau, Power BI, Statistical Analysis, ETL Workflows.

Speech Recognition Using Hidden Markov Models (HMM)

Objective: Designed a speech recognition system to process and decode speech signals using Hidden Markov Models, enabling efficient recognition of spoken commands.

Dataset Preparation:

  • Collected and preprocessed a dataset of audio files, including noise filtering and segmentation into phonemes.

  • Extracted Mel-frequency cepstral coefficients (MFCCs) as input features for acoustic modeling.

Model:

  • Implemented a Hidden Markov Model (HMM) for acoustic sequence modeling and speech decoding.

  • Trained the HMM with supervised learning using labeled speech data to map audio features to text.

  • Fine-tuned the model for improved recognition of short phrases and isolated words.

Results:

  • Achieved 90% accuracy in recognizing isolated words and 80% accuracy for short phrases on the test set.

  • Demonstrated real-time recognition capability with minimal latency for basic spoken commands.

Skills & Tools: Python, Hidden Markov Models, MFCC, Speech Signal Processing, NumPy, SciPy, Acoustic Modeling.

Disease Prediction: Leveraging Support Vector Machines (SVM) and K-Nearest Neighbors (KNN)

Objective: Developed predictive models to classify disease outcomes based on patients' symptoms, demographics, and health indicators, aiding in accurate diagnosis and personalized treatment planning.

Dataset Preparation:

  • Processed and cleaned a dataset of patient records, including handling missing data, normalizing features, and encoding categorical variables.

  • Conducted exploratory data analysis (EDA) to identify key features influencing disease outcomes.

Model:

  • Trained a Support Vector Classification (SVC) model to separate classes using hyperplane-based decision boundaries.

  • Implemented a K-Nearest Neighbors (KNN) algorithm for proximity-based classification, leveraging neighborhood patterns for prediction.

  • Tuned hyperparameters for both models (e.g., kernel type for SVC, number of neighbors for KNN) to maximize accuracy.

Results:

  • Achieved 85% accuracy using SVC and 82% accuracy with KNN on the test dataset.

  • Demonstrated improved precision and recall for high-risk disease categories, enhancing clinical decision-making reliability.

Skills & Tools: Python, Scikit-learn, SVM, KNN, Data Preprocessing, EDA, Hyperparameter Tuning, Statistical Analysis.

Machine Learning Project Image.png

Customer Attrition Analysis

Objective: Built a data-driven solution to analyze and predict customer attrition, enabling businesses to identify and retain at-risk customers.

Dataset Preparation:

  • Collected and processed a dataset of customer transactional records, demographics, and interaction logs.

  • Designed and implemented ETL (Extract, Transform, Load) pipelines to clean, transform, and store data in a structured format for seamless machine learning integration.

  • Addressed data inconsistencies, handled missing values, and ensured data quality through rigorous validation techniques.

Model:

  • Conducted exploratory data analysis (EDA) to identify significant features influencing customer retention and attrition.

  • Utilized logistic regression and decision tree models to predict churn probabilities, achieving high accuracy on validation data.

  • Performed feature importance analysis to provide actionable insights for improving customer retention strategies.

Results:​

  • Optimized database queries and storage, reducing data retrieval time by 40%.

  • Delivered a predictive model that identified at-risk customers with 85% accuracy, empowering stakeholders to develop targeted retention campaigns.

Skills & Tools:  Python, SQL, ETL Pipelines, Data Preprocessing, EDA, Relational Databases, Machine Learning, Tableau, Power BI.

bottom of page