My Projects

Employee Risk Attrition Assessment

Objective: Developed a predictive model to assess employee attrition risk across departments and identify key factors contributing to turnover.

Dataset Preparation:

Analyzed and preprocessed a dataset of 10,000+ employee records, including demographics, job performance metrics, and working hours.
Addressed missing values, normalized continuous variables, and encoded categorical features for model training.
Conducted exploratory data analysis (EDA) to uncover patterns and correlations in employee turnover.

Model:

Built a Logistic Regression model to predict the likelihood of employee attrition.
Applied feature engineering to identify significant predictors such as underutilization (working less than 250 hours/month).
Optimized the model with a learning rate of 0.01 and L2 regularization to prevent overfitting.

Results:

Achieved 85% accuracy on the validation dataset, with precision and recall metrics tailored for high-risk employee detection.
Identified actionable insights, such as the impact of workload imbalance and job satisfaction, to improve retention strategies.

Skills & Tools: Python, Logistic Regression, Feature Engineering, Tableau, Power BI, Statistical Analysis, ETL Workflows.

Speech Recognition Using Hidden Markov Models (HMM)

Objective: Designed a speech recognition system to process and decode speech signals using Hidden Markov Models, enabling efficient recognition of spoken commands.

Dataset Preparation:

Collected and preprocessed a dataset of audio files, including noise filtering and segmentation into phonemes.
Extracted Mel-frequency cepstral coefficients (MFCCs) as input features for acoustic modeling.

Model:

Implemented a Hidden Markov Model (HMM) for acoustic sequence modeling and speech decoding.
Trained the HMM with supervised learning using labeled speech data to map audio features to text.
Fine-tuned the model for improved recognition of short phrases and isolated words.

Results:

Achieved 90% accuracy in recognizing isolated words and 80% accuracy for short phrases on the test set.
Demonstrated real-time recognition capability with minimal latency for basic spoken commands.

Skills & Tools: Python, Hidden Markov Models, MFCC, Speech Signal Processing, NumPy, SciPy, Acoustic Modeling.

Disease Prediction: Leveraging Support Vector Machines (SVM) and K-Nearest Neighbors (KNN)

Objective: Developed predictive models to classify disease outcomes based on patients' symptoms, demographics, and health indicators, aiding in accurate diagnosis and personalized treatment planning.

Dataset Preparation:

Processed and cleaned a dataset of patient records, including handling missing data, normalizing features, and encoding categorical variables.
Conducted exploratory data analysis (EDA) to identify key features influencing disease outcomes.

Model:

Trained a Support Vector Classification (SVC) model to separate classes using hyperplane-based decision boundaries.
Implemented a K-Nearest Neighbors (KNN) algorithm for proximity-based classification, leveraging neighborhood patterns for prediction.
Tuned hyperparameters for both models (e.g., kernel type for SVC, number of neighbors for KNN) to maximize accuracy.

Results:

Achieved 85% accuracy using SVC and 82% accuracy with KNN on the test dataset.
Demonstrated improved precision and recall for high-risk disease categories, enhancing clinical decision-making reliability.

Skills & Tools: Python, Scikit-learn, SVM, KNN, Data Preprocessing, EDA, Hyperparameter Tuning, Statistical Analysis.

Customer Attrition Analysis

Objective: Built a data-driven solution to analyze and predict customer attrition, enabling businesses to identify and retain at-risk customers.

Dataset Preparation:

Collected and processed a dataset of customer transactional records, demographics, and interaction logs.
Designed and implemented ETL (Extract, Transform, Load) pipelines to clean, transform, and store data in a structured format for seamless machine learning integration.
Addressed data inconsistencies, handled missing values, and ensured data quality through rigorous validation techniques.

Model:

Conducted exploratory data analysis (EDA) to identify significant features influencing customer retention and attrition.
Utilized logistic regression and decision tree models to predict churn probabilities, achieving high accuracy on validation data.
Performed feature importance analysis to provide actionable insights for improving customer retention strategies.

Results:

Optimized database queries and storage, reducing data retrieval time by 40%.
Delivered a predictive model that identified at-risk customers with 85% accuracy, empowering stakeholders to develop targeted retention campaigns.

Skills & Tools: Python, SQL, ETL Pipelines, Data Preprocessing, EDA, Relational Databases, Machine Learning, Tableau, Power BI.