About Me

Who Am I?

Hi I'm Shashak Kumar Dubey. I am a passionate individual dedicated to the dynamic intersection of Data Science, Data Analysis, Machine Learning, and Artificial Intelligence. With a robust foundation in these domains and hands-on experience through multiple projects, I've delved deep into leveraging data to derive actionable insights and solve real-world challenges.

My journey in this field has been marked by a series of diverse projects that have honed my skills and expanded my understanding. From predictive analytics to exploratory data analysis and the implementation of machine learning models, I've actively applied these techniques to create impactful solutions. Having worked extensively on various projects, I possess a nuanced understanding of the complete data lifecycle - from data wrangling and preprocessing to model building, validation, and deployment. I am adept at utilizing a wide array of tools and technologies to extract valuable insights from complex datasets.

My passion for continuous learning drives me to explore the latest advancements in this rapidly evolving field. I am eager to contribute my skills and insights to new challenges and collaborate with like-minded individuals to drive innovation in data-driven solutions.

Data Analyst

Data Science

Machine Learning

Artificial Intelligence

What I do?

Here are some of my expertise

Data Analysis

Extracting insights and patterns from data to drive informed decisions.

Data Visualization

Developing visual data representations for easy interpretation.

Machine Learning

Building predictive models and algorithms to derive actionable insights.

Artificial Intelligence

Leveraging AI techniques to simulate intelligent behaviors.

CNN Applications

Implementing CNN for advanced image recognition tasks.

SQL Integration

Leveraging SQL for data optimization and precise analysis.

DSA Questions
Technologies
Projects
Online Platform Presence
My Specialty

My Skills

Exploring the world of data is my passion. I enjoy diving into data using tools like Python and data analysis to uncover meaningful insights, solve problems, and make informed decisions.

Python

93%

Data Analyst

90%

Data Science

90%

Power BI

80%

Machine Learning

85%

PostgreSQL

80%

Artifical Intelligence

70%

Excel

75%
Education

Education

Persuing Bachelors of Technology (B.Tech) in Information Technology (IOT) at Madhav Institute of Technology & Science (MITS), Gwalior

  • Started a tech journey at MITS in November 2020, set to finish by May 2024.
  • Active involvement in technology-related extracurricular activities.

Completed senior secondary education under the CBSE (Central Board of Secondary Education) curriculum in 2019 with a focus on Physics, Chemistry, and Mathematics (PCM) from Emmanuel Mission Senior Secondary School, Jhalawar, Rajasthan.

Completed class 10th from Kendriya Vidyalaya S.E.C.L Dhanpuri, Shahdol under the CBSE (Central Board of Secondary Education) board in 2017.

My Work

Recent Work

Indian Traffic Sign using CNN Link

  • Developed an Indian Traffic Sign Recognition system using Convolutional Neural Networks (CNN), employing transfer learning with various pre-built models including InceptionV3, InceptionResNetV2, VGG16, ResNet50, MobileNet, among others.
  • Implemented robust data augmentation techniques using ImageDataGenerator to enrich the dataset, enhancing the model's ability to generalize across diverse traffic sign variations.
  • Applied advanced pooling techniques to efficiently reduce feature sizes, optimizing computational efficiency without compromising the quality of extracted features.
  • Optimized model performance through fine-tuning with Adam Optimizer, achieving a commendable validation accuracy of 50% and an impressive test accuracy of 70%, showcasing the system's efficacy in accurately recognizing Indian traffic signs across varying conditions.

Diamond Price Prediction using Pipelines Link

  • Developed a Diamond Price Prediction model utilizing data visualization techniques and meticulous data preprocessing, including outlier handling and efficient label encoding.
  • Employed streamlined pipelines for data scaling and evaluated multiple machine learning algorithms such as Linear Regression, Decision Tree, Random Forest, and XGBoost. XGBoost emerged as the optimal model, achieving a remarkable negative root mean squared error (RMSE) of -588.3831, showcasing superior predictive performance.
  • Emphasized the significance of data preparation, feature engineering, and algorithm selection in achieving accurate diamond price predictions, highlighting the critical role of streamlined processes in successful predictive modeling.

Malaria Detection using CNN Link

  • Developed a Malaria Detection system utilizing Convolutional Neural Networks (CNN) to analyze a dataset comprising two types of images—infected and uninfected cells.
  • Conducted rigorous preprocessing of image data and implemented CNN architecture to accurately classify between infected and uninfected cells, achieving a notable accuracy of 65%.
  • Demonstrated the potential of CNN in medical image analysis by successfully detecting malaria-infected cells, showcasing its applicability in disease identification and medical diagnostics.

Resume Selector using NLP Link

  • Conceptualized and developed a Resume Selector system utilizing Machine Learning and NLP methodologies to proficiently categorize diverse resumes across fields like Data Science, Business Analyst, and DevOps Engineer.
  • Led comprehensive Exploratory Data Analysis (EDA) and orchestrated robust data cleaning strategies, leveraging NLP tools such as TfidfVectorizer for text representation and precise label encoding, ensuring optimal data preparation for classification.
  • Achieved exceptional performance metrics with RandomForestClassifier, securing a remarkable 99.4% accuracy and an outstanding r2_score of 99.6, underscoring the system's efficacy in precise resume categorization.

Salary Prediction using Linear Regression Link

  • Engineered a robust Salary Price Prediction model using Linear Regression techniques on a rich dataset containing essential parameters such as 'Name', 'JobTitle', 'AgencyID', 'Agency', 'HireDate', 'AnnualSalary', and 'GrossPay', enabling accurate estimation of salary prices based on job-related factors.
  • Leveraged advanced visualization libraries like Matplotlib and Seaborn to craft an array of insightful data visualization plots, empowering comprehensive exploration and nuanced understanding of the dataset's intricate relationships and patterns.
  • Demonstrated exceptional model performance metrics, including an impressive r2Score of 0.96 and a mean absolute error of 2473.36, affirming the model's precision in predicting salary prices with a high degree of accuracy derived from critical job-related attributes.

Adventure Wroks Sales Dashboard using Power BI Link

  • Created a dynamic Power BI Sales Dashboard: Developed a comprehensive dashboard with four distinct pages—Exec, Spatial Info, Product Detail, and Customer Detail—providing in-depth insights into sales performance, geographic trends, product analysis, and customer segmentation for Adventure Works Bike Shop.
  • Visualized complex data effectively: Transformed raw sales data into intuitive and visually compelling representations, enabling executives and stakeholders to make informed decisions quickly and efficiently.
  • Improved decision-making through actionable insights: Empowered decision-makers by presenting crucial metrics and trends, facilitating strategic planning and targeted actions based on customer behavior, product performance, and geographical sales distribution.

InstaDBMirror Link

  • Created InstaDBMirror, a robust relational database project resembling a social media platform akin to Instagram. The database includes comprehensive modules for user management, post interactions, hashtag functionalities, and more, demonstrating a keen focus on intricate database design.
  • Exhibited adeptness in SQL query development and database management, crafting a structured and realistic dataset tailored for seamless integration with frontend applications. This project showcased a meticulous approach to database design, ensuring scalability and optimal performance.
  • Designed a well-structured database schema, prioritizing data integrity and facilitating efficient data analysis through proficiently crafted SQL queries. The project underscored the ability to answer complex data-related inquiries, showcasing a strong foundation in database management and optimization.

Global Terrorism Analysis Link

  • Analyzed a comprehensive dataset on global terrorism, exploring variables like year, month, country, attack type, casualties, etc. Employed Matplotlib and Seaborn libraries to create insightful visualizations, summarizing terrorist activities by region, weaponry, casualties by country, and attack frequency across regions.
  • Leveraged data analytics skills to gain profound insights into global terrorism dynamics. Through meticulous analysis and visualization, identified trends, patterns, and critical concerns, contributing to a comprehensive understanding of worldwide terrorist activities.
  • Utilized advanced data visualization techniques to effectively communicate trends and patterns, enabling a deeper understanding of global terrorism dynamics and aiding in the identification of key areas for further examination and concern.

Uber Drives Analysis Link

  • Conducted an in-depth analysis of Uber drives dataset, meticulously examining parameters such as 'START_DATE', 'END_DATE', 'CATEGORY', 'START', 'STOP', 'MILES', and 'PURPOSE'. Employed rigorous data cleaning techniques and applied feature engineering to refine the dataset, ensuring its readiness for detailed analysis.
  • Leveraged the powerful capabilities of Matplotlib and Seaborn libraries for data visualization, skillfully creating a range of visual representations that unveiled key insights into Uber drives. Through these visualization techniques, demonstrated adeptness in analyzing trends, patterns, and correlations within the dataset, showcasing a nuanced understanding of data analysis methodologies.

Wine Quality Prediction Link

  • Processed a dataset with 13 features including parameters like volatile acidity, residual sugar, etc., ensuring data cleanliness by removing missing values. Applied feature scaling techniques for optimal normalization and employed diverse visualization plots using Matplotlib and Seaborn libraries to comprehensively explore the dataset's characteristics.
  • Trained a Logistic Regression model to predict wine quality based on the dataset's features, achieving an accuracy rate of 76%. This highlighted the model's proficiency in discerning wine quality from the provided dataset, showcasing its predictive capabilities.
Experience

Work Experience

Data Science Intern at Code Clause from July 2023 - Sep 2023

  • Applied data analysis, ML, and NLP to identify key trends, generate insights, and optimize business strategies
  • Developed and implemented data-driven solutions to optimize business strategies for Walmart sales, flight price prediction, and resume screening, leveraging data science, machine learning, and NLP technologies.

Open Source Developer at Hactoberfest 2022

  • Successfully completed the Hacktoberfest challenge by making a minimum of four pull requests to different open-source repositories, contributing to the growth and improvement of the open-source community.
  • Collaborated with developers and contributors from around the world to enhance the functionality and usability of open-source software during the Hacktoberfest event.
Experience

Certifications

Python for Data Science and Machine Learning Bootcamp by UDEMY

I gained hands-on experience in utilizing Python for Data Science and Machine Learning through this comprehensive bootcamp. Explored essential libraries such as Pandas, NumPy, and Scikit-Learn, enabling me to manipulate data effectively, build predictive models, and derive valuable insights. The bootcamp also covered advanced topics like data visualization and model deployment, enhancing my proficiency in leveraging Python for data-driven solutions.

Supervised Machine Learning: Regression and Classification by STANDFORD UNIVERSITY

Explored supervised machine learning, focusing on regression and classification models. Mastered algorithms like Linear Regression, Decision Trees, and Logistic Regression for accurate predictions and data classification. Hands-on experience in feature engineering and model tuning.

Using Python to Interact with the Operating System by GOOGLE

Utilized Python extensively to engage with the operating system, delving into Linux and Bash shell scripting, alongside diverse Python libraries. This encompassed a broad spectrum of tasks, facilitating efficient system operations, and enabling robust task automation.

PostgreSQL by UDEMY

Extensive hands-on experience in PostgreSQL, involving database management, query optimization, and data manipulation. Proficient in crafting complex queries, ensuring efficient data retrieval, and optimizing database performance for seamless operations.

Python by UDEMY

EProficient in Python programming language, leveraging its versatility in various domains such as data analysis, machine learning, and automation. Skilled in developing robust applications, conducting data manipulation, and implementing machine learning models using Python libraries.

Get in Touch

Contact

Flat No. C309, MV Royal Homes, Srirampura, Jakkur Post, Bangluru