Name: Harsh Gupta

Job Role: Data Scientist

Experience: 1+ years

Address: Balrampur, U.P., India

Skills

Python 95%
Data Visualization 90%
Databases 80%
Machine Learning 90%
Deep Learning 90%
NLP 85%
Large Language Models 80%
Retrieval Augmented Generation (RAG) 50%

About Me

I'm passionate about applying artificial intelligence and machine learning to solve real-world problems. With a strong foundation in mathematics and statistics, I enjoy diving deep into data and uncovering insights that can drive meaningful results. Additionally, I excel in transforming complex datasets into compelling narratives that inform strategic decision-making and drive actionable outcomes.

  • Profile: Data Scientist & NLP Engineer
  • Programming Languages: Python, C/C++, PHP & Javascript
  • Data Preprocessing: SQL, Pandas, Numpy & Scipy
  • Data Visualization: Power BI, Matplotlib, Seaborn & Plotly
  • Machine Learning: Scikit-learn, Supervised and Unsupervised learning, Evaluation Metrices, Hyperparameter tuning & MLDLC
  • Deep Learning: Tensorflow, Keras, Transformers, & Large Language Models
  • NLP: Spacy, NLTK & Gensim
  • Generative AI: Langchain, HuggingFace, RAG, & Vector Databases
  • Deployment: Streamlit, Heroku, Huggingface Spaces, Docker & AWS
  • MLOps: MLFlow

Resume

Resume

I'm passionate about applying artificial intelligence and machine learning to solve real-world problems. With a strong foundation in mathematics and statistics, I enjoy diving deep into data and uncovering insights that can drive meaningful results.

Experience


June 2023 - July 2023

Machine Learning Trainee

Analytics Vidhya

Analytics Vidhya, a leading IT company.

  • Developed a Laptop Price Predictor that uses a Random Forest Regressor to provide accurate price predictions based on laptop features such as RAM, memory, and processor.
  • Utilized various machine learning concepts including Scikit-learn, Pandas, Numpy, Regex, Streamlit, Matplotlib, and Seaborn.
  • Implemented Pipelines to streamline data processing and model building.
  • Utilized advanced techniques, such as the Random Forest Regressor, which resulted in an exceptional R2 score of 90%.



Education


2024-Present

Master of Computer Applications

Jawaharlal Nehru University (JNU)

2021-2024

Bachelor of Computer Applications

Siddharth University

Throughout my BCA journey, I honed my analytical and problem-solving skills, particularly in soft computing and genetic algorithms, equipping me with a robust skill set for dynamic challenges in data science. Driven by a keen interest in natural language processing (NLP), I gained practical experience with techniques such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer models like Retrieval-Augmented Generation (RAG). My engagement with cutting-edge concepts, including large language models (LLMs) and model fine-tuning, further enhanced my ability to tackle complex analytical problems.

CGPA: 8.256
2020-2021

Intermediate

St. Xavier's High School (PCM with CS)

In my Class 12 Computer Science elective, I developed a profound interest in programming languages, algorithms, and data structures. This not only equipped me with practical skills in problem-solving and software development but also ignited my passion for the field of computer science. It laid a solid foundation, inspiring me to pursue further studies and explore professional opportunities in this dynamic and evolving domain.

Percentage: 93%
2019-2020

High School

Vidyagyan Leadership Academy (Elective : CS)

I completed my primary schooling through a scholarship program that covered all the expenses after successfully clearing a two-level state-level entrance. In my high school education, I secured an impressive score of 96.2% in CBSE board exams, with a perfect score of 100/100 in mathematics.

Percentage: 96.2%

Projects

Projects

Below are some of my projects on Machine Learning, Deep Learning, Data Analysis, NLP and Retrieval Augmented Generation (RAG)

Laptop price predictor

  • Developed a Laptop Price Predictor that uses a Random Forest Regressor to provide accurate price predictions based on laptop features such as RAM, memory, and processor.
  • Utilized various machine learning concepts including Scikit-learn, Pandas, Numpy, Regex, Streamlit, Matplotlib, and Seaborn.
  • Implemented Pipelines to streamline data processing and model building.
  • Utilized advanced techniques, such as the Random Forest Regressor, which resulted in an exceptional R2 score of 90%.

Email/SMS Spam Classifier

  • Developed an ML model for spam classification in SMS/emails using Vectorization and Natural Language Processing.
  • Achieved a remarkable accuracy score of 0.98, leveraging Multinomial Naive Bayes (MNB) and TF-IDF, boasting a precision score of 0.991.
  • Utilized Scikit-learn, Pandas, Numpy, nltk, Matplotlib, Seaborn, WordCloud, etc.
  • Gained expertise in NLP concepts like Tokenization, stopwords removal, stemming, term frequency-inverse document frequency, etc.

Book Book Recommender System

  • Developed a book recommendation system using collaborative filtering and cosine similarity.
  • Utilized libraries like Scikit-learn, Pandas, Numpy, Matplotlib, Seaborn, etc.
  • Explored various recommendation system types: Content-based, Collaborative-based, Popularity-based, and Hybrid-based.

Sentiment Analysis on movie reviews

  • Explored Embeddings and utilized self-trained Word2Vec models.
  • Utilized keras, Pandas, Spacy, gensim, NLTK, Seaborn, Matplotlib etc.
  • Achieved an accuracy score of 0.89.

Next word prediction using LSTM

  • Explored SimpleRNN, LSTM, GRU and Bidirectional LSTM in Keras.
  • Utilized Keras tokenizer, Embeddings Batch Normalization and Dropout.
  • Achieved an accuracy score of 0.82.

Result Analysis System

  • Utilized Streamlit, Plotly, Matplotlib, and Pandas for dynamic and interactive visualizations of examination results.
  • Benchmark individual student performance against class averages and peers to identify growth areas over SGPA or rank.

Auto ML

Welcome to my user-friendly Machine Learning (ML) model creation platform designed for individuals with limited or no prior ML experience. This platform empowers users to effortlessly create ML models through an intuitive interface, drag-and-drop functionality, and pre-built templates for various ML tasks such as classification, regression, and clustering.

Automated Cuet Score Checker

In CUET, checking scores manually can be a tedious task, involving matching each answer with the original sheet. Introducing the Automated CUET MCA Score Checker - a convenient solution to simplify the score-checking process.

OpenAI Whisper Automated Speech Recognition(hindi)

This is a project focused on building a robust speech recognition system for the Hindi language. It leverages OpenAI's Whisper model to convert spoken Hindi into accurate text, facilitating applications in transcription, voice commands, and more. This project aims to improve accessibility and efficiency for Hindi speakers in technology-driven contexts.

Sentiment Analysis using BERT

  • Utilized BERT, a leading NLP model, to conduct accurate sentiment analysis.
  • Contextual Understanding: Leveraged BERT's contextual awareness to capture nuanced meanings, surpassing conventional methods.
  • Actionable Insights: Provided valuable insights into customer feedback and social sentiment, facilitating data-driven decision-making processes.

Transformer-based (Decoder-only) Language Model from Scratch

This repository explores building a character-level transformer decoder in PyTorch, similar to GPT while focusing more on understanding individual components. My goal is to gain deep transformer knowledge and see if character-level learning improves handling of unseen words. The code allows for hyperparameter tuning and experiment customization.

Document QnA using Llama3 and Groq

Document QnA is a webapp that lets users upload multiple documents and ask questions about their content. It uses Llama3, Groq API, LangChain, FAISS, and Google Palm Embeddings to identify relevant documents and provide answers with page numbers. The Streamlit interface ensures easy and efficient use.

Blog Generation using Llama2

  • Developed a custom blog generator using Llama2, Langchain, and Huggingface.
  • Designed a Streamlit interface allowing users to specify topic name, world count, and audience type for personalized blog content.

Movie Name Guessing using Plot with RAG

An AI-powered application that can guess movie titles based on plot summaries. Built using LangChain, Google Palm LLM, CSVLoader, RetrievalQA, Google Palm Embeddings, and FAISS. Deployed on Streamlit for an interactive user experience, allowing you to enter a plot summary and receive a predicted movie title.

Chat with databases using RAG

  • Deployed an end-to-end RAG application with a monitoring system by utilizing LangChain and LangSmith.
  • Leveraged advanced language models like GooglePalm for natural language understanding.
  • Employed ChromaDb for database interaction, providing a rich palette of functionalities.
  • Optimized query generation using FewShotPromptTemplate and create_sql_query_chain.

House Price Prediction with experimentation Tracking using MLFlow

  • Developed a house price prediction model using linear regression, predicting prices based on features like location, total square feet, number of bathrooms, and bedrooms (BHK).
  • Experiment Tracking: Implemented MLflow for tracking experiments, streamlining the development process and providing deeper insights into model performance.
  • Hyperparameter Tuning: Utilized GridSearchCV and Hyperopt to fine-tune models, optimizing performance and achieving better results.

Titanic Survival Prediction

Using machine learning techniques, I developed a model to predict survival rates aboard the Titanic. By analyzing passenger data, I gained insights into factors influencing survival, contributing to disaster preparedness strategies.

Iris Flower Classification

Implemented a classification model to accurately classify Iris flower species based on their features. This project honed my skills in data preprocessing, model selection, and evaluation, demonstrating proficiency in classification algorithms.

Credit Card Fraud Detection

Developed a robust fraud detection system using machine learning algorithms to identify fraudulent transactions. Leveraging techniques like anomaly detection and feature engineering, I contributed to enhancing financial security measures.

Myfiglet

  • Created a Python library enabling effortless integration and display of FIGlet fonts within Python CLI applications.
  • This is a simple and self-contained library that doesn't require external dependencies. It enhances the visual appeal of various programs and enables the addition of colors to figlet fonts with the color parameter.

Py4Math

  • Created a Python library to streamline and enhance coding workflows.
  • Highlighted by a powerful search() function, users can instantly resolve queries, with the module currently featuring over 50 mathematical formulas and ongoing development.

0 Hours spent
0 Books read
0 Lines of code

More projects on Github , Kaggle & Pypi

I enjoy writing blogs. Check out my blog posts on

medium.com


GitHub Kaggle Medium Pypi

Contact

Contact Me

Below are the details to reach out to me!

Address

Balrampur, Uttar Pradesh, India

Contact Number

+91 8303761573

Email Address

harshnkgupta@gmail.com

Download Resume

Click Here