Umang Mehta

Machine Learning Engineer & Data Scientist working on Knowledge Graphs, Big Data Engineering, Deep Learning, ML Ops with Natural Language Processing & Understanding for Conversational AI & Predictive Analytics.

Graduated with Master's Degree in Data Science and aspire to work in the field of Deep Learning, Machine Learning (ML), Artificial Intelligence (AI) and Cognitive Science & Engineering with applications involving Speech Recognition, Voice Interaction, Natural Language Processing (NLP) and Understanding (NLU). I am a Machine Learning Engineer and Data Scientist with 4.5 years of active research and development experience in ML, Neural Networks, NLP, Social Media Mining, Opinion Mining, Conversational AI, Information Extraction (IE), Knowledge Graph, Predictive Analytics, Big Data Engineering, ML Ops with Python & Scala. I also have 5 years of experience as Full-Stack Software Engineer and Team Mentor with expertise in Database Design, Java Programming and Web Development.

General Info

Highlights

90%

Java, Python & Scala

Object Oriented & Functional Programming.

85%

Machine Learning, Deep Learning & ML Ops

With Natural Language Processing

5

Years of Work Experience in Software and Data Engineering

4.5

Years of Work Experience in ML, NLP, AI, Big Data and ML Ops

Skills

Programming Languages:

Java

Python

Scala

Javascript

Typescript

C/C++

Relational Database:

MySQL

PostgreSQL

NoSQL Database:

AWS DynamoDB

Dgraph

MongoDB

Neo4J

JanusGraph

Apache Solr

Machine Learning, Deep Learning, Big Data:

TensorFlow

PyTorch

Keras

Deeplearning4J (DL4J)

Apache Spark

SciKit-Learn

Apache Hadoop

NLP Toolkits:

NLTK

Stanford CoreNLP

Spacy

OpenNLP

Gensim

Data & ML Ops:

AWS Sagemaker

AWS Step Function

AWS Elastic MapReduce (EMR)

AWS Glue

Databricks

MLFlow

Apache Airflow

Data Warehouse:

Snowflake

AWS Redshift

Apache Hive

Data Visualization:

Plotly

Tableau

Matplotlib

Seaborn

Servers:

Apache2

Apache Tomcat 7 & 8

WildFly

NodeJS

Netty

Openfire

OS & Cloud Services:

Linux

Amazon Web Services (AWS)

Google Cloud Services

VCS & DevOps:

Git

Subversion (SVN)

AWS EC2

AWS Lambda

AWS ECS

AWS Batch

Docker

Kubernetes

Languages

  • English
  • Hindi
  • Gujarati
  • Marathi

Extra-Curricular

Won 2 consecutive Annual Fall Chili Cook-offs in 2017 and 2018 organized by SICE at Indiana University.

Organized Bollywood Quiz as Event Head in Symphony, the Annual Cultural Festival of KJSCE in 2011.

Volunteered in the Infra and Logistics Team for Technical and Cultural Festivals in KJSCE from 2009 to 2011.

As a hobby writer, got my poetic verses published in Kshitij, the Annual College Magazine of KJSCE and articles on topics of technology published in the Technical Magazine of the Computer Society of India, KJSCE Chapter.

Work Experience

  • May 2023-Present

    Comcast

    Machine Learning Engineer

    Developed workflows using Databricks components like Workflows, Unity Catalog, Delta Tables, Feature Store, MLFlow as building blocks for creating pipelines with stages for feature engineering, training, validation, scoring & publishing for Machine Learning Models.

    Used Terraform as IaaC to deploy Databricks infrastructure resources on to different environments in the CI/CD pipeline.

    Built CI/CD pipelines for ML code using Concourse and designed a standard workflow that is now being used by the entire organization for all ML projects

    Deployed Model Endpoint using MLFlow, Docker to extract model from Databricks Model Registry and Terraform to deploy the model to AWS ECS


    Technologies and Tools: Python, Databricks, Spark, MLFlow, AWS S3, AWS Athena, AWS Glue, AWS ECS, Terraform, Docker, Git, Linux

  • Nov 2021-April 2023

    Audible - An Amazon Company

    SDE - Machine Learning Engineering

    Developed workflows using AWS components like Sagemaker, Lambda, Batch, Glue, EMR & Step Functions as building blocks for creating pipelines with stages for training, validation, scoring, publishing, load testing & deployment to production for Machine Learning Models.

    Worked on the Frontend with React and Backend with FastAPI & AWS DynamoDB for the platform used for hosting Machine Learning pipelines using the stages mentioned earlier.

    Used AWS Cloud Development Kit (CDK) to deploy and manage Cloud Resources on AWS.

    Worked with Data Scientists to make the model code production ready and deploy them as containers to the AWS ECS repository for AWS Sagemaker to run Training and Inferencing.

    Overall my work on Machine Learning Platform resulted in reducing engineering efforts of Data Science by 75%.

    Collaborated with Data Scientists to develop and deploy multiple Machine Learning Models including Predictive Analytics, Text Classification, Anomaly Detection and Reinforcement Learning.


    Technologies and Tools: Scala, Python, Java, Spark, AWS S3, AWS EMR, AWS Lambda, AWS ECR, AWS Batch, AWS Sagemaker, AWS Step Function, AWS CDK, Git, Linux

  • Jan 2020-Oct 2021

    Capital One

    Data Engineer

    Developed Data Processing Pipeline using Apache Spark and Scala to process Credit Card Requests in batches with joining data from other sources and APIs and make the final output available for Card Embossing Process which will replace the existing mainframe system making the process more efficient by 70%.

    Integrated Streaming Data Pipeline using Apache Kafka as an alternative option to batch processing in various data pipelines.

    Built data pipelines for data transfer and warehousing using Enterprise File Gateway, Snowflake, Databricks and Apache Spark for incoming data from external sources to be used by analytics intents like Anti-Money Laundering and Fraud Detection using Anomaly Detection.

    Built serverless functions using Python to spin up a transient AWS EMR to run the data pipelines.

    Used AWS Lambda, AWS EMR, AWS CloudFormation and AWS S3 for production deployment.


    Technologies and Tools: Scala, Java, Spark, Kafka, Git, AWS EC2, AWS S3, Linux, Spring Boot

  • June 2019-Oct 2019

    Hello Nesh Inc.(Nesh)

    Data Scientist - NLP & AI

    Built Conversational AI agent in Python for Oil and Gas domain.

    Developed Knowledge Extraction pipeline for public documents with Semantic Extraction using NER, Constituent Parsing with Rasa NLU & Spacy and built a Knowledge Graph using Dgraph, GraphQL to represent the extracted knowledge.

    Extracted topics from public documents using Gensim & TextRazor and linked them to entities in Knowledge Graph.

    Conceptualized and built PoC of Diagnostic Analysis and Predictive Analytics Engine for Oil Well Failure using E-M, MLE and MAP.

    Trained and deployed text classifiers with Word Embeddings, LSTM, BERT using TensorFlow, Keras on AWS Sagemaker.


    Technologies and Tools: Python, Java, Rasa, TensorFlow, Keras, SciKit-Learn, BERT, Stanford CoreNLP, Spacy, TextRazor, Gensim, Dash by Plotly, NumPy, SciPy, Pandas, Flask, Dgraph, Docker, Kubernetes, Nginx, Git, AWS EC2, AWS S3, AWS RDS, AWS Lambda, AWS Sagemaker, AWS DynamoDB, Gremlin, JanusGraph, Linux, Javascript, NodeJS

  • June 2018-May 2019

    Kelley School of Business, Indiana University Bloomington

    Graduate Research Assistant

    Working under Prof. Matthew Josefy utilizing ML and NLP for research on Strategy and Entrepreneurship.

    Research and implement NLP methods to extract relevant information from SEC Filings.

    Develop and implement models using ML and NLP to analyze business model and board leadership structure of companies.

    Built Text Classifiers using NLTK and SciKit-Learn with around 90% accuracy measured with 10-Fold Cross Validation.


    Technologies and Tools: Python, Java, NumPy, SciKit-Learn, Pandas, Stanford CoreNLP, Spacy, Git, TensorFlow, Keras, BeautifulSoup, Windows

  • Feb 2018-May 2019

    Ariadata Inc. (Aridat)

    Chief NLP Research Engineer

    Build an analytics engine to determine critical reception of an artists work based on chatter on social media.

    Built Sentiment Classifier using Naїve Bayes and Multiclass Logistic Regression to classify tweets from artists as +1, 0 and -1 and implemented metrics to analyze sentiment distribution over different demographics.

    Leading and advising on the research and implementations of advanced NLP methods to improve the efficiency of the Sentiment Classifier and add new functionalities to improvise the analytics provided to the artists.


    Technologies and Tools: Python, Java, Numpy, Scikit-Learn, Pandas, Stanford CoreNLP, NLTK, Afinn, TextBlob, MongoDB, Git, Matplotlib, Plotly, TensorFlow, Keras, Linux

  • June 2014-July 2017

    Vitruvian Technologies Pvt. Ltd. (RealtyRedefined)

    Senior Developer & Team Mentor

    Developed functionalities for web-based ERP and CRM systems in the domain of Real Estate.

    Collaborated in a team of 12 for project development including Java Programming, Data Structure & Database Design, Web Design & Development and Unit Testing.

    Lead, trained and mentored a sub-team of 5 throughout the development of the projects. Contributed towards the Core Framework, proprietarily used by the firm for project development.


    Technologies and Tools: Java, Scala, Groovy, Spring Framework, Hibernate ORM, MySQL, Apache Solr, ElasticSearch, HTML 5, CSS 3, Javascript, AngularJS, UnderscoreJS, Bootstrap, AJAX, Jquery, PHP, Laravel Framework, Play Framework, Git, SVN, AWS EC2, AWS S3, AWS RDS, AWS Route53, AWS Cloud CDN, Linux

  • June 2013-May 2014

    Algonation

    Co-founder & Developer

    Developed web portals and mobile apps for small and medium scale enterprises.

    Built server software for TCP Layer Protocols customized for cloud-based industrial requirements.

    Developed standalone and distributed softwares for parts of manufacturing production lines.

    Mentored and trained groups of 3-4 undergraduate interns for developing industry level projects.


    Technologies and Tools: Java, PHP, HTML 5, CSS 3, Bootstrap, Javascript, AJAX, Jquery, PHP, RabbitMQ, Netty, JavaFX, Openfire, Smack, XMPP, MySQL, Linux, Windows, Google Cloud Services

  • June 2013-May 2014

    Research Innovation Incubation Design Laboratory (Riidl)

    Software Engineer Intern

    Developed web-based ERP application for educational institutes using HTML, CSS, Bootstrap, Javascript and PHP.

    Designed the data structures & schema and managed the database transactions using MySQL.

    Built mobile app for the ERP using Java Android SDK for Android phones and Java ME for Java-based feature phones.

    Deployed the ERP on a hosting service using cPanel.


    Technologies and Tools: Java, PHP, HTML, CSS, Bootstrap, Javascript, PHP, MySQL, Java ME, Android SDK, cPanel

Education

  • May 2019

    Master's Degree in Data Science

    Luddy School of Informatics, Computing and Engineering
    Indiana University, Bloomington

    Major Courses: Machine Learning, Data Mining, Deep Learning Systems, Advanced Natural Language Processing, Elements of Artificial Intelligence, Algorithms Design and Analysis, High Performance Big Data Systems, Advanced Database Concepts.

  • May 2013

    Bachelor's Degree in Computer Engineering

    K J Somaiya College of Engineering (KJSCE)
    Mumbai University, Mumbai

Projects

  • Aug 2018-Dec 2018

    Neural Conversation Model

    Seq2Seq Learning with LSTM/RNN

    Built a Proof of Concept(PoC) for Sequence-to-Sequence (Seq2Seq) Learning process for a Neural Conversation Model using Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) on Deeplearning4J (DL4J) library.

    Trained Model on Cornell Movie-Dialog Corpus.

    Experimented with High-Performance Computing (HPC) optimization options in DL4J.

    Experimented with Big Data System coupling with DL4J for Hadoop and Spark.


    Technologies and Tools: Java, DL4J, ND4J, Hadoop, Spark, Intel DAAL, Git, Linux

  • May 2018-May 2019

    OpenIE

    Open Domain Information Extraction

    Worked for this research project under computational linguistics faculty at IU Prof. Damir Cavar for processing unstructured text and extract data, knowledge, entities, relations and mapping out event information.

    Enabling semantic search with concept abstraction and linking concepts to concepts in large knowledge graphs like YAGO, DBPedia and Microsoft Concept Graph.


    Technologies and Tools: Python, Java, Numpy, Scikit-Learn, Pandas, Stanford CoreNLP, Spacy, OpenNLP, NLTK, Neo4J, Django, Git, WildFly, Linux

  • May 2018-May 2019

    Speech Prosody and Pragmatics

    Detecting Prosody and Pragmatics of spoken language

    Working under computational linguistics faculty at IU Prof. Damir Cavar for this research project focusing on prosody, intonation contour detection, focus and stress pattern analysis for the processing of semantic and pragmatic aspects of spoken language.


    Technologies and Tools: Python, Java, Numpy, Scikit-Learn, Pandas, Google Cloud Speech API, Git, Linux

  • Jan 2018-April 2018

    Twitter Sentiment Ananlysis

    Data Mining and Social Media Mining Mini Project

    Developed a Data Preprocessing module using NLTK with steps involving POS Tagging, Stop Words Removal, Stemming and Lemmatization, Negation Handling, N-gram and Sentiment Scoring using AFINN and TextBlob.

    Built Sentiment Classifier using Naїve Bayes and Logistic Regression to classify tweets as +1, 0 and -1 with 87.89% accuracy.


    Technologies and Tools: Python, Numpy, Scikit-Learn, Pandas, NLTK, MongoDB, Git, Linux

Leadership

  • Feb 2018-Feb 2019

    Data Science Club at Indiana University

    President & Treasurer

    Improvised the foundational structure of the Club Leadership.

    Organized hackathon for analyzing the opioid crisis in collaboration with SPEA at Indiana University.

    Spearheaded the initiative of monthly Newsletter and a semester wise e-Magazine.

  • June 2011-May 2012

    Students’ Council, K J Somaiya College of Engineering

    Creative Head

    Organized Technical and Cultural Festivals in KJSCE as Creative Head of the Organizing Committee.

    Headed the Design Team for Kshitij, the Annual College Magazine of KJSCE.

Hobbies

Cooking

Reading

Writing

Movies

Music

Painting