Umang Mehta

Data Scientist & AI Research Engineer working on Knowledge Graphs, Big Data Engineering, Deep Learning with Natural Language Processing & Understanding for Conversational AI & Predictive Analytics.

Graduated with Master's Degree in Data Science and aspire to work in the field of Deep Learning, Machine Learning (ML), Artificial Intelligence (AI) and Cognitive Science & Engineering with applications involving Speech Recognition, Voice Interaction, Natural Language Processing (NLP) and Understanding (NLU). I am a Data Scientist and AI Research Engineer with 3 years of active research and development experience in ML, Neural Networks, NLP, Social Media Mining, Opinion Mining, Conversational AI, Information Retrieval (IR), Knowledge Graph, Predictive Analytics, Big Data Engineering, Python & Scala. I also have 5 years of experience as Software Developer and Team Mentor with expertise in Database Design, Java Programming and Web Development.

General Info



Java, Javascript & Python

Object Oriented & Functional Programming.


Machine Learning & Deep Learning

With Natural Language Processing


Years of Work Experience in Software and Data Engineering


Years of Work Experience in ML, NLP, AI and Big Data


Programming Languages:






Relational Database:



NoSQL Database:





Apache Solr

Machine Learning, Deep Learning, Big Data:



Deeplearning4J (DL4J)

Apache Spark


Apache Hadoop

NLP Toolkits:


Stanford CoreNLP






Apache Tomcat 7 & 8





OS & Cloud Services:


Amazon Web Services (AWS)

Google Cloud Services

VCS & DevOps:


Subversion (SVN)


AWS Lambda

AWS Sagemaker




  • English
  • Hindi
  • Gujarati
  • Marathi


Won 2 consecutive Annual Fall Chili Cook-offs in 2017 and 2018 organized by SICE at Indiana University.

Organized Bollywood Quiz as Event Head in Symphony, the Annual Cultural Festival of KJSCE in 2011.

Volunteered in the Infra and Logistics Team for Technical and Cultural Festivals in KJSCE from 2009 to 2011.

As a hobby writer, got my poetic verses published in Kshitij, the Annual College Magazine of KJSCE and articles on topics of technology published in the Technical Magazine of the Computer Society of India, KJSCE Chapter.

Work Experience

  • Jan 2020-Present

    Capital One

    Data Engineer

    Developing Data Processing Pipeline using Apache Spark and Scala to process Credit Card Requests in batches with joining data from other sources and APIs and make the final output available for Card Embossing Process.

    Integrating Streaming Data Pipeline using Apache Kafka as an alternative option to batch processing in various data pipelines.

    Technologies and Tools: Scala, Java, Spark, Kafka, Git, AWS EC2, AWS S3, Linux, Spring Boot

  • June 2019-Oct 2019

    Hello Nesh Inc.(Nesh)

    Data Scientist - NLP & AI

    Built Conversational AI agent in Python for Oil and Gas domain.

    Developed Knowledge Extraction pipeline for public documents with Semantic Extraction using NER, Constituent Parsing with Rasa NLU & Spacy and built a Knowledge Graph using Dgraph, GraphQL to represent the extracted knowledge.

    Extracted topics from public documents using Gensim & TextRazor and linked them to entities in Knowledge Graph.

    Conceptualized and built PoC of Diagnostic Analysis and Predictive Analytics Engine for Oil Well Failure using E-M, MLE and MAP.

    Trained and deployed text classifiers with Word Embeddings, LSTM, BERT using TensorFlow, Keras on AWS Sagemaker.

    Technologies and Tools: Python, Java, Rasa, TensorFlow, Keras, SciKit-Learn, BERT, Stanford CoreNLP, Spacy, TextRazor, Gensim, Dash by Plotly, NumPy, SciPy, Pandas, Flask, Dgraph, Docker, Kubernetes, Nginx, Git, AWS EC2, AWS S3, AWS RDS, AWS Lambda, AWS Sagemaker, AWS DynamoDB, Gremlin, JanusGraph, Linux, Javascript, NodeJS

  • June 2018-May 2019

    Kelley School of Business, Indiana University Bloomington

    Graduate Research Assistant

    Working under Prof. Matthew Josefy utilizing ML and NLP for research on Strategy and Entrepreneurship.

    Research and implement NLP methods to extract relevant information from SEC Filings.

    Develop and implement models using ML and NLP to analyze business model and board leadership structure of companies.

    Built Text Classifiers using NLTK and SciKit-Learn with around 90% accuracy measured with 10-Fold Cross Validation.

    Technologies and Tools: Python, Java, NumPy, SciKit-Learn, Pandas, Stanford CoreNLP, Spacy, Git, TensorFlow, Keras, BeautifulSoup, Windows

  • Feb 2018-May 2019

    Ariadata Inc. (Aridat)

    Chief NLP Research Engineer

    Build an analytics engine to determine critical reception of an artists work based on chatter on social media.

    Built Sentiment Classifier using Naїve Bayes and Multiclass Logistic Regression to classify tweets from artists as +1, 0 and -1 and implemented metrics to analyze sentiment distribution over different demographics.

    Leading and advising on the research and implementations of advanced NLP methods to improve the efficiency of the Sentiment Classifier and add new functionalities to improvise the analytics provided to the artists.

    Technologies and Tools: Python, Java, Numpy, Scikit-Learn, Pandas, Stanford CoreNLP, NLTK, Afinn, TextBlob, MongoDB, Git, Matplotlib, Plotly, TensorFlow, Keras, Linux

  • June 2014-July 2017

    Vitruvian Technologies Pvt. Ltd. (RealtyRedefined)

    Senior Developer & Team Mentor

    Developed functionalities for web-based ERP and CRM systems in the domain of Real Estate.

    Collaborated in a team of 12 for project development including Java Programming, Data Structure & Database Design, Web Design & Development and Unit Testing.

    Lead, trained and mentored a sub-team of 5 throughout the development of the projects. Contributed towards the Core Framework, proprietarily used by the firm for project development.

    Technologies and Tools: Java, Scala, Groovy, Spring Framework, Hibernate ORM, MySQL, Apache Solr, ElasticSearch, HTML 5, CSS 3, Javascript, AngularJS, UnderscoreJS, Bootstrap, AJAX, Jquery, PHP, Laravel Framework, Play Framework, Git, SVN, AWS EC2, AWS S3, AWS RDS, AWS Route53, AWS Cloud CDN, Linux

  • June 2013-May 2014


    Co-founder & Developer

    Developed web portals and mobile apps for small and medium scale enterprises.

    Built server software for TCP Layer Protocols customized for cloud-based industrial requirements.

    Developed standalone and distributed softwares for parts of manufacturing production lines.

    Mentored and trained groups of 3-4 undergraduate interns for developing industry level projects.

    Technologies and Tools: Java, PHP, HTML 5, CSS 3, Bootstrap, Javascript, AJAX, Jquery, PHP, RabbitMQ, Netty, JavaFX, Openfire, Smack, XMPP, MySQL, Linux, Windows, Google Cloud Services

  • June 2013-May 2014

    Research Innovation Incubation Design Laboratory (Riidl)

    Software Engineer Intern

    Developed web-based ERP application for educational institutes using HTML, CSS, Bootstrap, Javascript and PHP.

    Designed the data structures & schema and managed the database transactions using MySQL.

    Built mobile app for the ERP using Java Android SDK for Android phones and Java ME for Java-based feature phones.

    Deployed the ERP on a hosting service using cPanel.

    Technologies and Tools: Java, PHP, HTML, CSS, Bootstrap, Javascript, PHP, MySQL, Java ME, Android SDK, cPanel


  • May 2019

    Master's Degree in Data Science

    Luddy School of Informatics, Computing and Engineering
    Indiana University, Bloomington

    Major Courses: Machine Learning, Data Mining, Deep Learning Systems, Advanced Natural Language Processing, Elements of Artificial Intelligence, Algorithms Design and Analysis, High Performance Big Data Systems, Advanced Database Concepts.

  • May 2013

    Bachelor's Degree in Computer Engineering

    K J Somaiya College of Engineering (KJSCE)
    Mumbai University, Mumbai


  • Aug 2018-Dec 2018

    Neural Conversation Model

    Seq2Seq Learning with LSTM/RNN

    Built a Proof of Concept(PoC) for Sequence-to-Sequence (Seq2Seq) Learning process for a Neural Conversation Model using Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) on Deeplearning4J (DL4J) library.

    Trained Model on Cornell Movie-Dialog Corpus.

    Experimented with High-Performance Computing (HPC) optimization options in DL4J.

    Experimented with Big Data System coupling with DL4J for Hadoop and Spark.

    Technologies and Tools: Java, DL4J, ND4J, Hadoop, Spark, Intel DAAL, Git, Linux

  • May 2018-May 2019


    Open Domain Information Extraction

    Worked for this research project under computational linguistics faculty at IU Prof. Damir Cavar for processing unstructured text and extract data, knowledge, entities, relations and mapping out event information.

    Enabling semantic search with concept abstraction and linking concepts to concepts in large knowledge graphs like YAGO, DBPedia and Microsoft Concept Graph.

    Technologies and Tools: Python, Java, Numpy, Scikit-Learn, Pandas, Stanford CoreNLP, Spacy, OpenNLP, NLTK, Neo4J, Django, Git, WildFly, Linux

  • May 2018-May 2019

    Speech Prosody and Pragmatics

    Detecting Prosody and Pragmatics of spoken language

    Working under computational linguistics faculty at IU Prof. Damir Cavar for this research project focusing on prosody, intonation contour detection, focus and stress pattern analysis for the processing of semantic and pragmatic aspects of spoken language.

    Technologies and Tools: Python, Java, Numpy, Scikit-Learn, Pandas, Google Cloud Speech API, Git, Linux

  • Jan 2018-April 2018

    Twitter Sentiment Ananlysis

    Data Mining and Social Media Mining Mini Project

    Developed a Data Preprocessing module using NLTK with steps involving POS Tagging, Stop Words Removal, Stemming and Lemmatization, Negation Handling, N-gram and Sentiment Scoring using AFINN and TextBlob.

    Built Sentiment Classifier using Naїve Bayes and Logistic Regression to classify tweets as +1, 0 and -1 with 87.89% accuracy.

    Technologies and Tools: Python, Numpy, Scikit-Learn, Pandas, NLTK, MongoDB, Git, Linux


  • Feb 2018-Feb 2019

    Data Science Club at Indiana University

    President & Treasurer

    Improvised the foundational structure of the Club Leadership.

    Organized hackathon for analyzing the opioid crisis in collaboration with SPEA at Indiana University.

    Spearheaded the initiative of monthly Newsletter and a semester wise e-Magazine.

  • June 2011-May 2012

    Students’ Council, K J Somaiya College of Engineering

    Creative Head

    Organized Technical and Cultural Festivals in KJSCE as Creative Head of the Organizing Committee.

    Headed the Design Team for Kshitij, the Annual College Magazine of KJSCE.