Shreyas Prashant Kulkarni, Recent Data Scientist Graduate

How about some positive vibes from old Mr. Cranky?

I get tons of unsolicited emails and normally they make me cranky and some have become blogs posts but today, I want to post about one that for some reason touched me. Maybe because the writer, Shreyas Prashant Kulkarni, did a modicum of research and included what he found in his letter. Mr. Kilkarni sent me an email looking for employment. Unfortunately, I’m not hiring and I replied as such. I explained to him that although I wasn’t hiring, my blog gets about 800 unique visitors a day, most of them technology execs and most of them in the DC area.

I then offered to post his letter and resume and he thought that would be a good idea. Managers hiring Tech Talent… here’s a guy that may be worth checking out.

This is not an endorsement of Shreyas. I have not verified the quality of the young man’s character or work. I’m simply passing on this information. Here’s hoping that a few good deeds can outweigh my crankiness.

His Letter

Hello friends at Driven Forward LLC.

I hope you are having great start of the week.

I’m Shreyas, a recent graduate of Data Science from University at Buffalo. I have also completed bachelors in Computer Science from India before that. I am reaching out to you because I am looking for full-time opportunities in data science/analytics or relevant fields. I believe, Driven Forward LLC. is making a great contribution in the field of Consulting and E-Commerce using data science and I would love to push that forward with my data skills and make business impact

I am proficient in Python, R, and MySQL. I have recently worked as a Graduate research assistant at my university where I optimized deep learning algorithms to reduce their computation time by up to 70 times. I was fortunate to work as a research intern in Froot research and as a software developer intern in GS Lab for more than a year which gave me industry exposure in data science field. During my graduate program, I have mastered my concepts of analysis, statistics, and probability and have used a variety of supervised and unsupervised machine learning algorithms on diverse datasets. I have also worked on big data tools like Hadoop and Spark.

As Drivenforward is a website that deals with coaching and training in order to help become better at business., I am really excited to know more about Driven Forward LLC. and want to explore opportunities to work with you and your incredible team. I would love to chat with you or anyone in your team that you think might be interested in connecting, anytime according to convenience. I am also attaching my resume to know more about my projects and experience. Looking forward to hear back from you soon. 

Warm regards,
LinkedIn | Github
Contact no. – (716) 598-8705

His Resume

Shreyas Kulkarni
+1(716)598-8705  | |
EDUCATION University at Buffalo, State University of New York Aug 2017 – Feb 2019
     Master of Science (M.S.) in Data Science
Pune Institute of Computer Technology, Pune
July 2013 – June 2017
     Bachelor of Engineering (B.E.) in Computer Science 

  • RELEVANT WORK EXPERIENCE University at Buffalo | Graduate Research Assistant Sept 2018 – Feb 2019 
    • Technologies and languages – Java, Weka tools, Python, Peersim, Pandas, deeplearning4j
    • Worked on an independent research project with Prof. Dutta to optimize computation time of deep learning algorithms.
    • Analyzed 10+ research papers to pinpoint and implement GADGET (a gossip-based sub-gradient solver) algorithm for distributed models.
    • Built a Gossip-based horizontally distributed model in java using peersim simulator which performed 70 times better than the centralized perceptron while maintaining equivalent accuracy across a variety of datasets and network topologies. 
    • Wrote various supporting scripts in python for data segmentation, file format transformation, and to test results efficiently.
  • Froot Research | Data science Research Intern June 2018 – Aug 2018 
    • Technologies and languages- Python, Apache spark, Association rule mining, MySQL, Pandas, NumPy, WARMR, ILP
    • Assembled an algorithm to bucketize the integer data based on the distribution along the column to generate the frequent patterns.
    • Researched various methods and generated frequent patterns for multi-relational database using virtual join efficiently with PySpark.
    • Detected anomalies in generated association rules over certain time periods for discovering newer patterns. – Achievement: Created pip installable python package for bucketing algorithm which converts integer array to categorical buckets.
  • GS Lab | Software Development Intern July 2016 – June 2017 
    • Technologies and languages- RabbitMQ, mongoDB, Spark, Elasticsearch, Kibana, J48, Java, Python, Javascript, HTML, weka tools
    • Developed a generic system, which can be integrated into backend of any online platform for analysis and recommendations.
    • Created a pipeline that enabled vendors to create various visualizations to customize their platform, using user-specific click data.
    • Enhanced user experience by recommending products based on user data and attributes demanded by vendors with 80% accuracy.
    • Publications: Published paper Generic user event activity analysis and Predictionin IRJET Journal based on this project.

  • SKILLS Competencies: Python (Pandas, NumPy, Matplotlib, PyMySQL, TensorFlow, Scikit learn, Keras, MLLib, NLTK, Scipy), R, Machine learning, deep learning, MySQL, Natural language processing , Java, MATLAB, Statistics, Probability, A/B test, timeseries, Scala, C, C++, JS, mongoDB
  • Tools: Apache spark, Hadoop, R Studio, Tableau, Anaconda, Elasticsearch, Kibana, Weka, Android, Rabbitmq, GCP, Eclipse, Latex, AWS
  • Certificates: Oracle Database Associate, Data structures and algorithms, NLP with Python for ML, 

  • SELECTED PROJECTS Fraud transaction detection | Python, Pandas, NumPy, scikit-learn, Gridsearch, statsmodel, seaborn 
    • Imported the credit card transactions in JSON format into Pandas dataframe, handled missing values and performed EDA to get insights.
    • Performed feature engineering on the dataset having mixture of boolean, integer and categorical variables using statistical significance calculated by pearson’s correlation coefficient and Cramer’s V tests. Also, Handled unbalanced data using upsampling and downsampling.
    • Identified fraud transactions using gridsearch for random forest hyperparameters tuning to give more than 99% precision and 100% recall.
  • Anomaly detection in the website impressions | Python, Pandas, TBATS, time series, SARIMAX
    • Performed statistical and visual analysis on the unlabeled website data of a year for calculated CPM (cost per mille).
    • Compared various methods like SARIMA, SARIMAX, TBATS to notify abnormal behavior on a day for every website independently.
    • SARIMAX model in combination with statistical benchmark worked the best to filter out abnormalities based on the residual for every day.
  • Santander product recommendation | R, Association Rules, SVD, Hierarchical clustering, K-means, ggplot2, Apriori
    • Analyzed user data based on gender, nationalities, age, profession and drew out necessary conclusions for recommendations.
    • Performed low rank approximation using SVD on training data of 30M users and recommended account types for users based on their personal details for test data for 3M rows. Also tried to cluster data using different algorithm which failed due to disparity.
    • Generated various association rules between user details and account details using apriori in R to get recommendations.
  • Text analysis and classification using Hadoop and Spark | Python, ArticleAPI, Tweepy, Pandas, NLTK, D3.js, MRJob, BS4, MLLib, TF-IDF
    • Scrapped NYT articles and tweets using respective developer APIs using BeautifulSoup and pandas for various topics.
    • Leveraged hadoop MapReduce to extract key-factors using MRJob to clean data using NLTK in mapper and word-count in Reducer stage.
    • Created word cloud using D3.js for word-count and co-occurrence of top 10 words for the results obtained using another reducer for both article and tweet data and a pipeline in Pyspark which pre-processed the articles and vectorized them using TF-IDF.
    • Classified articles based on topics using logistic regression, Random Forest, Naive Bayes using PySpark to get almost 70% accuracies.
  • Data Scientists Analysis | R, regsubsets, glmnet, rf, ggplot2, caret, xgboost
    • Transformed data to separate out respondents like workers, students, etc, participated in the survey based on various parameters.
    • Assessed supervised algorithms like OLS, subset selections, shrinkage methods, random forest in R to predict the salaries using worker’s responses as training set. Achieved 85% accuracy with Random forest to conclude that it is the best method with the categorical variables.
    • Analyzed and pointed out different trends in the field of data science-based on these responses by using ggplot2 visualizations.