Data Science Professional specialising in developing and orchestrating ML workflows with Computer Science background.
My key areas of interest are Machine Learning, Deep Learning, GANs, CI, Deployments
Used RetinaNet (a one stage Detector), to first create image embeddings & then detect 50 classes. All classes were day-to-day based consumer goods like food, table, chair, dress, laptop, etc. Used Open Images V2 dataset for Training. Human annotation was provided along with data. using Google BigQuery and some pre-processing technique to create human annotated class-wise training data.
Technology Used: Keras (Tensorflow 1.14 Backend), RetinaNet, Google BigQuery, MLflow, DVC, Python, Open Images V2
The project was an internal POC to demonstrate how to boost production readiness during the development cycle itself. Used Micro-service strategy into breaking off every ML Pipeline component & transforming them into Container. Using Kubeflow Pipeline to orchestrate complete workflow on GKE. This overall resulted in high reproducibility, portability and scalability. The complete pipeline was automated and a Web app was used to initiate and monitor the workflow.
Technology Used: Python, Docker, Nvidia-Docker, GKE, GCR, Kubeflow, Kubeflow Pipeline, Gitlab
Build Top of on 'Orchestration of ML Workflows'. I developed generic design standards and protocols which was integrated into every project. It included 'Project Governance & Compliance', 'Git Compliance', 'Data Version Control', 'Pipeline Version Control'. All ML apps were production-ready during development cycle itself & resulted in saving about 60-70% time & resource during the production cycle. Data VCS enabled time-travel. Project Governance & Git Compliance enabled common development practice, effective collaboration. All this resulted in making each project far more optimize.
Technology Used: Python, Docker, GKE, GCR, Kubeflow, Kubeflow Pipeline, Gitlab, Cookiecutter, DVC, MLflow
A complete automated & generic platform to retrain any given model with a new batch of data. Based on CI principals. The Pipeline works as follows- Every week (or biweekly) human taggers tagged data which gets version control using DVC. Then data is combined and used to retrain the model. A benchmarking container was attached at the end, which checked the accuracy of the freshly trained model. In the case of poor performance, data time travel was used to change data.
Technology Used: Python, Docker, GKE, GCR, Kubeflow, DVC, MLflow, LabelImg
An object-oriented Deep learning python library covering the implementation of several well know the algorithms across Neural Network domain of ANN, CNN, RNN, GAN, etc. The library also includes Image processing used in Computer Vision.
Technology Used: Python, Keras (Tensorflow 1.14 Backend), OpenCV, MLflow, Github