Hello, I'm

Parag Gupta

DevOps Engineer • Cloud Architect • SRE

Dedicated to building resilient, scalable infrastructure and innovative DevOps solutions

About Me

DevOps Engineer and Cloud Architect with 9+ years of experience transforming infrastructure challenges into elegant, efficient solutions. I've architected systems that handle millions of transactions daily, including a 4000-core OpenStack cluster and AI-powered monitoring solutions that reduced incident response time by 40%.

My approach combines technical expertise with business awareness, having delivered over $500K in cost optimizations while improving system reliability. I'm passionate about creating infrastructure that empowers teams rather than constraining them, whether using Kubernetes, Terraform, or custom Python automation.

What drives me is seeing the real-world impact of well-designed systems – from accelerating developer workflows to ensuring critical services remain available for users worldwide.

I bring deep expertise across cloud platforms (AWS, GCP, Alibaba), container orchestration, infrastructure as code, and security implementation – all applied with a focus on practical outcomes that deliver business value.

Languages

Python, Shell scripting

Unix

CentOS, Ubuntu

Networking

KVM, Libvirt, QEMU, IP tables

Security

Security Baseline, Vulnerability Management, IAM

9+

Years of Experience

20+

Projects Completed

5+

Cloud Platforms

Key Achievements

Cost Optimization Impact

Saved approximately $500K through cloud cost optimizations, infrastructure improvements, and operational efficiencies across various roles.

Innovative Solutions

Received the "Rising Star Award" for developing Netbox, a custom tool to track resources and IP allocations in data centers, enhancing infrastructure management.

Publications

Authored technical blogs on Kubernetes, Django, and AWS CodeDeploy, providing insights on infrastructure automation and high availability setups.

AI Innovation Patent

Developed and filed a patent for an AI-powered AIOps system that revolutionizes root cause analysis by fusing multi-modal observability data with continuous learning capabilities.

Work Experience

Computer Scientist II / I

Adobe Systems
Dec 2021 - Present
  • Engineered an AI-powered Cortex metrics analyzer that automatically detected system anomalies and provided root cause analysis, reducing MTTR by 40%
  • Built an AI-powered platform that converts natural language instructions into executable Ansible playbooks, featuring NLP understanding, automatic playbook generation, interactive validation, and integration with existing Ansible infrastructure
  • Implemented a PII detection and redaction solution for application logs, enhancing security compliance while maintaining system observability
  • Created modular Terraform components that automated cloud infrastructure provisioning, reducing environment deployment time from days to hours
  • Migrated 26 services to Ethos (Adobe Cloud Foundation platform for managed Kubernetes), improving scalability and reducing operational overhead by 35%
  • Designed and implemented secure cloud deployments with comprehensive guardrails for Adobe Journey Optimizer
  • Built a Streamlit-based dashboard to audit configuration files, improving operational visibility by 65%
  • Automated Big Data component deployment using Python and Airflow, reducing manual intervention by 70%
  • Led tech stack optimization initiatives resulting in annual savings of $376,000 through resource consolidation
  • Developed disaster recovery solutions for revenue-impacting components, reducing potential downtime by 85%

Technologies Used

Python Streamlit AWS Azure Airflow Kubernetes

Site Reliability Engineer

Media.net
Feb 2021 - Dec 2021
  • Designed & Deployed a Reliable and Scalable Kubernetes Cluster(v1.22) on Data-Centre following the best industry standards and security measures allowing multiple teams to use it as a CAAS solution
  • Optimised & Benchmarked in-house java applications and integrated them with instrumentation monitoring in-order to become more objective
  • Revamp infrastructure monitoring to speed cloud migration, predict inventory needs, scale and gain real insights from data
  • Introduced Service Mesh in current tech-stack to simplify service discovery in complex networking environments, and streamline deploying applications

Technologies Used

Kubernetes Docker Prometheus Grafana Istio

Senior Engineer DataOps

Tokopedia
Oct 2019 - Feb 2021
  • Architected data engineering tech stack on GCP, enabling scalable analytics capabilities across the organization
  • Established highly available Airflow clusters supporting business intelligence requirements across multiple departments
  • Implemented cost optimization initiatives that reduced GCP expenses by 30-40%
  • Designed hybrid (GCP + Alibaba) network architecture with robust security controls including Vault, access management, and comprehensive logging
  • Configured Grafana and Prometheus monitoring for critical components, reducing alert response time by 60%
  • Reduced application downtime by automating Unit & Integration Test Suite and publishing detailed reports to Dev

Technologies Used

GCP Alibaba Cloud Jenkins Airflow Vault

DevOps Engineer

Paytm
Aug 2018 - Aug 2019
  • Managed hybrid infrastructure (hardware + cloud) for Wallet and Payment Gateway systems processing millions of transactions daily
  • Migrated QA environment from AWS to OpenStack, delivering 45% cost reduction while maintaining performance
  • Implemented high availability architecture that eliminated single points of failure across critical payment systems
  • Automated infrastructure provisioning and configuration management using Salt and Ansible, reducing deployment errors by 85%
  • Configured and optimized ELK stack to handle over 50TB of log data, improving troubleshooting efficiency by 70%

Technologies Used

AWS Docker Kubernetes ELK Salt

DevOps Engineer

TO THE NEW
May 2017 - Jul 2018
  • Designed application architecture and containerization strategy for GRAILS and Angular applications
  • Implemented CI/CD pipelines with Jenkins, reducing deployment time by 75%
  • Orchestrated containerized workloads using Kubernetes, improving resource utilization by 40%
  • Developed Technical Content for Alibaba Cloud Services
  • Managed Routing and Authentications using API Gateway
  • Automated Backups, Snapshots and DR recovery methods
  • Established centralized logging using EFK stack and performance monitoring with NewRelic, Pingdom, and PagerDuty
  • Created cost optimization dashboards with Grafana, identifying savings opportunities that reduced cloud spending by 25%

Technologies Used

AWS OpenStack Ansible Docker Jenkins

Technical Skills

Programming Languages

Python Python
Shell Scripting Shell Scripting
YAML/JSON YAML/JSON

Unix & Virtualization

QEMU QEMU
KVM/Libvirt KVM/Libvirt
CentOS/Ubuntu CentOS/Ubuntu

Cloud Platforms

AWS AWS
GCP Google Cloud
Azure Azure
Alibaba Cloud Alibaba Cloud
OpenStack OpenStack

DevOps Tools

Kubernetes Kubernetes
Docker Docker
Terraform Terraform
Ansible Ansible
SaltStack SaltStack
Istio Istio
ArgoCD ArgoCD

Security

Security Baseline Security Baseline
IAM Identity & Access Management
Vulnerability Management Vulnerability Management

Monitoring

Prometheus Prometheus
Grafana Grafana
ELK Stack ELK Stack
Zabbix Zabbix

Databases & Storage

MySQL MySQL
PostgreSQL PostgreSQL
Redis Redis
GlusterFS GlusterFS
Kafka Kafka

AI Tooling

Azure OpenAI Azure OpenAI
Streamlit Streamlit
Pandas Pandas
MCP MCP

Featured Projects

AIOps Patent

AIOps Root Cause Analysis (Patent Pending)

AI-powered system that fuses metrics, logs, and traces into unified structures for intelligent root cause analysis with continuous learning capabilities.

AI/ML Graph Neural Networks Bayesian Networks Patent
Kubernetes HA Cluster

Kubernetes HA Cluster

High-availability Kubernetes cluster setup on Data Center with multiple master nodes, custom monitoring, and advanced security.

Kubernetes Docker HAProxy etcd
IaC

Infrastructure as Code

Reusable Terraform modules for multi-cloud deployments with automated testing and deployment pipelines.

Terraform AWS Azure GCP
Monitoring Solution

Monitoring Solution

Comprehensive monitoring and alerting system for microservices with custom exporters and intelligent alerts.

Prometheus Grafana Alertmanager Kubernetes
Cortex VM Analyzer

Cortex VM Analyzer

AI-powered VM metrics analysis application that fetches data from Cortex with interactive visualizations.

Python Streamlit Azure OpenAI Docker
Ansible Automation

Ansible Automation Assistant

Natural language to Ansible playbook conversion platform with interactive editing and execution features.

Python Streamlit OpenAI Ansible
Cost Optimization

Cloud Cost Optimization

Tools and strategies for cloud cost management with automatic resource scheduling and right-sizing.

AWS CloudWatch Lambda Python

Get In Touch

×