abc

Site Reliability Engineer

LinkedIn | GitHub

About

Highly accomplished Site Reliability Engineer with 5 years of experience specializing in building and maintaining scalable, highly available, and fault-tolerant systems. Proven expertise in automating operational processes, optimizing system performance, and leading incident response to ensure robust infrastructure. Adept at leveraging cloud technologies and modern DevOps practices to drive efficiency and enhance system reliability.

Work Experience

Site Reliability Engineer

Leading Tech Solutions Inc.

Jan 2019 - Jan 2024

Engineered and maintained critical production systems, ensuring high availability and optimal performance for a rapidly growing SaaS platform serving over 1 million users.

  • Automated key operational tasks using Python and Ansible, reducing manual intervention by 40% and improving deployment efficiency by 25%.
  • Improved system uptime from 99.5% to 99.9% by implementing proactive monitoring (Prometheus, Grafana) and robust alert mechanisms, significantly reducing critical incidents.
  • Led incident response and post-mortem analysis for major outages, decreasing Mean Time To Resolution (MTTR) by 30% through root cause identification and preventative measures.
  • Optimized cloud infrastructure costs on AWS by 15% through rightsizing instances, implementing auto-scaling policies, and managing resource allocation effectively.
  • Developed and maintained CI/CD pipelines using Jenkins and GitLab CI, enabling faster and more reliable software releases with a 99% success rate.

Education

Computer Science

University of Technology

3.8/4.0

Sep 2015 - May 2019

Courses

  • Distributed Systems
  • Operating Systems
  • Network Security
  • Algorithms and Data Structures

Certificates

AWS Certified Solutions Architect – Associate

Amazon Web Services

Mar 2022

Certified Kubernetes Administrator (CKA)

Cloud Native Computing Foundation (CNCF)

Aug 2021

Projects

Automated Deployment Pipeline for Microservices

Jan 2022 - Jun 2022

Designed and implemented a fully automated CI/CD pipeline for deploying a suite of microservices to a Kubernetes cluster.

Languages

English

Skills

Programming Languages

  • Python
  • Go
  • Bash
  • Java

Cloud Platforms

  • AWS
  • Azure
  • Google Cloud Platform (GCP)

Containerization & Orchestration

  • Docker
  • Kubernetes
  • Helm

CI/CD & DevOps Tools

  • Jenkins
  • GitLab CI
  • GitHub Actions
  • Terraform
  • Ansible
  • Chef
  • Puppet

Monitoring & Logging

  • Prometheus
  • Grafana
  • Datadog
  • ELK Stack (Elasticsearch, Logstash, Kibana)
  • Splunk
  • PagerDuty

Operating Systems & Networking

  • Linux (Ubuntu, CentOS, RHEL)
  • Networking (TCP/IP, DNS, HTTP)
  • Load Balancing
  • Firewalls

Databases

  • PostgreSQL
  • MySQL
  • MongoDB
  • Redis
  • Cassandra

Version Control

  • Git
  • GitHub
  • GitLab
  • Bitbucket

Methodologies

  • Agile
  • Scrum
  • ITIL
  • Site Reliability Engineering (SRE)
  • DevOps