About Me

Ni Yao - Site Reliability Engineer

Site Reliability Engineer

Deliver reliability for applications through proactive SRE practices in architecture design, resilience engineering, observability, automated recovery, and CI/CD.

Experience

JP Morgan Chase

Senior Site Reliability Engineer | Vice President Feb 2023 - Present
  • Engineering research, consultation, and implementation of resiliency best practices for Kubernetes, Kafka, databases, certificate automation, and CI/CD across a half dozen Payments services.
  • Ownership of client relationships and delivery of resiliency solutions.
  • Implemented observability for B2B Payments applications with daily cash flow exceeding 100 billion USD, reducing mean time to diagnosis by 10x.
  • Developed a production automation framework on AWS utilizing Python, Lambda, Step Functions, and API Gateway to enable automated recovery of incidents, toil reduction of manual-decision executions, and automation of Disaster Recovery / Sustained Resiliency events.

JP Morgan Chase

Site Reliability Engineer | Senior Associate Dec 2021 – Feb 2023
  • Engineering research, consultation, and implementation of resiliency best practices for Kubernetes and certificate automation for a high-volume payment application.
  • Built Grafana dashboards for performance and application monitoring.
  • Conducted incident RCAs for applications processing 100 billion USD daily.

Con Edison

Site Reliability Engineer | Associate Feb 2019 – Dec 2021
  • Architected a high availability infrastructure for an MFT application handling 80% of company-wide data transactions with external vendors.
  • Reduced MTTR (Mean Time to Resolve) by 80% through implementation of PagerDuty and runbooks to standardize resolutions for known issues.
  • Automated manual monthly tasks with toil reduction savings of 0.25 FTE.
  • Supported four critical applications with an under 1-hour RTO.
  • Upgraded SFTP security cipher requiring client coordination with over 600 external vendors/companies.

Con Edison

Integration Analyst | Analyst Jul 2017 - Feb 2019
  • Developed 15 features for backend stability and front-end point of entry.
  • Developed fixes for 6 often recurring bugs, reducing downtime by 75%.
  • Supported a critical application with an under 1-hour RTO.

Skills

Programming Languages

  • Python
  • Java
  • C++
  • SQL

Core Skills

  • Infrastructure Resiliency
  • Performance Monitoring
  • Observability Dashboarding
  • Monitoring and Alerting
  • Production Automation
  • Kubernetes Containerization
  • Pipeline Deployment
  • RCA and Post-Mortem
  • Cyber Security (SSL)
  • Git
  • Agile Development

Tools & Technologies

  • AWS Cloud Services
  • Datadog / Dynatrace / Splunk
  • Grafana / Prometheus
  • Bitbucket / GitHub
  • Terraform
  • Jenkins / Spinnaker

Education

The City College of New York, The Grove School of Engineering

B.A. in Computer Engineering, 2016

Certifications

AWS Solutions Architect Associate

Contact

Location: New York, NY

Blog

Check out my latest articles on Site Reliability Engineering, Cloud Architecture, and Technical Leadership.