SUMMARY
Principal DevOps Engineer and Site Reliability Engineer with 20+ years of experience building and operating cloud infrastructure, high-availability production systems, and mission-critical platforms across enterprise, SaaS, and academic environments. Deep expertise in incident response, on-call operations, CI/CD, infrastructure as code, disaster recovery, and monitoring and observability. Proven success reducing MTTR, improving system reliability, and delivering measurable business impact, including $400k+ annual cost savings. Hands-on experience supporting AI/ML platforms, GPU infrastructure, and LLM pipelines.
SKILLS
Cloud & Virtualization
AWS, Azure, OpenStack, Proxmox, VMware ESXi
Containers & CI/CD
Docker, Kubernetes, GitLab CI/CD, Jenkins
Infrastructure as Code (IaC)
Terraform, Ansible
Operating Systems
Linux (Red Hat, CentOS, Ubuntu, SUSE), Windows Server
Languages & Scripting
Python, Bash, PowerShell, PHP
AI & Data Systems
GPU Infrastructure (CUDA), LLM Architecture, AI Orchestration, Prompt Engineering, AI Data Ingestion, Data Modeling
Reliability & Observability
SRE, Incident Response, On-call, Zabbix, Nagios, PRTG
Networking & Identity
DNS, DHCP, LDAP, Active Directory, RADIUS
Databases
MongoDB, MySQL, MariaDB
Platform Engineering & Automation
CI/CD Pipelines, Infrastructure as Code, Automation, Production Systems
WORK EXPERIENCE
Downtown DevOps — Philadelphia, PA
IT services and consulting firm providing SRE, CI/CD, and infrastructure leadership across multiple organizations.
Principal DevOps Engineer / SRE (Founder, Contract)
03/2019 – Present
- Provided principal-level SRE and platform engineering leadership across multiple client organizations, serving as escalation point for complex, multi-system production failures.
- Spearheaded incident response for critical outages, ensuring seamless service restoration and long-term improvements.
- Led cross-functional teams to resolve complex issues, resulting in enhanced system reliability and performance across multiple organizations.
- Designed, operated, and maintained Proxmox clusters, GitLab, email systems, NAS platforms, and mixed Windows/Linux environments.
Tamman Inc — Philadelphia, PA
Digital accessibility solutions company (~$4.6M revenue, 50+ employees).
Senior DevOps Engineer / Technical Lead (Contract, Part-time)
09/2016 – 04/2025
- Developed GPU-optimized servers for AI experimentation using CUDA, empowering early-stage AI and ML workloads.
- Implemented CI/CD pipelines in GitLab for CUDA-based AI projects, streamlining development and deployment workflows.
- Built production, staging, demo, and development environments using Azure, Bitbucket, MongoDB Cloud, and Proxmox.
ReminderMedia — King of Prussia, PA
Relationship marketing technology company (300+ employees, ~$73M revenue).
Senior DevOps Engineer / Technical Lead (Contract)
12/2016 – 12/2024
- Optimized production systems by rewriting PHP code and tuning server performance, reducing processing time by over 2 hours per run and saving $400k+ annually.
- Developed fuzzy-logic matching software in Python to correlate identities across disparate data sources, generating 500,000+ new CRM leads.
- Built staging and QA environments that significantly reduced outages caused by flawed deployments and system patches.
- Automated PXE-based system provisioning in VMware ESXi environments using BIND, DHCP, GitLab, Python, and Bash.
- Implemented monitoring and data-integrity detection algorithms to proactively identify issues before customer impact.
Comcast (via NCS Technologies) — PA / NJ
Global media and technology company (180k+ employees).
SRE Manager / Principal Application Support Engineer
05/2014 – 12/2016
- Led enterprise-wide incident communications during major national outages, coordinating engineers, managers, and executives to minimize customer impact.
- Designed and built Bella Vista, an internal application that dramatically improved triage, analysis, and resolution of distributed system failures affecting 1M+ customers.
- Architected and operated real-time reliability dashboards for the DNC Convention and NBC Rio Olympics, providing live system health and content availability metrics during global events.
Vincent Communications — Woodbury, NJ
Founder / Principal Software Engineer / Systems Administrator
11/2010 – 05/2014
- Automated UI layout and content generation via database-driven design using Perl, reducing development time by over 50%.
- Installed, configured, and maintained Red Hat Enterprise Linux and Windows Server environments.
Princeton University — Princeton, NJ
Systems Administrator / IT Support
10/1996 – 11/2010
- Modernized the Mathematics Department infrastructure by migrating legacy Sun systems to Dell hardware running Red Hat Linux, improving performance and reliability.
- Replaced legacy file servers with a NetApp appliance, enhancing NFS throughput and simplification of administration.
- Designed and implemented an enterprise trouble-ticketing system, reducing response times by 90% through automation and paging integration.
Additional Keywords
- Cloud Infrastructure
- Platform Engineering
- Site Reliability Engineering
- Infrastructure as Code
- Disaster Recovery
- Monitoring and Observability
- CI/CD Pipelines
- DevOps
- Containerization
- Automation
- Production Systems
- AI/ML Infrastructure
- GPU Computing
- LLM Platforms
- Cloud Migration
- System Administration
- Network Security
- Performance Tuning