Ask me how I can help you grow and continuously improve a world-class resilience engineering practice fueled by deep learning from incidents and chaos engineering. I love cultivating reliability in complex systems: resilience in software teams & robustness in their services. Half software engineering and half relationship building. Demonstrated ability to identify & deliver systemic improvements & culture change.
Principal Incident Analyst
Indeed
to
Impact.
Technical leadership of the team celebrated as an exemplar in learning from incidents at LFI Conf 2023.
Learning From Incidents .
Developed, practiced, and mentored teammates in cognitive interviewing, generating and synthesizing themes, facilitating productive retrospectives, and narrative composition and graphic representation.
We produced 30 internal incident reports digging deep into social and technical details all over the software ecosystem. Most reports featured interactions between systems managed by multiple teams. Many reports involved challenges between internal platforms, vendor platforms, or partner platforms.
We facilitated learning review meetings for each of the internal reports. Most of those meetings were attended by a diverse cross-section of the company including many people who had not been directly responders.
Change Management.
Developed a change management plan, secured funding, identified and hired the vendor for incident response training for 30-40% of the company including all of Research and Development and cross-functional partners in Security, Legal, Operations, and Customer Support.
Crowd-sourced a list of over 50 migration projects and thereby catalyzed the creation of a tiger team and program to manage migrations for platforms, languages, and library upgrades.
Collaboration & Mentorship.
Coordinated messaging with a company-wide training program transforming the product management practice at the company to improve alignment between reliability and product.
Mentored 2 staff engineers and 2 senior engineers on my own team, led the community of practice of principal site reliability engineers.
Advised many technical fellows, senior managers, directors, and some VPs in the nuances of our complex systems and especially through the sociotechnical lens.
Principal SRE—Resilience Engineering
Procore
to
Impact.
Founded the Incident Management & Resilience Engineering team. Defined mission, vision, and roadmap. Wrote an effective job description. Successfully recruited and hired a Principal Resilience Engineer to join the team.
Collaboration.
Developed relationships widely across the organization including diverse teams: Learning & Delivery, Product & Technology Excellence, Application Infrastructure, Software Delivery, Architecture Advisory Group, Security, Customer Success, Mobile Development and Product.
Learning From Incidents .
Engaged Adaptive Capacity Labs to train a cohort of a dozen people across the organization. Developed the business case. Identified participants. Supported logistics and communication. Supported contract negotiations.
Led or contributed to six learning reports surfacing a dozen themes. Organized and iterated on an information architecture for themes and reports.
Senior Software Engineer—Reliability
New Relic
to
Impact.
Pioneered chaos engineering practices to build skills and confidence across teams. Identified enabling infrastructure and features to allow safer experimentation in staging and production—e.g. traffic shaping, traffic mirroring, and advanced canary deployments. Balanced improvement of code & components and individual skills & teamwork.
Mentorship.
Actively coached across skill levels: from a college intern, to lead engineers including security, tech support, devops, and software devs.
Apprenticed myself to a shihan in programming, mentoring, and community organizing to study closely how he supports and encourages growth and independence.
Cross-team Coordination.
Led chaos engineering experiments (including the first in production), from developing experience across the organization, negotiating organizational buy-in, and synthesizing cross-team expertise into concrete experiment design.
Resolved gnarly, entangled design problems, integrating tooling with production systems and requiring buy-in from multiple teams.
Thought Leadership.
Developed professional relationships with leaders in learning from incidents and chaos engineering. Participated in SNAFUcatchers.com. Supported learning from incidents workshop at Velocity San José 2019.
Wrote extensively in internal blogs. Actively participated in informal Slack conversations. Gave formal presentations and lightning talks in many communities of practice: Reliability, Chaos Engineering, Kubernetes, Ruby, Javascript, and others.
Code, Runbooks, Process, and Training.
Incident chat ops: for example, used TDD to bootstrap test scaffolding and implement a micro-rules engine that automated escalation and paging during incidents.
Social engineering and evangelism for changes to policy around incident tracking, metrics collection and reporting, team health, and incident retros.
Helped write and organize Service Level Objectives documentation and tooling.
Principal Software Engineer
Itron, Inc. (was Comverge)
to
Impact.
Applied diplomacy and role-modeling to streamline communications and increase collaboration between QA, Development, IT, and Solutions Delivery.
Increased productivity for 20 developers and 4 technicians. Reduced build times and deploy times while also increasing reliability.
Lead adoption of Docker throughout the organization.
Leadership.
Fostered deep respect and trust from my team. The director received repeated requests to remain on my team and several senior devs expressed specific interest in joining me.
Nurtured emotional safety and courage. Established weekly retros and bi-weekly one-on-ones focused on social-emotional parts of software development. Spoke hard truths and admissions of failure. Gave specific and concrete applause frequently. Other managers followed my lead.
Mentorship.
Encouraged professional development in all levels of expertise: high-school intern, junior dev fresh out of a code school, intermediate dev transitioning from QA to engineering, and two senior developers. One exceeded her own goals in a well-received talk at Rocky Mountain Ruby 2015.
DevOps.
Reduced build times from 8 hours to 30 minutes by expanding the jenkins cluster from 20 to 70 VMs, and finding and fixing the worst-offenders in the test suite.
Improved whole software delivery life-cycle. Instituted standardized deploy process, team-wide runbooks, and semi-automated deploys that increased repeatability and reduced time required from one month to a few hours.
Social and technical engineering of Docker adoption. Enabled early adopters to experiment. Deployed first containers into CI. Created first containerized tests and deploys. Designed and evangelized effective patterns. Planned, negotiated, persuaded, and executed deployment in stages from dev, to test, QA, UAT, and production.
Senior Software Engineer
Mobilecause
to
Impact.
Fostered a culture of continuous improvement through honest and courageous communication, by speaking hard truths, and applauding when others did the same.
Designed and implemented customizable & embeddable donation form that became the company's flagship feature.
Emergent Design.
Refactored all outbound messaging into a delivery service while maintaining existing message flows. This abstraction only two weeks later enabled us to improve throughput just in time for a customer's campaign messaging over 330,000 numbers.
Created a foundation for modularized html, css, and javascript enabling migration UI from home-grown CSS-soup to Bootstrap.
Refactored one particularly messy controller with just a
handful of method extractions. A few weeks later those
changes enabled a colleague to clarify the routing and
models and complete the cleanup without me.
Scale.
Profiled outbound messaging performance and throughput.
Identified garbage collection and IO as primary
bottlenecks.
Threaded message delivery for an immediate twelvefold increase in SMS message throughput. Introduced retries and fallbacks to ensure message logging and persistence and expand the ability to identify and diagnose failures.
Enabled message workers to run under JRuby as a strategic
move to open other performance opportunities via JVM based
tools and languages.
Front-end Engineering.
Streamlined unit testing of javascript front end code
using Jasmine.
Created dynamic and thoroughly customizable fundraising
thermometer and pledge wall for use in high-profile gala
events. Produced the SVG graphic for dynamic display and
employed Ember.js for updates.
Methodology.
Full-stack agile methodologies including continuous
integration, full-time pair-programming, retrospectives, etc.
Software Engineer
Pivotal Labs
to
Impact.
Initiated an informal usability analysis for Groupon's
editorial tools via direct observation of editors and deal
creators.
Consequently created dramatic improvements in deal
creation and editorial workflows. Streamlined
communications between sales and editorial teams by
integrating with Salesforce.
Learning.
Mastered Ruby within a month. Immediately effective
maintaining and updating existing Rails application for
Groupon Merchant Center team.
Scale.
Refactored Groupon's merchant analytics into an
independent service-oriented architecture. Improved
largest deal analytics reports from a query time of 7
minutes down to 15 seconds.
Impact.
Three students gave unsolicited feedback that my teaching
changed their lives.
Increased typical class enrollment by 50%. Inspired many
students to join Boulder Aikikai.
Fifth Degree Black Belt
Boulder Aikikai
January 2025
Bachelor's of Environmental Design, emphasizing computer methods in design
College of Architecture and Planning, University of Colorado at Boulder