How To Build High-Performing Engineering Teams

Engineering Teams

Charity Majors CTO, Honeycomb

Published
June 18, 2022

Reading time
10 minutes

There is a distinctive gap opening up between the top engineers and the rest.

The elite engineers represent the top few percent of engineering teams and are making incredible gains year on year in velocity, reliability, and human compatibility, whilst the bottom 50% are losing ground.

The loss has nothing to do with engineering ability.

Take an engineer out of an elite-performing team and place them in the bottom 50%, and they become subpar too; take an engineer out of a mediocre team and embed them in an elite team, and they are pulling their weight within the year.

So how do you build these high-performing teams?

The most important things to focus on are observability and delivery time. We’ll talk about both of these and more.

The well-being of an engineering team

high performing engineering teams well being

How many engineers have ever said something like or heard something like this:

‘I will do whatever it takes to make our customers happy, even at the expense of my sleep, mental health, relationships, and quality of life. That’s the job’

We take a lot of pride in our work, we take a lot of pride in doing things well, and often especially in our 20s, this means sacrificing our well-being.

If you move on to management the mindset changes:

‘Sometimes you just have to sacrifice our team to make your users happy.‘

This is a big lie.

I’ve never viewed a happy engineering team and miserable users long-term or vice versa.

Instead, great engineers are those who are allowed the space to be creative and engaged - they tend to make better products, and have better user satisfaction.

All in all, it is more fun to be a good engineer than a bad one, more fun to do a good job for your users, more fun to delight your users, and more fun to do your work when you’re awake and engaged because you feel like doing work; because you’re doing interesting stuff.

This may be obvious but ultimately, what all this boils down to is you want to be on a high-performing team. However, a lot of us don’t stop to think of our work in these terms.

We get so down in the muck day by day and often do not analyze the overall health of the team. We need to learn how to step back if we truly care about the well-being of our engineering teams.

You Cannot Hire Your Way Out

elite engineers

When teams are underperforming, the first tool they reach for is hiring and you cannot hire your way out of this problem if you have a team that isn’t performing well.

If you add more engineers to your team that does not make your team better. It usually makes your team worse because they begin to add more points of contention. For example, more engineers are trying to schedule the same amount of work and the same amount of resources.

This issue has a lot to do with tools, telemetry, and keeping up with what’s new in tech. This is not to say that the newest is always better but it is imperative to keep in mind that things are constantly changing.

With teams becoming less and less effective over time because they’re constantly accumulating surface area, bugs, responsibilities, and things you have to be on call for - productivity is constantly getting worse.

The way to get ahead of this is to look for step functions and opportunities to make your team 10%, 20%, and 30% more efficient regularly to keep up and stay ahead.

How Well Does Your Team Perform?

This is not the same thing as asking how good you are at engineering.

High-performing teams are the ones that spend their time doing interesting things that move the business materially forward whereas lower-performing teams spend their time doing work that they have to do to get to the stuff that they want to do to become high performing.

The best way to know whether your team is high performing is through the DORA metrics.

Questions arise such as:

How often do you deploy?
How long does it take?
How often does it fail?
How long does it take to recover?
How often does your team get alerted outside of work hours? (this correlates closely to teams’ burnout)

There is a huge gap between the ‘elite teams’ and the rest of us. It is the difference between deploying in under an hour and a day, the difference between restoring service in less than one hour or more than six months.

If we look at this over the years, there are more and more teams that are muscling their way into the elite yet most of us are losing ground.

It pays off to be on a high-performing team because of the competitive advantage for teams who get to improve and iterate in the way they build together many times a day.

How Do We Build These High Performing Teams?

There is a wide gap between the elite engineers and the other 75%. So how do we we build higher performing teams?

building high performing teams, source: codefresh.io

Source: codefresh.io

Do Not Simply Hire the Smartest People You Can

’You want to hire everyone you can from MIT and Stanford, and aggressively poach as much talent as you can.‘

That’s how most recruiting organizations seem to approach hiring, which is backward because great engineers don’t make great teams.

If you hire the best engineer in the world to come and join you but it takes you three days to ship a change, it’s still going to take them three days to ship a change.

It’s the other way around - great teams are what make great engineers.

The thing that makes a google engineer great is typically not their skills going in but the fact that they get to work on one of the world’s largest, most complicated, and interesting systems for a few years.

Think of the amount you learn when you’re on a good team - you learn the same defaults, best practices, and skills that have taken millions of engineering hours to evolve and arrive at.

Productivity tends to rise or fall to match that of the team that you join. This is not only because of your knowledge of algorithms and data structures, but also the socio-technical system that you participate in.

Acknowledge the Fundamental Attribution Error

’Fundamental Attribution Error’ is a sociological term used to describe how much our environment and system contribute to our success.

People tend to attribute most of the results to their individual capacity and very little of it to the system around us. People would estimate that it is 80% them and 20% system when it is the reverse. 80 or 90% system and 10 or 20% them.

The smallest unit of software delivery is a team and therefore, technical leadership - whether you are an IC, individual contributor, senior engineer, manager, or director, you must pay a lot of attention to all the feedback loops that are at the heart of the sociotechnical system.

Take Note of CI/CD

Feedback loops such as code review and feedback loops are critical.

CI/CD, which is probably the single largest most all-encompassing most powerful feedback loop of all, are not complicated systems.

They are complex systems because their outputs feed back into themselves therefore, you know making a change upstream tends to have a lot of ripple effects and tends to percolate down.

Instead of improving something linearly, you can make exponential improvements, if you find the right lever.

It could be argued that fear around deployments is the single largest source of technical debt in almost every organization.

Fear around deployments, nervousness, or just the sheer amount of time that gets spent delivering software and dealing with releases can be crippling.

‘Shipping is the heartbeat of your company. Shipping new code should be as small as ordinary and as regular as a heartbeat.‘

Put simply, shipping underpins your entire organization, it’s regular and it’s not something you have to think about, it happens automatically and if you do nothing, deployment goes down and you have to do something to stop your heart from beating.

CI/CD is how we get there.

build high performing teams

If you were asked - do you do CI/CD? You’re probably going to say yes absolutely we do.

However, if we start probing more deeply, the response quickly becomes ‘we have definitely run CI’ when you should not count CI as only the appetizer.

The entire point of CI is to clear the path for you to do CD and by CD, we do not mean Continuous Delivery - Continuous Deployment is what will change your life.

Continuous deployment holds the feedback loop that has the power to transform your team. CI/CD is the beating heart of your company and the core feedback loop of software delivery.

The speed coverage and cadence of your CI/CD pipeline will set the bar for how high-performing your team can be more than any other variable and the job of engineering leadership is to tune this feedback loop.

Build Good Feedback Loops

You’ll notice that in the feedback loop example below, there’s only one change set per engineer build, you’re building an artifact with only your changes nobody else’s. There’s no manual QA, it’s almost as important as it is predictable.

Engineer merges a change to master
Any user-visible changes hidden safely behind a feature flag
Which triggers CI to run tests and build an artifact
The artifact is then deployed or canarified to production (for deployment to staging and later promoted to production).

This is the way to build software and great teams.

Practice Observability Driven Development

Observability Driven Development

If you practice observability-driven development, meaning you instrument your code as you go, automatically deploy it and look at it, you have to view this through the lens of the instrumentation that you just wrote.

You should ask yourself ‘Is it doing what I want to do and does anything else look weird?‘

If you simply look at it, engineers will find the majority of all bugs and fix them so much faster. This is key and often people miss this step.

The cost of finding and fixing bugs goes up exponentially from the moment that you write them by having a predictable interval. This is what lets you hook into the body’s reward system so you are aware when you merge your code. Your mind won’t feel like you’re done until you’ve looked at it in production.

Quality of Life

You must remember that high-performing teams get to spend most of their time doing interesting things and lower-performing teams do not. This is a quality of life issue.

Software engineering has the potential to be a creative, exciting, and fulfilling career where you spend 4 maybe 5 hours a day solving puzzles.

It’s not realistic to expect engineers to give more than 4-5 hours a day of real intense engineering consistently. However, it is critical to ensure that those 4 or 5 hours a day move the business forward.

Output is what matters and as Dan Pink said in his research, what we care about is ‘autonomy mastery’ and this is a quality of life issue meaning it’s an ethical issue.

How much is your fear of continuing this deployment costing you?

As a rule of thumb, if it takes X number of engineers to build and run your software system with a 15-minute delivery interval and takes you hours to deploy, it’ll take you twice as many people. If it takes you days to deploy, it will take you twice as many and weeks to double it again.

The cost of supporting the software goes up not linearly but exponentially and if you’re clamoring for headcount this can be a challenge.

If you’re a manager who has too much to do but not enough cycles, what would you do with twice as many engineering cycles? What would you build? A great thing to do would be to go fix your build pipeline.

The CTO has one job and it is to ensure that every engineer gets to spend as much of their productive time as possible on the highest value work possible.

These small minutes to hours and these small changes add up and multiply so it takes time as it is an investment but we are too busy to improve.