HugOps During Downtime: Building Empathetic Teams

Culture

Aimee Pearcy Technical Writer

Published
March 29, 2022

Reading time
6 minutes

What is HugOps?

While DevOps focuses on software, HugOps focuses on the people behind the software.

HugOps is a way to show empathy and appreciation for the real people who are involved in building, shipping, and running software.

It’s a way to acknowledge and celebrate those – the Service Reliability Engineers (SREs), SysAdmins, Engineers, and Support Staff – who are working tirelessly behind the scenes to keep the services that we rely on running smoothly.

Why Is Empathy Important for Improving DevOps?

At its core, HugOps is about empathy. Empathy and community care are both vital in every aspect of our lives – and software engineering is no exception.

The behind-the-scenes of software development and maintenance are more complex and stressful than many people realize. Meanwhile, companies are more dependent on uptime than ever before.

The financial costs – and the social repercussions – for downtime are rising each year. For global platforms like Facebook, which currently has almost three billion users relying on it for communication, downtime can have huge repercussions.

Sharing the burden with a community can help the team who is responsible for fixing the outage to manage this stress better.

How Can I Improve Organizational Empathy & HugOps?

hugops

When things go wrong, it can be difficult to know what to do to make it better – especially if you’re also facing high levels of stress.

No matter how much you prepare your team in advance, it can seem like all of that fuzzy warmth and empathy you spent months building up with your team members can evaporate in a matter of seconds.

Thankfully, there are some steps you can take in advance to improve organizational empathy and promote HugOps.

1. Learn to Communicate

Operations can be a highly stressful environment that is filled with conflicts, so it is vital that team members are not punished for making mistakes.

Four components of better communication include observing without judging, expressing feelings, expressing and clarifying needs, and expressing specific requests based on these feelings and needs.

2. Collaborate

Rather than piling extra stress on one another and increasing the pressure to get things done, the technical community should work to share their burdens.

Increasing the social ties and relationships between team members can create a sense of shared responsibility. Team members must not point the finger and blame others when things go wrong. Instead, each incident should be treated as a learning opportunity, and developers should be encouraged to support operations teams when they need it.

If you notice that one staff member has taken a lot of the burden, or has been involved in too many on-call incidents in a single shift, then you should step back and re-evaluate. It is also important to make sure that each staff member has the correct training so that they know how to respond to emergency incidents.

3. Align on Common Values

As a rule, the collaboration between multiple departments goes much more smoothly if everyone is trying to achieve the same thing. It means that more time can be spent prioritizing tasks, instead of arguing about fundamental issues.

Making sure that your team is aligned on common values and understands why they are doing something, and how it fits in with the overall project, is also vital when it comes to motivating team members to complete work to a high standard.

4. Mind Your Medium

Given the rise in remote working over the past couple of years, we’re relying on software such as Slack, Discord, and Microsoft Teams more than ever when it comes to communicating with our fellow team members.

Video calls can be inconvenient, so it can be tempting to stick to text-based communication where possible. But this can make the tone of a conversation difficult to get across, and raises questions such as: “is my team mad at me about this? Is this my fault?”

If in doubt, you should over-communicate. If there’s still room for ambiguity, maybe you should hop on a video call.

5. Build Inclusive Relationships

Remember, your team members are more than little icons on your screen – they are real people.

One of the key features of a great team is that you all like each other. There are multiple ways to facilitate this, and the most successful way largely depends on the people involved in your team. It could include assigning them a ‘buddy’ from a different team when they first join the company, or even organizing an off-site social to encourage people to hang out together away from their laptops.

You should also pay attention to your team members’ communication styles. If you notice that someone remains silent in large group meetings, but is overflowing with ideas in one-on-one meetings, you’ll want to make sure you give them a chance to shine.

6. Measure and Improve

It is important that you’re carefully measuring each team member’s workload, and that you work to constantly measure the outcomes of each situation. If you identify that improvements need to be made, then they should be implemented as soon as possible.

When these improvements have been identified, team members should be given the time and tools they need to improve the services and the incident response. Ask for feedback regularly, and make sure you act on it.

7. Prevent incident burnout

Whenever an incident occurs, the team should take the time to fully analyze it and figure out what led to it. This can help easily prevent such incidents in the future or to recover from them quickly if they do occur.

If these steps are not taken, it can lead to teams repeatedly making the same mistakes and fixing the same problems over and over again. This can lead to burnout, which can quickly derail your entire team and hinder any progress.

8. Shift the responsibility left

’Shifting the responsibility left’ means not blaming individuals but blaming the systems and processes in place, by building manageable, reliable software.

Engineers should work to follow secure coding practices and apply modern principles to ensure the service is robust.

This can help teams make the shift away from reacting to incidents – which is often stressful and intense – and towards better monitoring and proactive reliability in the first place.

HugOps in Practice

Dear intern:

We've all screwed something up,
So welcome to the club.

"That Sinking Feeling (The #HugOps Song)" pic.twitter.com/KmwgSfx33h
— Forrest Brazeal (@forrestbrazeal) July 7, 2021

The role of an Operations Engineer is a stressful one. Team members do essential work day in and day out, and it typically goes unnoticed – until the moment something goes wrong.

It’s important to remember that nobody makes mistakes on purpose and that getting things wrong is an essential part of the learning curve. So the next time someone in your team messes up, cut them some slack and show empathy. After all, the system and procedures are ultimately to blame than individuals.

Here are some examples of #HugOps in practice from people who understand the effort that goes into running these services and keeping them going:

And #HugOps to @Shopify pic.twitter.com/lGmP9L0hZy
— Chris Short (He/Him/His) 🇺🇸🇺🇦 (@ChrisShort) March 22, 2022