‘Tis the season… for holiday on-call. Christmas is right around the corner and while most of your team will hopefully take the opportunity to disconnect from work and reconnect with loved ones, one or two of your folks will be stuck awkwardly carting their laptops around from festivity to festivity.
Holiday on-call is rough. This is true even if production fails to burn down to the ground and those laptops stay firmly shut. Just knowing that you need to be available to log in within 15 minutes of a page and then potentially sacrifice the next five hours to firefighting is an unpleasant way to spend the holiday.
Holiday on-call has been a necessary evil every place I’ve worked. There are things we can do as managers and teams, though, to make it better.
Making it better
Covering holiday on-call yourself (assuming you are the manager) is certainly a nice thing to do if you find it to be sustainable. It’s also very simple to implement: just put yourself in the schedule. I used to cover Christmas for my team without much thought because I don’t observe the holiday. This changed after I had kids, though; juggling production fires with childcare is a recipe for pain. Still, asking one of your engineers to cover holiday on-call for a historically unstable system can feel like that episode of Next Gen where Troy tells LaForge to get in the Jefferies tube.
So what can we do to make things better?
Set things up so that your on-call engineer can only be paged for truly critical events. If your business quiets down over the holiday period, make sure that fact is reflected in on-call. For example, if you use an incident severity tiering system then you may decide, in collaboration with neighboring teams, that the on-call engineer is expected to respond to Sev1 and Sev2 incidents but that Sev3’s will wait until after the holiday.
The “in collaboration with neighboring teams” bit is important. If your on-call engineer is stuck, they need to have access to the people who can help them get unstuck. This is tricky during the holidays. Make sure they are able to page the on-call engineers from related teams and that the escalation path is clear. Your definitions of “pageable” should be consistent and incident response times should make sense across teams. If your on-caller is expected to log on within 30 minutes of a Sev2 but the on-caller for the team they need help from has an SLA of 24 hours then your on-caller is going to bear the brunt of it while things get Escalated.
Additionally, work with your on-callers to make sure they aren’t stuck at home, chained to the wifi. My previous employer used to pass around a hotspot device so folks could roam. These days, of course, you can just tether to your phone which is handy if your company pays for employee data plans. Nudge your on-caller to confirm that they have everything they need in order to access your network away from the office if they don’t often work remotely.
Finally, make sure that holiday on-call is well distributed over the course of the year. Christmas is the next holiday in the US but what about all the other days folks are typically OOO? Sometimes holidays seem to fall in just such a way that two or three engineers always end up stuck with on-call. Set aside some time to sort schedules out with the team and make sure that doesn’t happen. Keep in mind that folks book vacation far in advance so do this well ahead of the next holiday.
Recognition
Appreciate your on-call engineer. Some companies do this explicitly through compensation or floating holidays but not all. This is a place where you can get creative. Sometimes it looks like a favorite snack shipped to their home. We did a lot of cookie-based gratitude at a previous company. People were feeling pretty snacked out at one point so once I tried small personal donations to the on-callers’ favorite local charities instead. You don’t need food or funds to show appreciation, though: maybe your on-caller would like a hackathon day or two. Play around with what works best for you and your team. The goal is to find a sustainable option that resonates with your on-call engineers.
Last two cents
Of course the best gift you can give your holiday on-caller is a stable system. A past employer recognized that business was very slow this time of the year and earmarked the week leading up to the break for “liberally fixing sh*t.” You might not want to release a potentially breaking “fix” right before the holiday1 but you can certainly work toward better monitoring, higher fidelity alerting, reduced toil, and richer documentation. Why not invest in a smoother holiday on-call if you can?
Wishing you holiday cheer and a quiet phone,
Nuts and Bolts
What are the things that you, your manager or your team have done to make holiday on-call better? Please share tips in the comments below!
For the record I think we had a policy forbidding non-emergency pushes until after the holiday break.