How do we interview for Cloud Engineering positions?

We are VirtusLab – a rapidly developing software engineering company. We offer services spanning from cloud infrastructure to reactive systems and data science. We also invest in R&D and provide substantial support to the IT community by organising conferences, meetup groups, and contributing to open source efforts.

cloud_bw-min

I’m Bartek Antoniak, a Space Owner of the Cloud area at VirtusLab. You will probably meet me during the recruitment process 😉.  I wanted to share this article to help you understand who we are, what we value, and so you feel well-prepared for your interview.

In this text, we’ll shed light on Cloud Engineering interviews at VirtusLab. Not necessarily focusing on the process, which is pretty much standard compared to other companies. Instead, we’ll show you how we evaluate our candidates, the cloud engineering abilities required and how to prepare for the interview. To some degree, it applies to everyone from junior to principal roles. Don’t be discouraged if something is completely new to you. We design interviews to reduce formalism and unlock your potential by having friendly conversations. 

Candidate Profile

Everything starts with understanding your work ethic. We have identified the key aspects that help us navigate during the interview meeting.

  • Understanding the problem space – within the current project and business domain, explaining it in simple terms. Effective communication and common sense play an important role to work together efficiently as a team. 
  • Proposing solutions or improvements to existing problems – we welcome any opinion, not everything is perfect. We seek some track record in improving processes, technology or working methods. 
  • Individual contributor vs leadership role – understand what type of role fits you best. Individual contributors might expect more technical topics. Managers will be asked more about team dynamics, communication and decision making.
  • Learning technology fast on your own – we want to know if you understand the cloud-native landscape and current trends in technology.
  • Working unsupervised with good results – depending on your experience, we expect a reasonable level of autonomy.
  • Reasoning about complex systems – unless you are specifically interviewing for a role that requires deep knowledge, we aren’t looking for this as mandatory. However, sooner than later, you need to develop these skills.  

image for article: How do we interview for Cloud Engineering positions?

Engineering

When it comes to the technical side of things, some principles remain the same regardless of the current technology stack. Instead of walking through the typical checklist, we ask a lot of open-ended questions in areas of:

  • Understanding the best practices, declarative and modular code – both from a programming and infrastructure point of view. Implementing a large Terraform codebase might become messy if we do not follow some guidelines.
  • Thinking about automation and design-first approach – building and operating infrastructure in production require solid design and a high level of automation. We do our best to automate ourselves so we can focus on more important problems than repeatable tasks.
  • Understanding life-cycle management – nowadays, we mainly deal with complex distributed systems. This requires an exceptional understanding of how the system behaves and how to manage it properly. 
  • Understanding security risks and how to mitigate them – we pay a lot of attention to security by design.

This approach helps us open more room for discussion outside of the current project space.

Please find some technologies we use on a daily basis. It may vary depending on the specific project.

image for article: How do we interview for Cloud Engineering positions?

System Design

This part is all about exploratory problems. Besides understanding the underlying technology, we need to consider limitations and find optimal solutions. Reaching the very end of the problem is not important. We’d rather get the impression if it was good working with you. Some of the things worth considering:

  • Asking clarification questions – try to understand the problem before jumping straight into the implementation phase. Some people start with technology without understanding the bigger picture.
  • Proposing a solution – what’s your opinion, how this might work. We collaborate as much as possible at this stage. 
  • Considering the build vs buy approach – in the complex technology world, sometimes it might be better to pay extra money instead of reinventing the wheel.
  • Ability to tie things together – and see the bigger picture e.g how it integrates with monitoring stack or network architecture.
  • Service management aspects – everything from backup, restore, upgrades, patches, to release cycle.

Practical Examples

The first task is more dev oriented, for candidates who prefer solution design work and should deeply understand the cloud-native technology landscape.

Propose an architecture and technology to solve inter-service communication problems (high latency, complexity, different standards of exposing endpoints) at scale. Take into consideration the following aspects:

  • Provide north-south and east-west communication
  • Workloads are running both on Kubernetes and standalone VM in Azure or any other cloud
  • 20+ teams with 100+ services to communicate
  • It needs to be secure
  • It needs to be monitored
  • Applications may use different technology stacks (Java, Scala, Go)

Stretch goal: extend this model to multi-cloud environments Azure/AWS/GCP.

The next task is more suitable for Site Reliability Engineers (SRE) and operations roles. As well as for DevOps Engineer working closely with application teams on automating some workflows.

Propose a high-level design for incident management workflow in Azure/AWS/GCP for a large number of microservices (100+). Take into consideration the following aspects:

  • Workloads are running both on Kubernetes (managed) and VM.
  • It needs to be fully automated.
  • It needs to support on-call duties.

Additionally, try to answer the following questions:

  • What technology and tooling does it use?
  • How to check uptime, notifications, past incidents?
  • What metrics are crucial to measuring?
  • How would you design alerts and severity levels?

Stretch goal: extend this model to multi-cloud environments Azure/AWS/GCP.

Most candidates who fail the interview do so because their understanding of the technology landscape is very shallow. They reason about single technology without trying to understand the problem space first.

Closing Words

We do not expect you to qualify for all of the above points. A good understanding of some of these areas and a willingness to develop expertise in others are sufficient. We are not concerned with your education or any other formalism. What we are concerned with are your passion, knowledge and experience.

Thank you for taking the time to read our article. If you find it interesting, we are more than happy to talk with you. See our open position in the cloud area: 

Written by

Bartek Antoniak
Bartek Antoniak Jan 20, 2022