How to evaluate cloud-native technology and build trust

This article presents cloud-native solutions that successfully passed evaluation and how you can evaluate cloud-native technology yourself.

cloud-header_fin-min

Evaluation of cloud-native technology is a worthwhile effort. This article presents a selection of cloud-native solutions that successfully passed evaluation and worked efficiently in production. It also describes how you can evaluate technology and tools yourself. By evaluating solutions for your business, you can be confident that your selected technology will meet your needs and work reliably.

The evaluation procedure displayed here represents the unique goals and production methodology of VirtusLab. Therefore, you’ll need to tailor the evaluation method using your own criteria.

Please remember that using specific technologies to tackle problems is a part of a larger strategy and does not solve all issues. At VirtusLab, we combine well-informed technology choices with rigour in execution and clear communication.

Evaluating cloud-native tech is a vital part of building modern systems

Establishing a “good enough” grasp of cloud-native technologies and tools is incredibly challenging these days. 

Firstly, the tech stack used in modern application development and operations is highly-complex and depends on multiple solution providers. Secondly, there are now more solutions than ever and the pace of cloud-native technical invention is increasingly fast. 

A thorough evaluation method is vital since cloud-native technologies provide a launching pad for innovation and competitive advantages. Missing these innovations and competitive benefits might put your leading position at stake.

What can we do about it? We can develop the capacity to continuously evaluate cloud-native technology and tools. Ongoing evaluation is necessary as, month after month, developers release new products into the cloud-native ecosystem. 

Evaluation starts when a new solution is released; then it repeats at regular intervals throughout the solution’s life-cycle. Evaluation only halts when a solution is adopted within the tech stack used for live projects or when evaluation reveals that it is too limited for practical use in a production setting.

In short, ongoing tech evaluation helps you discern whether a solution is:

  • Suited to your needs and ready for use today
  • Promising but not yet in a ‘ready to use’ condition
  • Too many limitations, so neither suitable nor ready to use.

Cloud-native technology evaluation sorts technologies into three categories: ready for use, not yet ready for use, and too many limitations.

The risk diminishes whenever evaluation accurately assigns one of these three categories. The risk is that your business might burn time and money integrating new solutions when their capabilities are overestimated, or they are not ready to work reliably in an operational setting.

Proven cloud-native technologies

This section shows a stack of tech proven in business since it moved from evaluation to production. This tech stack comes from Virtuslab, so the selections reflect the company’s deep involvement in cloud-native engineering. In addition, you can find notes in the sections below that explain each technology’s use, the situation it works best in, and the benefits it provides.

A diagram showing cloud-native technologies that have successfully passed through evaluation and have proven themselves in multiple live projects.

Cloud-native technology: Public cloud

The general benefits of today’s public cloud are already familiar to most of us…

  • A flexible pay-as-you-go pricing model
  • Compliance with standards, certifications, data protection, etc
  • Reliability and performance
  • Support is always available.

The major providers in comparison

Despite these general benefits, public cloud offerings have their own strengths and weaknesses. As a result, there’s no obvious recommendation of one provider to suit everyone. Consequently, many organisations adopt a multi-cloud strategy to choose the best one according to each project’s needs. Before opting for a multi cloud strategy, let’s see what the largest providers have to offer. After all, using a single provider may prove sufficient while also making things simple, in one respect at least.

  • Microsoft Azure – This cloud has matured a lot in the past few years. It’s become more stable and it also provides a lot of managed services, monitoring and security tooling. Microsoft’s offering and engagement model is suitable for large enterprises.
  • Google Cloud Platform (GCP) – Bleeding edge technology, Google is innovating in the Kubernetes ecosystem, releasing new cloud-native technologies quickly and often; also provides solid edge infrastructure support with Anthos
  • Amazon Web Services (AWS) – Main player in the market, mature cloud and good automation capabilities (CloudFormation), serverless leaders (AWS Lambda). AWS works well for rapid prototyping and startups.

Public cloud evaluation needs to take into consideration:

  • Requirement of regulatory compliance, and security policies 
  • Current technology domain, whether this is data engineering, eventing or simple microservices architecture
  • Organisation size and their current skills in this area
  • High-availability and networking requirements.

Possible issues with public cloud

If the evaluation is missing or faulty, the consequences of choosing the wrong provider may include:

  • A longer time to migrate to the new cloud service
  • Vendor lock-in
  • Technology and services that are not matched closely to current business goals.

Cloud-native technology: Automation and infrastructure provisioning

When building scalable cloud-native infrastructure, automation technology helps make resource management in a cloud environment much easier, especially when used with GitOps to create a robust end-to-end solution. Furthermore, a high level of automation can be achieved by putting a strong emphasis on a single source of truth for automation.

Main approaches in automation and infrastructure provisioning

Here are the main approaches to automation and infrastructure provisioning, along with  examples of technology solutions that have passed our evaluation.

  • Infrastructure as Code (IaC): Terraform, CloudFormation
    • Terraform is the leading solution here. It provides a declarative method for building infrastructure, which makes it easy to modularise and test codebases.
    • CloudFormation is IaC built into AWS which means it is always up to date with AWS upstream changes, new APIs and so on. Other differentiators are built-in state management and support for nested stacks.
  • Infrastructure as Software (IaS): Pulumi – Pulumi demonstrates a relatively new practice when it comes to the automation of infrastructure. It’s defining characteristics are:
    • IaS is the most natural approach to tackling infrastructure complexity for anyone who knows how to write software.
    • If we can model the system as a graph of resources and use APIs to manipulate those resources, then we can use programming languages to build this system.
    • IaS is one level above IaC in terms of expressibility.

Everyone who uses IaC, has to start programming at some point. IaC as a task consists of a mixture of scripting languages, build tools and IaC Domain Specific Languages (such as Terraform). A more scalable approach would be to just use a general purpose programming language from the start – in other words, the approach used in IaS.

Issues with automation and infrastructure provisioning

These are a few of the issues to consider when evaluating automation technology:

  • Manage the consistency of cloud infrastructure between intended state, defined via Infrastructure as Code, and the actual state of the system.
  • Handling edge cases and backward incompatible changes. Especially when operating production-grade infrastructure with live traffic.
  • Interoperability with other technology in this area. 
  • If technology is keeping up with upstream changes and APIs released by cloud providers.

Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:

  • Development, staging environments not fully compatible with production environment – environments inconsistency problem
  • Creates friction for application delivery, slowing down everything
  • Doesn’t support automation capabilities of existing technology choices.

Cloud-native technology: Containers Orchestration

Container orchestration is a key enabler of agility and short lead time to deployment for large software development projects. The best-known orchestrator, Kubernetes, provides one single entry-point to managing both infrastructure and applications. Staying true to open standards in technology, Kubernetes adds flexibility between different clouds, interoperability with other technologies, and lower entry barriers for developers as they only need to learn one tech.

Kubernetes-based container orchestration solutions

While Kubernetes is the same everywhere, extensions and addons tend to be unique to each public cloud provider, which limits interoperability and may create a challenge in multi-cloud environments.

Kubernetes-based container orchestration solutions that have passed our evaluation are:

  • Azure Kubernetes Service – Microsoft’s flavour of Kubernetes is cost-effective and has a lot of addons integrated with the wider MS Azure tech ecosystem.
  • Elastic Kubernetes Service – Kubernetes from AWS, the most popular and market-leading container orchestrator. Takes a hands-off approach giving flexibility and responsibility to the customer.
  • Google Kubernetes Engine – Google’s Kubernetes is a clear leader in terms of developer experience and number of features supported. Lags behind AWS and Azure in terms of adoption and usage.
  • Google Anthos – support in running Kubernetes on premises and on edge infrastructure.
  • Self-hosted Kubernetes – less popular, adds a lot of control and flexibility but also complexity and maintenance.

Possible issues with Kubernetes-based container orchestration

These are a few of the issues to consider when evaluating container orchestration solutions:

  • Integration with built-in and external cloud identity access management systems
  • Support for seamless and in-place upgrades of k8s version
  • Release and deprecation cycle
  • Enhanced security and reliability of Kubernetes itself

Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:

  • Kubernetes is the core component of any cloud infrastructure these days, it implies other technology choices. Many projects start from here.
  • More hands-off work, requirement for integration with monitoring stack, glueing technologies together, this is extra work and required knowledge.
  • The features that each cloud provider offers can be very different so look carefully at what’s really needed.

Cloud-native technology: Monitoring and observability

These days modern software is highly distributed and complex to monitor. Humans can no longer reason about the full system status. Every system we operate in production is proactively monitored by automation. The amount of observability data might be overwhelming so the system must be scalable and provide meaningful insights, at the same time avoiding false positives. 

Monitoring and observability solutions

Here are the monitoring and observability solutions that have passed our evaluation:

  • Dashboards: Grafana – the leading solution in this category. It’s hard to find something better and more customizable.
  • Metrics: Prometheus, Thanos – the leading solution in this category. Thanos adds more advanced capabilities such as a global query view, high availability, data backup with history and cheap data access as its core features in a single binary.
  • Splunk – managed monitoring solution, often used for security SOC, SIEM, can handle large volumes of data, can create dashboards, alerts in a single place.

Possible issues with monitoring and observability

These are a few of the issues to consider when evaluating monitoring and observability technology:

  • Support for real time monitoring and handling large volumes of data
  • Single glass pane for observability data, monitor multiple systems from one place
  • Scraping metrics from different sources with no need for custom implementation
  • Integration with external support, alerting and on-call duty systems
  • Pull-based vs push-based monitoring
  • Self hosted monitoring vs SaaS

Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:

  • Egress data costs money, especially if sending large amounts of data between cloud regions | pull vs push has different tradeoffs (see article above)
  • Performance bottleneck or incomplete monitoring data leads to undetected/undiscovered incidents
  • Fragmentation of data, observability data is in various different places, hard to correlate events, reason about “big picture”
  • Blind spots / not enough insights.

Cloud-native technology: Deployment

Technology leaders need to deliver software quickly and reliably to win in the market.

Deployment solution evaluation results will be of great interest to both business and technology leaders alike as there is a surprisingly strong correlation between organisational and technological performance. We see that, when compared to low performing organisations, the high performing organisations have:

  • 46 times more frequent code deployments
  • 440 times faster lead time from commit to deploy
  • 170 times faster mean time to recover from downtime
  • 5 times lower change failure rate (1/5 as likely for a change to fail)

These figures come from a study that shows organisational market performance and technical performance correlate very closely.

Deployment solutions

Bringing the focus back to the solutions, here are the deployment technologies that have passed our evaluation and gone on to prove themselves in active use:

  • Helm – de facto standard when it comes to k8s deployment
    • Simple, templating engine
    • Advanced lifecycle hooks
    • The way to package k8s manifest / k8s app
    • Helm chart can be published and stored either in Git repository or container registry
  • GitOps: ArgoCD – new and modern way of working with deployment and continuous deployment
    • Supports high-level of automation
    • Declarative configuration in Git approach – single source of truth when it comes to the codebase and system state
    • Support various plugins, extensibility, for example secret management with SOPS
    • Web UI which shows the entire system state, easy to see all k8s objects and its state
    • Provides CLI and k8s API
    • Automatically syncs Git repo with your cluster
    • Advanced notification features
  • GitHub Actions – an emerging trend 
    • Everything is close to the source code, one platform for everything
    • Community driven plugins ecosystem 
    • Support automation bots for checking code quality, security
    • Easy external integration with other systems
  • Azure DevOps – a full CI/CD ecosystem
    • Azure native approach, works well with Microsoft Azure cloud
    • Supports self-hosted runners
    • Built-in secret management
  • Jenkins – old but still great, good plugin ecosystem, we run it in k8s
  • GitLab CI – full CI/CD ecosystem, a lot of integrations and plugins

Possible issues with deployment

These are a few of the issues to consider when evaluating deployment technology:

  • Deployment software should enable us to act quickly 
  • We should automate ourselves, software deployment is repeatable and predictable 
  • Depending on a company’s expertise different approaches may be more suitable:
    • Traditional CI/CD approach – more predictable
    • GitOps (ArgoCD, Flux, Kubernetes Operators) – more towards autopilot mode
  • It should provide seamless integration with external artefact storage systems

Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:

  • Long lead time to deploy
  • Resistance to change, problematic rollback 
  • Not able to deploy application to multiple clouds
  • Potential security threats.

Cloud-native technology: Datastores and eventing

Cloud native storage must be highly-available and scalable using a software architecture that can grow with your business. It must also support predictable performance/SLA, be highly consistent (read and write data should return the correct data), and have no delays in operation. Finally, deployment of new storage options must be easy and fast.

Datastores and eventing solutions

Here are the datastores and eventing solutions that have passed our evaluation:

  • Kafka – the leading solution in field of eventing / streaming, scalable and flexible
  • CockroachDB – a k8s native, distributed relational data store
  • MongoDB – a distributed document datastore, that can be run in k8s
  • Prometheus – a time series datastore for metrics.

Possible issues with datastores and eventing

These are a few of the issues to consider when evaluating datastores and eventing technology:

  • Understand the different data types:
    • Documents (XML, YAML, JSON)
    • Logs
    • Time series (metrics)
    • Media / streaming
    • FiIes / Blobs
  • Understand different storage capabilities according to workload 
    • Queue, NoSQL, SQL, KeyValue, Object
    • Consistency: Eventual vs Strong
    • Replication, encryption, snapshot, cloning
  • Interfaces to container runtime and orchestration – it should work with Kubernetes
  • Infrastructure automation for storage
  • Role based access control, granular access, protecting data in the cloud, monitor storage policy compliance 

Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:

  • Low-latency performance (QoS, IOPS) and resource quotas – especially in a multi-tenant environment
  • Applications might not survive restarts and outages
  • Difficulty moving data and apps between public clouds.

How cloud-native technology evaluation works

The process of cloud-native technology evaluation always fits within a larger initiative. For example, when a company is well-connected with the technology ecosystem, it sees the current trends and how many projects use the tech under consideration. Additionally, being involved in the cloud-native community and partnering with other mature organisations allows companies to stay on top of technology development.

This is how it works:

A flow-diagram showing the three main steps of a cloud-native technology evaluation process: internal proof of concept, verification and deployment.

1. Internal proof of concept

Find out if using the solution makes sense or if the problems can be solved using technology that’s already in regular use in the company or industry.

Factors to take into account:

  • What are the benefits? Understand how such technology can contribute to business goals and technology strategy. 
  • How is the solution built? Including software architecture and internal dependencies.
  • Does it support the functionalities we need? Verify it against the key functional requirements.
  • Does it fit with the technology stack currently used? This includes licensing, integrations and service management aspects.

2. Verification

Review the summarised PoC results with an internal group of solutions architects and expert engineers and/or cloud centre of excellence. Always get feedback from multiple sources to learn the different intentions people have who will use the solution.

The technologies that succeed in evaluation move into a trial period. They are accompanied by internal design documents based on the knowledge gathered during the evaluation. The approach we follow relies on a standard way of Documenting Design Decisions using RFCs and ADRs.

Prepare a document according to these specifications:

  • Explain in one paragraph the problem space, context or the decision needed.
  • Why is this solution being implemented? What use cases does it support? What is the expected outcome?
  • Detailed design section
    • Explain the design for somebody without deep expertise in technology
    • Get into specifics and edge-cases and include examples
    • Explain trade offs, different possible solutions including pros and cons.

3. Deploy cloud-native technology

Integrate the technology into the developer organisation. First, evaluate current project conditions as these can determine when introducing a new technology is most appropriate. Following this, provide guidance about how to manage the technology enablement process, which looks as follows:

  • Cloud Native Assessment. Work on a detailed proposal, follow up with some clarification questions, then decide on the initial statement of work.
  • Support model and pricing. Depending on the size of the organisation, it is worth considering engaging directly with the vendor.
  • Design and implement reference architecture. Pave the road and establish reusable patterns for other teams within the organisation.
  • Distribute support documentation. An overview of the usage patterns, examples and how-to guides.
  • Share knowledge and train team members. Create guardrails that keep the organisation on a safe path.
  • Measure business benefits and publish case studies.

The cloud-native technology to trust is clear

Now, you have a strategy that enables your company to choose specific, well-suited technologies you can depend upon to perform. Amongst others, the primary gains from this approach are:

  • A long-term sustainable technology strategy which supports innovation
  • Keeps your business on top of the cloud-native technology curve
  • Encourages the greatest people to come and work in an up-to-the-minute yet reliable technological environment.

Still find navigating the cloud-native landscape a challenge? Let’s talk.

Written on September 6, 2022 by

Bartek Antoniak
Bartek Antoniak Head of Cloud