Delivering incremental process improvement in the cloud requires sufficient visibility of networks and applications for monitoring and management, particularly when workloads are less than static or predictable – and private cloud is no different.
Suppliers often recommend not only a continuous (even agile) process to achieve this, but accurate performance testing to expose issues earlier. However, Rob Rutherford, chief executive officer at managed services provider and consultancy QuoStar, points out that if workloads are sufficiently predictable, less performance testing should be needed.
“We used to do a lot more testing. You would get bottlenecks, particularly around disk and the like,” he says. “For automated load testing, you would definitely need to do it and keep doing it when running your own on-prem or if hosting in a private cloud.”
Some organisations rely on private cloud – perhaps due to applications that cannot be refactored for migration or data security requirements – and cannot migrate to public cloud or hybrid cloud in the near future.
For Rutherford, this means that avoiding blind spots, bottlenecks and other issues starts with good preparation, including developing a clear understanding of the cost of implementation and required system change down the track, as well as contractual obligations and service-level agreements (SLAs).
“We are often tasked with addressing performance issues without increasing pricing. This can prove arduous when the return on investment forecasted falls away from the realities of a complex workload boom,” he says.
After signing an order, unexpected adaptations to networks, applications or infrastructure can quickly wipe out returns, damaging cyber security and user experience into the bargain – at times requiring “drastic” decisions to get projects back on track, he says.
Over-resourcing might be required
Private cloud infrastructures need to build in the flexibility that allows users to scale up and out of it with ease – especially with an expected rise in infrastructure rationalisation and simplification after the pandemic, meant to facilitate digital transformation. This might even mean over-resourcing at the start in areas such as disk speed and input/output operations per second (IOPS), says Rutherford.
“Many environments take up more resources in the early stages of a heavy migration,” he adds. “Many throw RAM and processor power at an underperforming environment when disk speed is the bottleneck.”
Process improvement should be aligned tightly with key performance indicators (KPIs) such as availability and uptime, keeping an eye out for exceptions to performance utilisation and outlining threshold situations where adding resources should be authorised. Holding regular meetings to consider incidents, current status and future demands or issues can be an excellent idea, says Rutherford.
“It’s all done on a case-by-case basis. Obviously, a lot of it’s going to revolve around the management of resource utilisation, continually alerting around memory, processor, network, and then any sort of spikes or anomalies and whether they’re one-off or likely to recur and cause non-conformance,” he says.
He adds that if cost is really critical, it can make sense to minimise private cloud investment – only keeping what you really need in private cloud and moving the rest to public cloud or migrating to a hybrid infrastructure.
Hiren Pirekh, Northern Europe vice-president at cloud services provider OVHcloud, suggests also keeping up with innovations, new features, integrations and updates from key private cloud suppliers such as VMware. For cloud providers, this becomes a key way of delivering continuous process improvement to customers.
Examples include taking advantage of Tanzu to enable containerisation across hosted private cloud platforms to deliver greater scalability and utilisation, bringing the customer closer to serverless options and the benefits of public cloud.
“How things could differentiate, as well as looking at this continuous feedback to ensure the user experience is understood and captured, which can also aid the evolution of improvement,” says Pirekh.
Optimising workload management
When monitoring and managing performance, however, differing approaches are available. OVHcloud provides VMware’s vRealize suite to deliver automation, including dashboards, custom rules and custom metrics for management. This facilitates feedback on application response speed to determine maximum user load for a software application, and the creation of tags that can feed back on the health of virtual machines in situ.
Pirekh says the user simply needs to decide how to partition resources to suit the workloads they want to support. And, of course, once an understanding of what is happening is secured, enterprises often have a much better understanding of how to manage it as well.
Pirekh says its approach delivers enough visibility to help predict future capacity and resourcing needs via automation of different actions and creation of specific dashboards to monitor particular aspects of the private cloud environment, and tailored alerts.
“You can create alerts to a pre-existing action, whether it’s to power business intelligence [BI], restart the host, to add more CPU, or to move a virtual machine to another host,” he says. “And the vSphere vCenter hypervisor allows you to understand exactly what your resources are, and you can provision or remove resources through this so you don’t need to log into a separate cloud manager.”
Performance testing too should be adapted to organisational requirements. Users should determine what they require and how continuous improvement will be achieved in relation to key parameters like speed, stability and scalability, and work back from there to develop a continuous testing strategy that will optimise workloads.
Once that is achieved, it becomes time to look for incremental opportunities for optimisation based on the improvements made so far, says Pirekh.
“But you’re paying for a block of resources dedicated to you – it can be easier to manage in the private cloud environment. You probably need less process of continuous improvement than you would in a public cloud, because that has a lot more that you need to monitor to ensure you’re getting the efficiency,” he says.
Chris Yates, vice-president of marketing at cloud services provider Platform.sh, suggests not making too many generalisations about private cloud infrastructure, due partly to the potential challenges for users around owning and operating their network.
“There is an assumption that if you own the entire stack from beyond the bare metal, down to the datacentres, the cooling the procurement of equipment, and so on, then naturally you can see it,” says Yates. “Yet, at least with the customers we deal with, the challenge they have is that they’re ‘their own network’, if you will.”
At the same time, most systems today are “hugely distributed” which means dealing with a lot of variation in, for example, systems, processes and performance. It couldn’t be less like a neat and tidy setup where the racks are all “perfect” and everything is interconnected.
“You can have the latest networking gear and yet have a server literally sitting under somebody’s desk and running something critical, but you don’t know what that is,” he says. “Do you have absolute control over that sort of thing? It can be under their desk but they have zero visibility; they might not even know it exists.”
Don’t count out public cloud
On the public cloud side, users might not have the bare metal or core infrastructure to own or manage, but they are benefiting from “standing on the shoulders of lots of people”, often receiving full tooling that is pre-instrumented or pre-linked and assists with performance management, he notes.
Whether public or private cloud, a performance management approach needs to be specific to customer needs, especially at the application level, and should enable continuous improvement of all the tooling, Yates points out, with every change to an application being logged and repeatable.
Platform.sh has recently invested in more application performance monitoring, acquiring developer Blackfire.io to help bridge gaps in its own cloud platform-as-a-service (PaaS) offerings in what he calls observability – being able to have systems infrastructure or applications instrumented to observe changes in performance and relate those to any externalities.
This is part of tracking and then optimising all changes in a technology portfolio or fleet of applications.
“We need to make sure we are looking at the performance in terms of changes in compute resource utilisation – like CPU and memory and what-not – then tying those to the things that influence changes, such as in the application code changes and updates that are performed, or things that are happening in individual regions or datacentres, at the infrastructure layer,” says Yates.
For Yates, the ultimate goal is to capture and be able to audit as much as possible to drive continuous improvement of all tooling, in conjunction with discovering faults in a timely manner and making enhancements to systems and processes wherever possible. Also, the right approach can facilitate finding a way to improve one area that can then be transferred elsewhere in the business.
“That traceability becomes really important for managing at scale,” says Yates.
Carl Lehmann, principal analyst at IT market watcher 451 Research, agrees. Private cloud users typically have strategic workloads that require certain levels of security – and they must have a suitable set of IT resources, expertise and relevant assets to maintain it to their requirements in a specific or controlled environment.
“They know exactly what workloads they will execute in the private cloud, and they know, potentially, its utilisation rates and how it needs to scale in private clouds. Depending on how they’re configured and set up, they don’t have instant access to the resources to scale like a public cloud option,” he says.
“They need to be cloud-native, modern, stable and predictable, and operate within predictable performance boundaries and budgetary constraints.”