Ford’s Willow Run Assembly Plant in Michigan was built to bring automotive mass production to warplane manufacture during World War Two. Its enablement of production scaling would prove to be a winning strategy in the end, but constant design changes requested by the government upset the finely-tuned clockwork of the assembly line and contributed to high costs, production delays, and quality issues. In spite of seeking continuous improvement, continuous change to airplane design proved counterproductive. Continuous improvement would come from the scale afforded by innovations in manufacturing, not from continuous design change.
The cloud services industry has something to learn from this pivotal moment in manufacturing history. Cloud services have made continuous change a possibility—something that wasn’t the case with the traditional model of deploying IT resources. Traditionally, IT infrastructure was simply deployed, then managed. With the cloud’s enablement of continuous change, firms need to ensure that that change leads to improvement in costs, security, and scalability. Change for its own sake, or in pursuit of the wrong goal, can be costly and counterproductive.
Continuous improvement is about driving improvements in the general best practices of IT hygiene like driving down incident response time, automating incident remediation, and aggregating and correlating alerts. However, it’s also about more subtle practices of IT hygiene that sometimes get overlooked—namely the strengthening of the relationship between IT staff and business leadership and reorienting the purpose of that relationship toward business outcomes. Finally, continuous improvement is about creating a safe environment in which firms can pursue true innovation in addition to incremental improvement and organizational realignment.
Balancing innovation and governance
The power of the hyperscale cloud is in unleashing scale and innovation. Hyperscalers are releasing new tools and services in the tens to hundreds each quarter, and firms should want their staff to be creative in activities like writing new Lambda and Azure functions. However, we know from the world of biology that uncontrolled growth is disastrous to an organism. Likewise, uncontrolled scaling and innovation exposes IT systems to unforeseen security gaps and cost overruns from misconfigurations and experimental script and code.
As such, continuous improvement is characterized by a careful balancing act between cost, security, risk, and the freedom to be creative, try new things, use new tools, and support that growth within the business.
Achieving the balance
This balancing act is not something that happens by accident. When firms are able to achieve it, it’s invariably the result of beginning with a carefully crafted operational model. This is not a fixed architectural drawing for the finished structure, but a framework guiding ongoing construction and renovation. This operational model should support the kind of next-generation AI ops, features, and functions that drive continuous improvement across a firm’s IT landscape.
A key part of the operational model is building in controls for cost. Continuous change offers the opportunity for building more economical systems, but the experimentation required to find new cost-effective processes can itself lead to unintended rises in cost.
Building in cost anomaly detection is critical. These processes ensure, monitor, and analyze a firm’s spend to ensure that costly and risky unintended events are not arising within the environment. These events include things like unwanted actors spending resources excessively or transferring data out. It’s also about creating budgets and policies to protect the business from a misconfiguration or an unforeseen error or running a function over and over again on a serverless PAAS service, for example.
Firms need the appropriate operational model and to have a controller solution that creates value. By setting out into the cloud with these tools already in hand, a firm can control cost, measure and manage risk, and create platforms that scale and allow business to grow. When these three boxes are checked, a firm can be confident its cloud IT deployments are creating benefits for the business.
START with the operational model
To avoid ending up with a Willow Run scenario of cost overruns and hampered scaling, the operational model must be in place before the move to a hyperscale cloud begins. Applying the legacy model and way of thinking to the cloud, even at the adoption phase, is the entry point to that trio of disasters we’re seeking to avoid.
- Start with a Cloud Adoption Framework or similar process to build a solid foundation
- Study and understand a well-architected model (regardless of the hyperscale cloud in use)
- Understand and codify the business problem to be solved
On a more granular level, examine each of the functions in the legacy process and assess how these functions can be modernized for the hyperscale cloud environment. For example, a legacy model may generate periodic alerts that are responded to by a person. How can that process be revised to be driven by AI ops?
An effective AI ops model leverages an ITSM platform that allows a team to observe, engage, and act appropriately and measure responses and engagements over time, offering valuable information on how processes can be improved. This information in turn fuels the automation platform for action. This kind of AI ops model is critical to a cloud model because things move quickly, things change quickly, and things scale quickly.
Real world successes in continuous improvement
A Protera client came to us with a slew of file servers scattered across a public cloud environment. Each server had data on it, and it wasn’t clear which data was on which server. Each file server had users with access to AD and operating systems that needed to be patched—each was a different device that needed to be managed.
Protera is kicking off a project to take that file server landscape and consolidate it into a cloud-based enterprise file system, giving our client a file storage platform with near-unlimited scale. It minimizes the threat landscape by limiting the diversity of operating systems that could be scanned, which would be a potential area of compromise for the business. By consolidating resources to a platform that intelligently tiers storage and allocates infrequently used file data to cheaper tier capacity storage, this system is reducing our client’s monthly cloud spend.
Completing the checklist: security, cost optimization, and scalability
By identifying one small disparate section of the business to focus on, our client quickly adopted continuous improvement, but also realized improvement in all three verticals of benefits.
Our philosophy for creating the right operating model and driving continuous improvement is this: if you focus on addressing security, cost, and scalability, you can build a cloud environment that adds value to the business. The cloud’s capacity for continuous change is powerful, and the right operating model can make the difference between that power driving value and driving chaos and uncertainty.