It’s a complex topic and area to cover – we’ll be talking about it on the webcast today, but I have questions…
With cloud, on-premise, even different ways of implementing both, do you find that assessing your environment has gotten a whole lot more complex?
I know with our systems, it sure has. Sure, the basic pieces are there – security, compliance, performance, all of that. But the bits and pieces that are needed to truly assess and monitor have gotten signficantly more involved and intricate.
We’ve been doing a bit of an automation sweep through our own systems. This has mean re-architecting a lot of what we do, re-determining resources that are needed, etc. At the same time, we’re automating nearly anything that can be automated. From scaling up and down based on expected and actual volume, to predicting load, to backup and recovery options and warm standby options.
We’re finding it’s been made more complex with our cloud configurations of seemingly simple and discreet pieces. All of the moving parts are making it more difficult to model and get real data on the big picture, rather than performance of a given piece.
As an example, we were modeling our database requirements to figure out sizing for an Azure database solution. The issue though is that we were stumbling through a maze of automated processes that may kick in at times that we weren’t aware of (they fell victim to “set it and forget it” – they were forgotten alright!) along with web traffic patterns that can be… challenging to model. It was a bit easier before to model a SQL Server based on performance counters and better control over the things touching the system, but now, well, not as clear a picture.
We found ourselves in a strange situation of extending the timeframes for looking at performance and counters in an attempt to capture a more statistically significant picture of the usage. Sounds all well and good, but the more time we added, the more spikes and anomolies came into the model and the less specific the day-to-day modeling became. It was presenting a signficant challenge to determining what we needed to know.
We’re starting to see this across applications as well – where we have APIs that plug into a web-based solution – APIs that may be being accessed from third party sites or from applications, portals, devices and such that we don’t have even control over, let alone predictive information about.
I love a good challenge as much as the next person, but we have found ourselves digging in on creating more responsive solutions (to load) vs. over-extending capacity. We’re more focused on scaling and production response and less on “waiting capacity.”
I actually think this is a good thing, though it’s more technologically challenging as we create new scaling models and figure out what happens when things expand and contract on an automated scale. It’s brought to light how much less control we may actually have when it comes to “touches” on our systems.
Almost makes you miss the days of the silo database. Almost. But the cooler stuff we’re doing and applications and services that are being born from this whole effort are a lot of fun and helpful.
It’s just been very eye-opening! What do you see as you review your systems?