It’s been an interesting ride, optimizing SQL Server, figuring out best use cases for the different flavors of SQL Server that we have come to rely on, and we’re seeing those choices pay off in terms of management, performance, and cost savings.
The “flavors” I’m speaking of? AWS RDS SQL Server, Azure SQL DB specifically.
Our event platform is based on these technologies and we have gone through a whole series of not only tests but live environment experiences with these platforms. I’ve written before about the choice to move to AWS for RDS services and SQL Server. We’ve learned some key takeaways with RDS.
First, the instances offered are solid options (we run on Web Edition and somewhere between Large and 4XL server instance sizes, depending on what’s going on). The uptimes are excellent, the scaling options are good (more on that in a minute) and the opportunity to move up and down the instance size is quite nice. In addition, the fact that it’s running straight-up SQL Server has its benefits as well (again, more in a minute).
Second, the performance has been quite good, and the compatibility with a “grow your own” SQL server environment is outstanding. Very few features are missing and the items that are, generally, can be worked around. This makes migrating to SQL Server on RDS quite straightforward and reliable and lessens the impact development and support-wise. I believe, too, that you end up being able to get a little “closer” to the instances and see what’s happening, very much like a dedicated SQL Server. It doesn’t feel like there is this administrative layer between you and the server.
That said, we’ve moved our production workloads to AZURE SQL DB for the most part. Why? Great question. Really, for only one reason. Better scaling management. With AZURE SQL DB (“Azure”) you can allocate “up to X” vCPUs – and then Azure will manage that as the workload demands ebb and flow. In addition, the billing increment of 1 second is really great to help maximize your budget.
With AWS, you can scale up and down the instance size in a pretty straightforward way. It takes about 10 minutes soup-to-nuts to complete a scaling activity, and it has surprisingly little downtime. As in seconds (about 15-30 or less) that the database is actually less available. It’s quite remarkable.
The catch is that once you’ve up/down-scaled, that’s the instance size that you’re on. It’s what you’re paying for. Doesn’t matter if things ease up and you’re only using 20% of CPU, you’re still paying for that instance size.
With Azure, the usage and associated billing rises and falls with the utilization of the database. In real-time, it upscales and downscales to optimize usage of the resources.
This has afforded better scaling response, better budget management, and more options as deploying additional resources, offering new options, and building out additional database services for various client environments. It’s been ideal for “spikey” traffic and has resulted in added learning about specific performance draws on the systems we support by seeing very specific usage data that we can isolate.
The auto-tuning has been interesting as well, and I’ll be writing more about that shortly, but it’s shown where a development resource had missed some opportunities for optimization and, incredibly, has been wrong on some suggestions and actually automatically reversed its optimization steps. Very cool and really helpful to have that validation step built in.
There isn’t a single “you should do THIS” type of recommendation, but there are clear use-cases where you can consider different options, different providers, and leaning hard on unique offerings within providers. Of course Azure also has SQL Server instance-based services, but the real power, in this particular environment, has proved to be the more interactive scaling and control.
There have been some oddities with Azure – we’ve seen hiccups in reporting of activity, and there are some of the “top query” functionality I wish would show just a bit more information. The biggest surprise is not being able to use fully-qualified names for objects to span databases. I understand what’s happening behind the scenes, so that drives the inability to make assumptions across databases (assumptions as in it’s not on the same machine, so can’t get to it that way) but this can make a spot-recovery of table or something along those lines quite challenging.
Overall, we’re still digging into what belongs where, and have seen really strong performance when creating multi-provider solutions. We’re continuing to test and run production workloads through and find out performance metrics, options, and differences between “SQL Server as a Service” and “SQL Server as an Instance” (my terms). But be sure to understand what options best fit specific usage scenarios.