Uncategorized

SQL Server Triage – Lessons Learned

Featured Article(s)
Removing Oracle Database Passwords from Unix Shell Scripts
Oracle Password Repository (OPR) is a small program that can be used to allow you to replace hard coded passwords in shell scripts.

More Than 30 Technical Sessions, On Your Computer
…no travel required to get up to speed and confident with SQL Server. We’ll be covering everything from Backup to Recovery, Data Protection to Best Practices, Security and User Administration to integration with Word, Excel and Access when working with SQL Server. It’s a huge array of sessions and content and lots of how-to information, tips, and demonstrations. Check out the SSWUG.ORG Virtual SQL Server Conference – it’s not like any other "virtual conference" you’ve ever seen, and you’ll come away knowing what’s important, what you need to know with SQL Server. Check it out here.

Something to Remember
I see time and time again where people have forgotten how to triage a SQL Server "emergency." Cool heads and thoughtful responses must prevail. It’s simply got to be the case that you stand back and figure out what’s going on before you take action. I realize it’s "not natural," that you’re often in a mode where the big goal is to simply fix it *now* and fast.

These are very, very dangerous situations. You can make knee-jerk decisions and half-informed analysis passes at the symptoms that can result in so much more damage to the problem at hand.

A couple of recent issues show that this happens nearly every time you are faced with a situation where you must correct, must fix an unknown issue with SQL Server. The users are yelling, management is yelling, the client or customers are yelling… it’s just not a lot of fun. This is the time where you must step up and take the leadership role. It sounds corny, sure. I get it. But you must be the one to essentially say "Stop, Look and Listen" to what’s going on.

– Pull stats – CPU, DISK, trending information
– Take notes – happening at particular times of the day? What else do you know?
– Profile it – DURING THE TIME WHERE THE ISSUE OCCURS*
– Seek out the root issue, not the symptoms
– Stop, look at what information you gather, look at the symptoms and think through the possible issues.

* We recently had an issue where the system was churning and we were troubleshooting what to do next to address performance. We needed to get a picture of what was happening. We needed to see what was impacting the server. The client didn’t want to run a trace during the problem times, they wanted to wait until later when the system was under a smaller load. This won’t tell you anything, or at least a lot less, about what’s going on. You need valid information from the times the issues are occurring.

Take small, singular, measured steps. Small so you know exactly what you did. Measured so you know what the impact could/should be and you can confirm the impact of the step. Then, more to the next one. In a perfect world, if you execute a step and it doesn’t work, back it out. Take it out of the equation as you continue to try things to solve the issue at hand. If you can, you’d like to be testing one thing, and one thing only, as you try to get the issue addressed.

Featured White Paper(s)
The Technologies and Architecture Behind SharePoint Geo-Replication
Infonic Geo-Replicator replicates SharePoint, and other content, between the servers at your global HQ, and your remote serv… (read more)

Enabling the Intelligent Enterprise
To become an Intelligent Enterprise, use of Business Intelligence must become pervasive in the entire organization. Typicall… (read more)