Wednesday 30 January 2013

Adventures with Powershell, first steps...

I keep hearing about this mysterious product that is now bundled with SQL Server in various guises called Powershell.

I've made the occasional foray in it's direction to find a DOS prompt with all the functionality that should be in a DOS prompt, in fact, something surprisingly like a Unix command line but in DOS, off course I'll probably find a horse's head next to me in the morning for suggesting such heresy.

Anyway, to date I found no practical purpose for me or the environment that I manage because everything it seems to offer I can do already via SQL Agent without having to learn arcane Powershell scripting and magic words.

But no, I have one nagging issue.

Prior to some recent data centre activity we DBA types had configured a job that collected, on a daily basis, stats from each SQL server and stored them at a single point. This was entirely hand carved using t-sql, ssis & the SQL agent & worked beautifully on SQL Server 2000, 2005 & 2008R2. The only flaw was that the first method coded, and due to time constraints never altered, the code worked in an entirely linear and fault intolerant fashion - so if server A was uncontactable the SSIS step would fail and the whole process failed for the day. This would have been fine if it wasn't for service companies who seem to make it their mission to make networks about as unreliable and poorly performing as possible to keep themselves in work while not having to tax themselves too hard to be forced into remedying the situation.

So, four years of living with an unreliable but generally working situation became the accepted norm due to massive volumes of actual work - why bother monitoring or checking when there are dozens of calls a day telling us how bad things are.

This cannot continue.

Enter the next service provider who promises everything will be fabulous, they look after these babies in their sleep. Contract signed and responsibility for managing services on Windows servers duly ensconced with said service provider who has to tools to manage the situation.

Said tools, yours truly & colleagues, don't cost the service provider a penny but do provide a regular parade of helpdesk calls along the lines of "service x isn't working, the users are jumping on our heads, why didn't you notice as they've started wearing studs". This is of course good for the service desk because it keeps their volume of calls up whilst they can file under first time fix because the same said tools have had to resort to managing the services themselves.

This toolset is actually pretty good because you can find extensions of the toolset sitting in other teams who merrily report directly to the biggest tools that server x is down again, naturally they won't report it to the helpdesk because they don't actually help and as a change to the environment (bringing server back up) is a change request and investigating why it went down is complex enough to be handed to third line but because it mentions "database" in the text somewhere they shunt it into the DBA queue which puts us all in limbo because the DBA queue is not actively monitored since the same service provider decided that staff of the client in non application support roles can't use the ticketing system.

I'm feeling better now.

So the fundamental problem is that we need to know when servers or services are down before anyone from the user community has a chance to nip home for their stilettos. We cannot rely on the service company for anything. What we need is a means of checking server status and service status whilst minimising the need to connect to SQL boxes but still keeping control of the entire process within the SQL world so that we can stop anyone else scuppering us further.

Enter Powershell (I got there in the end).

There is an interesting article I came across recently that introduces something very close to what I need written by Jugal Shah on MSSQLTIPS.

The principle works nicely - Powershell appears rather wordy so performance isn't great (30 seconds to check 4 servers) so it's not going to running every few seconds but anything around an hour will do fine for our purposes and I suspect there must be some ways to reduce the volumes of text and hence run times.

What it doesn't do is put any control and coordination around the principle that we would expect in our environment, so next time I'll post our implementation which works in our environment.

No comments:

Post a Comment