4/13/11

Time (or Space) is running out.. on Exchange ?!

Hello Again,

This time I'm going to talk about an issue that has happened to me a few time (to my misfortune). Hopefully, It'll help you deal with said issue in a more relaxed fashion and save you some trouble.
Picture the following scene - Your favorite monitoring system alerts that space is running low on the drive that stores your exchange transaction logs (for a specific group), but nobody notices, and it keeps running out.
This usually happens when your Exchange Server isn't being backed up in time.
One way to prevent this from happening, is making your monitoring system alert in a proper way - so there's no way this kind of thing can be overlooked. Another way is to schedule a backup of the exchange server more frequently (or schedule one at all - if you didn't think about it earlier).
Now, free space on your "log drive" is reaching it's critical mark, once it has reached zero available space it will automatically dismount all the stores of the group in question, but fear not - there is a way to deal with it.
First, you can try a manual backup of the exchange server (with your favorite backup manager or even ntbackup) - you might still have enough time to save the day.
Second, or should I say if time is of the essence, you'll need to resort to extreme measures - you'll need to delete  all the transaction logs of the group in question. "But wait, won't that affect me in some horrible way?" - Rest assured dear reader, follow these steps and everything will be ok.
  1. Navigate to the MDBDATA folder on the drive in question. While in the folder, select the first 3-4 days of logs "on record" and copy them to a location that can contain them (i.e. another drive that has lots of 
  2. free space).
  3. When you're finished copying, open a new Notepad instance and do the following - 
    • Open the Exchange System Manager
    • Navigate to the Group in question
    • For each Store, locate the .edb file location and copy it to a new line in the recently opened notepad instance.
  4. Now, for each line in your Notepad instance add the following in front of the line - eseutil /mh
  5. After completing these steps you are prepared to dismount all the store of the group, but be advised, this action will temporarily disconnect all mailboxes connected to these stores (on the bright side, it would've happened anyway, if not now then later). You can now dismount all the stores of the group in question.
  6. After you've done dismounting, open a CMD prompt, navigate to the folder where the exchange server is installed, copy all the content of your Notepad instance and paste it back to the CMD prompt. For each line that runs, you should get a "Clean Shutdown" in the "Shutdown State" line. (if this is not the case, you now have a serious problem, and you can address it using this post)
  7. Navigate to the MDBDATA folder from step 1 and remove the files you have backed up eralier.
  8. Now, select the rest of the files and cut them. Create a new folder and paste them info that folder.
  9. Now mount all the stores (and pray it all works as it should work).
  10. Once the stores are mounted you can delete the new folder you've created earlier and the files that you've backed up in step 1.
  11. Done.
To make sure this does not happen again - do as I mentioned earlier and make sure your backup plan for the exchange server is planned right and executed as scheduled.
Hopefully, you'll never have to resort to these measures, but just in case you do, I hope this post helps you.

Best Wishes, 
Dani.

p.s.
If you have any thoughts on the subject, leave a comment and I'll be sure to reply :)

4/11/11

NetApp, W32Time and stuff between them

Hello World,

This time I'm going to talk about an issue that surfaced recently. One of our storage guys claimed that his NetApp machines aren't getting a good TimeSync service from our Domain, thus drifting away in time and getting to the point where they can no longer co-operate with our domain due to an exceeded time skew.
He also claimed that he is sure that this happens due to the fact that most of our DCs don't have SP2 installed.

It seemed kinda strange, getting schooled by some storage guy, and even more strange was that I haven't noticed any problems in TimeSyncing anywhere in the domain.
I've decided to look into it and found some inconsistencies in his words.
First - There is no issue resolved regarding Time Synchronization in the release notes of Windows Server 2003 Service Pack 2. So there's no way that time sync isn't working for him because of that. 
Second - I found out that this doesn't happen in other domain's in the forest even though we have similar conditions in the other domains. Add to that, the fact that apparently, setting the clock on each NetApp machine manually is too much for one man to do (we have at least one NetApp machine in each site and we have lots of sites). 

All these things didn't add up, so we've decided to apply some best practices (courtesy of NetApp), and now storage guys are reporting no issues. I'll lay out some of them (only the general ones) for you to know :)

  • First and foremost - NetApp machines are site aware. As long as their subnet is defined under the AD Site configuration they are able to locate "Favored" DCs all on their own. In our case, we had a preferred DC manually defined in each machine, and it was the same one for all the machine, even if the site was connected with a low-bandwidth WAN connection. We removed the manual records, seeing that we have a perfectly good site configuration.
  • When defining the "Time Authority" for your NetApp machines, be sure to specify the FQDN of your domain in order for the site integration to work properly.
  • NetApp can use Time Synchronization in 2 different protocols, one of them being NTP. The best practice for any domain would be - all the DCs syncing with the PDCE in the domain, all the PDCEs in the forest syncing via NTP with an authorative DC in the forest, and this DC syncing (also via NTP) with some external time source (that's my personal opinion).
  • Make sure that the time deamon is online on each machine (you'll find out that in some cases, it's switched off for no particular reason).
  • As a best practice you should - "Set the timed window for adding a random offset within 5 minutes of the actual time update/verification. This way not all the systems are talking to the time server at exactly the same time every hour." - this can save you unnecessary timeouts.
I think that's all for now. If I'll have any further conclusions I'll be sure to post it back here.

    in hope of better time sync results, 
    Dani. :D

    Reference : Windows File Services Best Practices with NetApp Storage Systems (Downloadable technical reference from the Network Appliance website).