4/15/11

When stations play Hide and Seek with DCs

Hey guys,

I hope to keep this post short, because it's not really something super complex. 
b.t.w I'll be on vacation this week, so don't expect anything out of the ordinary - muse usually comes to me at work :)

Today I'd like to talk about how a station chooses (or rather locates) a DC to communicate with. It's been on my mind this week, because I've had an opportunity to be a part of a technical job interview and the guy we interviewed didn't seem to know how this process works. I'd like to dedicate this post to him. 

To start off, I just want to point out that Microsoft probably has a more thorough explanation on their technet library, so I'm just going to simplify it a bit for those who want to complete a successful job interview and don't have the tolerance to read a lot of complex (and sometimes - good for nothing) technical terms.
So, first of all, the name of the process that commences the "search" is "DC Locator" (no surprises here I hope..). The DC locator works his "charm" through the netlogon service, which means that if netlogon is down locally, you won't be able to find a DC to authenticate with (but you'll probably get an error that extensively explains this). 
Next, your nifty DC locator will try to query a DNS server, and as you've probably guessed -if you don't have a connection with the DNS servers, you won't be able to locate a DC. So, as to the DNS query, if this is the first time your station attempts to query the DNS, it will first query the domain name and save it in the netlogon cache for future use (to save you some time on your net logon, hopefully). 
Now, the DC locator will query the DNS server of his choice for a dc, it will do so through the msdcs zone and it'll prefer a dc in the same subnet. Once a DC has been found, the "client" will establish communication with it using LDAP(Lightweight Directory Access Protocol), that is, to gain access to Active Directory. The DC will identify the site which the said "client" belongs to using client's IP subnet. If the current DC isn't the optimal choice (i.e. not a DC in the closest site), it will return the name of the client's optimal site - in case the client has already failed to communicate with a DC on that site, it will continue "working" with the current DC, else it will query the DNS with a site-specific query. Once the client establishes connection, it will "cache" the info for netlogon future usage. In a case where the client cached a non-optimal DC entry, it will flush its cache in 15 minutes and will reattempt this whole process from the top when needed.
Any further actions will include : Logon,Authenticaion etc.

Well, it turned out longer than I expected, so I hope it'll still simplify the whole process for you.

Good luck on your job interviews,
Dani ;)

Technical Reference : How Domain Controllers Are Located in Windows(Technet)

4/13/11

Time (or Space) is running out.. on Exchange ?!

Hello Again,

This time I'm going to talk about an issue that has happened to me a few time (to my misfortune). Hopefully, It'll help you deal with said issue in a more relaxed fashion and save you some trouble.
Picture the following scene - Your favorite monitoring system alerts that space is running low on the drive that stores your exchange transaction logs (for a specific group), but nobody notices, and it keeps running out.
This usually happens when your Exchange Server isn't being backed up in time.
One way to prevent this from happening, is making your monitoring system alert in a proper way - so there's no way this kind of thing can be overlooked. Another way is to schedule a backup of the exchange server more frequently (or schedule one at all - if you didn't think about it earlier).
Now, free space on your "log drive" is reaching it's critical mark, once it has reached zero available space it will automatically dismount all the stores of the group in question, but fear not - there is a way to deal with it.
First, you can try a manual backup of the exchange server (with your favorite backup manager or even ntbackup) - you might still have enough time to save the day.
Second, or should I say if time is of the essence, you'll need to resort to extreme measures - you'll need to delete  all the transaction logs of the group in question. "But wait, won't that affect me in some horrible way?" - Rest assured dear reader, follow these steps and everything will be ok.
  1. Navigate to the MDBDATA folder on the drive in question. While in the folder, select the first 3-4 days of logs "on record" and copy them to a location that can contain them (i.e. another drive that has lots of 
  2. free space).
  3. When you're finished copying, open a new Notepad instance and do the following - 
    • Open the Exchange System Manager
    • Navigate to the Group in question
    • For each Store, locate the .edb file location and copy it to a new line in the recently opened notepad instance.
  4. Now, for each line in your Notepad instance add the following in front of the line - eseutil /mh
  5. After completing these steps you are prepared to dismount all the store of the group, but be advised, this action will temporarily disconnect all mailboxes connected to these stores (on the bright side, it would've happened anyway, if not now then later). You can now dismount all the stores of the group in question.
  6. After you've done dismounting, open a CMD prompt, navigate to the folder where the exchange server is installed, copy all the content of your Notepad instance and paste it back to the CMD prompt. For each line that runs, you should get a "Clean Shutdown" in the "Shutdown State" line. (if this is not the case, you now have a serious problem, and you can address it using this post)
  7. Navigate to the MDBDATA folder from step 1 and remove the files you have backed up eralier.
  8. Now, select the rest of the files and cut them. Create a new folder and paste them info that folder.
  9. Now mount all the stores (and pray it all works as it should work).
  10. Once the stores are mounted you can delete the new folder you've created earlier and the files that you've backed up in step 1.
  11. Done.
To make sure this does not happen again - do as I mentioned earlier and make sure your backup plan for the exchange server is planned right and executed as scheduled.
Hopefully, you'll never have to resort to these measures, but just in case you do, I hope this post helps you.

Best Wishes, 
Dani.

p.s.
If you have any thoughts on the subject, leave a comment and I'll be sure to reply :)

4/11/11

NetApp, W32Time and stuff between them

Hello World,

This time I'm going to talk about an issue that surfaced recently. One of our storage guys claimed that his NetApp machines aren't getting a good TimeSync service from our Domain, thus drifting away in time and getting to the point where they can no longer co-operate with our domain due to an exceeded time skew.
He also claimed that he is sure that this happens due to the fact that most of our DCs don't have SP2 installed.

It seemed kinda strange, getting schooled by some storage guy, and even more strange was that I haven't noticed any problems in TimeSyncing anywhere in the domain.
I've decided to look into it and found some inconsistencies in his words.
First - There is no issue resolved regarding Time Synchronization in the release notes of Windows Server 2003 Service Pack 2. So there's no way that time sync isn't working for him because of that. 
Second - I found out that this doesn't happen in other domain's in the forest even though we have similar conditions in the other domains. Add to that, the fact that apparently, setting the clock on each NetApp machine manually is too much for one man to do (we have at least one NetApp machine in each site and we have lots of sites). 

All these things didn't add up, so we've decided to apply some best practices (courtesy of NetApp), and now storage guys are reporting no issues. I'll lay out some of them (only the general ones) for you to know :)

  • First and foremost - NetApp machines are site aware. As long as their subnet is defined under the AD Site configuration they are able to locate "Favored" DCs all on their own. In our case, we had a preferred DC manually defined in each machine, and it was the same one for all the machine, even if the site was connected with a low-bandwidth WAN connection. We removed the manual records, seeing that we have a perfectly good site configuration.
  • When defining the "Time Authority" for your NetApp machines, be sure to specify the FQDN of your domain in order for the site integration to work properly.
  • NetApp can use Time Synchronization in 2 different protocols, one of them being NTP. The best practice for any domain would be - all the DCs syncing with the PDCE in the domain, all the PDCEs in the forest syncing via NTP with an authorative DC in the forest, and this DC syncing (also via NTP) with some external time source (that's my personal opinion).
  • Make sure that the time deamon is online on each machine (you'll find out that in some cases, it's switched off for no particular reason).
  • As a best practice you should - "Set the timed window for adding a random offset within 5 minutes of the actual time update/verification. This way not all the systems are talking to the time server at exactly the same time every hour." - this can save you unnecessary timeouts.
I think that's all for now. If I'll have any further conclusions I'll be sure to post it back here.

    in hope of better time sync results, 
    Dani. :D

    Reference : Windows File Services Best Practices with NetApp Storage Systems (Downloadable technical reference from the Network Appliance website).