Nuts and Bolts Troubleshooting Networks Lead image: Lead Image © Fernando Gregory, 123RF.com
Lead Image © Fernando Gregory, 123RF.com
 

Resolving problems with DNS, Active Directory, and Group Policy

Escaping the Trap

Upgrading domain controllers or installing new servers can cause problems with name resolution, Active Directory replication, and Group Policy. A coordinated approach can isolate these errors in Windows Server 2008 or newer. By Thomas Joos

Incorrect name resolution and network connectivity are the most common problems on networks. Therefore, if a service on one or more computers fails or if connectivity and performance problems occur, always check first whether name resolution and the connection between the servers and clients is working optimally. At the command prompt, nslookup checks to see whether the name of the server can still be resolved on the computers involved, which can often help delimit errors. All participating servers – and clients – need to be able to resolve one anothers' addresses.

Name Resolution and Network Connectivity

Name resolution plays an important role: The nslookup command must return the server IP address correctly. However, this does not work in nested structures until you have configured Domain Name System (DNS) servers for the subdomain on the subdomain controller and the DNS server has registered. To use other DNS servers for name resolution on the local machine, run nslookup <host> <server> at the command line as follows:

nslookup dc02.microsoft.com dc01.contoso.com

Here, nslookup attempts to resolve host dc02.microsoft.com using server dc01.contoso.com. Instead of the second entry, you could specify the IP address.

If you enter a DNS server with its fully qualified domain name (FQDN) as the service entry, it requires the DNS server used by the local computer to resolve the server dc01.contoso.com, but not the host dc02.microsoft.com. The DNS server dc01.contoso.com in turn can then resolve the host dc02.microsoft.com, and you will not see an error.

In other words, you can use the nslookup tool to reveal in great detail the weak points of your DNS resolution. To query multiple hosts one after another, use

nslookup -server <server>

where <server> is the name or the IP address of the DNS server you want to query; for example:

nslookup -server 10.0.0.11

You can also combine the two options, if necessary.

If you start nslookup so that it does not use the locally configured DNS server for name resolution, but the remote server 10.0.0.11, you can enter <host> <server> in the nslookup command to query a different DNS server.

The nslookup tool launches at the command prompt and is configured to use DNS server 10.0.0.11 for name resolution. It checks whether the locally configured DNS server can resolve IP address 10.0.0.11 to a server name in its reverse lookup zone (Figure 1, area 1). Because that works, the output shows DNS server 10.0.0.11 as the Default Server with FQDN dc01.contoso.com. An error message at this point (e.g., if the server name for 10.0.0.11 is unknown) would mean that the DNS server configured in the IP settings of the local computer cannot resolve the server name in its reverse lookup zone. In this case, you need to check the configuration of the reverse lookup zone and ensure that all pointers are entered correctly. Consistent name resolution with DNS not only involves resolving server names to IP addresses (forward), but also IPs to server names (reverse).

Diagnosing DNS problems with nslookup lets you narrow down the potential causes of an error.
Figure 1: Diagnosing DNS problems with nslookup lets you narrow down the potential causes of an error.

The next command (Figure 1, area 2) tries to get the server with IP address 10.0.0.13 to resolve hostname dc02.microsoft.com, which it cannot. In this case, a problem exists on the server at 10.0.0.13, which fails to identify the microsoft.com zone. Either you need to check the Forwarders tab in DNS Manager for server 10.0.0.13 to see whether you need to set up forwarding to microsoft.com, or you need to create a secondary zone for microsoft.com on server 10.0.0.13 if you want the server to resolve hostnames for the microsoft.com zone. Next, an attempt is made to resolve the same server (dc02.microsoft.com) using the default server of the nslookup command line (area 3). The default server can easily resolve the server name, which shows that this configuration is fine.

At this point, name resolution from the parent to the subdomain is established. However, name resolution still needs to be set up from the child to the parent domain, and forwarding is the best option:

1. Start by right-clicking Conditional Forwarders in the DNS Manager snap-in.

2. In the context menu, select New Conditional Forwarder and enter the parent DNS domain.

3. Enter the IP address of a DNS server in the parent domain. If multiple DNS servers are responsible for name resolution in the parent domain, you need to store all the DNS servers.

4. You do not need to repeat this action on each DNS server in the subdomain if you replicate the records on the DNS server for the subdomain. For this to work, the subdomain first needs to be created.

Check whether the domain controllers (DCs) in the parent domain can be resolved from the subdomain. In the example in Figure 2, dc-berlin.de.contoso.int is a subdomain controller and dc01.contoso.int is a DC in the parent domain contoso.int. Make sure that name resolution between the subdomains works if you use multiple subdomains. Fully configured name resolution is essential to ensure that Active Directory (AD) replication works and, thus, that the server-based services in the domain also work.

Name resolution must be ensured for a network.
Figure 2: Name resolution must be ensured for a network.

Also use ping to test whether computers on the network can communicate with one another. Note that the Internet Control Message Protocol (ICMP) on which the ping command relies is blocked on many networks. If name resolution and network connections are working well and performance is good, many causes of failure are already excluded. Now launch an unlimited ping test

ping <IP address> -t

to determine whether network packets are lost or the response time in some packages is becoming longer. If you have name resolution problems, also ensure that the correct DNS server is entered in the network settings of the computers. The fastest way to launch the network connection configuration tool is to use ncpa.cpl at the command line.

Optimizing Reverse Zones and IPv6 Settings

Working with a reverse zone makes sense in terms of stability. It can resolve the IP address of a computer to a server name, which in turn helps avoid problems with name resolution. You can create reverse lookup zones with a wizard in the DNS Manager tool, just as for primary DNS zones. After you create an IPv4 zone and enter the subnet you want the zone to cover, register the servers one after another in the zone. You can accelerate the process with ipconfig /registerdns. In exceptional cases, the reverse lookup zone update might not work, in that the server is available in the forward zone, but not in the reverse zone. In this case, you can just manually add the server record. To do this, you only need to create a new pointer.

On the servers, you will also want to enable automatic IP address retrieval in the IPv6 settings; otherwise, the ::1 record (loopback address) will cause an error during local IP address validation (Figure 3). This generally does not affect stability, but it is not ideal for tests or for displaying the correct name. By default, Windows Server 2012 R2, for example, tries to resolve the name to the IPv6 address, resulting in undesirable messages.

Error messages occur during name resolution because of a less than optimal IPv6 entry for the network connection of DNS servers.
Figure 3: Error messages occur during name resolution because of a less than optimal IPv6 entry for the network connection of DNS servers.

Regardless of whether you are using IPv6, you will want to set the DNS server network connection entry in the IPv6 setting to Obtain an IP address automatically. In the IPv4 settings, either enter the IP address of the local server or the IP address of another DNS server on the network. You should always use an IP address of another DNS server on the network as the primary DNS server and your own as the secondary address. If the DNS cache on the computer contains an incorrect record that you have already corrected, you can delete it with ipconfig /flushdns.

When a Windows client launches, it automatically registers with the DNS when the local services (NetLogon service and DNS client) are started. Because you will not want to restart the two services or the entire server to fix problems, you can run ipconfig /registerdns at the command prompt to refresh the DNS records manually.

Repairing DNS Records

The nslookup tool also lets you verify AD SRV locator resource records. Clients can ask the DNS which host on the network is responsible for each server-based service. AD relies heavily on these SRV DNS records. For this reason, it makes sense to verify the records with nslookup. Each domain controller (DC) in the AD has a CNAME, the so-called directory system agent (DSA) object of its NT directory services (NTDS) settings, in addition to its host A name (e.g., dc01.contoso.com). The CNAME is the GUID of the DSA object, which can be found as an SRV record in the DNS under the _msdcs node of the domain zone.

DCs do not attempt to resolve their replication partners with the traditional host A record but, instead, use the CNAME. If a DC's CNAME fails to resolve, a DC tries to find a host A record. If this also fails, the DC tries to resolve the NetBIOS name, either via broadcast or a WINS server. Each DC needs a unique CNAME, which in turn points to its host A record. In case of replication problems, check that these entries are present.

All the AD SRV records are stored parallel in the \%WinDir%\System32\config\netlogon.dns file, which you can view with an editor. If records that AD requires are missing in DNS zones, the most useful thing to do is run the dcdiag /fix command. The tool attempts to build missing entries from the netlogon.dns file; then, name resolution should work again.

Now, the DNS records should update fairly quickly. If the dynamic update still does not work, check the zone properties to discover whether or not dynamic updating is enabled. If you also want workstations and servers that are not members of the forest to register dynamically with the zone, you can enable the Nonsecure and secure option in the General tab of the DNS Manager Properties dialog for the zone.

If Domain Controllers Cannot Be Found

If clients or servers are receiving a message telling them that the DC cannot be reached, you should first use ping on the affected computers to test whether a connection to the IP address of the server works. If this works, make sure the IP address of a DNS server that can resolve the DC is set in the network settings of the servers. In the network settings of the DCs themselves, the DNS servers must be defined such that name resolution works.

If name resolution is still not working after completing these basic checks, the DC records may be missing in the DNS zones. You will find these settings below _msdcs on the DNS servers. On the DCs, you will discover such errors fastest by typing dcdiag at the command prompt. Also use nltest /dsgetsite to check that the DC is assigned to the correct AD site (Figure 4). Typing

nltest /dclist:<NetBIOS name of domain>
Parameters to the nltest command let you check the status of DCs.
Figure 4: Parameters to the nltest command let you check the status of DCs.

displays a list of all DCs in a corresponding domain. The entries should be listed as FQDNs. The command

nltest /dsgetdc:  NetBIOS name of domain

is also important and lists the name, IP address, GUID, AD FQDN, and other information. All information should be free of errors.

Use net start netlogon and then net stop netlogon to start and stop the NetLogon service on the new DC. On startup, the service attempts to re-register the data from the netlogon.dns file. If it encounters problems, you will find an entry in Administrative Tools | Services | System Event Notification Service Properties that can help you determine the problem.

Also, nltest /dsregdns often helps with DNS registration problems. If re-registration does not work by restarting the NetLogon service, delete the DNS _msdcs zone and delegation. The next time you restart the NetLogon service, it reads the data from netlogon.dns, recreates the _msdcs zone, and writes the entries back into the zone. You can then use dcdiag to see whether the problems are fixed or perform an extended test with dcdiag /v.

Checking DCs

To troubleshoot AD replication, make sure that name resolution is working on the network. Name resolution is a basic precondition for AD replication. The dcdiag /v command lets you perform a thorough check of AD. If errors appear here, you may well have found the cause of the replication errors. Enter the error in a search engine for information on troubleshooting. dcdiag /a checks all DCs at the current AD site, whereas dcdiag /s checks all the DCs in the forest. To display only the errors, use dcdiag /q. If you want to test only a single DC over the network, use:

dcdiag /s:<name of DC>

If you see any errors, you should first restart the server and then see which entries are in the Event Viewer and whether all services started (e.g., the system services for the DNS Server and AD).

Review all the errors in dcdiag and search for them on the Internet. Typing

dcdiag /v >c:\temp\<filename>.txt

redirects all the data to a text file, from which you can copy and check for errors.

The various advertising tests and flexible single-master operations (FSMO) role tests need to work correctly in all cases. By using nslookup and ping, you can test name resolution and communications between the DCs. The repadmin /showreps command shows the DC replication setup. If individual DCs cannot replicate, you will quickly see which DC is the root cause. Typing

repadmin /showreps >c:\<filename>.txt

lets you redirect the data to a text file, and

repadmin /showreps * /csv > c:\<filename>.csv

redirects output to a CSV file You can import this into Excel, for example, for better troubleshooting.

Use the Active Directory Sites and Services Microsoft Management Console (MMC) to check Sites | <name of site> | Servers and see whether all your DCs are listed. You will find an NTDS Settings entry below each DC. Click on this to see the replication connections to other DCs in the window to the right. The connections are generated automatically. You can start replication manually from the context menu (Replicate Now). If an error is displayed, you need to check why the DCs are unable to communicate.

Additionally, you will want to check that all the DCs are registered properly in the AD. To do this, use nltest /dclist:contoso. Check the individual DCs to see whether they know their own locations with nltest /dsgetsite.

Make sure that all DCs are shown with their DNS names. If this is not the case, check whether the correct DNS server is registered on the DC, the DNS server has the server and its IP address, and finally, name resolution works with nslookup.

The Knowledge Consistency Checker (KCC) connects to the DCs at the different sites and automatically creates a replication topology based on the defined schedules and site links. If a replication connection does not work, read the server GUID for each server with repadmin /showreps. Each server displays the DSA object GUID in the window. You need to use this to add a connection and then use the GUIDs with the repadmin /add command. The domain name in this example is contoso.int, and the server GUIDs for the two DCs are:

Go to Active Directory Sites and Services and delete all the connection objects. Next, create a new connection from the defective DC to a working DC:

repadmin /add "cn=configuration, dc=contoso,dc=int" e8b4bce7-13d4-46bb-b521-8a8ccfe4ac06._msdcs.contoso.int d48b4bce7-13d4-444bb-b521-a8ccfe4ac06._msdcs.contoso.int

Use the appropriate server GUIDs and domain names for your environment. During this procedure, error 8441 (distinguished name already exists) may appear. In this case, the connection already exists. Perform a full replication of the connection you created with:

repadmin /sync cn=configuration,dc=contoso,dc=int DC1 e8b4bce7-13d4-46bb-b521-8a8ccfe4ac06 /force /full

Then, in the Active Directory Sites and Services snap-in, make sure that automatically generated connection objects from the defective machine to a functioning DC again exist and make sure that replication works in all directions. Also check whether the individual FSMO roles are known on the network. You can view these in a single action using:

netdom query fsmo

For an individual view, use the following commands:

Troubleshooting Problems with Group Policy

Group Policy can fail for a variety of reasons, so you must investigate where the problem lies step by step. Preferably, you have created different group policies for the different settings and linked them to the appropriate organizational unit (OU) or the whole domain. The following points will help you with this investigation:

The Windows Resultant Set of Policy (RSoP) MMC snap-in provides a graphical interface and evaluates the applied policy. You can view the RSoP on a workstation from the MMC with File | Add/Remove Snap-In | Resultant Set of Policy. Another way is by typing rsop.msc in the search box of the Start menu.

If group policies are not applied correctly to individual computers, use the free Microsoft Group Policy Log View tool [1] to home in on the error. Install the tool on the computer that you want to analyze; then, open a command prompt with administrator privileges and change to the directory in which you installed the tool. Use the following command to monitor Group Policy:

gplogview -o gpevents.txt

The tool parses all the Group Policy entries and displays a text file in which the GPO errors are collected. You can also run the tool from any computer using a logon script. If the logon script saves the file with the evaluation results on a share, you can monitor the use of Group Policy on multiple computers in a targeted way. In this case, do not just save the evaluation file on the network, but add the respective hostname of the evaluated machine to the file name:

gplogview -o \\<server>\<share>\<computername>-gpevent.txt

You can also create an HTML file as a report:

gplogview -h -o \\<server>\<share>\<computername>-gpevent.html

The tool also uses color highlighting in the HTML report. The redder the entry in the Activity Id field, the more serious the error. The tool can also monitor the application of Group Policy in real time by opening a command prompt with administrator privileges and starting real-time tracking with gplogview**-m.

The tool now monitors the local machine for the application of Group Policy. Opening a second command line and running gpupdate /force brings up a Group Policy Log View window, in which you will see the policy evaluation.

Conclusions

The problems discussed in this article can affect any infrastructure, even after seemingly minor changes. Because these problems can have serious effects on the well-being of the entire infrastructure, I tried here to provide quick fixes for the administrator. At the same time, it is conceivable that less serious problems have affected your network for a long time. Although they might not slow down the environment so that IT managers would notice, they could still cause suboptimal IT operations.