Lecture11 - Domain Name System
Lecture11 - Domain Name System
Lecture11 - Domain Name System
Goal: Design a system to look up domain names that can scale to the planet-wide internet and handle
queries on billions of objects.
The Internet Domain Name System (DNS) is the naming system for nodes on the Internet. It
associates human-friendly names with numeric IP addresses and other information about
that node.
Introduction
The Internet Domain System, DNS, is the distributed system that enables the lookup of
hundreds of millions of domain names. It’s an application-specific implementation, not a
generic object store, but it is a collection of software that is used every time we access a web
page, send email, or send a packet to any system on the Internet.
A global non-profit organization called ICANN, or the Internet Corporation for Assigned
Names and Numbers, is responsible for managing IP addresses, autonomous system
numbers that are used for routing, and the domain name system.
The Internet Assigned Numbers Authority, the IANA, is a department within ICANN that
is responsible for assigning IP addresses and managing top-level domains. The IANA
allocates chunks of the IP address space to five organizations called Regional Internet
Registries (RIR).
These cover large geographic areas.
For instance, ARIN is the American Regional Internet Registry and covers the U.S. and
Canada.
Computer names formed a flat namespace: each name had to be unique and there was no
concept of domains or any form of hierarchy. Machines had names such as UCBVAX for a
certain Vax computer at UC Berkeley or DECWRL for a computer at Digital Equipment
Corporation’s Western Research Lab.
If you had a system on the internet, you would periodically download the latest copy of
the hosts.txt file from SRI-NIC via FTP. It was a text file that contained the names of all the
computers on the Arapanet and their corresponding IP address. By searching this file,
programs could look up the address corresponding to a specific machine name.
This worked well when there weren’t a lot of hosts on the Internet. Until around 1990, the
Internet was accessible only to companies and universities working on Department of
Defense projects. As the number of hosts on the Internet grew, the system didn’t scale:
asking people to download a file containing all the hosts on the Internet didn’t work
anymore: the file would get huge and the information within it would change too frequently.
1. Country code domains contain two-letter country code names, such as de for Germany, es for
Spain, or uk for the United Kingdom or ke for Kenya.
2. Internationalized domain names (IDN) top-level country code domains are top-level
domains that are displayed in their native language. For example, .中国 for China, .ευ for
Greece, and پ اک س تان. for Pakistan.
3. Finally, generic top-level domains include traditional ones like .com, .edu, and .org and all
the newer ones like .party, .audio, and so on. These domains also include names in different
languages.
Currently, there are 1,589 top-level domains. The Internet Assigned Numbers Authority
(IANA) delegates the management of various domains to different organizations. Each top-
level domain has an administrator who is in charge of it. The IANA itself only keeps track of
the root servers. These root servers tell you who to contact for information about top-level
domains.
Shared registration
Domain name allocation and management is done through a system of shared registration.
The domain name registry is the master database of all domain names that are registered
under a top-level domain.
The domain name registry operator is the company that is in charge of this database. These
operators run a NIC – a network information center – that tracks information about specific
domains. The list of registry operators can be found at icann.org.
Then there’s the domain name registrar. This is the company that you use to register a
domain name. There can be many registrars for each top-level domain and each registrar can
handle registrations for multiple top-level domains. The registrars consult and update the
master database that’s managed at the Registry Operator’s NIC. The database of domain
name registrars can be found at iana..org.
Currently, 2,661 registrars provide registration services for various domains. Of these 1,202
are registrars for DropCatch.com, which is a collector of expiring domains. Dropcatch has so
many registrars because the domain name registries allow each registrar to contact them
only at a limited frequency. This allows Dropcatch to check registries essentially constantly
to pick up domain names that just expired.
The registrar you choose becomes the designated registrar for your domain. It’s the company
you go through for any changes since you cannot contact the registry directly. The registry
operator keeps the central registry database for the top-level domain. Only the designated
registrar, the company you registered your domain name with, can make changes for that
domain name unless you invoke a domain transfer to another registrar.
For example, the company Namecheap is the designated registrar for the domain
poopybrain.com and Verisign is the registry operator for the .com top-level domain. This
means that Namecheap sends information about poopybrain.com to Verisign.
We need a way to be able to resolve human-friendly domain names into IP addresses that
software can use to send and receive data.
Original solution
The original solution, as we saw, was to download the file containing the list of all computer
names on the Internet along with their corresponding addresses onto your own system.
Then, local software on your system can search for a name and find the address.
This was the system in place throughout the 1970s and 1980s. The file would be downloaded
via FTP from the Network Information Center (NIC) at the Stanford Research Institute (SRI).
Of course, this solution did not scale to millions of hosts on the Internet. Not only would the
file get big but there’s also a lot of churn in the data. Hosts are constantly being added and
deleted and many addresses are frequently changing.
DNS provides…
DNS servers provide answers to various types of information about domain names. Some of
the data they provide includes:
Addresses
Perhaps most importantly, they give us an IP address that corresponds to a
name.
Aliases
They can also provide aliases. These are called canonical name records, where
you specify that one name really refers to another name.
Name servers
They identify name servers. These are other DNS servers that tell you where to
go for more information about that domain.
Mail servers
They give you names of mail servers for that domain
Text data
They can provide arbitrary other data in text records.
DNS servers enable load distribution because you can have lots of name servers that can
handle queries for the same domain. DNS servers cache previous lookups to return
responses faster the next time someone looks up the same domain name.
They can also provide a list of IP addresses for a given domain name. This allows the client
to contact any one of several IP addresses to find available servers or to do load balancing.
Some DNS servers shuffle that list of IP addresses for successive queries so that different
clients will likely choose different addresses even if they use a simple approach such as
choosing the first address.
Each top-level DNS server knows about the DNS servers for each domain immediately
beneath it: the edu DNS servers will know about the DNS servers
for rutgers.edu, columbia.edu, nyu.edu, and so on.
Descending dee[er] into the hierarchy, DNS servers are responsible for names within
individual organizations.
Authoritative servers
DNS has a concept of zones and authoritative servers. A zone is just a group of machines
under a node in the domain tree that’s managed by one entity. For instance, rutgers.edu is a
zone.
An authoritative name server is the DNS server that is configured for that zone rather than
some other DNS server that might have cached information about that zone.
Suppose you want to contact a system at Rutgers. You need its address. That’s handled by a
DNS server that Rutgers administers. How do we find it?
The domain registry helps us here. When you register a domain with a domain registrar, you
provide it with the addresses of DNS servers that can answer queries about the domain. The
domain registrar stores this information at the domain registry.
We know that the information about some computer in Rutgers is sitting in a DNS server
that Rutgers administers. That doesn’t help us if we don’t know how to get to that DNS
server. To find the server we need, we can start at the root of the DNS hierarchy.
Root name servers can tell you the addresses of DNS servers responsible for all the top-level
domains. By asking any root DNS server about the computer at particular request, it will
provide the addresses for DNS servers that are responsible for the domain.
The root servers have names like A.ROOT_SERVERS.NET, B.ROOT_SERVERS.NET, and so on.
...
DNS Query types
There are two ways queries are done via DNS: iterative resolution and recursive resolution.
Iterative resolution
With iterative resolution, a DNS server returns either an answer or a referral to another DNS
server.
A referral is a message that tells you about a DNS server at a lower level in the domain
hierarchy. The DNS client must process these referrals by submitting queries to those
servers.
The advantage of iterative resolution is that each component is stateless. It either has an
answer, provides a referral, or it fails the query.
Recursive resolution
Recursive name resolution isn’t a great name because we’re not really using recursion.
Recursive resolution means that a name server is willing to take on the responsibility of fully
resolving the name so the client doesn’t have to deal with referrals. Basically, it does a
sequence of iterative resolutions until it finds a name server that gives it the answer or it
gives up if it’s unable to find one.
The DNS server never sends back referrals to the client that made the request. Instead, it will
query all the needed DNS servers to find the domain name, handle the referrals itself, and
then return either the answer or a failure to the client that made the query.
The good part about recursive resolution is that the client doesn’t need to deal with referrals
and DNS servers can cache all the intermediate results they discovered to make query
resolution quicker in the future.
While recursive resolution makes life easier for the process that is making the request, the
disadvantage of this approach is that the name server has more work to do. It may have to
issue multiple queries and process responses to resolve the domain name, maintaining the
context of the query until the response is sent.
Top-level DNS servers only handle only iterative queries. They want to remain stateless,
handle simple local lookups, and be able to support a heavy query volume with minimal
effort.
Resolvers in action
Most computers run a service called a DNS stub resolver. This is a mini DNS server that
stores and checks cached lookups so that the computer does not have to waste time
contacting a remote service each time it needs to find the address of google.com or any other
frequently accessed domains. Prior to issuing a remote query, the stub resolver also checks a
local hosts file (hosts.txt on Windows systems) to see if there are any pre-configured name-to-
address mappings.
If an answer cannot be found in the cache or in the hosts file, the stub resolver then contacts a
DNS server, often one provided by the ISP or a public DNS server such Cloudflare (1.1.1.1),
Google Public DNS (8.8.8.8), Quad9 (9.9.9.9), OpenDNS (208.67.222.222) or one of several
other free DNS services.