Appendix 1: networking basics

I suppose you could say that the core topic of the book has now been addressed and from here on in we're just wrapping up some loose ends. Perhaps "loose ends" isn't quite the right phrase. Let's try "bonus material" instead.
I know. This book was supposed to be about learning new technology and not the technology itself. But I thought I'd throw in some basic backgrounders for three of the biggest "big tent" technologies of them all: networking, Linux, and Amazon Web Services. No matter what tools you end up learning and working with, it's hard to imagine going through a full IT or dev career without at some point coming face to face with the members of that club.
So we begin with networking, the glue that holds the everything together. And I mean everything. Forget the IT world: if networks failed, our banking, transportation, health, and industrial systems would disappear along with them. If you want to build your corner of the modern world, you'd better understand how it will connect to everything else.
The contents of this chapter, made available through the kind permission of Manning Publications, come from chapter 14 of my Linux in Action and chapter 5 from Learn Amazon Web Services in a Month of Lunches.

Understanding TCP/IP addressing

A network's most basic unit is the humble Internet Protocol (IP) address, at least one of which must be assigned to every connected device. Each address must be unique throughout the entire network, otherwise message routing would descend into chaos.
For decades, the standard address format followed the IPv4 protocol: each address is made up of four 8-bit octets, for a total of 32-bits (don't worry if you don't understand how to count in binary). Each octet must be a number between 0 and 255. Here's a typical (fake) example:
154.39.230.205
The maximum theoretical number of addresses that can be drawn from the IPv4 pool is just over 4 billion (256^4). Once upon a time, that seemed like a lot. But as the internet grew far beyond anyone's expectations, there clearly weren't going to be enough unique addresses in the IPv4 pool for all the countless devices seeking to connect.
Four billion possible addresses sounds like a big number until you consider that there are currently more than 1 billion Android smart phones in use—that's in addition to all the millions of servers, routers, PCs, and laptops, not to mention Apple phones. There's a good chance your car, refrigerator, and home-security cameras also have their own network-accessible addresses, so something obviously had to give.
Two solutions to the impending collapse of the internet addressing system (and the end of life as we know it) were proposed: IPv6, which is an entirely new addressing protocol, and Network Address Translation (NAT). IPv6 provides a much larger pool of addresses but, since it's still not all that widely deployed, I'll focus on NAT.

NAT addressing

The organizing principle behind NAT is both simple and brilliant: rather than assign a unique, network-readable address to every device in your home or business, why not have all of them share the single public address that's used by your router?
But how will traffic flow to and from your local devices? Through the use of private addresses. And if you want to divide network resources into multiple subgroups, how can everything be effectively managed? Through network segmentation.
Here's how it works. When a browser on one of the laptops connected to your home WiFi visits a site, it does so using the public IP address that's been assigned to the DSL modem/router provided by your ISP. Any other devices connecting through the same WiFi network use that same address for all their browsing activity (see figure 7.1).
A typical NAT configuration, showing how multiple local devices - each with its own private address - can all be represented by a single public IP address

A typical NAT configuration, showing how multiple local devices - each with its own private address - can all be represented by a single public IP address

In most cases, the router uses the Dynamic Host Configuration Protocol (DHCP) to assign unique private (NAT) addresses to each local device - but they're unique only in the local environment. That way, all local devices can enjoy full, reliable communication with their local peers. This works just as well for large enterprises, many of which use tens of thousands of NAT IP addresses, all behind a single public IP.
The NAT protocol sets aside three IPv4 address ranges that may only be used for private addressing:
Local network managers are free to use any and all of those addresses (there are more than 17 million of them) any way they like. But addresses are usually organized into smaller network (or subnet) blocks whose host network is identified by the octets to the left of the address, leaving octets to the right available for assigning to individual devices.
For example, you might choose to create a subnet on 192.168.1, which would mean all the addresses in this subnet would start with 192.168.1 (the network portion of the address) and end with a unique, single-octet device address between 2 and 254. One PC or laptop on that subnet might therefore get the address 192.168.1.4, and another could get 192.168.1.48.
Following networking conventions, DHCP servers generally don't assign the numbers 0, 1, and 255 to the final octet of a network device's IP address.
Following through with that example, you might subsequently want to add a parallel - but separate - network subnet using 192.168.2. In this case, not only are 192.168.1.4 and 192.168.2.4 two separate addresses, available to be assigned to two distinct devices, but - because they're on separate networks - the two might not even have access to each other (see figure 7.2).
Devices attached to two separate NAT subnets in the 192.168.x network range

Devices attached to two separate NAT subnets in the 192.168.x network range

Because it's critically important to make sure systems know what kind of subnet a network address is on, we need a standard notation that can accurately communicate which octets are part of the network and which are available to be used for devices. There are two commonly used standards: Classless Inter-Domain Routing (CIDR) notation and netmask. Using CIDR, the first network in the previous example would be represented as 192.168.1.0/24: the /24 tells you that the first three octets (8*3=24) make up the network portion, leaving only the fourth octet for device addresses. The second subnet, in CIDR, would be described as 192.168.2.0/24.
These same two networks could also be described through a netmask of 255.255.255.0. That means all 8 bits of each of the first three octets are used by the network, but none of the fourth.
You don't have to break up the address blocks exactly this way. If you knew you weren't likely to ever require many network subnets in your domain, but you anticipated the need to connect more than 255 devices, you could choose to designate only the first two octets (192.168) as network addresses, leaving everything between 192.168.0.0 and 192.168.255.255 for devices. In CIDR notation, this would be represented as 192.168.0.0/16 and have a netmask of 255.255.0.0.
Nor do your network portions need to use complete (8-bit) octets. Part of the range available in a particular octet can be dedicated to addresses used for entire networks (such as 192.168.14.x), with the remainder left for devices (or, hosts, as they're more commonly called). This way, you could set aside all the addresses of the subnet's first two octets (192 and 168), plus some of those of the third octet (0), as network addresses. This could be represented as 192.168.0.0/20 or with the netmask 255.255.240.0.
Where did I get these notation numbers? Most experienced admins use their binary counting skills to work it out for themselves. But for a short chapter like this, that's a bit out of scope - and unnecessary for the normal work you're likely to encounter. Nevertheless, there are many online subnet calculators that will do the calculation for you.

Understanding the Domain Name System (DNS)

As your website grows and more people discover it, I’m sure you won’t be satisfied with having to identify it by its IP address. A nice, easy-to-remember name like, say, best-site-ever.com will work much better. Let’s learn how that works.
Under all those bright, cheerful web pages with external links displayed as softly chiseled 3D boxes and identified by catchy, easy-to-remember names, it's all about numbers. There's no real place called google.com or wikipedia.org; rather, they're 172.217.3.142 and 208.80.154.224. The software that does all the work connecting us to the websites we know and love recognizes only numeric IP addresses.
The tool that translates back and forth between text-mad humans and our more digitally oriented machines is called the domain mame system. Domain is a word often used to describe a distinct group of networked resources - in particular, resources identified by a unique name like, oh, I don’t know, bootstrap-it.com. As shown in figure 7.3, whenever you enter a text address in your browser, the services of a DNS server are invariably - and invisibly - sought.
DNS address query for stuff.com, and the reply containing a (fictional) IP address

DNS address query for stuff.com, and the reply containing a (fictional) IP address

The first stop is usually a local index of names and their associated IP addresses` stored in a file that's automatically created by the OS on your computer. If that local index has no answer for this particular translation question, it forwards the request to a designated public DNS server that maintains a much more complete index and can connect you to the site you’re after. Well-known public DNS servers include those provided by Google - which uses the deliciously simple 8.8.8.8 and 8.8.4.4 addresses - and OpenDNS.
Until something breaks, you normally won't spend a lot of time thinking about DNS servers - unless, of course, you want your customers to be able to access your website by its plain-text name. For that to happen, you'll have to reserve the name you'd like with a domain name registrar. The job of a registrar is to update the indexes used by the big DNS servers so that translation requests from anyone on the internet can be quickly satisfied.
Once you’re over that critical hurdle, you can go back to ignoring DNS.