### 3 Introduction To Berkeley Sockets

#### 3.1 Basic Concepts

##### 3.1.1 Binary versus Text-Oriented Protocols

Before they can exchange information across the network, hosts have a fundamental choice to make. They can exchange data either in binary form or as human-readable text. The choice has far-reaching ramifications.

To understand this, consider exchanging the number 1984. To exchange it as text, one host sends the other the string 1984, which, in the common ASCII character set, corresponds to the four hexadecimal bytes 0x31 0x39 0x38 0x34. These four bytes will be transferred in order across the network, and (provided the other host also speaks ASCII) will appear at the other end as "1984".

However, 1984 can also be treated as a number, in which case it can fit into the two-byte integer represented in hexadecimal as 0x7C0. If this number is already stored in the local host as a number, it seems sensible to transfer it across the network in its native two-byte form rather than convert it into its four-byte text representation, transfer it, and convert it back into a two-byte number at the other end. Not only does this save some computation, but it uses only half as much network capacity.

Unfortunately, there's a hitch. Different computer architectures have different ways of storing integers and floating point numbers. Some machines use two-byte integers, others four-byte integers, and still others use eight-byte integers. This is called word size. Furthermore, computer architectures have two different conventions for storing integers in memory. In some systems, called big-endian architectures, the most significant part of the integer is stored in the first byte of a two-byte integer. On such systems, reading from low to high, 1984 is represented in memory as the two bytes:

0x07    0xC0
low  -> high


On little-endian architectures, this convention is reversed, and 1984 is stored in the opposite orientation:

0xC0    0x07
low  -> high


These architectures are a matter of convention, and neither has a significant advantage over the other. The problem comes when transferring such data across the network, because this byte pair has to be transferred serially as two bytes. Data in memory is sent across the network from low to high, so for big-endian machines the number 1984 will be transferred as 0x07 0xC0, while for little-endian machines the numbers will be sent in the reverse order. As long as the machine at the other end has the same native word size and byte order, these bytes will be correctly interpreted as 1984 when they arrive. However, if the recipient uses a different byte order, then the two bytes will be interpreted in the wrong order, yielding hexadecimal 0xC007, or decimal 49,159. Even worse, if the recipient interprets these bytes as the top half of a four-byte integer, it will end up as 0xC0070000, or 3,221,684,224. Someone's anniversary party is going to be very late.

Because of the potential for such binary chaos, text-based protocols are the norm on the Internet. All the common protocols convert numeric information into text prior to transferring them, even though this can result in more data being transferred across the net. Some protocols even convert data that doesn't have a sensible text representation, such as audio files, into a form that uses the ASCII character set, because this is generally easier to work with. By the same token, a great many protocols are line-oriented, meaning that they accept commands and transmit data in the form of discrete lines, each terminated by a commonly agreed-upon newline sequence.

A few protocols, however, are binary. Examples include Sun's Remote Procedure Call (RPC) system, and the Napster peer-to-peer file exchange protocol. Such protocols have to be exceptionally careful to represent binary data in a common format. For integer numbers, there is a commonly recognized network format. In network format, a "short" integer is represented in two big-endian bytes, while a "long" integer is represented with four big-endian bytes. Perl's pack() and unpack () functions provide the ability to convert numbers into network format and back again.

Floating point numbers and more complicated things like data structures have no commonly accepted network representation. When exchanging binary data, each protocol has to work out its own way of representing such data in a platform-neutral fashion.

##### 3.1.2 Berkeley Sockets

Berkeley sockets are part of an application programming interface (API) that specifies the data structures and function calls that interact with the operating system's network subsystem. Berkeley sockets are part of an API, not a specific protocol, which defines how the programmer interacts with an idealized network.

#### 3.2 The Anatomy of a Socket

A socket is an endpoint for communications, a portal to the outside world that we can use to send outgoing messages to other processes, and to receive incoming traffic from processes interested in sending messages to us.

To create a socket, we need to provide the system with a minimum of three pieces of information.

##### 3.2.1 The Socket's Domain

The domain defines the family of networking protocols and addressing schemes that the socket will support. The domain is selected from a small number of integer constants defined by the operating system and exported by Perl's Socket module. There are only two common domains

Table 3.1. Common Socket Domains
Constant Description
AF_INET The Internet protocols
AF_UNIX Networking within a single host

In addition to these domains, there are many others including AF_APPLETALKAF_IPX, and AF_X25, each corresponding to a particular addressing scheme. AF_INET6, corresponding to the long addresses of TCP/IP version 6, will become important in the future, but is not yet supported by Perl. The AF_ prefix stands for "address family." In addition, there is a series of "protocol family" constants starting with the PF_ prefix.

##### 3.2.2 The Socket's Type

The socket type identifies the basic properties of socket communications.

Table 3.2. Constants Exported by Socket
Constant Description
SOCK_STREAM A continuous stream of data
SOCK_DGRAM Individual packets of data

Perl fully supports the SOCK_STREAM and SOCK_DGRAM socket types. SOCK_RAW is supported through an add-on module named Net::Raw.

##### 3.2.3 The Socket's Protocol

Like the domain and socket type, the protocol is a small integer. However, the protocol numbers are not available as constants, but instead must be looked up at run time using the Perl getprotobyname() function.

Table 3.3. Some Socket Protocols
Protocol Description
tcp Transmission Control Protocol for stream sockets
udp User Datagram Protocol for datagram sockets
icmp Internet Control Message Protocol
raw Creates IP packets manually

The TCP and UDP protocols are supported directly by the Perl sockets API. You can get access to the ICMP and raw protocols via the Net::ICMP and Net::Raw third-party modules.

The allowed combinations of socket domain, type, and protocol are few. SOCK_STREAM goes with TCP, and SOCK_DGRAM goes with UDP. Also notice that the AF_UNIX address family doesn't use a named protocol, but a pseudoprotocol named PF_UNSPEC (for "unspecified").

Table 3.4. Allowed Combinations of Socket Type and Protocol in the INET and UNIX Domains

Domain Type Protocol
AF_INET SOCK_STREAM tcp
AF_INET SOCK_DGRAM udp
AF_UNIX SOCK_STREAM PF_UNSPEC
AF_UNIX SOCK_DGRAM PF_UNSPEC

##### 3.2.4 Datagram Sockets

Datagram-type sockets provide for the transmission of connectionless, unreliable, unsequenced messages. The UDP is the chief datagram-style protocol used by the Internet protocol family.

As the diagram in Figure 3.2 shows, datagram services resemble the postal system. Like a letter or a telegram, each datagram in the system carries its destination address, its return address, and a certain amount of data. The Internet protocols will make the best effort to get the datagram delivered to its destination.

Figure 3.2. Datagram sockets provide connectionless, unreliable, unsequenced transmission of message

There is no long-term relationship between the sending socket and the recipient socket: A client can send a datagram off to one server, then immediately turn around and send a datagram to another server using the same socket. But the connectionless nature of UDP comes at a price. Like certain countries' postal systems, it is very possible for a datagram to get "lost in the mail." A client cannot know whether a server has received its message until it receives an acknowledgment in reply. Even then, it can't know for sure that a message was lost, because the server might have received the original message and the acknowledgment got lost!

Datagrams are neither synchronized nor flow controlled. If you send a set of datagrams out in a particular order, they might not arrive in that order. Because of the vagaries of the Internet, the first datagram may go by one route, and the second one may take a different path. If the second route is faster than the first one, the two datagrams may arrive in the opposite order from which they were sent. It is also possible for a datagram to get duplicated in transit, resulting in the same message being received twice.

Because of the connectionless nature of datagrams, there is no flow control between the sender and the recipient. If the sender transmits datagrams faster than the recipient can process them, the recipient has no way to signal the sender to slow down, and will eventually start to discard packets.

Although a datagram's delivery is not reliable, its contents are. Modern implementations of UDP provide each datagram with a checksum that ensures that its data portion is not corrupted in transit.

##### 3.2.5 Stream Sockets

The other major paradigm is stream sockets, implemented in the Internet domain as the TCP protocol. Stream sockets provide sequenced, reliable bidirectional communications via byte-oriented streams. As depicted in Figure 3.3, stream sockets resemble a telephone conversation. Clients connect to servers using their address, the two exchange data for a period of time, and then one of the pair breaks off the connection.

Figure 3.3. Stream sockets provide sequenced, reliable, bidirectional communications

Reading and writing to stream sockets is a lot like reading and writing to a file. There are no arbitrary size limits or record boundaries, although you can impose a record-oriented structure on the stream if you like. Because stream sockets are sequenced and reliable, you can write a series of bytes into a socket secure in the knowledge that they will emerge at the other end in the correct order, provided that they emerge at all ("reliable" does not mean immune to network errors).

TCP also implements flow control. Unlike UDP, where the danger of filling the data-receiving buffer is very real, TCP automatically signals the sending host to suspend transmission temporarily when the reading host is falling behind, and to resume sending data when the reading host is again ready. This flow control happens behind the scenes and is ordinarily invisible.

Although it looks and acts like a continuous byte stream, the TCP protocol is actually implemented on top of a datagram-style service, in this case the low-level IP protocol. IP packets are just as unreliable as UDP datagrams, so behind the scenes TCP is responsible for keeping track of packet sequence numbers, acknowledging received packets, and retransmitting lost packets.

##### 3.2.6 Datagram versus Stream Sockets

With all its reliability problems, you might wonder why anyone uses UDP. The answer is that most client/server programs on the Internet use TCP stream sockets instead. In most cases, TCP is the right solution for you, too.

There are some circumstances, however, in which UDP might be a better choice. For example, time servers use UDP datagrams to transmit the time of day to clients who use the information for clock synchronization. If a datagram disappears in transit, it's neither necessary nor desirable to retransmit it because by the time it arrives it will no longer be relevant.

UDP is also preferred when the interaction between one host and the other is very short. The length of time to set up and take down a TCP connection is about eightfold greater than the exchange of a single byte of data via UDP (for details, see [Stevens 1996]). If relatively small amounts of data are being exchanged, the TCP setup time will dominate performance. Even after a TCP connection is established, each transmitted byte consumes more bandwidth than UDP because of the additional overhead for ensuring reliability.

Another common scenario occurs when a host must send the same data to many places; for example, it wants to transmit a video stream to multiple viewers. The overhead to set up and manage a large number of TCP connections can quickly exhaust operating system resources, because a different socket must be used for each connection. In contrast, sending a series of UDP datagrams is much more sparing of resources. The same socket can be reused to send datagrams to many hosts.

Whereas TCP is always a one-to-one connection, UDP also allows one-to-many and many-to-many transmissions. At one end of the spectrum, you can address a UDP datagram to the "broadcast address," broadcasting a message to all listening hosts on the local area network. At the other end of the spectrum, you can target a message to a predefined group of hosts using the "multicast" facility of modern IP implementations. These advanced features are covered in Chapters 20 and 21.

The Internet's DNS is a common example of a UDP-based service. It is responsible for translating hostnames into IP addresses, and vice versa, using a loose-knit network of DNS servers. If a client does not get a response from a DNS server, it just retransmits its request. The overhead of an occasional lost datagram outweighs the overhead of setting up a new TCP connection for each request. Other common examples of UDP services include Sun's Network File System (NFS) and the Trivial File Transfer Protocol (TFTP). The latter is used by diskless workstations during boot in order to load their operating system over the network. UDP was originally chosen for this purpose because its implementation is relatively small. Therefore, UDP fit more easily into the limited ROM space available to workstations at the time the protocol was designed.

For the UNIX domain, which can be used only between two processes on the same host machine, addresses are simply paths on the host's filesystem, such as /usr/tmp/log. For the Internet domain, each socket address has three parts: the IP address, the port, and the protocol.

($a,$b,$c,$d)      = split //./, '18.157.0.125';
$packed_ip_address = pack 'C4',$a,$b,$c,$d; ($a,$b,$c,$d) = unpack 'C4',$packed_ip_address;