🌐
Computer Networks: A System Approach

Chapter 1: Foundation

1.2 Requirements

To establish a computer network for everyone, we should consider these stakeholders:
  1. An application programmer need the guarantee that each message will be delivered withour error
  1. A network operator want a system that is easy to administer and manage.
  1. A network designer cares about the efficiency and performances.
 
Scalable Connectivity
We call a physical medium a link, and we often refer to the computers as nodes. There are special nodes called switches which forwards data received on one link out on another.
💡
What is the key difference between circuit network and packet network?
Two most common switched networks are circuit network (like telephone) and packet network. Packet networks send blocks of data(packet), and use a strategy called store-and-forward. Switchers will store the packet in their memory then send out at proper time.
We can connect different networks together to constitute an internet, and this is the way how internet became scalable.
A node that is connected to two or more networks is commonly called a router or gateway, and it plays much the same role as a switch—it forwards messages from one network to another.
But connection between each nodes doesn’t mean we have provide host-host connectivity, the final requirement is that each node could recognize and say the nodes on the network it wants to communicate with, which is called addressing.
In addition to node-specific addressing, another requirement of a network is that it supports multicast and broadcast.
Cost-Effective Resource Sharing
For several nodes sharing one physical link efficiently, we use a method called statistical multiplexing. Data is transmitted from each flow on demand block by block. And each block has a limited size which referred to a packet . In this way the link will not be idle at any time and each node has the chance to send message.
Reliable Message Delivery
Reliable message delivery is one of the most important functions that a network could provide.There could be 3 general cases of failure:
  1. bit errors such as lightning strikes.
  1. A complete packet lost.
  1. A physical link is cut.
A network should provide methods to recognize failure and restore them.
Manageability
Network management has historically been a human-intensive aspect of networking, and while it is unlikely we’ll get people entirely out of the loop, it is increasingly being addressed by automation and self-healing designs.

1.3 Architecture

Layering and Protocals
Layering means abstraction, hiding the implementation details behind a well-defined interface. Layering decomposes the netowrk into more manageable components and provide a more modular design(so we can only need to modify the function at one layer and resue others).
The asbtract objects that make up the layers are called protocols. A protocol is built on low-level layers and provides two different interfaces:
  1. service interface for upper level layers to use
  1. peer interface for its counterpart on another machine to communicate
The Set of rules governing the form and content of a protocal graph is called network architecture.
Encapsulation
high-level message is uninterpreted to low-level protocol. RRP(reply-response-protocol) will send header (the control information) with message body to the peer. This process of encapsulation is then repeated at each level of the protocol graph.
Multiplexing and Demultiplexing
The header that RRP attaches to its messages contains an identifier that records the application to which the message belongs to, which is called the demultiplexing key, or demux key
nearly every protocol implements this mechanism. For example, HHP has its own demux key to determine which messages to pass up to RRP and which to pass up to MSP.
7-Layer OSI Model
Open Systems Interconnection (OSI) architecture
  1. physical layer handles the transmission of raw bits
  1. data link layer collects a stream of bits into a larger aggregate called a frame, Network Adaptors implement this level.
  1. network layer handles routing among node within a packet-switched network
  1. transport layer implements a process-to-process channel, in which the unit of data exchanged is commonly called a message.
higher layers typically run only on the end hosts and not on the intermediate switches or routers.
  1. Finally, the session layer provides a name space that is used to tie together the potentially different transport streams that are part of a single application.
  1. the presentation layer is concerned with the format of data exchanged between peers—for example, whether an integer is 16, 32, or 64 bits long
  1. Application layer protocols include things like the Hypertext Transfer Protocol (HTTP)
Internet Architecture
A simpler stack of 7-layer OSI model is oftern used like above.
The Internet Architecture has three features that are worth highlighting:
  1. The Internet architecture does not imply strict layering, the application is free to bypass the defined transport layers
  1. The whole structure is an hourglass shape— wide at the top, narrow in the middle, and wide at the bottom. This shape actually reflects the central philosophy of the architecture. That is, IP serves as the focal point for the architecture—it defines a common method for exchanging packets among a wide collection of networks.
  1. In order for a new protocal to be officially included in the architecture, there must be both a protocol specification and at least one representative implementations of the specification.
We reject kings, presidents, and voting. We believe in rough consensus and running code. (David Clark)
Why network layer(IP) is so important? Why we need this hourglass shape architecture? Of these three attributes of the Internet architecture, the hourglass design philosophy is important enough to bear repeating. The hourglass’s narrow waist represents a minimal and carefully chosen set of global capabilities that allows both higher-level applications and lower-level communication technologies to coexist, share capabilities, and evolve rapidly. The narrow-waisted model is critical to the Internet’s ability to adapt to new user demands and changing technologies.
💡
The complexity of the combination from low level to high level will explode without this narrow-waisted model. Each lower level implementation must provide interface to network layer, and each higher level application should only use IP protocol. It did reduce the complexity of the entire system and cut the relation between application layer and physical layer. Of course we can choose other layer to achieve this, but host-to-host connection is the most repeatable layer in the architecture, which is best for this purpose.

1.5 Performance

Two fundamental way to measure performances: bandwidth(throughput) and latency(delay).
As the figure above shows, bandwidth means the bits transmitted by second. Ona 10-Mnps network, it takes 0.1 mircrosecond to transmit each bit.
The second performance metric, latency, corresponds to how long it takes a message to travel from one end of a network to the other.
Delay × Bandwidth Product
The delay × bandwidth product is important to know when constructing high-performance networks because it corresponds to how many bits the sender must transmit before the first bit arrives at the receiver.
The sender can send up one RTT × bandwidth worth of data before hearing from the receiver that all is well.
For a TCP connection, it means the data size limit you can send for one time, only after you receive the acknowledge signal from receiver, you can send another data unit. Sending data less than this size means you are idle while waiting for acknowledge signal.
The variation in latency in called jitter.
For online streaming video, if they know the upper and lower bound on the latency that a packet can experience, it can delay the tima at which it starts palying back the video long enough to ensure that in thefuture it will always have a frame to dispaly when it needs it.

Continued: End-to-End Arguments in System Design

For a communication system, some types of network functionality can only be correctly implemented end-to-end like Reliability, security, etc. Because of this, end hosts can must satisfy the requirement without network’s help unless there is extra performance required.
  • Solution 1 is incomplete, Recevier has to do the check anyway if the memory is corrputed during switching or receiving
  • Solution 2 is complete, and there is no need for reliability from lower layers.
Is this still valid? What about denial of Service? What about Privacy against Intrusion?

Chapter 2: Direct Links

2.1 Technology Landscape

2.2 Encoding

The network adaptor contains a signalling component that actually encodes bits into signals at the sending node and decodes signals into bits at the receiving node
We can easily map the data value 1 onto the high signal and the data value 0 onto the low signal, which is called non-return to zero(NRZ).But this leads to 2 problems, continous 1 and continuous 0s:
  1. Hard to detect 1s and 0s if continuing 1
  1. Sychronize the clock can be hard (we use the transition from 1 to 0 or 0 to 1 to denote the clock)
We can use NRZI(non-return to zero inverted) to sove the continus 1s: make a transition from the current signal to encode a 1, and stay at the current signal to encode a 0. An alternative, called Manchester encoding, does a more explicit job of merging the clock with the signal by transmitting the exclusive OR of the NRZ-encoded data and the clock.
The problem is Manchester encoding double the rate so it is 50% efficient.
Another method is called 4B/5B, which insert an extra bit into the bit stream for every 4 bits, so as to break up long sequences of 0s or 1s. Conbined with NRZI, they can solve continuous 1s and 0s togther. Note that 4B/5B encoding results in 80% efficiency.

2.3 Framing

Recognizing exactly what set of bits constitutes a frame—that is, determining where the frame begins and ends—is the central challenge faced by the adaptor.
Byte-Oriented Protocols PPP
PPP use byted-oriented framing which use special character known as sentinel characters to indicate where frames start and end.
If the end of the frame or count field is corrupted, you can not detect the end of frame correctly.
Bit-Oriented Protocols HDLC
HDLC denotes both the beginning and the end of a frame with the distinguished bit sequence 01111110. On the sending side, any time five consecutive 1s have been transmitted from the body of the message, the sender inserts a 0 before transmitting the next bit. For any sequence has more than 6 1s continuously will be detected as error.
Both Bit-Oriented Protols and Byte-Oriented Protocols must send the count field which is dangerous if corrupted, the next method will fix this problem.(because of bit stuffing ,they can not control the size of a frame)
Clock-Based Framing SONET
No bit-stuffing is used, so that a frames’s length does not depend on the data being sent.
It is arranged as 9 rows of 90 bytes each, and the first 3 bytes of each row are overhead, with the rest being available for data that is being transmitted over the link. The first 2 bytes of the frame contain a special bit pattern, and it is these bytes that enable the receiver to determine where the frame starts.
When the special pattern turns up in the right place enough times, the receiver concludes that it is in sync and can then interpret the frame correctly.
SONET also use NRZ, the simple encoding. To ensure that there are plenty of transitions to allow the receiver to recover the clock, the payload bytes are scrambled, by calculating the exclusive OR of the data to be transimitted and by the use of a well-known bit pattern which is 127 long , has plenty of transitions form 1 to 0.
One of the reason SONET is designed, because it need to transfer to different rates easily like above.

2.4 Error Detection

Internet CheckSum Algorithm
Compared to our repetition code, this algorithm scores well for using a small number of redundant bits—only 16 for a message of any length—but it does not score extremely well for strength of error detection. For example, a pair of single-bit errors, one of which increments a word and one of which decrements another word by the same amount, will go undetected.
Cyclic Redundancy Check
This Algorithm use polynomial arithmetic to detect errors. By selecting different function , We can detext different types of errors easily. And it can be implemented by XOR gates easily.

2.5 Reliable Transmission

Stop-and-Wait
Such a stop-wait method can make sure all the messages arrive in order. But it can not make full use of the links capacity, so we use sliding window.
Sliding Window
For sender:
  1. every time receive a ACK , LAR += 1, so it will send a new frame, LFS+=1
For receiver:
  1. Any Seqnum larger than LAF(largest acceptable frame) or less than LFR(last frame received) will be discarded.
  1. Let SeqNumToAck denote the largest sequence number not yet acknowledged, such that all frames with sequence numbers less than or equal to SeqNumToAck have been received. The receiver acknowledges the receipt of SeqNumToAck, even if higher numbered packets have been received. This acknowledgment is said to be cumulative. It then sets LFR = SeqNumToAck and adjusts LAF = LFR + RWS.
Finite Sequence Numbers and Sliding Windows
We can only use a finite sequence number to denote the frames. To assure the sequence number should statisfy the following rules:
SWS < (MaxSeqNum + 1) / 2 when RWS=SWS
so as to avoid any sequence Number will not get confused with different incarnations.

2.6 Multi-Access Networks

Ethernet is the dominating Mult-access network in LAN.
transceiver, a small device directly attached to the tap, detected when the line was idle and drove the signal when the host was transmitting. It also received incoming signals. The transceiver, in turn, connected to an Ethernet adaptor, which was plugged into the host.
Multiple Ethernet segments can be joined together by repeaters or (hub), repeating the signals and passing it further.
Access Control
Each host on an Ethernet—in fact, every Ethernet host in the world—has a unique Ethernet address(64-bit MAC Address).
To summarize, an Ethernet adaptor receives all frames and accepts
  • Frames addressed to its own address(unicast)
  • Frames addressed to the broadcast address (broadcast consists of all 1s)
  • Frames addressed to a multicast address, if it has been instructed to listen to that address (the first bit set to 1)
  • All frames, if it has been placed in promiscuous mode
Transmitter Algorithm
When the adaptor has a frame to send and the line is idle, it transmits the frame immediately; there is no negotiation with the other adaptors.
When an adaptor has a frame to send and the line is busy, it waits for the line to go idle and then transmits immediately.
There is no central contol so it is possible for two adaptors to begin transmitting at the same time which is called collide.
At the moment an adaptor detects that its frame is colliding with another, it first makes sure to transmit a 32-bit jamming sequence and then stops the transmission.
In the figure above, A first send a message to B at time t, and reach B at time t + d(delay). In (c), B realize the collision and send a runt frame (short frame to tell collision). If the message is not long enough, the runt frame will arrive A after A finish sending the frame and A will not detect the collision.
To assure A could detect the collision, the frame size should be at least 512 bits(According to longest distance between A and B, the round-trip delay)
Considering that a maximally configured Ethernet is 2500 m long, and that there may be up to four repeaters between any two hosts, the round-trip delay has been determined to be 51.2 μs, which on a 10-Mbps Ethernet corresponds to 512 bits.

Chapter 3 Internetworking

3.1 Switching Basics

A switch’s primary job is to receieve incoming packets on one of its links and to transmit them on some other link. This function is sometimes referred to as either switching or forwarding.
address is the identifier used to identify the end nodes. No two nodes on a network have the same address.
How does the switch decide which output link to place each packet on? There are 3 approaches:
  • datagram or connectionless approach
  • virtual circuit or connection-oriented approach
  • Source Routing
Datagrams
The idea behind datagrams is simple: you just include the destination address in your packet so that the swtich can decide how to get to it.
For each switch, it will maintain a forwarding table which record the port it should use to reach the destination.
Virtual Circuit Switching
 
Virtual Circuit is referred to as a connection-oriented model. We can think of this as a two-stage process.
First step is called “connection setup”. The administrator will create a new virtual connection from host A to host B. And use VCI to identify this path. Each switch will maintain a VC table contains:
  1. A VCI that identifies the connection at this switch.
  1. An incoming interface on which packets for this VC arrive
  1. An outgoing interface in which packets for this VC leave
  1. A potentially different VCI that will be used for outgoing packets
The VCI in different switches can be different. We just need make sure same VCI will not conflict in the same switch.
Once the tables have been set up, the data transfer phase can proceed.
For the example above, a packet turn in with VCI 11 will be sent out with VCI 7 through interface 2.
When host A no longer wants to send data to host B, it tears down the connection by sending a teardown message to switch 1. The switch removes the relevant entry from its table and forwards the message on to the other switches in the path, which similarly delete the appropriate table entries.
 
Some notes about virtual circuit switching:
  • it doesn’t contain full address so as to reduce the data packet and speed up the look up
  • setup connection takes time. So there is at least one RTT of delay before data is sent.
One of the most common applications of virtual circuits for many years was the construction of virtual private networks (VPNs)
Source Routing
The packet contains all the information about the network topology that is required to switch a packet across the network is provided by the source host.

3.2 Swtiched Ethernet

Ethernet has its own limits: distance less than 2500m and no more than two repeaters between any pair of hosts. If we want to build a scalable network, we should connect the ethernets with nodes.
The simplest variants, we can just put a node which differ from a repeater (only operates on bits), Instead fully implement the Ethernet’s collision detection and media acces protocols on each interface.
All the bridge can maintain a forwarding table to forward the frame to specified part.
At first, the table is empty. Once the switch got a message from A on port 1, it can update the table (A, 1). If it doesn’t know where the destination is, it will send the frame to all the neighbors. A timeout is associated with each entry and the bridge discards the entry after a specified period of time.
Spanning Tree Algorithm
If there is a loop in the network, the frames will potentially get forwarded forever between switches.
So an algorithm to extract the spanning tree of original netowrk is very important.
Every node will be assigned with a unique number.
 
L2-based network do not scale because broadcast does not scale. To forward message and setup the network, each switch will send messages to all the neighbors nearby.
Another limitation of L2-based netowrk is lack of support for heterogeneity. So they can only use broadcast because MAC address differences a lot from each other.

3.3 Internet(IP)

3.3.2 Service Model

The philosophy used in defining the IP service model, therefore, was to make it undemanding enough that just about any network technology that might turn up in an internetwork would be able to provide the necessary service.
The IP service model can be thought of as having two parts: an addressing scheme which provides a way to identify all hosts in the internetwork, and a datagram connectionless model of data delivery.
Datagram Delivery
The IP datagram service is called best effort means that if something goes wrong, the network will do nothing — it made its best effort.
Packet Format
For the data at inernetworking layer and above, packets are designed to align on 32-bit boundaries to simplify the task of processing them in software.
  • Version: IPV4/V6
  • HLen: Header Length, default 5 words
  • TOS: type of service (e.g. low delay sevice type need a special queue)
  • Length: The length of the datagram maximum 65536(2^32)
  • TTL : time to live. A count number which record the hop. Each router decrement by 1, default 64. If TTL=0, which shows it waste too many resources and will be discarded.
  • Protocol : high-level protocol TCP(6), UDP(17)
  • CheckSum: Sum of IP headers
  • Source Addr: Address
  • Destination Addr : destination address
Fragmentation and Reassembly
Every netowrk type has a maximum transmission unit(MTU), which is the largest IP datgram that it can carry in a frame.
If the packet is larger than the MTU of a router, the router will conver it into several fragmentations. These fragmentations will be reassemblied in the destination host.
We use Ident and Offset to indicate the fragmentations.
IP reassembly is far from a simple process which cost the host time to wait and even give up. Ip fragmentation is generally considered a good thing to avoid.

3.3.3 Global Address

Addressing scheme is necessary for a global internet. Ethernet addresses are unique but they are also flat which means they have no structure and provide very feww clues to routing protocols. In contrast IP addresses are hierarchical by which they are made up of several parts that correspond to som e sort of hierarchy in the internetwork.
All the IP address has two parts, network part and host part.

3.3.4 Datagram Forwarding in IP

For any node, whether it is a host or a router, it first tries to establish whether it is connected to the same physical network as the destination.
If the node is not connected to the same physical network as the destination node, then it needs to send the datagram to a router which is called next hop router
Normally, there is also a default router that is used if none of the entries in the table matches the destination’s network number.
By aggregating the addresses through network part, routers only need to store local routing tables and a network table. This Hierarchical aggregation reduce the information stored in each node and achieve scalability.

3.3.5 Subnetting and Classless

Classes ABC helps build scalable internetworks. But it wastes a lot of addresses if the customer wants more than 255 hosts (Class C) but far more fewer than 65536 addresses.
Subnetting is one way to solve this problem. A single network number can be shared among multiple networks with a subnet mask.
AND the subnet mask with address will get a subnet number which shows if the hosts are in the same physical netowrk.
To support subnetting, the table must now hold entries of the form (SubnetNumber, SubnetMask, NextHop).
The router will try these masks one by one to find the interface.
Classless Addressing
Subnetting can not solve the problem we mentioned before that The waste of B class addresses. We need a method to aggregate networks together and assign carefully.
Instead of handing out 16 addresses at random, we can hand out a block of contiguous class C addresses. Suppose we assign the class C network numbers from 192.4.16 through 192.4.31.
CIDR requires a new type of notation to represent network numbers, or prefixes as they are known, because the prefixes can be of any length. The convention is to place a /X after the prefix, where X is the prefix length in bits.
The x number is just like the subnet mask to show how many prefix bits of addresses can be seen as a group.
In a forward table, the same address may match different interfaces. For example, we might find both 171.69 (a 16-bit prefix) and 171.69.10 (a 24-bit prefix) in the forwarding table of a single router. In this case, a packet destined to, say, 171.69.10.5 clearly matches both prefixes. The rule in this case is based on the principle of “longest match”; that is, the packet matches the longest prefix, which would be 171.69.10 in this example.

3.3.6 Address Translation (ARP)

The packet itself only contains IP address but not MAC address of the destination. So we need a method to translate IP address to physical address of the host.
A more general solution would be for each host to maintain a table of address pairs; that is, the table would map IP addresses into physical addresses. While this table could be centrally managed by a system administrator and then copied to each host on the network, a better approach would be for each host to dynamically learn the contents of the table using the network. This can be accomplished using the Address Resolution Protocol (ARP).
The goal of ARP is to enable each host on a network to build up a table of mappings between IP addresses and link-level addresses.If a host wants to send an IP datagram to a host (or router) that it knows to be on the same network (i.e., the sending and receiving nodes have the same IP network number), it first checks for a mapping in the cache. If no mapping is found, it needs to invoke the Address Resolution Protocol over the network. It does this by broadcasting an ARP query onto the network. This query contains the IP address in question (the target IP address).

3.3.7 Host Configuration(DHCP)

It is not possible for IP address to be configured by manufacturer since they should show the information about network topology. For this reason, IP address need to be reconfigurable.
For these reasons, automated configuration methods are required. The primary method uses a protocol known as the Dynamic Host Configuration Protocol (DHCP).To contact a DHCP server, a newly booted or attached host sends a DHCPDISCOVER message to a special IP address (255.255.255.255) that is an IP broadcast address.
While discussions of scaling often focus on keeping the state in network devices from growing too fast, it is important to pay attention to the growth of network management complexity.

3.3.8 Error Reporting(ICMP)

IP is always configured with a companion protocol, known as the Internet Control Message Protocol (ICMP), that defines a collection of error messages that are sent back to the source host whenever a router or host is unable to process an IP datagram successfully
One of the most useful control messages, called an ICMP-Redirect, tells the source host that there is a better route to the destination. host A send a packet to R1. If R1 find a better path in R2 for host B, it will send back an ICMP-Redirect to the host, instructing it to use R2 for all future datagrams addressed to that destination.
ICMP also provides the basis for two widely used debugging tools, ping and traceroute.

3.3.9 Virtual Netowrks and Tunnels

Our data packets are very vulnerable while passing among the routers. Anyone suspicious can penerate the routers and extract the packets.Virtual Private Network solves this problem by creating a tunnel to send the data.
First it will build a tunnel from router of host and router of destination (they all need support VPN). Then All the data packet will be encapsulated with a new header with the IP of destination router. The original IP payload and header will be encrypted for security. When R2 receive the packet, and find the destination is itself. It will decrypt the data and send it to real destination.

3.4 Routing

Forwarding consists of receiving a packet, looking up its destination address in a table, and sending the packet in a direction determined by that table.
Routing is the process by which forwarding tables are built. It depends on complex distributed algorithms, and is often referred to as the network’s control plane
Distance Vector(RIP)
Distance Vector algorithm calculates the routines by sharing all the information with neighbors. Everyone get the forwarding tables from the neighbor, update its own one and sending it to its own neighbors. A loop problem could exist in this process.
Link State(OSPF)
The basic idea behind link-state protocols is very simple: Every node knows how to reach its directly connected neighbors. It can just disseminate the totality of this knowledfe to every node, then every node will have enough knowledge of the network. So they can build the shortest path table by themselves.
Metrics
Previous Algorithms take for granted that each node knows the cost to reach its neighbors. In reality, we use Metrics to capture the cost between links.
Original Metrics algorithm took account of the delay, bandwidth and queue load but had a lot of problems.Under light load, it worked reasonably well, since the two static factors of delay dominated the cost. Under heavy load, however, a congested link would start to advertise a very high cost. This caused all the traffic to move off that link, leaving it idle, so then it would advertise a low cost, thereby attracting back all the traffic, and so on.
A third approach addressed these problems. The major changes were to compress the dynamic range of the metric considerably, to account for the link type, and to smooth the variation of the metric with time. They limit the dynamic range between the most expensive link and the least expensive link. Use average utilization to avoid temporal changes.

Chapter 4 Advanced Internetworking

4.1 Global Internet

Routing Areas
As a first example of using hierarchy to scale up the routing system, we’ll examine how link-state routing protocols (such as OSPF and IS-IS) can be used to partition a routing domain into subdomains called areas.
A router that is a member of both the backbone area and a nonbackbone area is an area border router (ABR), such as R1, R2 and R3.
The link-state advertisements of routers that are not area border routers do not leave the area in which they originated. The information from non-backbone area should be sent to a ABR, then transitted to other areas.
When dividing a domain into areas, the network administrator makes a tradeoff between scalability and optimality of routing. The use of areas forces all packets traveling from one area to another to go via the backbone area, even if a shorter path might have been available. For example, even if R4 and R5 were directly connected, packets would not flow between them because they are in different nonbackbone areas.

4.1.2 Interdomain Routing (BGP)

Chapter 5 End-to-End Protocols

5.1 Simple Demultiplexor(UDP)

User Datagram Protocol(UDP) is the simplest possible transport protocol that extends the host-to-host delivery service of network.
The header of transport layer protocol contains an indentifier(port) for both sender and receiver of the message to identify different processes.
UDP doesn’t implement flow control or reliable/ordered delivery. It does ensures the correctness of the messgae by the use of a checksum which takes the UDP header, the contents of the message body and three fields from IP header?

5.2 Reliable Byte Stream(TCP)

Chapter 6 Congestion Control

6.1 Issues in Resource Allocation

By resource allocation, we mean the process by which network elements try to meet the competing demands that applications have for network resources—primarily link bandwidth and buffer space in routers or switches.
We use the term congestion control to describe the efforts made by network nodes to prevent or respond to overload conditions.
Network Model
  • Router-Centric vs Host-Centric: based on router or host congestion control
  • Reservation-Based versus Feedback-Based: In Reservation-Based system, host asks the network for a certain amount of capacity. For a feekback-based sytem, the host begins sending data without initial capacity and adjust their sending rate according to the feedback they receive.
  • Window-Based vs Rate-Based.
Evaluation Criteria
Power = Thoughput / Delay
The problem with this strategy is that increasing the number of packets in the network also increases the length of the queues at each router. Longer queues, in turn, mean packets are delayed longer in the network
Fair Resource Allocation:

6.2 Queuing Disciplines

FIFO
First-In-First-Out Queue is the simplest queue. It is also a drop policy that packets arrive at the tail end of the FIFO are dropped.
Fair Queueing
The entire congestion-control mechanism is implemented at the sources and FIFO queuing does not provide a means to police how well the sources adhere to this mechanism, it is possible for an ill-behaved source (flow) to capture an arbitrarily large fraction of the network capacity.
Fair Queueing maintains a separate queue for each flow and transmit these queues in a sort of round-robin.
However, the length of packets are different. What we want is a bit-based round-robin. Thus, for each flow, we calculate the for the arring time of reach packet plus packet lengthand the next packet to transimit is always the packet that has the lowest timestamp. When the other packet is transmitting, a later packet even with lower timestamp can not preempt.

6.3 TCP Congestion Control

6.3.1 Additive Increase/ Multiplicative Decrease

TCP maintains a new state variable for each connection, called CongestionWindow , which is used by the source to limit how much data it is allowed to have in transit at a given time.
Multiplicative Decrease means once detected timeout packet, the congestion window will decrease to half of previous size.
Every time the source successfully sends a CongestionWindow ’s worth of packets—that is, each packet sent out during the last round-trip time (RTT) has been ACKed—it adds the equivalent of 1 packet to CongestionWindow .That is, rather than incrementing CongestionWindow by an entire MSS bytes each RTT, we increment it by a fraction of MSS every time an ACK is received.

6.3.2 Slow Start

There are actually two different situations in which slow start runs.
The first is at the very beginning of a connection, at which time the source has no idea how many packets it is going to be able to have in transit at a given time. In this situation, slow start continues to double CongestionWindow each RTT until there is a loss, at which time a timeout causes multiplicative decrease to divide CongestionWindow by 2.
After that, TCP introduces a temporary variable to store the target window, typically called CongestionThreshold , that is set equal to the CongestionWindow  value that results from multiplicative decrease. The slow start will only last until reach CongestionThreshold.

6.3.3 Fast Retransmit and Fast Recovery

Fast retransmit is a heuristic that sometimes triggers the retransmission of a dropped packet sooner than the regular timeout mechanism. The fast retransmit mechanism does not replace regular timeouts; it just enhances that facility.
This second transmission of the same acknowledgment is called a duplicate ACK . When the sending side sees a duplicate ACK, it knows that the other side must have received a packet out of order, and retransmit the missing packet(ACK+1).
When the fast retransmit mechanism signals congestion, rather than drop the congestion window all the way back to one packet and run slow start, it is possible to use the ACKs that are still in the pipe to clock the sending of packets. This mechanism, which is called fast recovery, effectively removes the slow start phase that happens between when fast retransmit detects a lost packet and additive increase begins.

6.3.4 TCP CUBIC

CUBIC’s primary goal is to support networks with large delay × bandwidth products, which are sometimes called long-fat networks. Such networks suffer from the original TCP algorithm requiring too many round-trips to reach the available capacity of the end-to-end path.
One important aspect of CUBIC’s approach is to adjust its congestion window at regular intervals, based on the amount of time that has elapsed since the last congestion event (e.g., the arrival of a duplicate ACK), rather than only when ACKs arrive (the latter being a function of RTT).
The idea is to start fast but slow the growth rate as you get close to  , be cautious and have near-zero growth when close to , and then increase the growth rate as you move away from .