01. TCP/IP overview
Let's see, how the task «organize data transfer between two arbitrary computers» task can be solved from scratch.
To solve these complex task we must divide it into subtasks and solve them independently.
1. Media
What data transmission media can we use?
- Two wires
- For twisted pairs
- Coaxial cable
- Radio
Truck with bed full HDD or BlueRay
1.1 Data representation
If binary, what is 0 and what is 1
- COM port two-wire: zero/non-zero potential
Twisted pair: Low-voltage_differential_signaling
- Coaxial: wave modulation
- ...
1.2 Data encoding
Representation is not enough:
- Frequency (how many bits per second?)
- Distortion and error detection
Error compensation (so called parity control)
- E. g. if zero potential is 0, how we detect if it's actually 0 or the wire is just cut off?
- Adaptation/optimization to the media
2. Interface
How computers use the selected media?
From here we must select one of these:
Circuit_switching network
- N nodes, M channels
- Two nodes hold a channel
- Transmit all data
- Release the channel
Disadvantage: M is always < N*(N-1)/2
- N nodes, M channels
Packet_switching network
- One media/N nodes
Data is divided to relatively small packets
- A node can send one packet at the time
- Disadvantage: there's no guaranteed data transfer rate, it depends on how overloaded the media is
- One media/N nodes
Suppose we've selected packet switching.
- ⇒ We need to specify source and destination addresses in case there's more than two hosts connected to the media
2.1 Transfer convention
What data packet shall contain to be successfully sent from one node to another
- How to distinguish packet from non-packet/noize
- How to detect data corruption
- How to identify sender and recipient
2.2 Media utilization discipline
- How to send a packet in the same time with other nodes over joint media?
How not to receive/eavesdrop foreign packets?
Example: ethernet
(2.1) preamble of the packet (additional fixed 0/1 sequence) ti detect packet start
(2.1) Frame_check_sequence (checksum) to verify packet integrity
(2.1) MAC_address to identify nodes
Ethernet packets are called frames
(2.2) Carrier-sense_multiple_access_with_collision_detection
- Can sense if media is not busy, so a packet can be sent
- When sending, can detect a collision (that other device is sending packet at the same time)
In both previous cases, stop a transmission, wait with random timeout and retry it
On recurrent busy/collision wait longer time and retry
Carrier-sense multiple access with collision detection is smart, but makes transmission time unpredictable.
Raw old ethernet local network is functioning normally until 30% of all the time it's busy, and 70% stale. If load goes over 50%, the network is considered inoperable.
Example: Token_ring
Not much to say, that's ancient technology
- In simple words:
- Media is like ring railroad with nodes as stops
- There's train runs round the ring (token)
- When train makes a stop (a node got the token), the node
- can detach a car routing to this stop (receive packet addressing to this node)
- and attach a car routing to another stop (send a packet to another node)
This policy has guaranteed delivery speed, but slow.
Invariants
- Incapsulation
All higher level data is separated into pieces (this is called fragmentation), each fit to be payload of a current level packet (level 2 packet is frame). Then data is wrapped into packet metadata (e. g. ethernet: MAC-addresses, type, checksum etc) and then sent.
- Independence
- There's (theoretically) no dependence on lower level implementation, e. g. ethernet frames are just the same, either if media is twisted pair or coaxial.
3. Global network
The task of data transfer is solved, but:
- locally (in the space of single media)
- without any data integrity / transmission control / data interpretation
So we shall continue. Next subtask is to unite all local networks (sets of nodes bound by joint media) into global network
- Router
- a node of two or more local networks, that can retransmit packets from one network to another
- Route
- a chain of routers leads from sender to recipient
- Routing
a process of determining this chain
- Host
- a node of global network
3.1 Identifying and routing
- All hosts should have unique identifiers
⇒ interface-level addressing (e. g. using of MAC-addresses) is inappropriate (the host may not have one, for example)
When sending a packet from one host to another, in case they're not connected to the same media, there should be algorithm to determine what router to chose to send packet instead of recipient (which therefor is not accessible directly by media)
3.2 Dynamic connectivity
No host can bear a full routing map of global network:
- It changes every second
Everybody can change a route without informing all others
But some information on what is connected to what is critical. Question is: what is «what»?
E. g.: the Internet
- (3.1) Addressing:
every host has an (almost) unique IP_address
There's some address groups, that can be non-unique, for use in intranet, the set of local networks under single administration
- (3.1) Routing:
- every IP address is assigned to a network interface connected to the media
Classic (topological): every IP address is divided to network address and node address.
- If network addresses of two nodes are identical, they treated as connected to single media, so no routing is needed.
If network addresses of two nodes are different, there must be special routing table entry pointing to the router that can deliver packet to the recipient. The shorter network address is, the longer is node address and more nodes can belong to this network. IPv4 network address of 0 bits (0.0.0.0) means all the internet
- Typical network setup:
- One local network
- One default router for all recipient not belonging to local network
- Non-classic (source based / policy based) routing
- (3.2) Exterior dynamic routing:
A set of single-administered local networks forms autonuomus system
All AS'es should announce their availability and other properties to each other (for example, via Border_Gateway_Protocol), and update routing tables
4. Data flows
So data may be transfered between any two host of the global network. Let's be sure if it was be transfered without corruption (or don't bother) and what can be done to avoid incidents.
4.1 Organize data into a flow
- Integrity: is data corruption, reorder, duplicate or loss permitted?
- Confirmation: is data transfere needs to be confirmed?
4.2 Flow control
- Separate multiple data flow between the same two hosts
- Control bandwidth of the flow according to the hosts/data channel capabilities
E. g. TCP/IP
TCP: stream model
- Bi-directional
- Needs to establish a connection
- Needs to confirm every package received
- Keeps the order and the checksum of the session packages, sends error if something going wrong
- Retransmit every unconfirmed/lost/corrupted package
UDP: datagram model
- One-way
- No connection
- No confirmation
- No ordering
- No error detection/correction
Principles:
- Asymmetric host roles:
- Client: initiates a connection
- Server: accepts a connection (and serve though it)
- Full connection ID: 4 numbers:
- server IP
- server port (dedicated to service, e. g. 443 for HTTPS, 22 for SSH etc.)
- client IP
- client port (random)
5. Interpretation of data
So the data is delivered. But what it is?
5.1 Associate connection to an interpreter
- By dedicated port
Not always (remember sugon SSH port?☺)
- TCP flow can be interpreted like simple stream I/O
- ⇒ the interpreter can know nothing about network at all!
5.2 Interpret
- Mostly out of OS duty (HTTP-server, SQL-server, …)
- Exception: DNS
- Optional: network filesystems etc.