TCP
TCP
TCP 协议
Transmission Control Protocol 传输控制协议
-
Source port number: This is the port number of the sending TCP.
-
Destination port number: This is the port number of the destination TCP.
-
Sequence number: This is the sequence number for this segment. This is the offset of the first byte of data in this segment within the stream of data being transmitted in this direction over the connection.
已发送的数据大小, 以 byte 为单位, SYC FIN 计为 1, ACK 不计入
-
Acknowledgement number: If the ACK bit (see below) is set, then this field contains the sequence number of the next byte of data that the receiver expects to receive from the sender.
已接收的数据大小, 以 bytes 为单位
-
Header length: This is the length of the header, in units of 32-bit words. Since this is a 4-bit field, the total header length can be up to 60 bytes (15 words). This field enables the receiving TCP to determine the length of the variablelength options field and the starting point of the data.
-
Reserved: This consists of 4 unused bits (must be set to 0).
-
Control bits: This field consists of 8 bits that further specify the meaning of the segment:
– CWR: the congestion window reduced flag.
– ECE: the explicit congestion notification echo flag. The CWR and ECE flags are used as part of TCP/IP’s Explicit Congestion Notification (ECN) algorithm. ECN is a relatively recent addition to TCP/IP and is described in RFC 3168 and in [Floyd, 1994]. ECN is implemented in Linux from kernel 2.4 onward, and enabled by placing a nonzero value in the Linux-specific /proc/sys/net/ipv4/tcp_ecn file.
– URG: if set, then the urgent pointer field contains valid information.
– ACK: if set, then the acknowledgement number field contains valid information (i.e., this segment acknowledges data previously sent by the peer).
– PSH: push all received data to the receiving process. This flag is described in RFC 993 and in [Stevens, 1994].
– RST: reset the connection. This is used to handle various error situations.
– SYN: synchronize sequence numbers. Segments with this flag set are exchanged during connection establishment to allow the two TCPs to specify the initial sequence numbers to be used for transferring data in each direction.
– FIN: used by a sender to indicate that it has finished sending data.
-
Multiple control bits (or none at all) may be set in a segment, which allows a single segment to serve multiple purposes. For example, we’ll see later that a segment with both the SYN and the ACK bits set is exchanged during TCP connection establishment.
-
Window size: This field is used when a receiver sends an ACK to indicate the number of bytes of data that the receiver has space to accept.
窗口大小, 与缓冲区有关, 随着数据不断接收而减少
-
Checksum: This is a 16-bit checksum covering both the TCP header and the TCP data.
The TCP checksum covers not just the TCP header and data, but also 12 bytes usually referred to as the TCP pseudoheader. The pseudoheader consists of the following: the source and destination IP address (4 bytes each); 2 bytes specifying the size of the TCP segment (this value is computed, but doesn’t form part of either the IP or the TCP header); 1 byte containing the value 6, which is TCP’s unique protocol number within the TCP/IP suite of protocols; and 1 padding byte containing 0 (so that the length of the pseudoheader is a multiple of 16 bits). The purpose of including the pseudoheader in the checksum calculation is to allow the receiving TCP to double-check that an incoming segment has arrived at the correct destination (i.e., that IP has not wrongly accepted a datagram that was addressed to another host or passed TCP a packet that should have gone to another upper layer). UDP calculates the checksum in its packet headers in a similar manner and for similar reasons.
-
Urgent pointer: If the URG control bit is set, then this field indicates the location of so-called urgent data within the stream of data being transmitted from the sender to the receiver. We briefly discuss urgent data in Section 61.13.1.
-
Options: This is a variable-length field containing options controlling the operation of the TCP connection.
- Window sacle: 在 SYN 时发送, 把它作为2的指数,再剩以TCP头中定义的接收窗口,就得到真正的TCP接收窗口了
- Maximum segment size:
-
Data: This field contains the user data transmitted in this segment. This field may be of length 0 if this segment doesn’t contain any data (e.g., if it is simply an ACK segment).
TCP 状态流转图
C : client S : Server
seq, ack 中的 c client 生成的 seq 随机数, s server 生成的 seq 随机数, res response bytes, req resuest bytes
sequenceDiagram
participant C
participant S
Note right of S: LISTEN
rect rgb(191, 223, 255)
C ->> S: SYN seq=c,ack=0
Note left of C: SYNC_SENT
S ->> C: ACK + SYN seq=s,ack=c+1
Note right of S: SYNC_RECV
C ->> S: ACK seq=c+1,ack=s+1
end
Note right of S: ESTABLISHED
Note left of C: ESTABLISHED
rect rgb(191, 223, 255)
S ->> C: ACK [window update] seq=s+1,ack=c+1
C ->> S: ACK seq=c+1,ack=s+1
end
rect rgb(191, 223, 255)
C ->> S: PSH [request] seq=c+1,ack=s+1
S ->> C: ACK seq=s+1,ack=c+req+1
end
rect rgb(191, 223, 255)
S ->> C: PSH [response] + ACK seq=s+1,ack=c+req+1
C ->> S: ACK seq=c+req+1,ack=s+res+1
end
Note over S, C: 之后的 TCP 状态流转和 CS 角色无关,只会区分主动关闭(首先发送 FIN),被动关闭
rect rgb(191, 223, 255)
C ->> S: FIN + ACK seq=c+req+1,ack=s+res+1
Note left of C: FIN_WAIT1
Note right of S: CLOSE_WAIT
S ->> C: ACK seq=s+res+1,ack=c+req+2
Note left of C: FIN_WAIT2
C -> S: [delay]
S ->> C: FIN + ACK seq=s+res+1,ack=c+req+2
Note right of S: LAST_ACK
Note left of C: TIME_WAIT
C ->> S: ACK seq=c+req+2,ack=s+res+2
Note right of S: CLOSED
C -> S: 2MSL
Note left of C: CLOSED
end
Q&A
seq ack 含义
这里的 seq, ack 的大小都是指相对大小
对于 client 而言, seq 表示已经接收到的数据, ack 表示已经发送的数据大小(不包含本次发送的数据), 其中 SYN FIN 的大小记为 1
为什么单个 TCP 包发送的最大字节数为 16388
是否有提供边界的 TCP 协议
SCTP(Stream Control Transmission Protocol), 与 TCP 相似,但提供消息边界
syscall 如下
|
|
什么是 socket pair
IP, 目的 IP 存储在 IP 首部中
socket = IP + port
socket pair: 包含客户I P地址、客户端口号、服务器 I P地址和服务器端口号的四元组
2MSL 的意义
MSL maximum segment lifetime : the assumed maximum lifetime of a TCP segment in the network
TCP 报文在网络的最大存活时间
BSD 系统为 30s, RFC 1122 为 2min
twice the MSL: one MSL for the final ACK to reach the peer TCP, plus a further MSL in case a further FIN must be sent.
重用 TIME_WAIT 状态的 IP+Port, 会出现 EADDRINUSE
不建议主动关闭 TIME_WAIT 的服务器,因为 TIME_WAIT 可以防止被重用的 Addr 收到上一个连接的报文
CLOSE_WAIT TIME_WAIT 的区别
tcp 连接在关闭时会进行四次挥手, 以客户端主动关闭连接为例: Client FIN, Server ACK, Server FIN, Clinet ACK. 其中
- Server ACK 之后 Server 会变为 CLOSE_WAIT
- Server 端要把 buffer 区的内容发送完成(这就是 Server ACK Server FIN 之间存在时间间隔的原因)
- Server FIN, Clinet ACK 之后 客户端 TIME_WAIT
通常,CLOSE_WAIT 状态在服务器停留时间很短,如果你发现大量的 CLOSE_WAIT 状态,那么就意味着被动关闭的一方没有及时发出 FIN 包,一般有如下几种可能
- 程序问题:如果代码层面忘记了 close 相应的 socket 连接,那么自然不会发出 FIN 包,从而导致 CLOSE_WAIT 累积;或者代码不严谨,出现死循环之类的问题,导致即便后面写了 close 也永远执行不到。
- 响应太慢或者超时设置过小:如果连接双方不和谐,一方不耐烦直接 timeout,另一方却还在忙于耗时逻辑,就会导致 close 被延后。响应太慢是首要问题,不过换个角度看,也可能是 timeout 设置过小。
- BACKLOG 太大:此处的 backlog 不是 syn backlog,而是 accept 的 backlog,如果 backlog 太大的话,设想突然遭遇大访问量的话,即便响应速度不慢,也可能出现来不及消费的情况,导致多余的请求还在队列里就被对方关闭了。
如果你通过「netstat -ant」或者「ss -ant」命令发现了很多 CLOSE_WAIT 连接,请注意结果中的「Recv-Q」和「Local Address」字段,通常「Recv-Q」会不为空,它表示应用还没来得及接收数据,而「Local Address」表示哪个地址和端口有问题,我们可以通过「lsof -i:
示例报文
示例1 简单的 http 请求以及响应
|
|