Connection Reset by peer means the remote side is terminating the session. This error is generated when the OS receives notification of TCP Reset (RST) from the remote peer.
Understanding Connection Reset by peer
Connection reset by peer means the TCP stream was abnormally closed from the other end. A TCP RST was received and the connection is now closed. This occurs when a packet is sent from our end of the connection but the other end does not recognize the connection; it will send back a packet with the RST bit set in order to forcibly close the connection.
“Connection reset by peer” is the TCP/IP equivalent of slamming the phone back on the hook. It’s more polite than merely not replying, leaving one hanging. But it’s not the FIN-ACK expected of the truly polite TCP/IP.
Understanding RST TCP Flag
RST is used to abort connections. It is very useful to troubleshoot a network connection problem.
RST (Reset the connection). Indicates that the connection is being aborted. For active connections, a node sends a TCP segment with the RST flag in response to a TCP segment received on the connection that is incorrect, causing the connection to fail.
The sending of an RST segment for an active connection forcibly terminates the connection, causing data stored in send and receive buffers or in transit to be lost. For TCP connections being established, a node sends an RST segment in response to a connection establishment request to deny the connection attempt. The sender will get Connection Reset by peer error.
Check network connectivity with ping command
Ping the remote host we were connected to. If it doesn’t respond, it might be offline or there might be a network problem along the way. If it does respond, this problem might have been a transient one (so we can reconnect now)
Check remote service port is open
A port is a logical entity which acts as a endpoint of communication associated with an application or process on an Linux operating system. We can use some Linux commands to check remote port status.
Check application log on remote server
For example, if the error is related with SSH. we can debug this on the remote server from sshd logs. The log entries will be in one of the files in the /var/log directory. SSHD will be logging something every time it drops our session.
Oct 22 12:09:10 server internal-sftp: session closed for local user fred from [192.0.2.33]
Check related Linux kernel parameters
Kernel parameter is also related to Connection Reset by peer error. The keepalive concept is very simple: when we set up a TCP connection, we associate a set of timers. Some of these timers deal with the keepalive procedure. When the keepalive timer reaches zero, we send our peer a keepalive probe packet with no data in it and the ACK flag turned on.
we can do this because of the TCP/IP specifications, as a sort of duplicate ACK, and the remote endpoint will have no arguments, as TCP is a stream-oriented protocol. On the other hand, we will receive a reply from the remote host (which doesn’t need to support keepalive at all, just TCP/IP), with no data and the ACK set.
If we receive a reply to we keepalive probe, we can assert that the connection is still up and running without worrying about the user-level implementation. In fact, TCP permits us to handle a stream, not packets, and so a zero-length data packet is not dangerous for the user program.
we usually use tcp keepalive for two tasks:
- Checking for dead peers
- Preventing disconnection due to network inactivity
Check Application heartbeat configuration
Connection Reset by peer error is also related to the application. Certain networking tools (HAproxy, AWS ELB) and equipment (hardware load balancers) may terminate “idle” TCP connections when there is no activity on them for a certain period of time. Most of the time it is not desirable.
We will use rabbitmq as an example. When heartbeats are enabled on a connection, it results in periodic light network traffic. Therefore heartbeats have a side effect of guarding client connections that can go idle for periods of time against premature closure by proxies and load balancers.
With a heartbeat timeout of 30 seconds the connection will produce periodic network traffic roughly every 15 seconds. Activity in the 5 to 15 second range is enough to satisfy the defaults of most popular proxies and load balancers. Also see the section on low timeouts and false positives above.
Check OS metric on peer side
Connection Reset by peer can be triggered by a busy system. we can setup a monitoring for our Linux system to the metrics like CPU, memory, network etc. If the system is too busy, the network will be impacted by this.