Keepalive is part of TCP/IP intended to distinguish between active and dead connections that appear to be idle, keep the active ones open, and close dead ones.
  • If keepalive is enabled on a given connection, after the connection remains idle for a period of time (configurable), a heartbeat probe is sent to the remote node.
  • If the remote node does not acknowledge the probe within a time interval (configurable), the probe has failed.
  • If too many probes fail (configurable), the connection is closed.

The kernel parameters that govern this are kept in /proc/sys/net/ipv4, and their default values (which are given in seconds) seem sensible to me, but might be excessive for some situations:
cat /proc/sys/net/ipv4/tcp_keepalive_time
cat /proc/sys/net/ipv4/tcp_keepalive_intvl
cat /proc/sys/net/ipv4/tcp_keepalive_probes

With those settings, no action is taken by the kernel's tcp/ip stack until a connection has been apparently idle for 2 hours.
Then probes are sent. For each probe, we listen for a response for 75 seconds.
A total of 9 probes must be sent and fail, before the connection is closed.

The total elapsed idle time is 2 hours, 11 minutes, and 15 seconds.

More aggressive settings may be useful to some people in two ways: preventing active service sessions from being artificially terminated (by keeping them alive), and in cleaning up half-closed connections. The down side is a few additional packets, which might have been more of an issue in days past. Note that probing only starts if the connection appears to be idle -- this does not occur during normal, active connections. Also note this only applies to connections for which you have enabled the keepalive option.

Nearly every service operates behind a firewall. Depending on how it is configured, a firewall may terminate an apparently idle connection as quickly as 15 minutes on a busy host. On the other hand, the default "close_wait" time on a stateful Linux iptables firewall is 3 days (last time I looked).

So, the default values may be fine for 95% of use cases, but modified settings may be beneficial to those whose connections are inexplicably dying or those whose firewalls are overloaded by tracking dead connections (e.g. from "hang-up call" -style probes from skript kiddies).

These would be set like any other kernel parameter (these are example values only - not suggested values):
echo 600 >/proc/sys/net/ipv4/tcp_keepalive_time
echo 45 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 4 > /proc/sys/net/ipv4/tcp_keepalive_probes

The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf
## Keepalive at 10 minutes; terminate idle at 13 minutes

# start probing for heartbeat after 10 idle minutes (default 7200 sec)

# close connection after 4 unanswered probes (default 9)

# wait 45 seconds for reponse to each probe (default 75

[Edit: typos]

This is nice, you very rarely see settings like this explained so clearly, along with recommended values.

Any more tips like this are quite welcome, thanks.
