November 16th, 2010modify system keepalive times in linux
Because I always forget how.
In any enterprise level application environment, you’ll find that your tiers are segregated by a firewall.
In some cases, you may see this type of architecture
FIREWALL -> WEB -> FIREWALL -> APP -> FIREWALL -> DB
or even
FIREWALL -> WEB -> FIREWALL -> APP/DB
In both designs, which are somewhat similar, you may potentially run into keepalive issues.
Keepalives are essentially messages sent between two devices on a specified interval to verify the state of the connection between them. If a message is not acknowledged by the receiving device, then the transmitting device assumes the connection is down and then will find another way to route data until that connection is re-established (if it does which usually, it doesn’t)
Keepalives are essential in environments where you’re using connection pools. Web servers may sometimes use a connection pool to talk to an application server like tomcat or weblogic. Application servers frequently use database connection pools to ensure that the performance is optimal.
Most connection pools will have a keep alive setting so you should leverage that when you can. Some connection pools do not. Mod_weblogic for example doesn’t have it’s own keep alive value. It can be enabled or disabled but by default, it will use the system keepalive interval which on RHEL/CentOS systems is set to 7200 seconds (two hours).
To check your current system keepalive settings
# sysctl -a | grep net.ipv4.tcp_keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl is the frequency by which keepalive messages are sent.
net.ipv4.tcp_keepalive_probes tells your system how many unacknowledged keepalive messages should be ignored before considering the connection to be dead.
net.ipv4.tcp_keepalive_time tells your system how long to wait before sending the first keepalive message after the last packet. This is the biggie!
I don’t understand why 7200 seconds was chosen as a number. In my environment here, the firewall can drop idle connections after one hour and sometimes even less depending on how big the connection table can get (I’m looking at you checkpoint).
So I normally trim these down so that the keepalive time is less and the number of probes is more. The interval is also reduced by a bit but that’s not really important. You would normally make these changes on the server that is initiating the connection. A webserver, or an application server. Sometimes a DB server but not always.
in /etc/sysctl.conf, add these lines (or modify them if they’re already there)
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20
net.ipv4.tcp_keepalive_time = 300
To put these settings into effect, run
sysctl -p /etc/sysctl.conf
and now retest with sysctl -a
Once set, you will need to restart your webserver or app server so it sees the new settings. This allows you to start with a fresh set of connections that you can actually monitor using netstat.
You should be able to corroborate on both ends of the connection, the ports, state and number of connections which tells you that things are A-OK!
Hope this helps.


