Read all of these before you start:
- A load balancer is used to distribute requests to one of a number of Apache web servers. It doesn't matter if they are sticky or not, as mod_jk will select the correct tomcat (or the next if the main one isn't there).
- Apache 1.3 or 2.2 is the webserver, and communicates with the tomcats v5.5 via mod_jk. Apache can decide which tomcat to send to depending on the session ID. Later versions may work differently.
- Multiple tomcats receiving requests from multiple Apaches, with either TCP ring or DB db persistence
- There is no complete documentation. A basic setup is fairly straight forward, but this is not suitable for production. In depth knowledge and testing is required.
- This document is created from memory and a quick lookup of docs - There will be some small missing items ill update when I have time. If you see anything, please email me. simon_hohbs AT hotmail.com
- When you setup session clustering tomcat cluster configs are instance specific. That means if the administrator copies these config from another server, e.g. from test to live, or from a another live server, then it will fail.
- From the documentation and from experience, using TCP ring at least, TOMCAT DOESN'T SUPPORT HOT DEPLOY WHERE YOU TAKE THEM DOWN ONE AFTER THE OTHER. Tomcat can fail a session over to a second, but doesn't fail back! IN fact, by default, if one TC goes down, and a second takes over, when the first comes up, it invalidates the session on both! The best that you can manage using the tomcat session clustering is to fail over to the second one, and stay there (using an extra config param). Checkout this excerpt from the tomcat cluster guide:
1. TomcatA starts up
2. TomcatB starts up (Wait that TomcatA start is complete)
3. TomcatA receives a request, a session S1 is created.
4. TomcatA crashes
5. TomcatB receives a request for session S1
6. TomcatA starts up
7. TomcatA receives a request, '''''invalidate is called on the session (S1)'''''
8. TomcatB receives a request, for a new session (S2)
9. TomcatA The session S2 expires due to inactivity.
- If you let the tomcat do the war unpacking, it can take a long time during which time the server is effectively down. We always unpacked the all wars by script during the actual tomcat downtime.
- The jsessionid can be passed by the webserver in the URL or in the post. If in the URL, then when people send links to other poeople, they are in danger of giving the other person a logged in session. With the wicket framework, it was not possible to configure it to not use URL as well as the post, so we setup an appache rule to remove it from the URL.
- We had to switch off the default dirty session optimization using useDirtyFlag="false", as some places in the code of one of our dependencies (wicket) was manipulation the session data outside of the official session accessor methods. This has a significant impact on the amount of session writes, so using dirty would be an advantage if possible. This was reported to be fixed in a later version of wicket.
- Compile or download the binary for mod_jk for your OS and apache version.
- Create your worker.properties. Here is one of mine:
# List the workers name
# First worker
# Second worker
# Load Balancer worker
- added connection_pool_timeout=120. This is not strictly necessary, but would drop unused connections (and hence threads on tomcat). This would be useful after a failover & bring up scenario I believe. From the docs: "Each child could open an ajp13 connection if it have to forward a request to Tomcat, creating a new ajp13 thread on Tomcat side. The problem is that after an ajp13 connection is created, the child won't drop it until killed. And since the webserver will keep its childs/ threads running to handle high-load, even it the child/thread handle only static contents, you could finish having many unused ajp13 threads on the Tomcat side. "
- added socket_timeout=20 (s) to workers. This is also to help the non responding tomcat holding the jk connectors open for ever, killing apache. From the docs: "Socket timeout in seconds used for the communication channel between >JK and remote host. If the remote host does not respond inside the timeout specified, JK will generate an error, and retry again. If set to zero (default) JK will wait for an infinite amount of time on all
- Added retries=1 to the workers. If there is a communication error with tomcat, by default it retires 2 times. If this is the credit card submission page - we don't want that. 1= the first try (i.e. no retries I assume)
- Added connect_timeout=20000 to the workers. This should "fix" the problem where apache "hangs" waiting for a hung tomcat. "Connect timeout property told webserver to send a PING request on ajp13 connection after connection is established. The parameter is the delay in milliseconds to wait for the PONG reply. The default value zero disables the timeout (infinite timeout). This features has been added in jk 1.2.6 to avoid problem with hung Tomcat's and require ajp13 ping/pong support which has been implemented on Tomcat 3.3.2+, 4.1.28+ and 5.0.13+. Disabled by default."
- Add the jk stuff to your appache conf file, e.g.
# Tomcat Connector
LoadModule jk_module /export/www/libexec/mod_jk.so
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkMount / loadbalancer
JKMount /*/ loadbalancer
JKMount /* loadbalancer
JkUnMount /flash/* loadbalancer
JkUnMount /_include/* loadbalancer
JkUnMount /images/* loadbalancer
JkUnMount /media/* loadbalancer
JkUnMount /flv/* loadbalancer
JkUnMount /wallpapers/* loadbalancer
JkUnMount /static/* loadbalancer
JkUnMount /landing.txt loadbalancer
- Enable the JK manager access from inside network only, e.g:
Deny from all
Allow from 172.30
- Add the valve, to your server.xml (note each tomcat will be different!)
enableLookups="false" redirectPort="8443" protocol="AJP/1.3" />
<Engine name="Catalina" defaultHost="localhost" jvmRoute="worker1">
<Host name="localhost" appBase="webapps"
<Valve className="org.apache.catalina.cluster.session.JvmRouteBinderValve" enabled="true" sessionIdAttribute="takeoverSessionid"/>
<ClusterListener className="org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener" />
- I removed .html from the Valve filter, as we have html pages run through tomcat therefore requiring sessions.
- sessionIdAttribute="takeoverSessionid" is required if you want the session to permanently move to the new tomcat. If you dont do this, when the old one comes back up, it immediately invalidates the session.
- This deployment uses TCP ring.
Your web app.
- Yes, you have to change your webapp. Make sure your web.xml has the <distributable/> element or set at your <Context distributable="true" />
- All your session attributes must implement java.io.Serializable. This is true for any frameworks you use also.
- Make sure you setup jkstatus aka jkmanager (the window to mod_jk on apache) on a non external url, and make sure you can hit each Apache with it, even through your load balancer. You can setup automated monitoring on this (e.g. using nagios). Using jkstatus, you can take tomcats up and down quickly to aid senario testing, without touching the tomcats.
- Use JMX on your test tomcat boxes. This can tell you the exact state of the tomcat cluster, valves and allow you to drill down into the session objects.
- Don't forget the distributable flag in your web.xml
- If the web application fails, wither with a page not found, or some server error, you have to decide what you want mod_jk on Apache to do. If it fails the session over and shuts down access to the tomcat, or retries, or gives up is very important.
- In each tomcats server.xml, you have to make sure it has the correct (and different) jvmRoute="workername"
- make sure that all the servers are synced with NTP, ideally to UTC
Conclusion and experiences.
- Setting up load balancing and fail over requires an intimate knowledge of every parameter of mod_jk and tomcats valve and cluster. For example, the defaults will cause Apache to fail if a tomcat hangs but keeps the connection open.
- Setting up session replication is tricky, and easy to break if your sys admins are not up to speed on the consequences of any changes they make to any of the configs. E.g. copying a server.xml from one tomcat to another will break the session management but not give any errors.
- With TCP ring at least, Tomcat could not be configured (and is documented as such) to support hot deployment, where you take one down, update it, bring it back up, take the other down, etc. It looses the session.
- For just failover, TCP ring works well and is reasonably easy to configure for a small number of tomcats (e.g. <5).
- DB replication using mysql can support a large number of tomcats and users (10,000 sessions), but does crash with corrupt db at least a couple of times a year. A mysql admin is required to check the db size and health periodically. The indexing scheme is important.
- There are a number of options for when to persist session data. Immediate write-through is the safest option, and also the slowest/highest bandwidth. We had issues in some data not being persisted when using the dirty policy.
- resin has far superior session fail over and hot deployment.
- You don't need to use tomcats session handling - you can do it yourself with a cookie, and storing the data yourself in the DB. But make sure you write through - i.e. every change to a variable which need saving shoule be saved immediately.
- I didn't have time to try the built in cluster deployment, by which you deploy to one server and it deploys to the others automatically. I would assume this loses sessions, but worth trying.
- Monitoring with jkstatus on live, and jkstatus and JMX on your test setup is essential.