This past week has been one heck of a roller coaster. Maintaining a web application is one thing, maintaining a web application that crashes every 10 mins is a totally different thing.
This particular application is built on Java, utilizes spring and hibernate to store and retrieve data from a MySQL database. We use c3p0 to manage connection pooling.
Everything was going on smoothly up until a week ago when we pushed a few updates that restricted the workflow of certain processes as per the client request, then the crashes began. 10 to 20 minutes would not pass before the application run hit max connections and subsequently locked out all users.
MySql goes awaythen the Application promptly goes numb taking Glassfish with it . What to do!
The immediate thing I think of is to increase memory utilized by `JVM` to salvage Glassfish and my application, but application still crashes. Glassfish does not freeze this time and I can log in to admin and check logs.
There is a spike in the number of connections just before the application crashes so naturally my next target is optimizing the database. Here is where I went round in circles, query cache, table cache, max connections and buffer pool size. Mysqltuner did a great job with its recommendations, but since I could not run the Application for more than 24 hrs straight it wasn’t helpful much. Sometimes it bought me an extra 20 minutes or so, sometimes it went right under in less than 5 minutes. (Learned a lot about the power of InnoDB storage engine here)
That left me with one more target, where I should have started. The application itself. At this point I have installed and tested several server monitors and performance agents. I finally settled on **Monitis.com**
* Overall System memory
* Java memory utilization
* Mysql memory utilization
* Threads connected to Mysql
* and slow queries are the main indicators I was following
A lot of staring at graphs and tip toeing through log files followed. I found lists and queries that need to be optimized. Old code that looped through Hibernate Sets where thrown out of the window in place of Projections and direct findByQuery alternatives where possible.
It is best to let the database to handle as much of the calculations and comparisons directly in the database, it is way faster especially if the columns are indexed.
This improvements made significant changes to the Applications performance but it was still crashing after a while, not too frequent now but still crashing. Now I had cured almost if not all the symptoms but the cause still remained elusive.
So I went back to the logs, this time not combing through the entire log files but just the 5 mins before and after the connections spiked. What I realized is that every time the connection spiked (although not too frequent now since other contenders had been neutralized) there was this particular table that was being extensively accessed, and interestingly enough we had added a blob column to this table to handle file attachments. I immediately suspected that checked out connections where getting held up here due to the large size of data it scans through every time the applications need some data from the table even if what the application is looking for is not the attachment.
My immediate thought was to rework the application to save attachments to file but that would be an headache for already existing attachments. I did some Googling about indexing blob columns but that did not look like it would solve the issue as table scans would still spend some time on the blobs, then I saw a suggestion to have attachments as a different table and I immediately knew that was the savior. B)
Implementing that brought my graphs and logs to a slow and normal wavy rhythm and that’s when I knew I had nailed it. What separating the attachment column to a different table does is simple, the new table with blob column and foreign key to the previous table is only accessed when needed. Which is only during insert and file view, allowing all the data in the previous table to be accessed as frequently as possible without need to scan through blobs.
Important things to note
- Issue is always most likely on your application* optimizing the other systems will help but just postpone the inevitable.
- Keep your blobs away from frequently accessed tables
- Make your logs friendly, you never know when you will need them. Log4j does a great job for me
Finally I will lets meet at [sub]string.devops.co.ke for more dev oriented discussions.