Session management is on of the most mission critical services
the application server provides to our applications. Managing user centric data on the server, and tying that data to
a web based client, makes it possible to create a Servlet and JSP application
that maintains meaningful state over an Http network.
From a performance perspective, the session management features
of a J2EE application server do not come for free. In fact, poorly managed sessions, or lazy programmers placing far
too much information into an HttpSession can ground your applications to a
halt.
Poorly managed HttpSessions are the second most common bottleneck
in a J2EE application. What is the most
common performance bottleneck in J2EE applications, you wonder? Bar none, poorly managed database
connections represent the biggest performance bottleneck in a J2EE application.
What is the best way to address the performance problem
associated with database connection? The best solution is to use a connection
pool, and manage that connection pool effectively.
What is a DataSource?
From a J2EE perspective, a DataSource is just another name for a
pool of database connections. For us, a DataSource is a connection pool.
Why do we need to use DataSources?
Connecting to a database, maintaining that connection, and then
subsequently tearing that connection down is a highly resource intensive
activity. Interacting with a database is one of the most significant
performance cost your applications will have.
|
|
Performance specialist
say that 80% of all performance gains can be achieved by fine tuning your use
of database connections.
Of course, the same
group of people also say that 80% of all performance gains are achieved by
fine tuning the type of data you place into your HttpSessions, so you’ve got
to take these types of statements with a grain of salt.
The point remains.
Connecting to a database has a high cost associated with it.
|
With a typical J2EE application, almost every user is going to
interact with your database at some point in time, and those interactions are
likely going to be very short and to the point. The client will grab data from your database, surf your site for
a while, and then maybe save some data to your database. But each client will by no means need a
permanent or extended connection to your data.
The Database Connectivity Problem
Now here’s the problem: Setting up and tearing down database
connections is extremely resource intensive.
Instead of creating a new connection every time a client needs
access to database, a developer should set up a collection of connections. One
hundred client may be accessing our site, but we may be able to service all one
hundred clients with only ten database connections. To do this, we set up a
pool of semi-permanent database connections, and when a client needs a
connection, we just give then one out of the collection.
Rather than creating and destroying a database connection every
time a client wants to access our database, we just create a small pool of
connections, and grab a connection from the pool when a client needs one.
Connection pooling is incredibly efficient, and it frees up a
bunch of resources that can then be used to address all of the other
bottlenecks in your application.
How do I create a connection pool?
A few years ago, if you wanted to pool your database connections, you had to either write your own
code, or download some code off of the internet that implemented pooling. You
then used that connection pooling code in your applications.
Of course, databases are very complicated things; far too
complicated for us lowly Java developers to try and write connection pooling
code for. As a result, the custom pooling code, or even the code that may have
been downloaded off the internet, usually resulted in stale connections, and
all sorts of cryptic SQLExceptions.
To address both the need for connection pooling, and the need for
a good implementation of connection pooling, open source projects like Struts
and Cocoon built connection pooling right into their framework. These
implementations usually work pretty well.
Of course, not everybody is using Struts and Cocoon, but every
application certainly needs them, so the J2EE community brought the concept of
connection pooling into the specification. Now, all J2EE compliant application
servers must implement a connection pooling service that can be created and
configured by an administrator, and subsequently accessed programmatically
through the JDBC extensions API.
I am using WebSphere and Struts.
Should I use the pooling facilities of Struts or WebSphere?
While open source projects such as Cocoon and Struts provide
connection pooling facilities, if you are using WebSphere, you really should be
using the connection pooling facilities of WebSphere.
First of all, connection pooling is a significant competition
point as far as J2EE application servers go, and a big part of having the best
J2EE application server on the market is having a very efficient connection
pooling mechanism. IBM puts a lot of work into making their connection pools as
efficient as possible.
Secondly, if you are using a DB2 database, IBM has built all
sorts of wizardry and magic into their DB2 connection pools. Believe it or not,
but IBM knows a lot more about DB2 than you, me, or even those clever cucumbers
at open source projects like Struts and Cocoon.
Thirdly, WebSphere monitoring tools such as Resource Analyzer can
track WebSphere connection pools in much more detail than they can custom
connection pools. Monitoring various properties of connection pools is pivotal
in optimizing their configuration.
Fourthly, if you have a problem with your WebSphere connection
pool, you can get fast, quality support from IBM. This won’t likely be the case
with open source connection pools.
Finally, you paid for WebSphere, so why not use it? IBM works
pretty darn hard to get their connection pooling to work right. Take advantage
of all of that hard work.
What is a JDBC provider?
In order for WebSphere to connect to a database, it needs some
software provided by the software vendor that makes that database . This
database connection software is known as a database driver.
IBM provides a set of database drivers that makes it possible to
connect to db2. These database drivers are zipped up in a file named
db2java.zip. Oracle and Microsoft also provide drivers that make connecting to
their databases possible.
A JDBC provider states the name of your database driver, the type
of database to which that driver connects, and where that driver is stored on
your file system.
Before you can create a connection pool, you must first register
a JDBC provider through which WebSphere can establish connections to your
database.
What properties can I configure for
a connection pool?
Administratively setting up a connection pool, and having your
developers simply access that connection pool simply isn’t good enough. If you
want to optimize performance, you must tweak your connection pool. Here’s a
look at those tweakable parameters:
Minimum Connections:
This setting is the lower limit on the
number of connections that will exist in your pool. This is also the number of connections that will be created when
the server starts up for the first time. Keep this number as low as possible,
since lazy initialization, which implies initializing resources only when they
are first needed, is a preferred design practice.
Maximum Connections:
This is the maximum number of connections
that can ever exist in the pool.
You may have a license restriction on the
number of connections your database can maintain. In that case, you maximum
number of connections might equal the number of licenses you have.
From a performance standpoint, you should
have done some type of load testing on your database, and from this testing,
come to a conclusion about what the maximum number of connections is that your
backend can maintain before a significant degradation in performance is
observed. The maximum connections parameter should then be set to this upper
performance limit. After all, what’s the point of supporting one more user, if
it destroys the experience of the thousand users already using your site?
Developers should prepare for handling
the exceptions that get thrown with the connection limit has been reached. When
this happens, new users who require a database connection should be given a
polite message about traffic being excessive. Politely ask them to come back
later, and they will.
Unused Timeout:
After a connection has splashed around in
the pool for a certain amount of time without actually being used, it’s obvious
that the connection isn’t needed anymore. The unused timeout setting dictates
how long a connection can fool around in the pool before it’s told to towel off
and go home.
Aged Timeout:
Aged timeout is how long a connection can
exist before it’s removed from the pool. The connection may be perfectly
healthy when it reaches its aged timeout, but it will be destroyed regardless.
Quite often, firewalls and proxies will
have a maximum amount of time that a connection can exist through them. Setting the aged timeout to be less than the
firewall or proxy timeout will help you avoid getting mysterious stale
connections.
Reap Time:
Reap time represents how often the
connection pooling thread, that times out old connections, and removes unused
connections from the pool, will run.
Connection Timeout:
If all the connections in the pool are in
use, the connection timeout represents how long the DataSource will let a
client wait for a connection to become available. If this time limit is hit, a ConnectionTimeoutException, a
subclass of SQLException, is unapologetically thrown back to the client.
Purge Policy
When a stale connection, or fatal
connection error happens, you have the option of terminating the failing
connection, or purging every database connection in the pool.
When accessing a connection pool,
my developers are providing unique usernames and passwords. How does this
effect pooling?
This completely defeats the process of database pooling.
All connections in a pool must have the same access rights
associated with them. If you create a
new connection with a different username and password, it will exist in its own
separate pool. If every connection is given its own username and password, each
connection is a pool unto itself.
Be careful when using usernames and passwords to create
DataSources. To track database access, using a logging mechanism would be a
better option than matching each connection with a username and password.
How does a connection pool work?
When your server starts up, the connection pool process creates a
number of connections to your database; this is based upon the minimum number
of connections specified in the DataSource properties window.
Each time a client asks for a connection, the DataSource gives
the client one from the pool. When a client is done with the connection, the
connection is returned to the pool, rather than being destroyed.
If the number of connections in use equals the number of
connections in the pool, the DataSource throws a new connection into the pool,
just in case a client makes a request for one.
The DataSource will keep throwing connections into the pool until the
number of connections in use stabilizes, or the maximum number of connections
allowed is hit.
If the maximum number of connections is hit, and a client asks
for another connection, the client will wait the number of seconds specified by
the wait timeout property of the DataSource. If none of the existing
connections are returned to the pool in that time, the client will be thrown a
ConnectionWaitTimeoutException.
Every once in a while, as specified by the reap time, the
DataSource will poll each of the connections and ask them how old they are and
how long it has been since they were last used by a client. If any connections have either hit the aged
timeout, or the unused timeout, those connections are removed from the pool.
If database traffic dies down, connections will continue to be
removed from the pool, due to the unused timeout, until the minimum number of
connections is met. The number of connections will remain at that minimum
number until database activity increases.
This is the ying and the yang associated with managing database
connections through a DataSource.