What is a Session?
A session, or more accurately, an HttpSession, is probably the
most important object in the entire Servlet API. Its importance cannot be
understated.
First of all, an HttpSession object has nothing to do with a
Session EJB, so let’s get that association out of your mind right away. It is a
completely different monster. An HttpSession is a server-managed component
completely independent of the EJB spec.
The Session Issue
Here’s the issue: the http protocol, the mechanism we use to surf
the Internet and browse websites, is stateless. Once a Web server is done handling a client request, the web
server completely forgets about what it just did, kinda like that fish in Finding
Nemo.
The http protocol treats every request as a brand spanking new,
unassociated request, regardless of how many client-server interactions may
have happened with a particular client in the past.
A web server, using the http protocol, doesn’t maintain any
meaningful state with a client.
What is the impact of a Stateless
protocol on our Applications?
A stateless protocol, such as http, causes all sorts of problems
for our web-based applications. For
example, if someone surfs to our site and sees a product they want to buy, we
want to keep track of that information until the user decides to check out.
Or perhaps a user is taking an online exam: we would want to keep
track of every answer the user has provided.
The http protocol provides no mechanism for keeping track of a user’s
actions from one invocation to the other.
That’s where the Servlet API, and more specifically, the HttpSession,
comes in.
|
Canada
and the http protocol have one thing in common: they are both stateless.
Canada has ten provinces and three relatively empty territories, but no
states. Of course, they have been eyeing Michigan for a while.
|
What is the purpose of the
HttpSession?
The HttpSession adds state to a stateless, web based, interaction
with a server. When a client makes a call to an application server, a Servlet
developer can programmatically create and associate an HttpSession with that
client. The session can then be used to keep track of all sorts of information
about the user.
If the user tells us their favorite color, we can store that
information in their session. If the
user gives us their address and phone number, we can store that in their
session. If they’re taking an online
exam, we can put the answer to every question they’ve been given into the
session as well. We can then go back
into that session object, at any point in time, and pull that information out.
So when a user is done taking an online exam, we can go into
their session and find out which questions they got correct, and which
questions they got wrong. If the client
is picking out books, or other products they want to purchase, when they click
‘check out,’ we can go into their session, process their order, and tell them
how much their purchase will be.
HttpSessions add state to a stateless protocol, and they are
pivotal in making online applications work.
How does WebSphere map a session to
a client?
So, we’ve got this miraculous component that creates a stateful
experience out of a stateless protocol.
The question is, how does WebSphere do it?
Well, when an application server creates a session for a user, a
crazy, unique number is generated that immediately gets associated with that
user. Then, any time a developer puts
information into a users session, the information is automatically associated
with the client’s unique id. This all
happens securely on the server.
In order to tie a client’s browser to their session, WebSphere
plants a transient cookie on the client’s. This transient cookie contains that
crazy unique number that was generated when the users session was created.
The cookie we place on the client machine contains the client’s
unique session id. On every subsequent request made by the client to the
server, the cookie containing the session id is sent back across the network,
and Voila!, the server ties the user to their session.
What is a transient cookie?
WebSphere plants a transient cookie on the client machine through
their web browser.
A transient cookie is maintained in the memory of the
browser, and will never actually be written to a textfile. All other types of
cookies actually get stored persistently on the users hard drive.
Because the cookie is resident in memory, when a client closes
their browser, the cookie, and subsequently the client’s session, is lost.
Some clients see cookies as a potential security loophole,
although no evidence to support this claim has ever come to light. To appease the security conscious, users can
configure their web browsers to accept transient cookies, but not cookies that
actually get written to the hard drive.
As a result, a transient cookie will allow sessions to be associated
with more clients than using cookies that persist to a users hard drive.
If cookies are turned off, will my
applications stop working?
Yes, if cookies are turned off, your WebSphere applications will
stop working - with the default WebSphere configuration at least. Cookies are
the preferred mechanism for storing the session id on a client machine.
If your clients turn cookies off, your applications won’t work.
This may seem detrimental to you architecture, but let me ask you a question:
what are you losing if you can’t service clients that are afraid of cookies?
The fact is, if a customer is afraid that a simple little
textfile called a cookie is going to wreak havoc on their computer, then what
are the odds that you are going to get that paranoid user’s email address, home
address, or credit card number? I’d say
the odds are somewhere between zero and negative none.
A fear of cookies is irrational.
You really shouldn’t worry about not being able to service these
irrational clients.
Are there other options for
managing session ids?
WebSphere provides three mechanisms for tying a user to their session:
F Cookies
F URLRewriting
F SSL ID tracking
What is URL rewriting?
URL rewriting is a great session id mapping option because,
unlike cookies, it can’t be turned off.
This may sound appealing at first, but it is by no means a panacea for
session id tracking.
When you use URL rewriting, the unique session id of a client
gets added to the end of every link on a web page. So instead of a user going to:
F www.ibm.com/index.html
they go to:
F www.ibm.com/index.html?sessionid=123xyz
As you can see, the user’s session id gets attached to the URL
this is used when making a subsequent request back to the server.
The good news is that by adding the session id to every link, URL
rewriting always works and it can’t be turned off. The bad news? Well, the
fact that the session id must be added to every link on every
web page on every part of your website.
How do you implement URL rewriting?
How do you add that special id to the end of every URL or link a
user might click on? Well, you have to
encode the URL, which means writing Java code wherever you see a hyperlink, and
turning every single page in your website into a JSP. After all, Java code won’t work if it’s in a regular HTML page.
So, implementing URL rewriting means extra work for the web
development team, and it’s going to place a much greater load on your
application server.
Think about it. When you use URL rewriting, you can not allow
your users to navigate to a single HTML page within your website.
HTML pages are not dynamic, and if your user navigates to a static
HTML page, their session id will be lost. If you have a significant amount of
static html content, your application server is going to have to assume the job
of generating all of that static content and delivering it to the client with
the session id embedded at the end of every internal link that might appear on
the web page. That’s a lot of extra
work for the server.
How does URL rewriting effect the
client?
URL rewriting makes client side navigation much less pleasant.
Imagine a user is about to buy a big ticket item off your
website, but just before they do, they go to yahoo.com to check their stock
portfolio. After seeing how well their
IBM stock is doing, they then type in your web site address to make their
purchase, only to find out that all of the information they had provided about
their big order has been destroyed.
That has happened because when they typed your address into the browser,
there was no reference to their session id.
If you were using cookies, your client wouldn’t run into this problem.
What is Secure Socket Layer (SSL)
encoding?
Because SSL creates a unique id to encrypt all interactions
between the client and the server, the application server can piggyback off
this SSL id to keep track of the client’s session id.
The two drawbacks to using this is the fact that SSL must be
enabled, and the fact that SSL encoding is only supported by the IBM Http Server
(IHS) and the SunOne web server.
If you have a highly secure website, this might be a very
feasible option for maintaining session ids. Furthermore, we’ll probably see
more web servers supported in the future.
What is Session affinity?
The J2EE spec requires an application server to implement session
affinity, also known as ‘sticky sessions.’
Implementing workload management means setting up multiple Java
Virtual Machines that can handle client requests. In a workload-managed
environment, if a client requests a JSP or a Servlet, there may be several JVMs
capable of handling the request.
Session affinity states that when a server/JVM creates a session
for a user, every subsequent request from that user must be directed back to
that server or JVM that initially handled it. A user is tied to the server that
created their session, thus the term ‘sticky sessions.’
Managing session data is quite a big job for the application
server. Ensuring that a user is always
sent back to the JVM that created the user’s session makes sure that session
data can be pulled out of memory quickly and efficiently.
What are the implications of
session affinity on workload management and failover?
According to session affinity, if a JVM holding a given user’s
session information goes down, all of that user’s session information is lost.
It’s kinda weird that they’d have something in the J2EE spec that seems to work
against the whole idea of workload management and failover, but believe it or
not, it’s in there.
How can we provide session failover
with WebSphere?
Fortunately, our beloved WebSphere goes over and above the spec
by providing several options for persisting a session beyond the JVM in which
it was created. The two supported mechanisms for providing session failover include:
F Memory to memory session replication
F Database persistence
Until WebSphere 5, using a centralized database to store session
information was the only option available for persisting sessions. By and
large, persisting sessions works pretty well.
How do persistent sessions work?
The basic idea behind persistent sessions is that whenever a
session is created, the application server writes the session id to a special
‘session database.’ Any information subsequently stuffed into a users session
gets added to the persistent session database.
If the server that initially created the users session fails, any
other server in the cluster can pick up from where the failed server left
off. If one JVM goes down, any
alternate server can look up the client’s information in the session database
using the session id that gets passed back to the server through the client’s
cookie. A redundant server can then
pull any required session information out of the central persistent database.
Any problems on the server are completely transparent, and if something does go
wrong, the client never even knows there was a problem.
Of course, this scenario isn’t without its drawbacks. First of all, persisting sessions requires
quite a few database writes, which is resource intensive. Persisting session
information to a database is going to slow your applications down a tad.
Furthermore, the database becomes a central point of failure
itself. If maintaining a robust
application server is important to you, you better either use db2, which we all
know never fails, or else cluster your database servers.
What is ‘memory to memory’ session
replication?”
If persisting sessions to a centralized database isn’t up your
alley, WebSphere 5 presents a new option for session failover: memory-to-memory
session replication.
The idea behind memory replication is that if servers are
clustered to support an application, you can configure the application servers
in the cluster to pass session information back and forth from one JVM to the
other. Each application server keeps a copy of the session data being used by
the other servers in the cluster.
Memory to memory session replication is said to be more efficient
than writing to a centralized database, although it does present a few
potential bottlenecks itself.
What are the drawbacks of memory to
memory session replication?
Memory to memory replication does indeed have its drawbacks.
First of all, network traffic may become large if session information is being
replicated across a large number of application servers.
Secondly, a significant amount of memory is going to be consumed
in order to implement session failover.
For instance, imagine you have five servers, and each server
maintains a gig of session data. That gig must be replicated on each of the
other four servers, so each server would need five gigs of memory – one gig for
current session data, and four gigs of failover data for each of the four
servers in the cluster. That’s a lot of memory!
Of course, those clever sausages at WebSphere do provide a
mechanism to address the memory and network traffic dilemma presented by
memory-to-memory session replication.
How can the use of in memory
session replication be optimized for performance?
First of all, WebSphere gives you the ability to configure how
often replication occurs. That can minimize the amount of network traffic
created, although it does reduce the ability of your cluster to recover from a
failover as well.
Secondly, WebSphere allows you to set up a master and slave
relationship between application servers. If you had five servers, you could
add two servers to be dedicated session masters. Session data would be replicated to these two masters, and these
two masters only. This would significantly reduce network traffic and the
memory required to support in memory replication.
What is the default mechanism for
managing session failover when you set up a cluster?
By default, a cluster provides no session failover. Session failover must be configured
explicitly through the WebSphere Administrative Console.
Setting up database persistence is a bit of work, but it’s not
too painful as long as your application servers can all connect to a common
JDBC database.
A big benefit to in-memory session replication, on the other
hand, is the fact that is can be easily set up and configured. On new
installations, in-memory session replication seems to be the failover mechanism
of choice.
It must be noted that session management doesn’t just happen
by default. It must be configured
explicitly, and it must be tested as well. Don’t wait until your application
server fails to find out if your session replication is working properly.