Read on for summaries of the sessions I attended on Day Two of JavaOne 2010.
Code Generation on the JVM - Writing Code that Writes Code
Traditional code generation techniques in Java are painful -- some examples include RMI stubs, JavaBean generation and WSDL generation. This session covered some alternatives to compile-time code generation which are gaining in popularity.
Project Lombok is essentially a pre-processor for Java source code which allows using (or abusing?) annotations to generate code such as property getters / setters and hashCode(), equals() and toString() methods. It works by manipulating the Java source AST during compile time to inject bytecode into the generated class files. It enjoys fairly good tool support within Eclipse, and can save a lot of typing.
Groovy takes a similar approach, but because all bytecode is generated at runtime, it is not dependent on special compilers or tools to function properly. Groovy supports code-generating annotations such as @Delegate, which enables easy composition of helper classes by generating bridge methods, @Lazy, which creates lazily instantiated properties, and @Immutable, which guarantees that an object's state cannot be changed.
This session turned out to be essentially an advertisement for Groovy.
Enterprise Service Bus - Lessons from the Field
This session was a joint presentation by Cisco and the NFL detailing some of the lessons they have learned by deploying ESB technology for NFL.com.
Some common misconceptions about ESBs are that ESB is equivalent to JMS, and that an ESB always constitutes a well-defined product from a vendor. Neither are true; JMS is one potential component of an ESB, and ESB solutions are almost always made up of multiple components from multiple vendors.
Keep it simple
A common pitfall in ESB design and usage is creating too many endpoints. Each new endpoint adds additional (and often exponential) complexity to a system, and at the very least adds more moving parts and therefore more things which can go wrong.
Pipelining for performance
NFL uses Mule as their primary ESB product, and takes advantage of Staged Event Driven Architecture (SEDA) to effectively pipeline requests by splitting each step into a separate queue. This works the same way that CPU pipelining does, in that you gain effective parallelism simply by processing multiple stages of multiple requests simultaneously.
What NOT to do
Don't use an ESB as an indirection layer or simple passthrough proxy. While tempting, this just adds another failure point and extra complexity to your system.
Also, don't use an ESB as a cron daemon (especially for large, monolithic jobs). Instead, decompose your problem into smaller components. There's nothing worse than finding our your daily batch job is taking 25 hours to complete and you can't scale...
Finally, don't use an ESB as an application glue layer. There are much simpler, scalable tools for that.
What TO do
Use your ESB for data integration (validation, routing, augmentation, transformation, and service invocation).
To validate or not
Validation of input data can be done strictly (using XML schema) or relaxed (ignoring unknown values). Strict validation might seem like the way to go, but even simple changes in data can cause serious problems downstream if it's done too strictly.
Event model - Poll vs. Push
Polling is easy, but often non-performant and has high latency. Push is faster, but much more complex. Push also requires guaranteed message delivery, and third party integration can be difficult. Basically, use the best tool for the job, but don't over-engineer it.
As ESB technology grows within an enterprise, security needs to be paramount, not just an afterthough, especially if it will be interacting with endpoints which are exposed to the outside world.
Make sure your long-running processes are broken up and use checkpointing to allow failed processes to resume processing without completely starting over.
Comet and WebSocket Application Scaling
I was really looking forward to this session just to learn a bit more about WebSocket support in HTML5. It started out with some brief definitions of what Comet applications are (server push) and some ways to implement Comet support in applications.
Rich user interfaces often have a need for server-side events. AJAX applications which respond to client-side events, such as button pushes, are fairly straightforward to develop and don't really have major impact on server application architecture, but handling events which originate outside the client and need to be pushed down in a timely manner are more difficult.
Polling is very easy, but tends to have high latency, especially when the polling interval is short, or the number of connected clients is high. High-traffic polling is almost indistinguishable form a denial-of-service attack on an application server. In one example, if 1000 clients are connected, and we poll at 20 second intervals, we get 50 requests per second, which is manageable. However, this gives a 10 second average latency, and 20 second maximum latency, which may be undesirable. To get low latency (less than 1 second), we could increasing our polling frequency to once every 0.5 seconds, but this now results in 2000 requests per second, likely leaving our server CPU or I/O bound.
Long polling is the process of holding a connection open to the server until an event occurs. This tends to be less resource intensive than polling, but now has the consequence of keeping many open sockets and processing threads tied up on the server. This is important as on a 64-bit JVM, the default stack size is 1MB, which can easily cause large amounts of memory to be consumed on idle threads. Using our 1000 client example, we would have 50 requests per second with low latency, but could potentially use over a gigabyte of RAM holding stacks for idle threads.
Apache doesn't scale to limits like this. Servlet containers typically don't either. Basically, a new server application architecture is required. Jetty pioneered this with asynchronous servlets (originally designed for an ActiveMQ web transport), and Dojo developed cometd. The Servlet 3.0 specification standardized this as web continuations.
Using these new techniques, long polling gets much more efficient as threads are released into the pool until data is available to send. Servers are generally limited by OS limits on concurrent connections now, instead of RAM, CPU, or I/O, making them much more scalable.
WebSockets are then new (soon-to-be) standard for performing complex communications over HTTP. Essentially, WebSockets allow an HTTP connection to be upgraded to a standard socket connection (with some limitations), allowing for efficient bi-directional transfer of information between client and server. Unfortunately, the specification is still in flux, and likely won't ever be fully supported at all intermediate routers, so Comet will still need to be used as a fallback in many scenarios.
Don't reinvent the wheel
Speedy Scripting - Productivity and Performance
This session was a general overview of scripting languages on the JVM, and their various performance characteristics.
Scripting languages have some benefts over traditional compiled languages (simpler, less boilerplate code, etc.). Compiled languages have benefits too, such as better type checking, error checking and performance.
However, performance is not the only concern in language design. Productivity, maintainability, and correctness are too. Java itself may be verbose, but static typinc does catch a lot of bugs that would otherwise go unnoticed. Scripting languages can be faster to market, but can suffer from maintainability if overused.
This session was an informal Q&A on OpenJDK, the open source components of the JDK. Some interesting facts came out of this session, such as the statistic that 98% of the code in OpenJDK is shared with the official branch. The remaining 2% consists of some font rendering code which Sun purchased years ago, and some Corba and SNMP code. The official JDK nightly builds are first pulled from the OpenJDK repository, merged with some code in a non-free repository, and then built from source. This means the official JDK is really OpenJDK+some bits, so using OpenJDK in a production environment, especially if it gets regular patching from a distribution, is pretty close to standard Java.
Scaling data processing with Java in the cloud
This session was a case study by WeatherBill on moving from a hosted data center to a cloud. WeatherBill is a weather-based insurance company which sells policies against poor weather to farmers and vacationing tourists. This naturally involves a lot of statistical calculations on some large data sets. As WeatherBill grew, they quickly ran out of capacity on their hosted servers and turned to Amazon EC2 for help using Hadoop. This proved ideal, as a very large cloud could be provisioned on-demand to handle forecasting tasks, and torn down when no longer needed. As an example, the first trial run of around 100 cpu cores cost only $500 to execute, far less than an always-on 24/7 local cluster would take.
JMS - Time for 2.0?
This was an exploratory session hosted by the JMS specification leads asking what, if any changes, should be made to a future JMS 2.0 specification, since JMS 1.1 was last updated in 2003. Topics under discussion included API changes, updates to descriptions of newer messaging technologies, and better integration with modern JavaSE APIs (JMS predates generics and does not make use of Collections).
While this was an interesting session, it's clear that there is no roadmap currently for JMS 2.0, and it will probably take at least a few years to put together anything resembling a specification. Still, it was nice to see that there is some life there after all, as the session was very well attended.
That's it for Day Two of JavaOne. Check back later for more updates.