randomCoder

Site Update - 5.0

2022-12-05T15:07:26-06:00

Release notes:

Replaced Spring with Jersey 3.1.0
Updated to Jetty 11.0.12
Removed Hibernate
Updated to Thymeleaf 3.1.0

This release is a major refactoring. Attempting to upgrade to the final release of Spring 6 caused some integration issues, mostly around Spring being much more opinionated about how nested contexts are used. So, it gave me an excuse to do something I've toyed with for a while, which is use JAX-RS (Jersey) to serve all the content. I've used Jersey for years as my preferred REST API framework on many other projects, and it's always seemed a little cleaner than Spring for such things.

This did turn into a much larger refactoring though, as removing Spring necessitated removal of Spring Data and Spring Security as well, which meant rewriting the data access layer. I'm not 100% happy performance-wise with where it's currently at, but it's functional and gives me a foundation to build on.

Update: Adding Caffeine for caching of commonly-requested database objects has made a big difference in the page render times. I'm now seeing TTFB (time-to-first-byte) times on the home page of around 50-60ms + ping time, which isn't bad.

Site Update - 4.2

2022-04-08T10:33:02-05:00

Release notes:

Updated to Spring Framework 6.0.0-M3
Updated to Jetty 11
Updated to Hibernate 6.0
Updated to Thymeleaf 3.1.0-M3
Updated compiler settings to Java 17

The site is now running with a Java module-path rather than a class-path, with no additional "--add-opens" clauses required under Java 18.

This took far longer than expected (four years), and still required bleeding-edge versions of several dependencies to work. I even needed to patch around a bug in Thymeleaf that only occurs when running on the module path.

A couple of lessons learned from this:

Migrating to newer Java releases is getting easier all the time.
Migrating to modules is harder, as libraries are often not tested with the module path.
The module system really needs a wildcard package export syntax. I import a lot of resources (web content) and having to explicitly open every sub-package is a huge pain.

Site Update - 4.0

2018-07-27T17:32:33-05:00

Release notes:

Updated to Spring Framework 5
Updated to Hibernate 5.3
Updated to Jetty 9.4.11
Updated to Thymeleaf 3.0
Updated compiler settings to Java 10

This release is all about updating to Java 10. I've been putting this off due to the large amount of changes required (mostly Java 9+).

I'm still using classpath mode (including "--add-opens java.base/java.lang=ALL-UNNAMED --illegal-access=warn" since Spring still uses cglib) -- I suspect converting to modules is going to be a lengthy process due to the number of third-party dependencies which haven't been updated with module info.

This also paves the way for future Java 11 support (since we're now two months away and counting). Hopefully since Java 11 will be an LTS release we'll see a larger push to support the Java module system within the community.

HTTP/2 Live

2016-04-21T09:37:23-05:00

HTTP/2 support is now live!

Since my last post, I have been investigating alternatives to enabling HTTP/2 support on randomCoder. My first attempts involved using HAProxy as the public front-end and SSL termination point for this site (and others I host). However, since HAProxy does not (yet) support HTTP/2 natively, proxying was limited to TCP mode for HTTPS connections.

My plan was to use SNI to redirect to one of several backends depending on the requested site. This seemed to work; however, there is a (IMO) unfortunate behavior of HTTP/2 which allows browsers to re-use secure connections to a website for multiple hosted domains, provided the SSL certificate returned is valid for all of them. This caused requests for domain A to sometimes be routed to domain B's backend, resulting in incorrect content being generated. As I currently have a multi-domain certificate and hosting all of the sites using the same backend is near impossible, this seemed like a show-stopper.

Fast forward to last week, and Let's Encrypt is now out of beta. I have been following this project for several months now, and had thought about replacing my single multi-domain SSL cert with one per domain, thereby forcing browsers to make a separate connection to HAProxy for each one. Unfortunately, another issue cropped up - Let's Encrypt certificates are not trusted by any shipping version of Java, making building API sites to be consumed by Java clients difficult.

Enter Nghttpx , an HTTP/2 native proxy (part of the Nghttp2 project). Nghttpx can be configured to listen on both secure and non-secure ports, speaking both HTTP/1.1 and HTTP/2, and proxy to multiple backends. This is a solution I had evaluated earlier, but at the time (version 1.7.x), backend protocol (h1 or h2) was configured globally, so all sites had to be one or the other. Version 1.9 is now out, and supports configuring these options per-backend. Finally, I have a solution to enable HTTP/2 publicly.

HAProxy is now out of the mix, and all front-end connections hit Nghttpx first. The backends all have either Nginx 1.10.x or Jetty 9.x backends, so we can connect to them via h2c (HTTP/2 plaintext). All of this is seamless to the user, and allows us to do things like enable HTTP/2 server push via Link: headers (as well as gives me a platform for playing with HTTP/2 related technologies such as gRPC .

I do have one hitch (so far): I am using Nghttpx's mruby support to do things like force redirect to SSL and canonicalize domain names at the proxy level. Everything was working well last night, but this morning every page load resulted in an exception while executing the mruby code (TypeError: expected String). Strangely, restarting Nghttpx fixed it with no changes to the script. If anyone knows why this might be occurring, please let me know...

Update: I have (hopefully) resolved the Nghttpx errors by disabling mruby... Since the scripting was only being used for HTTPS and hostname redirects, I have modified the backends to do this work directly.

Site Update - 3.2

2015-10-22T01:39:04-05:00

Release notes:

Updated to Spring Framework 4
Updated to Hibernate 5
Updated to Jetty 9.3.5
Converted from JSP to Thymeleaf
Updated compiler settings to Java 8

Most of these changes should be invisible to casual users. The biggest changes were made to allow the use of HTTP/2 on this site (once I switch to using HAProxy 1.6 instead of Apache as a reverse proxy).

I am currently converting a few other sites which are also hosted here to supporting HTTP/2. Once that is completed, I will be able to run the entire infrastructure over secure HTTP/2. In addition, I am testing out Docker based deployments of all the site components. More details to come.

Site Update - 3.1

2013-06-20T13:34:41-05:00

Release notes:

New feature: Gravatar avatar images on comments and article postings

Site Update - 3.0.6

2013-06-19T19:12:50-05:00

Release notes:

Converted all javascript UI code to JQuery, eliminating about 15 source files

Site Update - 3.0.3

2013-06-17T19:17:02-05:00

Release notes:

Began converting javascript from Behaviour.js (does anyone still use that?) to JQuery
Started adding ajax calls for admin tasks -- finished comment approval / disapproval / deletion

Site Update - 3.0

2013-06-17T13:43:05-05:00

Release notes:

New Feature: Allow articles to be closed for commenting.

This should hopefully reduce the amount of time I spend cleaning up after spammers. I have started with closing comments for anything older than 30 days. This should at least let me concentrate on page 1 and not have to go back to older stuff.

Site Update - 2.9

2013-06-17T09:47:23-05:00

Release notes:

Completed migration to Spring MVC 3.2
Converted data access to Spring Data JPA
Updated to new randomcoder-taglibs version (supports Spring Pageable implementation)
Built against Java 7

Upcoming changes:

Convert unit tests to JUnit 4
Expand unit test coverage
Update source to take advantage of Java 7 features
Refactor data access methods and entities to minimize N+1 selects

Site Update - 2.8

2013-06-12T13:56:24-05:00

Release notes:

Finally converted security from the truly ancient ACEGI to Spring Security 3.1. The configuration is much simpler now.
Began migration to Spring MVC 3.2; roughly one third of the controllers are complete.
Updated to Hibernate 4.2. This is a temporary stop on the way to using JPA and Spring Data.
Removed the old Hibernate OpenSessionInView pattern. Unfortunately this required wrapping almost all data calls with Hibernate.initialize() due to lazy collections. Another TODO...

Site Update - 2.7

2013-06-09T11:47:21-05:00

Release notes:

Hosting site on Jetty instead of Tomcat now.

Site Update - 2.6

2013-06-04T18:56:55-05:00

Release notes:

Moved to using HTTPS throughout the entire site.
Removed CardSpace support.

Site Update - 2.5.8

2013-06-04T14:26:09-05:00

This is the first code update in a while... New in this release:

Moved all code repositories to Git. These can be browsed via GitHub .
Updated Download module and Links section to reflect new locations for all resources.
Updated the site to Maven 3. There was some significant bitrot in the codebase that needed to be cleared up to get this working.

Other upcoming changes include:

Move from deployment as a Tomcat WAR file to a standalone app embedding Jetty. This should result in faster restart times and decreased resource utilization.
Remove CardSpace support as it has been deprecated by Microsoft for some time now and never really gained wide adoption.
Update libraries to more current versions. For example, update to Spring 3.2 (using 2.0 currently).
Move to using HTTPS throughout the entire site.

Domain name change

2010-12-23T08:21:42-06:00

Effective immediately, randomCoder() has a new home! You may or may not have noticed a redirect on this site to http://randomcoder.org/. This domain has recently become available, and since the focus of this site is very much community-oriented, a .org seemed more appropriate than a .com.

JavaOne 2010 - Thursday

2010-09-26T19:47:26-05:00

Read on for a summary of the final day of JavaOne 2010.

The Cassandra Distributed Database

Relational databases don't scale. BTrees are slow, and require read-before-write semantics. They are fast until indexes no longer fit in RAM, and then become very, very slow. Traditional scale-up of relational databases is almost entirely vertical, and expensive.

A common way to deal with this problem is to add a caching layer. However, this causes its own problems, such as lack of transparency, stale data on frequently written data, and the "cold cache" problems. Replication also helps scalability, but once write capacity is exceeded, this approach begins to break down. Finally, sharding can be used to spread the load horizontally. However, this increases complexity exponentially, and rebalancing can be very painful.

eBay coined an acronym called BASE which was meant to be a play on ACID, basically assuming that things won't always be consistent. If your application can work within those constraints, then you can scale much better.

Apache Cassandra is a "NoSQL" database which supports automatic replication, is application transparent, and is optimized for fast writes and fault tolerance.

NoSQL Myths

NoSQL is for people who don't understand SQL... Reality: ACID only scales so far, and once you start caching, replicating, etc., you're basically giving it up anyway.

NoSQL is not new; we've had key-value stores for years... Reality: Modern NoSQL stores have virtually nothing in common with things like Berkeley DB.

Only huge sites need to care about scalability... Reality: Lots of small sites grow quickly, and many, many companies already need this.

NoSQL is only appropriate for non-important data... Reality: Somwhat true, if you're using a database which does not provide durability (Cassandra does).

Cassandra architecture

Apache Cassandra supports automatic rebalancing when new nodes are added. Writes go to both the old and new nodes during this time, which avoids complex recovery logic. The replication strategy is pluggable, allowing strategies which are optimized for single datacenter vs. multiple datacenter availability. Consistency is tunable to use single lookup, quorum, or all available nodes which contain a key. In addition, the number of replicas to make synchronously vs. asynchronously can be set. For example, using synchronous replication of 3 copies and using quorum lookups virtually guarantees that readers see consistent data, but is less performant than other strategies.

Monitoring is supported via JMX, and many parameters are tunable at runtime. There are a wide variety of statistics available.

The Cassandra data model can be described as loosely schema-less. Each "column group" (or table) is implemented as rows of sparse arrays, containing both a column name and the data within it. This allows on-the-fly addition of new colums without schema changes, at the cost of some disk space. In modern systems, disk is cheap, but I/O is not. All access to rows is done via primary key, so the system relies on writing out data to multiple column families at once to allow for multiple "materialized views" of the underlying data.

Several APIs are available for communicating with Cassandra. At the lowest level is Thrift, followed by Hector (similar to JDBC), and finally a new library called Kundera which is similar in concept to JPA. Hadoop integration via Pig is also possible.

As for when to use Cassandra, the idea use case is probably for systems which are already using a SQL database plus something like memcached. In such a case, Cassandra is probably simpler to manage.

Practical Big Data Processing with MapReduce and Hadoop

This session covered some basic concepts of MapReduce, and showed a demo of Karmasphere Studio, a Netbeans-based tool for doing high-level Hadoop development.

The important thing to understand about data processing is that the critical time is between problem definition and getting the answer. The actual time spent computing the answer might only account for 25% of this, so finding faster ways to get the computation going is a big win.

Why do we need parallel processing? Because while Moore's law grows, data sizes are growing at about the same rate, and most algorithms don't scale linearly, but are at least O(n log n).

Parallel processing is inherently less efficient (per CPU) than serial, but you can't buy time, and CPUs are cheap. So what makes things slow? Synchronization. The answer then, is to eliminate synchronization, which is what MapReduce does.

In the simplest case, independent data sets allow for naive parallelism with linear scalability, but high latency. A classic example of a naively parallel algorithm is raytracing. Each pixel can be mapped independently, and therefore the algorithm scales linearly to 1 pixel per CPU.

Enter MapReduce, which splits processing into separate map, shuffle, sort, and reduce phases, allowing complex algorithms to be separated into much smaller, independent units of work. Not everything is easily translated however, and it's not unusual to see 50-60 chained MapReduce jobs to implement a particular algorithm. This is where tools like Karmasphere Studio are useful. They can be used to generate the boilerplate code necessary to setup Hadoop MapReduce jobs as well as provide simulations of a Big Data processing run and provide realtime feedback on potential inefficiencies and bottlenecks.

Building Enterprise Web Applications with Spring 3.0 and Spring 3.0 MVC

This session was a very fast-paced walkthrough of building a complex enterprise web application using Spring. I won't go through the entire demo, but it did illustrate some of the key concepts of working with Spring DI, Spring MVC, and Spring Security. Most of this wasn't new, and is fairly readily available in the Spring documentation, but was interesting anyway.

Top 10 Lessons Learned from Deploying Hadoop in a Private Cloud

This session was a retrospective on deploying complex Big Data to a private Hadoop cloud by OpenLogic, Inc. OpenLogic provides a service which allows companies to check their source code for copy and paste violations by checking it against a huge library (hundreds of thousands) of open source packages. This service is valuable, since there are potentially unwanted licensing implications for companies which inadvertently ship open source code as part of their product.

The OpenLogic stack consistes of a web client, Nginx web server, Ruby on Rails, MySQL, Redis, Solr, Stargate, and HBase. The Hadoop cluster (which runs HBase) contains over 100 CPU cores and 100 TB of disk. Each Hadoop node is brought up anonymously from the same template, allowing easy scale out without administration problems.

Why not Amazon?

Amazon is great for burst traffic, but extremely expensive for long term storage.

Configuration is key

Hadoop has many moving parts, and details matter. Tuning of operating system parameters such as number of allowed open files and process limits can be critical. Be sure to use a known compatible combination of Java VM, Hadoop, and HBase. Be sure to change only one parameter at a time when debugging, and be sure to read the mailling lists (and the code).

Commodity hardware != old PC or desktop

Hadoop hardware should be a rack mount server (but don't bother with RAID). Use enterprise drives, as the vibration in a rack-mount environment will cause a lot of premature drive failures otherwise. DO expect ugly hardware issues, just due to the sheer volume of hardware.

OpenLogic uses servers with dual-quad core processors, 32-64GB RAM, 6x2TB enterprise hard drives, and does RAID-1 on 2 drives for the boot partitions, and allocates the remaining space to Hadoop.

Bandwidth is key

Use dual gigabit NICs (or 10GigE if possible). HBase needs at least 5 nodes to really perform well. It also depends on ZooKeeper which requires low-latency connections. Be careful of low RAM scenarios.

BigData takes a long time...

To do anything. It can also be hard to test, but it's important not to skimp on testing. Backups are also difficult, and might require a second Hadoop cluster or public cloud.

Loading data

Don't use a single machine. The I/O bottleneck will assure that it doesn't complete for a very long time. Use MapReduce jobs to partition out the data load if possible, and turn off the write-ahead log in HBase during inital data imports. Also, avoid storing large (greater than 5 MB) values in HBase. Rows and columns are essentially free, so use them! Finally, avoid committing excessivly when using Solr.

Getting data out

HBase is NoSQL (so think hash table). Use Solr for fast index lookups of data.

Solr

Solr is a search engine based on Lucene that uses Hadoop to store indexes. It has automatic sharding and asynchronous replication, so it is fault tolerant and performant. OpenLogic indexes billions of lines of code, with 20+ fields indexed per source file. HAProxy is used to front Solr to balance writes, and on reads from slaves.

BigData is hard

Expect to learn a lot. You will get it wrong on the first, second, and probably third try. Try to find alternate ways to model your data.

Scripting languages can help

MapReduce jobs are simpler to write, and the HBase shell is JRuby.

Using Public Clouds (Amazon EC2)

This isn't really practical financially. 100 TB of storage on EBS runs about $120,000 per year, and 20 super huge CPU instances adds another $175,000 per year. This works out to about six times what it costs OpenLogic to host their own cloud.

Open Source is key

None of this would be possible without a large amount of open source software: Hadoop, HBase, Solr, Tomcat, Zookeeper, etc.

Expect things to fail... a lot

Hardware, software, and your own code and data can fail in unpredictable and unforseen ways. Monitoring matters!

Conclusion and Wrap Up

JavaOne 2010 was an intense four days, but overall I'd say it was a positive experience. The landscape has definitely changed since I last attended in 2002, and it's good to know that Oracle is committed to keeping the platform alive and moving forward.

JavaOne 2010 - Wednesday

2010-09-26T19:04:01-05:00

More tales from JavaOne 2010 - Wednesday sessions...

JUnit Kung Fu

While JUnit has become the de facto standard for unit testing in Java, there are still many things to learn, especially with some of the new features in JUnit 4.7 and 4.8.

Test naming

Unit test names can convey a lot of useful information during testing if done right. A good test name documents the expected behavior of the test. Example: instead of BankAccountTest.testDebit(), name your test WhenABankAccountIsModified.balanceShouldDecreaseOnDebit(). NOTE: Not sure I totally agree with this one...

Test structure

Tests should be organized into three basic parts: inputs and expected outputs, action, and test assertions. Tests should be treated as production (not throwaway) code. This means refactoring regularly and keeping them clean and concise.

Hamcrest

Hamcrest is an assertion library for JUnit that makes some common assertions easier to express and read. For example, in JUnit, you might say assertEquals(10000, calculatedTax). In Hamcrest, the equivalent expression would be assertThat(calculatedTax, is(10000)), which is easier to read. It also includes more powerful matchers, and can be extended to provide your own, allowing for tests that really do follow the best practice of having only one assertion.

Parameterized tests

Parameterized tests are unit tests which are used for testing several distinct cases. To use them, you need a table of test data, and a unit tests which has parameters for each column in the table. Annotate your test class with @RunWith(Parameterized.class), and annotate a static method with @Parameters which returns test data as a collection of object arrays.

JUnit Rules

JUnit has some new functionality which involves declaring class variables with the @Rule annotation. These are snippets of code which are executed before and after each test, and augment the test framework with additional functionality. For example, the TemporaryFolder rule is used to automate the creation and deletion of temporary folders for unit tests. The Timeout rule can be used to enforce a maximum execution time for each test (useful for integration tests that may hang), and the Watchman rule allows custom code to be run on each test success or failure, which can be used to integrate with external reporting systems.

JUnit Categories

Categories are a new feature in JUnit, and are not quite ready for production use. The allow the creation of a custom hierarchy of interfaces to mark tests as belonging to particular categories. An annotation on the class (or test method) of @Category(MyInterface.class) is used to mark the test(s). Currently, this feature needs a Test Suite defined to mark tests for execution or not based on category, but should allow external tooling such as Maven and Eclipse much greater functionality in the future (Run all tests marked as IntegrationTest, for example).

Parallel tests

This is a new feature, which requires Maven and the SureFire plugin version 2.5. Simply declare a dependency on SureFire 2.5, and set a configuration item to indicate the desired level of parallelism (methods or classes). Parallel=classes is recommended for existing tests.

Other tools

Infinitest - runs unit tests after every save in Eclipse

Mockito - mocking library, tends to be less formal than EasyMock

Project Lambda - To Multicore and Beyond

This session was an overview of some of the new support for lambda expressions that will ship with Java 8 in late 2012.

Multicore processing is here, and we've hit the wall with clock cycles, so we need to adapt and work with what we have. To better support parallelism, we need good support at the library level so that people will actually use it. One of the obvious first choices is parallel operations on collections (filter, sort, map/reduce). Unfortunately, writing and using parallel libraries is clunky without some language support.

One of the biggest barriers to parallel processing is the common for loop. It is inherently serial, stateful, and often has side effects. A good parallel framework should be able to do internal iteration (exploiting parallelism), be easier to read, and support immutable collections easily.

To support this without very complex code, Java will be getting lambda expressions. A lambda expression can be thought of as shorthand for a SAM (simple abstract method) type. The Project Lambda designers recognized that Java already has a construct for functional types, namely simple abstract method (SAM) classes. These are interfaces (or abstract classes) which have a single, abstract method. Some common examples include Runnable, Comparable, etc. For compatibility with existing APIs and to avoid extending the type system further, lambda expressions will implicitly resolve to an instance of a SAM type. For example, given a Predicate class with a single method boolean eval(T), the following code will declare a lambda expression which returns true for a student who graduated in 2000 and assign it to it's corresponding SAM type:

Predicate p = #{ Student s -> s.gradYear == 2000 };

Notice that the name of the method is not specified, and is not needed. Type inference is also supported, so if the "Student" type can be inferred, the lambda expression can be shortened to:

#{ s -> s.gradYear == 2000 }

Library support has yet to be finalized, but it is likely that Java will get some starter interfaces: Predicate, Filter, Extractor, Mapper, Reducer, etc.

Method references

It is expected that many lambda expressions will take the form:

#{ Person p -> p.getLastName() }

To support this common syntax, method references are supported, and can be used to shorten this expression to:

#Person.getLastName

This syntax allows the compiler to reference a method in the target type and call it on the given object automatically:

list.sortBy(#Person.getLastName);

Finally, to better support this use within collections, it would be helpful to have some new methods on the collections classes such as sortBy(), map(), filter(), etc. But, since the collections are defined in terms of interfaces, we can't extend them without breaking compatibility with third-party collection implementations. To solve this, another new language feature has been proposed which would allow interfaces to contain default implementations which defer to static methods elsewhere. This would allow these methods to be added to Collection, and defer to Collections.sortBy(), etc. This gives us the best of both worlds: a functional improvement to the language, and a backwards-compatible syntax so that existing code doesn't break. It is expected that most Collection implementations would fully implement the new methods, for reasons of performance.

This does bring up a few interesting questions though. With the new "super" interfaces, what happens if a class implements two interfaces, each of which has a default implementation of a given method? This sounds like the classic diamond problem of multiple inheritance. Also, will it be possible to extend built-in classes (via proxy perhaps) to support user-defined methods? This could be a clever way to support duck-typing, etc.

Extracting real value from Hadoop

This was a hands-on lab (the only one I had for the week), and covered some basic tasks such as writing a simple MapReduce job, interacting with the filesystem, etc.

One thing I hadn't worked with before was Hive. This is a system which allows you to map your data in HDFS onto virtual "tables" in Hive, and perform SQL queries on them. It's not very fast, but seems extremely powerful. This might be a very viable way to do ad-hoc queries of large Hadoop data sets vs. writing one-off MapReduce jobs.

Bridging Transactions from Java EE to .NET

I was hoping for a good example of two-phase commit using multiple languages, but unfortunately this talk didn't deliver. Some background was given on what two-phase commit is, the various standards on the .NET and Java sides (IEnlistmentNotification vs. XAResource), and finally interop strategies.

The conclusion seemed to imply that getting this working reliably is nearly impossible, and that in most cases you don't really need to do it anyway, even if you think you do.

JavaOne 2010 - Tuesday

2010-09-26T18:19:58-05:00

Read on for summaries of the sessions I attended on Day Two of JavaOne 2010.

Code Generation on the JVM - Writing Code that Writes Code

Traditional code generation techniques in Java are painful -- some examples include RMI stubs, JavaBean generation and WSDL generation. This session covered some alternatives to compile-time code generation which are gaining in popularity.

Project Lombok is essentially a pre-processor for Java source code which allows using (or abusing?) annotations to generate code such as property getters / setters and hashCode(), equals() and toString() methods. It works by manipulating the Java source AST during compile time to inject bytecode into the generated class files. It enjoys fairly good tool support within Eclipse, and can save a lot of typing.

Groovy takes a similar approach, but because all bytecode is generated at runtime, it is not dependent on special compilers or tools to function properly. Groovy supports code-generating annotations such as @Delegate, which enables easy composition of helper classes by generating bridge methods, @Lazy, which creates lazily instantiated properties, and @Immutable, which guarantees that an object's state cannot be changed.

This session turned out to be essentially an advertisement for Groovy.

Enterprise Service Bus - Lessons from the Field

This session was a joint presentation by Cisco and the NFL detailing some of the lessons they have learned by deploying ESB technology for NFL.com.

Some common misconceptions about ESBs are that ESB is equivalent to JMS, and that an ESB always constitutes a well-defined product from a vendor. Neither are true; JMS is one potential component of an ESB, and ESB solutions are almost always made up of multiple components from multiple vendors.

Keep it simple

A common pitfall in ESB design and usage is creating too many endpoints. Each new endpoint adds additional (and often exponential) complexity to a system, and at the very least adds more moving parts and therefore more things which can go wrong.

Pipelining for performance

NFL uses Mule as their primary ESB product, and takes advantage of Staged Event Driven Architecture (SEDA) to effectively pipeline requests by splitting each step into a separate queue. This works the same way that CPU pipelining does, in that you gain effective parallelism simply by processing multiple stages of multiple requests simultaneously.

What NOT to do

Don't use an ESB as an indirection layer or simple passthrough proxy. While tempting, this just adds another failure point and extra complexity to your system.

Also, don't use an ESB as a cron daemon (especially for large, monolithic jobs). Instead, decompose your problem into smaller components. There's nothing worse than finding our your daily batch job is taking 25 hours to complete and you can't scale...

Finally, don't use an ESB as an application glue layer. There are much simpler, scalable tools for that.

What TO do

Use your ESB for data integration (validation, routing, augmentation, transformation, and service invocation).

To validate or not

Validation of input data can be done strictly (using XML schema) or relaxed (ignoring unknown values). Strict validation might seem like the way to go, but even simple changes in data can cause serious problems downstream if it's done too strictly.

Event model - Poll vs. Push

Polling is easy, but often non-performant and has high latency. Push is faster, but much more complex. Push also requires guaranteed message delivery, and third party integration can be difficult. Basically, use the best tool for the job, but don't over-engineer it.

Security

As ESB technology grows within an enterprise, security needs to be paramount, not just an afterthough, especially if it will be interacting with endpoints which are exposed to the outside world.

Scalability

Make sure your long-running processes are broken up and use checkpointing to allow failed processes to resume processing without completely starting over.

Comet and WebSocket Application Scaling

I was really looking forward to this session just to learn a bit more about WebSocket support in HTML5. It started out with some brief definitions of what Comet applications are (server push) and some ways to implement Comet support in applications.

Rich user interfaces often have a need for server-side events. AJAX applications which respond to client-side events, such as button pushes, are fairly straightforward to develop and don't really have major impact on server application architecture, but handling events which originate outside the client and need to be pushed down in a timely manner are more difficult.

Polling

Polling is very easy, but tends to have high latency, especially when the polling interval is short, or the number of connected clients is high. High-traffic polling is almost indistinguishable form a denial-of-service attack on an application server. In one example, if 1000 clients are connected, and we poll at 20 second intervals, we get 50 requests per second, which is manageable. However, this gives a 10 second average latency, and 20 second maximum latency, which may be undesirable. To get low latency (less than 1 second), we could increasing our polling frequency to once every 0.5 seconds, but this now results in 2000 requests per second, likely leaving our server CPU or I/O bound.

Long polling

Long polling is the process of holding a connection open to the server until an event occurs. This tends to be less resource intensive than polling, but now has the consequence of keeping many open sockets and processing threads tied up on the server. This is important as on a 64-bit JVM, the default stack size is 1MB, which can easily cause large amounts of memory to be consumed on idle threads. Using our 1000 client example, we would have 50 requests per second with low latency, but could potentially use over a gigabyte of RAM holding stacks for idle threads.

Deployment issues

Apache doesn't scale to limits like this. Servlet containers typically don't either. Basically, a new server application architecture is required. Jetty pioneered this with asynchronous servlets (originally designed for an ActiveMQ web transport), and Dojo developed cometd. The Servlet 3.0 specification standardized this as web continuations.

Using these new techniques, long polling gets much more efficient as threads are released into the pool until data is available to send. Servers are generally limited by OS limits on concurrent connections now, instead of RAM, CPU, or I/O, making them much more scalable.

Enter WebSockets

WebSockets are then new (soon-to-be) standard for performing complex communications over HTTP. Essentially, WebSockets allow an HTTP connection to be upgraded to a standard socket connection (with some limitations), allowing for efficient bi-directional transfer of information between client and server. Unfortunately, the specification is still in flux, and likely won't ever be fully supported at all intermediate routers, so Comet will still need to be used as a fallback in many scenarios.

Don't reinvent the wheel

Doing all of this yourself is hard . Most developers wouldn't think of writing to XHR directly for AJAX applications today; it's a horrlbe interface and is better left to JavaScript abstractions to get things right. Comet programming is the same way, and while the frameworks are still being developed, some, such as Cometd Bayeux , are coming along nicely. Using a framework also allows the programmer to be insulated from browser quirks and environmental issues such as lack of availability of end-to-end WebSocket support.

Speedy Scripting - Productivity and Performance

This session was a general overview of scripting languages on the JVM, and their various performance characteristics.

Scripting languages have some benefts over traditional compiled languages (simpler, less boilerplate code, etc.). Compiled languages have benefits too, such as better type checking, error checking and performance.

However, performance is not the only concern in language design. Productivity, maintainability, and correctness are too. Java itself may be verbose, but static typinc does catch a lot of bugs that would otherwise go unnoticed. Scripting languages can be faster to market, but can suffer from maintainability if overused.

OpenJDK BOF

This session was an informal Q&A on OpenJDK, the open source components of the JDK. Some interesting facts came out of this session, such as the statistic that 98% of the code in OpenJDK is shared with the official branch. The remaining 2% consists of some font rendering code which Sun purchased years ago, and some Corba and SNMP code. The official JDK nightly builds are first pulled from the OpenJDK repository, merged with some code in a non-free repository, and then built from source. This means the official JDK is really OpenJDK+some bits, so using OpenJDK in a production environment, especially if it gets regular patching from a distribution, is pretty close to standard Java.

Scaling data processing with Java in the cloud

This session was a case study by WeatherBill on moving from a hosted data center to a cloud. WeatherBill is a weather-based insurance company which sells policies against poor weather to farmers and vacationing tourists. This naturally involves a lot of statistical calculations on some large data sets. As WeatherBill grew, they quickly ran out of capacity on their hosted servers and turned to Amazon EC2 for help using Hadoop. This proved ideal, as a very large cloud could be provisioned on-demand to handle forecasting tasks, and torn down when no longer needed. As an example, the first trial run of around 100 cpu cores cost only $500 to execute, far less than an always-on 24/7 local cluster would take.

JMS - Time for 2.0?

This was an exploratory session hosted by the JMS specification leads asking what, if any changes, should be made to a future JMS 2.0 specification, since JMS 1.1 was last updated in 2003. Topics under discussion included API changes, updates to descriptions of newer messaging technologies, and better integration with modern JavaSE APIs (JMS predates generics and does not make use of Collections).

While this was an interesting session, it's clear that there is no roadmap currently for JMS 2.0, and it will probably take at least a few years to put together anything resembling a specification. Still, it was nice to see that there is some life there after all, as the session was very well attended.

That's it for Day Two of JavaOne. Check back later for more updates.

JavaOne 2010 - Sunday and Monday

2010-09-26T17:30:48-05:00

JavaOne 2010 is over now, and I intend to post a write-up of each session I attended throughout the week. Read on for my impressions of the opening keynote Sunday and the Monday sessions.

Introduction and Opening Keynote

JavaOne and OpenWorld were really two separate events this year, despite being marketed as a combined conference. Those of us who are JavaOne attendees are segregated off by ourselves half a mile from the Moscone convention center at the Hilton. When I tried to get into the opening keynote, I was promptly denied entry, and told that "you people" need to go over to the Hilton and watch it via closed-circuit TV instead. Fortunately, I arrived early enough to make the trek back up the hill and still get a decent seat.

They did at least have the room catered, which turned out to be fortunate since Larry Ellison's keynote address went nearly an hour long. Those watching from Moscone were not so lucky; they had to sit through the whole thing. Oracle definitely doesn't have the same flair for putting on interesting events like Apple does. Perhaps Larry and Steve need to spend some more time together?

As for the content of the keynote, it started with a short introduction, followed by a nearly hour-long advertisement by HP. I wonder how much they paid Oracle for an effectively captive audience? At about 6:45 Larry finally got on stage and started yet another hour-long demo on Exalogic, Oracle's new "cloud-in-a-box" product. I can't say this was particularly interesting, especially since it's all been done before. At least the twitter feeds for #javaone10 and #oow10 were fun to read, even if they were a bit harsh.

When the keynote finally ended, they gave away 10 Kindles and 10 Blu-Ray players to people who actually sat through the whole thing. Unfortunately, since you had to be present to win, it took another hour to draw enough tickets to give away all the prizes. I'm certain they drew at least 250.

The first of 25 sessions I would attend this week was finally over. Hopefully, the rest of the week would be an improvement.

JDK 7 and JavaSE 7

Mark Reinhold gave a very interesting presentation on the future of Java 7 Monday morning, full of technical details and a bit of humor. As has been reported widely by this point, Java 7 will ship in mid-2011, but with a reduced feature set from what was originally intended. Java 8 will ship in late 2012 with the remaining features which didn't make Java 7. This plan was received fairly well by the audience, as I think most of us at this point just want to see something ship before we all retire or start coding .NET 100% of the time.

Some things that will make Java 7 include several small language changes from Project Coin (more on that later), better multi-core support via the Fork-Join framework, and several performance and productivity features from JRockit. Things that wo't make the cut include the Project Jigsaw modularity work and the closure support from Project Lambda. Both of these features still need considerable work before they are ready to include in a Java JSR.

Speaing of JSRs, Oracle has committed to delivering a full set of JSRs for Java 7 and Java 8 to the JCP, ending speculation that the future of Java might be in a perpetual JDK build without a formal specification. Overall, I think this is very good news. It adds some clarity to a process which has become increasingly clouded.

Project Coin

Project Coin is the umbrella project for a wide variety of small languages changes designed to make the Java platform easier for developers to use on a daily basis. This presentation covered about a half-dozen new features which will hopefully be included in Java 7.

Strings in switch

Quite simply, the switch statement now supports using strings in addition to the already support integers and enums.

Binary literals and underscores in literals

This isn't going to change anyone's life, but does make things a bit easier to read when dealing with low-level constants:

int x = 0b1100_0001;

int oneBillion = 1_000_000_000;

Generic type inference (diamond operator)

Instead of typing this:

Map map = new HashMap();

this works:

Map map = new HashMap<>();

This isn't a major win with something this simple, but once the generic parameters become more complex, it makes things much more readable.

Multi-catch

You can now catch several distinct exception types at once:

catch (final IOException | ClassCastException e) {...}

This should save considerably on duplicated catch blocks. In addition, the compiler will now be smart enough to determine that rethrowing a caught exception in this context now works more appropriately:

try { ... } catch (final Exception e) { log(e); throw e; }

This will not automatically cause your method to need to throw Exception, as the compiler can determine which exceptions could have been caught, and because of the final modifier is smart enough to know that no new exception types are being introduced. This will make checked exceptions much less of a pain to deal with.

Try-with-resources (or automatic resource management)

This is, IMO, the most significant change in Project Coin. Those of you who have used C#'s using keyword will no doubt find this familiar:

try (InputStream in = ...; OutputStream out = ...) { ... }

This is essentially shorthand for a try/finally block which closes the streams. It properly handles partial initialization, and captures suppressed exceptions thrown by the close() methods (available via a new helper method on Exception). Under the hood, this functionality is available via a new interface, AutoCloseable, which is a new superinterface of Closeable, but declared to throw Exception instead of IOException. Since interface implementations can narrow the scope of declared thrown exceptions, this allows any class with a close() method to be easily retrofitted to implement AutoCloseable. As of the current JDK builds, JDBC interfaces have been updated to include this functionality as well, greatly simplifying cleanup of database resources.

Tuning SOA Infrastructure

I was expecting this session to be a general overview of common performance issues encountered in SOA, but it turned out to be mostly a WebLogic tuning session. Since we don't use this anywhere, it didn't seem very applicable.

There were a few useful (though obvious) bits of advice, such as don't send huge messages through JMS (who knew?), and avoid calling asynchronous services from synchronous code (such as web UIs). This anti-pattern, while common, causes all sorts of issues such as timeouts, extra overhead, stuck threads, etc. It also significantly complicates fault handling.

Maven 3.x

This session talked about a lot of new features which are coming in Maven 3.0, which is going to be available soon. The current roadmap calls for Maven 3.0 to be released on or about October 1, with m2eclipse 1.0 arriving 6-8 weeks later.

Much of the work for Maven 3 has been under the covers, doing such things as unifying the plugin model used by Maven and several other projects Sonatype works on, such as Nexus and Tycho. The Maven dependency resolution code has been abstracted out into a standalone library, called Aether, which can be used in other applications to automatically fetch Maven artifacts from repositories. This could be an interesting approach to server deployment.

User-visible features include the Maven shell, which provides a very fast, reactive interface to building projects, and Maven polyglot, which will soon allow writing Maven project files in other languages.

JavaOne Keynote

The JavaOne keynote address Monday night was actually fairly good. Doug Fischer from Intel gave a short presentation on Java performance improvements on the latest generation of Intel processors and then it was on to Thomas Kurian from Oracle. Much of the information presented was also a part of the Java 7 session earlier in the day (in fact I think they worked from some of the same slides).

The keynote concluded with Kurian asking everyone to put on the T-shirts we had been given upon entering, which stated simply "I am the future of Java". The message was clear; developers are the heart and soul of the Java community and Oracle knows it.

JavaSE Q&A BOF

The first BOF (Birds of a Feather) session I attended, the JavaSE Q&A was pretty interesting, as many of the lead developers on the project were present to answer questions. There were a few general questions related to garbage collection, but more on the future of the platform and what will become of JRockit now that Oracle owns both of the major JVM implementations...

Neither product is going away, at least in the short term, but since there are now more developers at Oracle who are familiar with the HotSpot code base, that will become the long term product, with major features and performance improvements ported to it from JRockit. This is good news for the Java development community at large, as we will gain some new tools (specifically the Mission Control product for JRockit) in the official JDK.

There were also some questions on the future of OpenJDK. In short, no major changes are expected, as the Oracle management appears to understand the value of maintaining that as open source.

CXF vs. Axis BOF

The second BOF of the evening, this turned out to be a tech demo illustrating a common Hello World service using both CXF and Axis, two leading web service frameworks. Axis has been around a lot longer, but is starting to show its age, while CXF is the newcomer, but has a cleaner, less intrusive design.

Conclusion: Use CXF.

DI Flavors in Spring 3 BOF

In the final BOF of the evening, we were treated to a demo of some of the new configuration flavors available in Spring 3. In addition to the traditional XML-based configuration, Spring 3 can be configured using custom namespaces, annotations (both custom and JSR-based), and even pure Java code with no XML at all. The presenter gave several good use cases for each approach, and even gave examples of combining the different methods to achieve more flexible configuration.

10:30pm, and day one of JavaOne had finally come to a close. Stay tuned for more summaries of JavaOne 2010!

IE 9 Beta First Impressions

2010-09-16T20:09:51-05:00

So... I just finished installing the beta of IE 9 in a Windows 7 VM and so far, I'm struck by a few things...

First, it's basically the same UI as early Chrome builds (minimalistic), but in Microsoft Blue ^TM instead of Google Blue ^TM . There are some minor differences, like the fact that the tabs go on the same line as the address bar, and confirmation dialogs pop up from the bottom instead of down from the top.

Second, performance, while decent, is nowhere near what I would have expected given the hype this browser has received. I tried out several of the HTML5 demos on the Microsoft IE9 Demo Site in both Chrome 6 and IE9 and neither one could manage more than a paltry 10fps on most of them, and this is on a fairly fast Mac Pro. To be fair, IE9 was probably somewhat hampered by running in a VM, but does Microsoft really expect us to believe that by the time IE9 ships, these demos will be smooth as butter? Seems like they may be a bit ambitious...

Finally, standards compliance. There's still some bugs, but this is so much further ahead than where IE8 (or any previous Microsoft product) was, that I'm fairly certain pigs are going to fly and cats and dogs are going to start having little barking fuzzy offspring... I never would have believed it possible. Too bad that it's still going to take at least a decade until we can be sure that IE6, IE7, and even IE8 will finally be a distant memory, especially since IE9 will not run on XP or Vista.

In short, I welcome Microsoft back to the web, and at the very least, this will someday make us web developers' lives easier. And, it should at least make the web a bit less of an ugly place for users too (especially the IE6 crowd).