randomCoder : Building a tag cloud in Java

Building a tag cloud in Java

Posted by ccondit on 11/2/06 @ 1:14 PM :: Updated by ccondit on 6/27/13 @ 8:22 PM

Permalink

You've seen them. Maybe you like them, maybe not, but Tag clouds are here to stay (at least until someone invents something better). This article details how the tag cloud for this web site was created.

This site uses Hibernate (with Annotations) and Spring extensively, but while this article will make use of those frameworks, the same general concepts apply no matter what frameworks you choose to use (or not use).

The first step to bulding a Tag cloud is being able to map Articles to Tags (and vice-versa, which will be clear in a moment). The relevant code in Article.java is as follows:

private List<Tag> tags;

@ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE })
@JoinTable(
  name = "article_tag_link",
  joinColumns = { @JoinColumn(name = "article_id") },
  inverseJoinColumns = @JoinColumn(name = "tag_id"))
@OrderBy("displayName")
public List<Tag> getTags()
{
  return tags;
}

It might not be immediately obvious, but we need to map Tag objects to Article objects as well, in order to make our Hibernate queries easier to write. The following code in Tag.java accomplishes this:

private transient List<Article> articles;

@ManyToMany(
  cascade = { CascadeType.PERSIST, CascadeType.MERGE },
  mappedBy = "tags")	
@OrderBy("creationDate DESC")
public List<Article> getArticles()
{
  return articles;
}

There are a few points to note about this code. The @ManyToMany and @JoinTable annotations are used to set up the link table containing the primary keys for each entity (in this case article_tag_link ). Many-to-many relationships in Hibernate need an owning side. The @JoinTable and mappedBy entries in Article.java and Tag.java respectively establish the Article as the owning side. In fact, the articles collection in the Tag class is effectively read-only; no changes there will be persisted to the database.

Also, since the articles collection is not necessary for updates, we mark it as transient so that when serializing our Tag class we don't pull in all the articles as well.

Now that our relationship is defined, we can create some named Hibernate queries to give us some useful information for our tag cloud. From Tag.java:

@NamedQueries
({
  @NamedQuery(
    name = "Tag.All",
    query = "from Tag t order by t.displayName"),
  @NamedQuery(
    name = "Tag.CountAll",
    query = "select count(t.id) from Tag t"),	
  @NamedQuery(
    name = "Tag.ByName",
    query = "from Tag t where t.name = ?"),
  @NamedQuery(
    name = "Tag.AllTagStatistics",
    query = "select t, t.articles.size from Tag t
         order by t.displayName"),
  @NamedQuery(
    name = "Tag.MostArticles",
    query = "select max(t.articles.size) from Tag t")
})

To make use of these queries, we define a few DAO interfaces:

public interface TagDaoBase
{
  public List<TagStatistics> queryAllTagStatistics();
  public List<TagStatistics>
    queryAllTagStatisticsInRange(int start, int limit);
  public int queryMostArticles();
}

public interface TagDao extends
  GenericDao<Tag, Long>, TagDaoBase
{
  public Tag findByName(String name);
  public List<Tag> listAll();
  public List<Tag> listAllInRange(int start, int limit);
  public int countAll();
}

public class TagStatistics implements Serializable
{
  private Tag tag;
  private int articleCount;

  public TagStatistics() {}

  public TagStatistics(Tag tag, int articleCount)
  {
    this.tag = tag;
    this.articleCount = articleCount;
  }

  public Tag getTag() { return tag; }
  public void setTag(Tag tag) { this.tag = tag; }

  public int getArticleCount() { return articleCount; }
  public void setArticleCount(int articleCount)
  {
    this.articleCount = articleCount;
  }	
}

We split our DAO into two interfaces because our use of GenericDao allows us to avoid implementing the methods in TagDao directly (they will be provided automatically by Spring using interceptors, which is beyond the scope of this article). However, this approach will not work with the methods in TagDaoBase .

The implementation of these methods resides in TagDaoImpl.java :

public class TagDaoImpl
  extends HibernateDao<Tag, Long>
  implements TagDaoBase
{
  public TagDaoImpl() { super(Tag.class); }

  public List<TagStatistics> queryAllTagStatistics()
  {
    return queryAllTagStatisticsInRange(0, 0);
  }

  public List<TagStatistics> queryAllTagStatisticsInRange(
    int start, int limit)
  {
    Query query = getSession()
      .getNamedQuery("Tag.AllTagStatistics");
		
    if (start > 0) query.setFirstResult(start);
    if (limit > 0) query.setMaxResults(limit);
		
    List results = query.list();
		
    List<TagStatistics> tagStats
      = new ArrayList<TagStatistics>(results.size());
		
    for (Object result : results)
    {
      Object[] data = (Object[]) result;			
      Tag tag = (Tag) data[0];
      int articleCount = ((Number) data[1]).intValue(); 
      tagStats.add(new TagStatistics(tag, articleCount));
    }
		
    return tagStats;
  }

  public int queryMostArticles()
  {
    return ((Number) getSession()
      .getNamedQuery("Tag.MostArticles")
      .uniqueResult())
      .intValue();
  }
}

The getAllTagStatisticsInRange call does most of the work; it calls the appropriate Hibernate query and then wraps the resulting tuple in a list of TagStatistics objects.

We now have all the information necessary to build our tag cloud: the tags themselves, the number of articles per tag, and the maximum number of articles that any one tag has. We could call the DAO methods directly from our presentation layer, but there is still some boilerplate code we would need to implement the full tag cloud. Since we will probably want to show this on several different pages, let's take things a bit further:

public interface TagBusiness
{
  public List<TagCloudEntry> getTagCloud();
}

public class TagBusinessImpl implements TagBusiness
{
  private TagDao tagDao;

  // ...

  public List<TagCloudEntry> getTagCloud()
  {
    List<TagStatistics> tagStats = tagDao.queryAllTagStatistics();
    int mostArticles = tagDao.queryMostArticles();

    List<TagCloudEntry> cloud
      = new ArrayList<TagCloudEntry>(tagStats.size());
	
    for (TagStatistics tag : tagStats)
    {
      if (tag.getArticleCount() > 0)
        cloud.add(new TagCloudEntry(tag, mostArticles));
    }
    return cloud;
  }
}

public class TagCloudEntry extends TagStatistics
{
  private int scale;

  public TagCloudEntry() { super(); }

  public TagCloudEntry(
    TagStatistics stat, int maximumArticleCount)
  {
    super(stat.getTag(), stat.getArticleCount());
		
    if (maximumArticleCount <= 0)
      scale = 0;
    else
      setScale((getArticleCount() * 10) / maximumArticleCount);
  }
	
  public int getScale() { return scale; }
  public void setScale(int scale)
  {
    if (scale < 0) scale = 0;
    if (scale > 9) scale = 9;
    this.scale = scale;		
  }
}

The getTagCloud() method and the new TagStatistics subclass, TagCloudEntry provide us with a simple list of tag cloud entries. Each entry allows us access to the underlying Tag object, but also a new property, scale . This is an integer from 0 to 9 inclusive which gives the relative popularity of this particular tag.

In our presentation layer, we simply call TagBusiness.getTagCloud() , place the result in the request attributes, and render our tag:

<!-- tagcloud.jsp -->
<%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<%@taglib uri="http://randomcoder.com/tags-escape" prefix="rcesc" %>
<div class="sectionHeading">Tags</div>
<div class="sectionContent" align="right">
  <div class="tagCloud">
    <c:forEach
        var="tagCloudEntry"
        items="${tagCloud}"
        varStatus="status">
      <c:url var="tagLink"
        value="/tags/${rcesc:urlencode(tagCloudEntry.tag.name)}" />
      <c:url var="tagClass" value="cloud${tagCloudEntry.scale}" />
      <c:if test="${status.index > 0}">::</c:if>
      <a rel="tag" class="tag ${tagClass}" href="${tagLink}">
        <c:out value="${tagCloudEntry.tag.displayName}" />
      </a>
    </c:forEach>
  </div>
</div>

This gives us links with class names like cloud0 , cloud1 , ... cloud9 . Adding a little CSS provides us with some visual distinction:

.cloud0 { font-size: 1.00em; }
.cloud1 { font-size: 1.10em; }
.cloud2 { font-size: 1.20em; }
.cloud3 { font-size: 1.30em; }
.cloud4 { font-size: 1.40em; }
.cloud5 { font-size: 1.50em; }
.cloud6 { font-size: 1.60em; }
.cloud7 { font-size: 1.70em; }
.cloud8 { font-size: 1.80em; }
.cloud9 { font-size: 1.90em; }

Using em for our unit of measurement allows the links to be sized relative to the current font size in the document.

And there you have it, a simple tag cloud. Since this code is in use on this web site, you can download all of it from the randomCoder subversion repository .

Resources

article selection

Posted by palmand on 5/22/07 @ 3:12 PM :: #970

how would you modify your query if Article had a boolean property "shared"; I mean I would like to build a cloud which include only the article whose "shared" property is true.

Re: article selection

Posted by ccondit on 6/18/07 @ 7:48 PM :: #1358

If you want to restrict the tag cloud to only shared articles, you would simply modify each of the queries to add " and t.shared = true". Keep in mind, this is example code, and probably would need some modification to be used in a real-life situation.

Thanx!

Posted by IGC on 11/14/08 @ 7:03 AM :: #5175

Thanks a lot for sharing. I've been browsing hundreds of webpages looking for a solution like this that would let me code my own tag clouds on JSP/Java. Your code will be very helpful.

JavaFX

Posted by Kris on 12/19/08 @ 7:03 AM :: #5601

It will be interested to see how to render this with JavaFX. Swing seems pretty obvious to me, make a Panel with the tag model behind it. The shifting of the cloud would be a little interesting though.

"rcesc" tag library

Posted by Aniruddha Deshpande on 10/16/09 @ 2:40 PM :: #8603

This article references a tag library with prefix "rcesc". It is not clear from the article where and how do I get this custom tag library.

"rcesc" is not clear

Posted by jed on 7/15/10 @ 3:54 PM :: #10048

pls can you tell us or clear us the use of "rcesc"

Comment on this article

Comments are closed for this post.