The goal of the cloud visualization algorithm was to provide an aesthetically pleasing visualization of Tag cloud. As packing shapes is always a problem with a high complexity I needed to come up with a solution that is also fast. The desired layout was inspired by Wordle, but as Wordle is unfortunately closed source I needed to build the algorithm myself.

The algorithm

The resulting algorithm is fairly simple. The intuitive idea is to pack the words as close as possible to the center while they may not overlap. For simplicity we assume that we do not try to place the real "shape" of the rendered words but just the bounding box, which is a rectangle. So we break this problem down to placing rectangles in a plane.

As the algorithm needs to be fast we can't come up with the optimal solution which would be NP-hard but rather use a greedy approach to produce a reasonable result. The core idea is to remember the remaining rectangular spaces in the plane and place each rectangle in the space that is closest to the center.

Input is a list of words with each having a specific font size. The set of remaining spaces is initially an infinite rectangle covering the whole plane, later on the set is always sorted by increasing distance to the origin.

  • For each word in the list get the bounding box of the word.
    • Walk over the ordered remaining spaces until the bounding box would fit into the current one.
    • Align the bounding box within this space as close to the center as possible.
    • Walk over the ordered remaining spaces. For each space that is overlapping with the placed bounding box, remove this space and compute the up to four new spaces that result from cutting out the bounding box (top, right, bottom, left).
    • Add these new spaces to the remaining spaces.
    • Draw the word into the placed bounding box.

Possible Improvements

  • Ordering the list of incoming words by descending font size.
  • Rotating every second word.
  • Dramatic speed improvement: After all new spaces are computed remove those spaces that are completely contained within another space.
  • Kick out spaces that are smaller than a threshold.
  • Instead of using the bounding box for the complete word, use the bounding boxes of each character.
  • Add some margin to each bounding box so there is some space between the words.
  • Of course the whole algorithm can also be implemented by using a force directed approach.




Updated: 12/15/2015 23:12