An Open Letter to Jeff Bezos

Dear Jeff

At Meteorite BI we are a small software vendor who love to write open source software and enable other businesses gain value from their data, much in the same way you like to empower your users with the amazing array of tools you offer in the Amazon Cloud.

We recently asked our users to fill in a questionnaire and nicely one of your engineers told us that you were using Saiku in a number of departments in your company, which is great to hear, I really hope they find it useful. What would be better though was if your engineers also took part in the community side of open source instead of just consuming it. I’m sure we aren’t the only developers whose software your company runs internally for business purposes, and part of the whole open source ecosystem is about building great products with an even greater community.

So Jeff, I understand your developers can’t help on every project they use but can you please send out a memo to your engineers who use open source software and remind them to take a few minutes in their day to help give back to the small projects like ours who rely on support from large businesses like yours to help us grow and prosper just like you have, fix a bug, implement a cool new feature, answer a mailing list question, update the docs or translate them into another language, it all helps!

Thanks a lot.


Accessing your beans in a ServerEndpoint class using Apache Karaf

Its been a while since I wrote a blog post and even longer since I wrote a technical (of sorts) one, but here we go. I’ve been wanting an excuse to play with Websockets for a while and I’ve finally got one, we’re designing a new product, and hey, why no use “new tech”.

We had already decided to use OSGI and so we chose Apache Karaf which is a great OSGI framework. I was chatting to the guys on the #apache-karaf IRC channel last week and said how I was planning to use CXF at which point I was told there was support in PAX-Web and Jetty so why not use that as it would probably involve less code. Great I thought. So this week I tried to get it working, I created a test bundle and started adding stuff to it.

Then it came to wiring it all up. So I needed to inject a bean into my endpoint to lookup some stuff, not an uncommon scenario, but it would appear it wasn’t really a scenario that was considered when the websocket spec was drawn up and they neglected to define scope status(or something like that). The rest of my bundle was wired up with Blueprint, so I enquired as to how you would do it because the websocket is annotated with @ServerEndpoint and you don’t define it anywhere, “Use PAX CDI” was the response. Okay then, how do I do that, so I wired it all up with the @Inject annotations and worked out how to bootstrap PAX CDI. But my bean was always null, so I figured CDI wasn’t working properly and I spent a long time trying to figure that out, to the point where I tested the same class as a servlet and that worked absolutely perfectly. Weird I thought.

Turns out, if you google for it, without EJB support you can’t really do it. Bummer. So how do you inject data into your Websocket endpoint? Surely its possible? Well sorta. In the end we came up with this plan. Karaf has excellent JNDI support, I’m used to using JNDI for data sources, but you can reference OSGI services in the same way. So why not publish a service as a JNDI named definition and get it that way?

So in your service you can add something like this to the constructor:

try {

Context context = new InitialContext();

context.bind("fmclient", this); }

catch (NamingException e)

{ e.printStackTrace();


This simple blob will register your bundle service as a JNDI resource. Finally in the webservice all we have to do is:

Context context = new InitialContext();

FileManagerClient myBean = (FileManagerClient) context.lookup("fmclient");

And this will lookup your JNDI service and allow you to access it from the ServerEndpoint. So after much hacking and head scratching, and some grumbling at anyone who would listen(I apologise for that), we have beans in websocket classes with minimal fuss. You can find the full code here.

Selling Open Source Software

We had a management meeting a week or so ago, the details of which I shan’t go into, but the crux of the matter as ever is, making money from open source software is hard especially when you work in a small company, luckily we run the consultancy as well which gives us a second revenue stream.

People have this conception that because the software is “free” there isn’t a requirement to pay for it, or they may want to but they’ll get round to it one day and in the mean time just continue asking the odd question on the forum, or sending a random support request to the clearly non support centric, info@ email address. (I know what its like, I’ve had a few of those bosses, or Freetards as one Open Core BI company likes to refer to them).

This leaves an open source development company in a bit of a conundrum, do they go for bulk and try and signup as many of these “little fish”  as possible to generate enough revenue to pay the developers or do they go after a few “big accounts”? The smaller companies are certainly easier to talk to, but usually they have little to no budget and they are using open source software because they think it will save them money in the long run (and they are probably right) and getting money out of them is like getting blood out of a stone. The larger companies certainly have the money, but the bureaucracy and white paper takes forever and by the time you get a contract signed, you’ve wasted 100’s of man hours on chasing them up. Or even worse, you waste those man hours then they decide to go else where.

Employing marketing and sales people would certainly help as they have a background selling things and would free up the developers to go develop. But you also need to be able to pay the marketing person, it’s a bit of a vicious circle.

We have a thriving community using Saiku, we know we have 1000s of daily users, we have people on the IRC channel 24/7 and people asking plenty of questions each day on the forums, and those open source fanatics are correct, money isn’t everything, but support from the community is and whilst some people do help out with answer questions and the like in 3 and a bit years of Saiku being on Github, we’ve have 91 code contributions from the community that’s just over 1 external contribution a fortnight.

There comes a time when you have to ask yourself, what’s the point? The point all those years ago was to remember how to program and to create a tool that wasn’t as rubbish as JPivot. Well we managed that and I feel all the better for it.

I can’t help but feel some irony with our project, we set out something to create something easy to use, and we achieved it. So when you talk to people about Saiku, and ask them why they don’t want to pay for support or sponsorship, their response is often “we don’t need it”.

So what next? There is nothing I would like more than to keep Saiku completely open, make a competitive salary doing full time Saiku development work with Paul and taking over the world of analytics with Saiku. But lets be realistic do we continue putting time and effort into a tool which people love to use, but don’t want to pay for? Do we change the licensing model? Get outside investment? Or do we release it to the community and go do something else that makes us a decent living?

At the end of the day, we all want to make a living, and there comes a point where you have to make a decision. Do you listen to your head, or your heart?

Saiku 2.5 released to the wild.

So after many months of us ignoring our mantra of release early and often, we have finally got around to packaging up and distributing Saiku 2.5, hurrah!

Many of you will already be using Saiku 2.5 because its been pretty stable for a while now, but for those of you who have yet to upgrade, please do, we have a very extensive feature list for Saiku 2.5.

New Features

  • Charts: new chart types (heatgrid, area, dot and others) , improved settings, upgrade to ccc2, improved result, basic chart export
  • Spark Lines / Spark Bars
  • Top count (and other limit functions), sorting, filter methods in drag and drop query model
  • Improved parent child hierarchy support (fixed major bugs, show children)
  • Repository improvements (search, sorting, permissions)
  • Direct export endpoints to xls, csv, json (including parameters – all with 1 http call)
  • Improved encoding (use caption, but works with name attribute now as well)
  • Improved i18n – fix chrome issues, more options, languages: Croatian, Hungarian, German and more
  • Experimental PDF export
  • Use last result for selections if possible (e.g. filter on 2003 and select all months, month dialog will show only months for 2003)
  • Mdx editor now uses ACE (run query with cmd + enter / ctrl + enter), prepare for auto completion / mdx syntax highlighting
  • Open / save dialog for queries
  • Performance tuning
  • Save chart / table options (persist viz)


  • Datasource processor (modify datasource definition before connection establishing, e.g. role access to datasource), connection processor (modify the existing connection object. e.g. programmatic role)


  • Improved mondrian sharing, plugin path conventions (drop into solutions/saiku/plugins and it will be picked up by UI, easily update plugin without loosing extra plugins)
  • Run saiku query in pentaho scheduler to generate xls

New website

On top of the new software release we have also released a brand new version of our website and it has moved house! As part of our on going marketing overhaul we have decided to integrate the Analytical Labs brand into our Meteorite BI group, this doesn’t mean the software is changing in any way, we just want to make it clearer to people that we offer more than just free software and that if they require support or help with Saiku on a commercial level then we are available to do that. Redirects will remain in place for the foreseeable future, but update your bookmarks to


We now offer a number of different Saiku related support packages. Whilst Saiku is open source and liberally licensed at that, we have to earn a living, and one way of doing this is by providing support resources to businesses who like to have a nice cushion to land on when things don’t go quite as they expect. We offer both UK hours support packages and 24/7/365 packages depending on the level required, these are detailed on our new website and should you want to discuss options feel free to email us at


Along with support contracts, we have also added a new sponsor page, free software is nice, but someone has to write it. We ask that if companies are using Saiku in a commercial environment but don’t require support that they consider a sponsorship option as a way of supporting us and continuing the Saiku development. Also if you think Saiku is missing some vital feature we also have a number of sponsorship options for feature requests, again if you want to talk to us about that mail us at


Lastly, as ever, we believe in fostering and nuturing a great Saiku related community whether it be for Saiku OLAP or Saiku Reporting. If you find a bug please report them to the issue trackers on Github(olap, reporting), and if you have any technical questions, please feel free to swing by our forum at

Thanks a lot for the support over the years and we hope that you all enjoy the great new features in Saiku Analytics 2.5!

Trout, out.

Processing MongoDB data using Pentaho Data Integration

So having read by Ross Lawley and having had a chat with Matt Asay when he was in London about the state of ETL tools when it comes to transforming random datasets for document stores, I thought I’d take a stab at doing a PDI version of the same blog post to save people who don’t know python having to do data transformation via a script (programmers love scripting, us data guys like dragging and dropping…).

Anyway this being a fresh box I thougt we’d step through it from the top so first you need to install MongoDB if you don’t have it. You can use one of the many repositories for Linux users or you can install the tarball. I like bleeding edge and I don’t like having to mess around with repositories and dependencies so I went with the tarball.

curl > mongodb.tgz
tar xvfz mongodb.tgz -C /opt/mongo
cd /opt/mongo/bin

Once the mongo server is up and running you can then test the shell with:


Once I had mongo running I then set to work on a couple of transformations, I wanted to prove the ability to get data into and out of MongoDB. Although I guess in the real world there would be far more data extraction and analysis I thought it would be good to demonstrate both.

So first up I got the same dataset Ross was using:


Then I set about processing it with Pentaho Data Integration.

As you can see from this screenshot:


The process was reasonably painless apart from the lack of real support for dynamic key pairs within PDI, I get the feeling this is possible using the Metdata step but I was having issues trying to map all the keys to fields so I gave up in the end and just selected a few. I also wanted to do this with no code, I nearly succeeded but my denormalization skills let me down slightly and I needed to clone some fields where the ID and coordinates were null, but I was close enough.

Once all that was done I then configured the MongoDB output step to create the same data structure as Ross:


As you can see I make use of some Array stuff with the coordinates.

I then hit run and waited a few seconds.

Thanks to Ross he had put some test queries on the website so I checked my data load with these queries, amazingly enough they worked just fine, including the geospatial stuff, so I must have got the structure correct :)

Finally I wanted to show data extraction, I thought about postcode lookups or something similar, but as the postoffice likes to sell this stuff for a lot of money, I decided to just extract my data and render it to a spreadsheet.


As you can see from this transformation there isn’t much to it it just replicates the pub name demo query but in ETL land:

     {"_id": "$name",
      "value": {"$sum": 1}
  {"$sort": {"value": -1}},
  {"$limit": 5}

You get the data using a MongoDB input step and then parse the data using the very easy to use JSON input step and then ETL to your hearts content.

Demo transformations are attached to this blog post.


Installing CTools on Pentaho BI Server hosted in Meteorite Cloud

Until we get time to integrate the Ctools installer into Meteorite Cloud so people can pick various options at build time, there are a couple of ways to use Ctools products on Meteorite Cloud.

Option 1: Pentaho Market place

Pentaho Marketplace is a relatively new addition to the Pentaho stack and allows people to install plugins from the marketplace repository by clicking on a few buttons.


But, like many, you may want to run the development versions of these products to get bleeding edge features or customisations applied. To do so you need option 2.

Option 2: Ctools installer

To use the Ctools installer, it isn’t much different to usual apart from some minor changes I shall describe below.

First of all you need to spin up your Meteorite Cloud Pentaho server:

Then once it has build you need to checkout the repository for example:

git clone

The first checkout will be a sizeable 180mb so you may have to wait a while, but going forward the checkouts will be much faster as you will just be committing the changes back to the repository.

Next we need to get a copy of the Ctools installer this is done by cloning the git repository for it:

git clone

Now we can install the Ctools components to our Pentaho solutions directory:

./ -s /home/user/Projects/pentaho-solutions/pentaho-solutions -b dev -y

This tells the ctools-installer script to install the packages into our freshly cloned pentaho solutions repository, using the development branch and for speed, accept all prompts as Yes. It should then run. CDC is currently not supported as it wants access to the Pentaho webapp which we currently do not provide (this is in the pipeline).

Once the packages are installed you can then move back to the pentaho-solutions directory. If you run

git status

you should then see lots of new and modified files, this confirms that installation was a success. Next we need to commit these files back to the server to do this run:

git add *
git commit -a -m “new ctools installation”
git push

Again because of the amount of changes the installer has made the upload may take a little while. Grab a cuppa whilst you wait.
Finally we need to restart the Pentaho BI server this is done in the Meteorite Cloud node control panel:

Click on Restart BI server.


Give it a minute or two to restart and then log back into the Pentaho BI Server.
Once you are back in you should now see something like the following:


Ctools components are now installed for ultimate Pentaho BI server hackability.

My experiences with Play Framework V2

As I have a few minutes to spare I thought I would write a piece about my experiences with the Play Framework version 2.

For those of you who don’t know what the Play Framework is, where have you been? Originally based on the RoR paradigm, Play Framework offers a flexible REST based web development environment for Java and Scala.

The good:

  • Scala(or Java)
  • Flexible data persistence
  • Hot reloading
  • Plugins
  • Dependency management(SBT)
  • Debug mode
  • Scala console
  • Akka integration
  • Memcache based caching layer

The bad:

  • Clunky templating
  • No easy way of overriding the routing(I’ll explain later)
  • Deployment to production

The ugly:

  • SBT

So there is a brief list of things that spring to mind, I think I explain by working backwards through them.

SBT, where to start with SBT? Its an acronym for Simple Build Tool, which is a blatant lie. If you want to understand SBT configuration files, you need a PHD. For example syntax like += and := are hardly the easiest to decipher for someone new to the framework. I appreciate its going for minimalism unlike the ANT and Maven rather verbose file formats, but at the same time the learning curve for SBT is quite a bit higher, give me Maven any day of the week.

Deployment to production: this I can’t really blame on Play Framework since its ahead of its time, but it does make life harder. Play Framework 1.x used to be easy to create a WAR file and deploy it to your favourite servlet container, 2.0 makes use of non blocking IO and websockets for some communication and the old servlet container spec to 3.0 does not support these standards. There is a WAR file plugin that will generate a WAR for you, but if some stuff doesn’t work, whats the point? Similarly, Apache HTTPD does not support websockets, so if you are wanting to use a proxy to deploy your application, you are forced to use Nginx or similar.

Clunky templating: Play framework version 1.x used to make use of Groovy when creating its templates, you passed in variables and the templating engine could turn these variables into HTML blobs at render time. It worked quite well, but it wasn’t typesafe. Developer X would change backend code Y, but not update template Z and boom, you deployed to production only to find it not working. In Play Framework version 2.x they have overcome this and other issues by using Scala based templates, backing classes are generated at compile time to pass in the require objects and so typesafety is ensured. But, its a pain in the ass, getting the server happy with the imports, the parameters declared at the top of the template etc is a never ending cycle of trial and error, and one day it will work, another day not. I know a lot of this is to do with it being in dev mode and trying to construct these classes on the fly, but its a pain. Also you make one small change to any part of the View and the class needs recompiling and reloading which is time consuming.

So with the clunky templating in mind, I moved my UI to AngularJS which was fantastic but Play Frameworks routing got in the way. Again, not specifically a Play Framework issue but something that is tedious no the less. I wanted to ship my HTML and Javascript code in the same bundle to make development, testing and deployment easier, I also wanted to make use of AngularJS’s HTML5 mode which gets rid of the hashbang and makes the url look nice. But annoyingly of course whenever your route went to a non hashbang tag, Play took that as a requirement to render something and its own routing mechanism took over and tried to find the required route, and failed. It would be cool to tell it to ignore a specific routing subset entirely so you could just plumb in Angular or another framework of your choice.

So with those gripes out of the way whats good about Play Framework version 2.x?

Memcache caching, out of the box, with no extra effort, allows you to utilise a proper caching mechanism for your application and on top of that, because of how memcache distributes, if you use a clustered setup, you can easily add another node and everyone have a copy of the same cache.

Akka integration: I had issues getting my head around Akka at first, so much so I went and bought a book about it. But Akka is great, it allows you to create distributed job management clusters with minimal effort. Once you have an understanding of the code and how to pass the objects around, creating complex Akka jobs to manage nodes and clusters became pretty trivial. The only downside being the out of the box scheduler didn’t allow Cron style schedules for jobs so I found a 3rd party library to allow me to do that using the Quartz scheduler.

For those of us new to Scala the console mode is very useful to test out syntax and Scala functions, and its quick and easy to get to with Play:

play debug


And you are in.

Debug mode: rather than remembering those CATALINA_OUT options to get the JVM listening for a debugger, play debug, and all you have to do is connect your favourite IDE and debug away.

Dependency management, of course whilst SBT is annoying, having a dependency management framework setup for you and running out of the box on a new project is always nice.

Plugins, a mainstay of Play Framework version 1 and not gone away in version 2, plugins provide a great way to get additional functionality with minimal effort, add a dependency to SBT and install the plugin, as simple as that, SecureSocial is a great authentication tool.

Hot reloading, gone are the days of you having to reload your JVM everytime you change a line or two of code. Make your changes and refresh the browser, a few seconds later your changes have appeared. Of course with AngularJS on the UI, any HTML or Javascript changes were instantaneous.

In version one when it came to data persistance you really only had once answer: JDBC and SQL data stores. In version 2 they have really gone for ultimate flexibility, and it has really paid off I chose to use Squeryl as my JDBC driver, but that is by no means the (A)norm (sorry terrible joke), anyway there is nothing stopping you using a NOSQL data store and I would be sorely tempted to try out MongoDB on my next project as a Play Framework backing store as its document storage engine, lends itself to just this sort of scenario.

Now my final Pro, Scala. When I started using Play Framework 2.x I was a Java man, whilst I have used other languages over the years, my predominant language is Java. But not to shy away from a challenge and acknowledging that Play whilst it supports Java is mostly written in Scala, I decided to completely rewrite Meteorite Cloud in Scala to conform, and that is by far the best decision I have made on this project. The amount of lines of code I managed to throw away by Scala’s concise nature and syntax was amazing. Whilst I still use Java on a daily basis, the fact Scala is so easily used along side Java classes will make me really consider it every time I start a new project.

So there we go, my overview of Play Framework 2.x, feel free to leave a comment or two if you agree, disagree or think I’ve missed something.

Trout, out.