Programmatically administering users in Apache Karaf

So this is one of those bookmark type blog posts I’m just putting out there because I couldn’t find an example anywhere else and it took some figuring out.

I wanted to administer the users in Karaf programatically instead (or alongside) people logging into the console and typing stuff in, so I wanted to know how to get access to the Backin Engine that powers the user management within Karaf. This is how I did it(along with a bit of help from jbonofre on the #karaf IRC channel.

In my utility class I add:

private BackingEngine getEngine(){
  for (AppConfigurationEntry entry : realm.getEntries()) {
      String moduleClass = (String) entry.getOptions().get(ProxyLoginModule.PROPERTY_MODULE);
      if (moduleClass != null) {
        return backingEngineService.getEngineFactories().get(1).build(entry.getOptions());
  return null;

With that done I can then add a couple of setters and get the services via blueprint like:

<!-- JAAS Realm for user management -->
<reference id="realm" interface="org.apache.karaf.jaas.config.JaasRealm"/>

<!-- Backing Engine Service -->
<bean id="engineService" class="org.apache.karaf.jaas.modules.BackingEngineService">
    <property name="engineFactories" ref="engineFactories"/>

<!-- Backing Engine Factories -->
<reference-list id="engineFactories" interface="org.apache.karaf.jaas.modules.BackingEngineFactory" availability="optional"/>

With that done I can now edit users, clearly the get(1) needs adjusting to be dynamic but the premise works.

Hope it helps anyone in the same situation!

A message to all the Pentaho Consultants out there who use Saiku :)

I sent this email to a bunch of Pentaho Consultants today after some discussions we had yesterday with some of them. I will no doubt have missed a bunch of people, so here’s a public version for everyone to flame….

Hi guys

After some feedback from a few of you I figured I’d email you all directly to try and explain the situation and where we currently stand with Saiku and especially the CE version. I’ve also included everyone’s email address in the TO, I hope you don’t mind, we’re all friends(I think), and it gives people the opportunity to REPLY ALL if they would like to have a public discussion about the below. Similarly, if you want to take this email off line, just reply to me and I’ll be happy to discuss it with you.

As you are all aware the CE version is Apache licensed and very permissive, which means you and your clients are all free to do pretty much whatever you like with it, and that is the way we like it. Those of you who know me well will also know that I don’t like the CE & EE model and would rather everything just be open source and free to use and we made money of support and sponsorship instead of license sales but that sadly isn’t the case. Before we launched EE we tried offering support and no one was interested because Saiku “just worked”, we aim to make life simple for people and that is probably the biggest downfall with the application because then the customers you work with aren’t interested in investing because “its working, why pay for support?”. I realise that I have no god given right to make money from an Apache Licensed project, but similarly, I’m also not duty bound to support the open source core if I choose not to, but this isn’t something I’d want.

So that leads us to where we are today, Saiku standalone generates a decent amount of leads and we are finally seeing a return on it, but Saiku CE & EE for the BI server generate next to nothing in terms of useful leads or revenue, yet they take up a lot of time and energy to maintain, especially now we have to support Pentaho 5 & Pentaho 6 in different plugins.

If you want to see Saiku CE remain in the BI server, we’re going to need the help of the Pentaho consultants to make that happen, because currently we are putting lots of effort into software that we then just give away for nothing, and have no chance of recouping any revenue on to pay myself or Breno. Thats not to say Saiku would go away from the platform, but if things don’t change there is a large chance it would go to an EE only plugin because at least then if people used it, it would generate revenue for us.

Which leads me onto my next point, and one that some people didn’t seem to realise. We are cheap! We are cheap on purpose. Most Pentaho BI server installatoins will be small, departmental installations. Saiku is priced per user per year to keep the cost down whilst generating revenue to continue developing both versions. Harris said to me yesterday “but my users don’t need a cube designer or dashboards”, well thats fine, but it doesn’t mean they can’t use EE or just use CE and pay for licenses anyway. Purchasing EE licenses does also mean the users would then get product support, quicker bug fix releases etc, so it does have its benifits.

If they really don’t want to pay, just suggest they make a donation to the project, we happily accept contributions via Credit Card, Paypal, Bank transfer, or whatever, or ask them to offer some support to the community, just because your client chooses to use open source software, shouldn’t exclude them from contributing. Having spoken to someone who consults for the Belgian Government a few weeks back and they said they couldn’t sell licenses because “they have an open source only policy”, well thats nice for them, but it doesn’t mean they shouldn’t support the open source projects they use. Look at Mozilla, they have realised this and are now giving back.

As resellers you are also entitled to a 20% rebate on any completed deal, we’re not trying to take the money and run, we want to incentivise you guys to work with us to get Saiku into as many organisations as possible.

If every one of those 225,000 users gave 50 cents we clearly wouldn’t need EE!

To make it clear here are the list prices per user per year in USD, EUR and GBP

1-5   Users $300    €240    £180
6-10  Users $260    €205    £160
11-50 Users $230    €180    £140
51-99 Users $210    €165    £125
100 + Users    $180    €140    £110

Unlimited    $35,000    € 28,000.00    £22,000

Also, I’m not saying people have to give financially, if there were 50 regular committers, I could let the project tick over without spending hours every day maintaining the codebase because other people on the community would be there to help with the chores. If you make changes, push stuff back up stream, if you want to be a release manager, coder, ui guy, whatever let me know and we’ll find a role for you in the community, the code is easy to build and package we certainly make life as easy as possible for everyone.

I know most of you install Saiku then spend 5 minutes hiding the splash screen and removing the CE banners, please remember in doing so you are removing our link with the people who control the budgets and who are likely to have a say in sponsoring our project. So if you do remove those items, please offset it by having a chat with your client and discussing the prospect of contributing money or resource to the project, because as I said yesterday, the other option is JPivot or Analzyer, and that will make, you, me and your end users, very sad indeed I suspect.

Thanks for your time.


An Open Letter to Jeff Bezos

Dear Jeff

At Meteorite BI we are a small software vendor who love to write open source software and enable other businesses gain value from their data, much in the same way you like to empower your users with the amazing array of tools you offer in the Amazon Cloud.

We recently asked our users to fill in a questionnaire and nicely one of your engineers told us that you were using Saiku in a number of departments in your company, which is great to hear, I really hope they find it useful. What would be better though was if your engineers also took part in the community side of open source instead of just consuming it. I’m sure we aren’t the only developers whose software your company runs internally for business purposes, and part of the whole open source ecosystem is about building great products with an even greater community.

So Jeff, I understand your developers can’t help on every project they use but can you please send out a memo to your engineers who use open source software and remind them to take a few minutes in their day to help give back to the small projects like ours who rely on support from large businesses like yours to help us grow and prosper just like you have, fix a bug, implement a cool new feature, answer a mailing list question, update the docs or translate them into another language, it all helps!

Thanks a lot.


Accessing your beans in a ServerEndpoint class using Apache Karaf

Its been a while since I wrote a blog post and even longer since I wrote a technical (of sorts) one, but here we go. I’ve been wanting an excuse to play with Websockets for a while and I’ve finally got one, we’re designing a new product, and hey, why no use “new tech”.

We had already decided to use OSGI and so we chose Apache Karaf which is a great OSGI framework. I was chatting to the guys on the #apache-karaf IRC channel last week and said how I was planning to use CXF at which point I was told there was support in PAX-Web and Jetty so why not use that as it would probably involve less code. Great I thought. So this week I tried to get it working, I created a test bundle and started adding stuff to it.

Then it came to wiring it all up. So I needed to inject a bean into my endpoint to lookup some stuff, not an uncommon scenario, but it would appear it wasn’t really a scenario that was considered when the websocket spec was drawn up and they neglected to define scope status(or something like that). The rest of my bundle was wired up with Blueprint, so I enquired as to how you would do it because the websocket is annotated with @ServerEndpoint and you don’t define it anywhere, “Use PAX CDI” was the response. Okay then, how do I do that, so I wired it all up with the @Inject annotations and worked out how to bootstrap PAX CDI. But my bean was always null, so I figured CDI wasn’t working properly and I spent a long time trying to figure that out, to the point where I tested the same class as a servlet and that worked absolutely perfectly. Weird I thought.

Turns out, if you google for it, without EJB support you can’t really do it. Bummer. So how do you inject data into your Websocket endpoint? Surely its possible? Well sorta. In the end we came up with this plan. Karaf has excellent JNDI support, I’m used to using JNDI for data sources, but you can reference OSGI services in the same way. So why not publish a service as a JNDI named definition and get it that way?

So in your service you can add something like this to the constructor:

try {

Context context = new InitialContext();

context.bind("fmclient", this); }

catch (NamingException e)

{ e.printStackTrace();


This simple blob will register your bundle service as a JNDI resource. Finally in the webservice all we have to do is:

Context context = new InitialContext();

FileManagerClient myBean = (FileManagerClient) context.lookup("fmclient");

And this will lookup your JNDI service and allow you to access it from the ServerEndpoint. So after much hacking and head scratching, and some grumbling at anyone who would listen(I apologise for that), we have beans in websocket classes with minimal fuss. You can find the full code here.

Selling Open Source Software

We had a management meeting a week or so ago, the details of which I shan’t go into, but the crux of the matter as ever is, making money from open source software is hard especially when you work in a small company, luckily we run the consultancy as well which gives us a second revenue stream.

People have this conception that because the software is “free” there isn’t a requirement to pay for it, or they may want to but they’ll get round to it one day and in the mean time just continue asking the odd question on the forum, or sending a random support request to the clearly non support centric, info@ email address. (I know what its like, I’ve had a few of those bosses, or Freetards as one Open Core BI company likes to refer to them).

This leaves an open source development company in a bit of a conundrum, do they go for bulk and try and signup as many of these “little fish”  as possible to generate enough revenue to pay the developers or do they go after a few “big accounts”? The smaller companies are certainly easier to talk to, but usually they have little to no budget and they are using open source software because they think it will save them money in the long run (and they are probably right) and getting money out of them is like getting blood out of a stone. The larger companies certainly have the money, but the bureaucracy and white paper takes forever and by the time you get a contract signed, you’ve wasted 100’s of man hours on chasing them up. Or even worse, you waste those man hours then they decide to go else where.

Employing marketing and sales people would certainly help as they have a background selling things and would free up the developers to go develop. But you also need to be able to pay the marketing person, it’s a bit of a vicious circle.

We have a thriving community using Saiku, we know we have 1000s of daily users, we have people on the IRC channel 24/7 and people asking plenty of questions each day on the forums, and those open source fanatics are correct, money isn’t everything, but support from the community is and whilst some people do help out with answer questions and the like in 3 and a bit years of Saiku being on Github, we’ve have 91 code contributions from the community that’s just over 1 external contribution a fortnight.

There comes a time when you have to ask yourself, what’s the point? The point all those years ago was to remember how to program and to create a tool that wasn’t as rubbish as JPivot. Well we managed that and I feel all the better for it.

I can’t help but feel some irony with our project, we set out something to create something easy to use, and we achieved it. So when you talk to people about Saiku, and ask them why they don’t want to pay for support or sponsorship, their response is often “we don’t need it”.

So what next? There is nothing I would like more than to keep Saiku completely open, make a competitive salary doing full time Saiku development work with Paul and taking over the world of analytics with Saiku. But lets be realistic do we continue putting time and effort into a tool which people love to use, but don’t want to pay for? Do we change the licensing model? Get outside investment? Or do we release it to the community and go do something else that makes us a decent living?

At the end of the day, we all want to make a living, and there comes a point where you have to make a decision. Do you listen to your head, or your heart?

Saiku 2.5 released to the wild.

So after many months of us ignoring our mantra of release early and often, we have finally got around to packaging up and distributing Saiku 2.5, hurrah!

Many of you will already be using Saiku 2.5 because its been pretty stable for a while now, but for those of you who have yet to upgrade, please do, we have a very extensive feature list for Saiku 2.5.

New Features

  • Charts: new chart types (heatgrid, area, dot and others) , improved settings, upgrade to ccc2, improved result, basic chart export
  • Spark Lines / Spark Bars
  • Top count (and other limit functions), sorting, filter methods in drag and drop query model
  • Improved parent child hierarchy support (fixed major bugs, show children)
  • Repository improvements (search, sorting, permissions)
  • Direct export endpoints to xls, csv, json (including parameters – all with 1 http call)
  • Improved encoding (use caption, but works with name attribute now as well)
  • Improved i18n – fix chrome issues, more options, languages: Croatian, Hungarian, German and more
  • Experimental PDF export
  • Use last result for selections if possible (e.g. filter on 2003 and select all months, month dialog will show only months for 2003)
  • Mdx editor now uses ACE (run query with cmd + enter / ctrl + enter), prepare for auto completion / mdx syntax highlighting
  • Open / save dialog for queries
  • Performance tuning
  • Save chart / table options (persist viz)


  • Datasource processor (modify datasource definition before connection establishing, e.g. role access to datasource), connection processor (modify the existing connection object. e.g. programmatic role)


  • Improved mondrian sharing, plugin path conventions (drop into solutions/saiku/plugins and it will be picked up by UI, easily update plugin without loosing extra plugins)
  • Run saiku query in pentaho scheduler to generate xls

New website

On top of the new software release we have also released a brand new version of our website and it has moved house! As part of our on going marketing overhaul we have decided to integrate the Analytical Labs brand into our Meteorite BI group, this doesn’t mean the software is changing in any way, we just want to make it clearer to people that we offer more than just free software and that if they require support or help with Saiku on a commercial level then we are available to do that. Redirects will remain in place for the foreseeable future, but update your bookmarks to


We now offer a number of different Saiku related support packages. Whilst Saiku is open source and liberally licensed at that, we have to earn a living, and one way of doing this is by providing support resources to businesses who like to have a nice cushion to land on when things don’t go quite as they expect. We offer both UK hours support packages and 24/7/365 packages depending on the level required, these are detailed on our new website and should you want to discuss options feel free to email us at


Along with support contracts, we have also added a new sponsor page, free software is nice, but someone has to write it. We ask that if companies are using Saiku in a commercial environment but don’t require support that they consider a sponsorship option as a way of supporting us and continuing the Saiku development. Also if you think Saiku is missing some vital feature we also have a number of sponsorship options for feature requests, again if you want to talk to us about that mail us at


Lastly, as ever, we believe in fostering and nuturing a great Saiku related community whether it be for Saiku OLAP or Saiku Reporting. If you find a bug please report them to the issue trackers on Github(olap, reporting), and if you have any technical questions, please feel free to swing by our forum at

Thanks a lot for the support over the years and we hope that you all enjoy the great new features in Saiku Analytics 2.5!

Trout, out.

Processing MongoDB data using Pentaho Data Integration

So having read by Ross Lawley and having had a chat with Matt Asay when he was in London about the state of ETL tools when it comes to transforming random datasets for document stores, I thought I’d take a stab at doing a PDI version of the same blog post to save people who don’t know python having to do data transformation via a script (programmers love scripting, us data guys like dragging and dropping…).

Anyway this being a fresh box I thougt we’d step through it from the top so first you need to install MongoDB if you don’t have it. You can use one of the many repositories for Linux users or you can install the tarball. I like bleeding edge and I don’t like having to mess around with repositories and dependencies so I went with the tarball.

curl > mongodb.tgz
tar xvfz mongodb.tgz -C /opt/mongo
cd /opt/mongo/bin

Once the mongo server is up and running you can then test the shell with:


Once I had mongo running I then set to work on a couple of transformations, I wanted to prove the ability to get data into and out of MongoDB. Although I guess in the real world there would be far more data extraction and analysis I thought it would be good to demonstrate both.

So first up I got the same dataset Ross was using:


Then I set about processing it with Pentaho Data Integration.

As you can see from this screenshot:


The process was reasonably painless apart from the lack of real support for dynamic key pairs within PDI, I get the feeling this is possible using the Metdata step but I was having issues trying to map all the keys to fields so I gave up in the end and just selected a few. I also wanted to do this with no code, I nearly succeeded but my denormalization skills let me down slightly and I needed to clone some fields where the ID and coordinates were null, but I was close enough.

Once all that was done I then configured the MongoDB output step to create the same data structure as Ross:


As you can see I make use of some Array stuff with the coordinates.

I then hit run and waited a few seconds.

Thanks to Ross he had put some test queries on the website so I checked my data load with these queries, amazingly enough they worked just fine, including the geospatial stuff, so I must have got the structure correct :)

Finally I wanted to show data extraction, I thought about postcode lookups or something similar, but as the postoffice likes to sell this stuff for a lot of money, I decided to just extract my data and render it to a spreadsheet.


As you can see from this transformation there isn’t much to it it just replicates the pub name demo query but in ETL land:

     {"_id": "$name",
      "value": {"$sum": 1}
  {"$sort": {"value": -1}},
  {"$limit": 5}

You get the data using a MongoDB input step and then parse the data using the very easy to use JSON input step and then ETL to your hearts content.

Demo transformations are attached to this blog post.