Selling Open Source Software

We had a management meeting a week or so ago, the details of which I shan’t go into, but the crux of the matter as ever is, making money from open source software is hard especially when you work in a small company, luckily we run the consultancy as well which gives us a second revenue stream.

People have this conception that because the software is “free” there isn’t a requirement to pay for it, or they may want to but they’ll get round to it one day and in the mean time just continue asking the odd question on the forum, or sending a random support request to the clearly non support centric, info@ email address. (I know what its like, I’ve had a few of those bosses, or Freetards as one Open Core BI company likes to refer to them).

This leaves an open source development company in a bit of a conundrum, do they go for bulk and try and signup as many of these “little fish”  as possible to generate enough revenue to pay the developers or do they go after a few “big accounts”? The smaller companies are certainly easier to talk to, but usually they have little to no budget and they are using open source software because they think it will save them money in the long run (and they are probably right) and getting money out of them is like getting blood out of a stone. The larger companies certainly have the money, but the bureaucracy and white paper takes forever and by the time you get a contract signed, you’ve wasted 100’s of man hours on chasing them up. Or even worse, you waste those man hours then they decide to go else where.

Employing marketing and sales people would certainly help as they have a background selling things and would free up the developers to go develop. But you also need to be able to pay the marketing person, it’s a bit of a vicious circle.

We have a thriving community using Saiku, we know we have 1000s of daily users, we have people on the IRC channel 24/7 and people asking plenty of questions each day on the forums, and those open source fanatics are correct, money isn’t everything, but support from the community is and whilst some people do help out with answer questions and the like in 3 and a bit years of Saiku being on Github, we’ve have 91 code contributions from the community that’s just over 1 external contribution a fortnight.

There comes a time when you have to ask yourself, what’s the point? The point all those years ago was to remember how to program and to create a tool that wasn’t as rubbish as JPivot. Well we managed that and I feel all the better for it.

I can’t help but feel some irony with our project, we set out something to create something easy to use, and we achieved it. So when you talk to people about Saiku, and ask them why they don’t want to pay for support or sponsorship, their response is often “we don’t need it”.

So what next? There is nothing I would like more than to keep Saiku completely open, make a competitive salary doing full time Saiku development work with Paul and taking over the world of analytics with Saiku. But lets be realistic do we continue putting time and effort into a tool which people love to use, but don’t want to pay for? Do we change the licensing model? Get outside investment? Or do we release it to the community and go do something else that makes us a decent living?

At the end of the day, we all want to make a living, and there comes a point where you have to make a decision. Do you listen to your head, or your heart?

Saiku 2.5 released to the wild.

So after many months of us ignoring our mantra of release early and often, we have finally got around to packaging up and distributing Saiku 2.5, hurrah!

Many of you will already be using Saiku 2.5 because its been pretty stable for a while now, but for those of you who have yet to upgrade, please do, we have a very extensive feature list for Saiku 2.5.

New Features

  • Charts: new chart types (heatgrid, area, dot and others) , improved settings, upgrade to ccc2, improved result, basic chart export
  • Spark Lines / Spark Bars
  • Top count (and other limit functions), sorting, filter methods in drag and drop query model
  • Improved parent child hierarchy support (fixed major bugs, show children)
  • Repository improvements (search, sorting, permissions)
  • Direct export endpoints to xls, csv, json (including parameters – all with 1 http call)
  • Improved encoding (use caption, but works with name attribute now as well)
  • Improved i18n – fix chrome issues, more options, languages: Croatian, Hungarian, German and more
  • Experimental PDF export
  • Use last result for selections if possible (e.g. filter on 2003 and select all months, month dialog will show only months for 2003)
  • Mdx editor now uses ACE (run query with cmd + enter / ctrl + enter), prepare for auto completion / mdx syntax highlighting
  • Open / save dialog for queries
  • Performance tuning
  • Save chart / table options (persist viz)

Standalone:

  • Datasource processor (modify datasource definition before connection establishing, e.g. role access to datasource), connection processor (modify the existing connection object. e.g. programmatic role)

Plugin:

  • Improved mondrian sharing, plugin path conventions (drop into solutions/saiku/plugins and it will be picked up by UI, easily update plugin without loosing extra plugins)
  • Run saiku query in pentaho scheduler to generate xls

New website

On top of the new software release we have also released a brand new version of our website and it has moved house! As part of our on going marketing overhaul we have decided to integrate the Analytical Labs brand into our Meteorite BI group, this doesn’t mean the software is changing in any way, we just want to make it clearer to people that we offer more than just free software and that if they require support or help with Saiku on a commercial level then we are available to do that. Redirects will remain in place for the foreseeable future, but update your bookmarks to http://saiku.meteorite.bi

Support

We now offer a number of different Saiku related support packages. Whilst Saiku is open source and liberally licensed at that, we have to earn a living, and one way of doing this is by providing support resources to businesses who like to have a nice cushion to land on when things don’t go quite as they expect. We offer both UK hours support packages and 24/7/365 packages depending on the level required, these are detailed on our new website and should you want to discuss options feel free to email us at info@meteorite.bi.

Sponsor

Along with support contracts, we have also added a new sponsor page, free software is nice, but someone has to write it. We ask that if companies are using Saiku in a commercial environment but don’t require support that they consider a sponsorship option as a way of supporting us and continuing the Saiku development. Also if you think Saiku is missing some vital feature we also have a number of sponsorship options for feature requests, again if you want to talk to us about that mail us at info@meteorite.bi.

Help

Lastly, as ever, we believe in fostering and nuturing a great Saiku related community whether it be for Saiku OLAP or Saiku Reporting. If you find a bug please report them to the issue trackers on Github(olap, reporting), and if you have any technical questions, please feel free to swing by our forum at http://ask.analytical-labs.com

Thanks a lot for the support over the years and we hope that you all enjoy the great new features in Saiku Analytics 2.5!

Trout, out.

Processing MongoDB data using Pentaho Data Integration

So having read http://blog.mongodb.org/post/56876800071/the-most-popular-pub-names by Ross Lawley and having had a chat with Matt Asay when he was in London about the state of ETL tools when it comes to transforming random datasets for document stores, I thought I’d take a stab at doing a PDI version of the same blog post to save people who don’t know python having to do data transformation via a script (programmers love scripting, us data guys like dragging and dropping…).

Anyway this being a fresh box I thougt we’d step through it from the top so first you need to install MongoDB if you don’t have it. You can use one of the many repositories for Linux users or you can install the tarball. I like bleeding edge and I don’t like having to mess around with repositories and dependencies so I went with the tarball.

curl http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.4.5.tgz > mongodb.tgz
tar xvfz mongodb.tgz -C /opt/mongo
cd /opt/mongo/bin
./mongod

Once the mongo server is up and running you can then test the shell with:

./mongo

Once I had mongo running I then set to work on a couple of transformations, I wanted to prove the ability to get data into and out of MongoDB. Although I guess in the real world there would be far more data extraction and analysis I thought it would be good to demonstrate both.

So first up I got the same dataset Ross was using:

wget http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]

Then I set about processing it with Pentaho Data Integration.

As you can see from this screenshot:

input

The process was reasonably painless apart from the lack of real support for dynamic key pairs within PDI, I get the feeling this is possible using the Metdata step but I was having issues trying to map all the keys to fields so I gave up in the end and just selected a few. I also wanted to do this with no code, I nearly succeeded but my denormalization skills let me down slightly and I needed to clone some fields where the ID and coordinates were null, but I was close enough.

Once all that was done I then configured the MongoDB output step to create the same data structure as Ross:

output

As you can see I make use of some Array stuff with the coordinates.

I then hit run and waited a few seconds.

Thanks to Ross he had put some test queries on the website so I checked my data load with these queries, amazingly enough they worked just fine, including the geospatial stuff, so I must have got the structure correct :)

Finally I wanted to show data extraction, I thought about postcode lookups or something similar, but as the postoffice likes to sell this stuff for a lot of money, I decided to just extract my data and render it to a spreadsheet.

reporting

As you can see from this transformation there isn’t much to it it just replicates the pub name demo query but in ETL land:

db.pubs.aggregate([
  {"$group":
     {"_id": "$name",
      "value": {"$sum": 1}
     }
  },
  {"$sort": {"value": -1}},
  {"$limit": 5}
]);

You get the data using a MongoDB input step and then parse the data using the very easy to use JSON input step and then ETL to your hearts content.

Demo transformations are attached to this blog post.

 

Installing CTools on Pentaho BI Server hosted in Meteorite Cloud

Until we get time to integrate the Ctools installer into Meteorite Cloud so people can pick various options at build time, there are a couple of ways to use Ctools products on Meteorite Cloud.

Option 1: Pentaho Market place

Pentaho Marketplace is a relatively new addition to the Pentaho stack and allows people to install plugins from the marketplace repository by clicking on a few buttons.

marketplace

But, like many, you may want to run the development versions of these products to get bleeding edge features or customisations applied. To do so you need option 2.

Option 2: Ctools installer

To use the Ctools installer, it isn’t much different to usual apart from some minor changes I shall describe below.

First of all you need to spin up your Meteorite Cloud Pentaho server:

Then once it has build you need to checkout the repository for example:

git clone git@mydomain.meteoritecloud.com:pentaho-solutions

The first checkout will be a sizeable 180mb so you may have to wait a while, but going forward the checkouts will be much faster as you will just be committing the changes back to the repository.

Next we need to get a copy of the Ctools installer this is done by cloning the git repository for it:

git clone https://github.com/pmalves/ctools-installer.git

Now we can install the Ctools components to our Pentaho solutions directory:

./ctools-installer.sh -s /home/user/Projects/pentaho-solutions/pentaho-solutions -b dev -y

This tells the ctools-installer script to install the packages into our freshly cloned pentaho solutions repository, using the development branch and for speed, accept all prompts as Yes. It should then run. CDC is currently not supported as it wants access to the Pentaho webapp which we currently do not provide (this is in the pipeline).

Once the packages are installed you can then move back to the pentaho-solutions directory. If you run

git status

you should then see lots of new and modified files, this confirms that installation was a success. Next we need to commit these files back to the server to do this run:

git add *
git commit -a -m “new ctools installation”
git push

Again because of the amount of changes the installer has made the upload may take a little while. Grab a cuppa whilst you wait.
Finally we need to restart the Pentaho BI server this is done in the Meteorite Cloud node control panel:

https://mydomainmeteoritecloud.com/controlpanel/m/#info

Click on Restart BI server.

restartserver

Give it a minute or two to restart and then log back into the Pentaho BI Server.
Once you are back in you should now see something like the following:

CtoolsOnPUC

Ctools components are now installed for ultimate Pentaho BI server hackability.

My experiences with Play Framework V2

As I have a few minutes to spare I thought I would write a piece about my experiences with the Play Framework version 2.

For those of you who don’t know what the Play Framework is, where have you been? Originally based on the RoR paradigm, Play Framework offers a flexible REST based web development environment for Java and Scala.

The good:

  • Scala(or Java)
  • Flexible data persistence
  • Hot reloading
  • Plugins
  • Dependency management(SBT)
  • Debug mode
  • Scala console
  • Akka integration
  • Memcache based caching layer

The bad:

  • Clunky templating
  • No easy way of overriding the routing(I’ll explain later)
  • Deployment to production

The ugly:

  • SBT

So there is a brief list of things that spring to mind, I think I explain by working backwards through them.

SBT, where to start with SBT? Its an acronym for Simple Build Tool, which is a blatant lie. If you want to understand SBT configuration files, you need a PHD. For example syntax like += and := are hardly the easiest to decipher for someone new to the framework. I appreciate its going for minimalism unlike the ANT and Maven rather verbose file formats, but at the same time the learning curve for SBT is quite a bit higher, give me Maven any day of the week.

Deployment to production: this I can’t really blame on Play Framework since its ahead of its time, but it does make life harder. Play Framework 1.x used to be easy to create a WAR file and deploy it to your favourite servlet container, 2.0 makes use of non blocking IO and websockets for some communication and the old servlet container spec to 3.0 does not support these standards. There is a WAR file plugin that will generate a WAR for you, but if some stuff doesn’t work, whats the point? Similarly, Apache HTTPD does not support websockets, so if you are wanting to use a proxy to deploy your application, you are forced to use Nginx or similar.

Clunky templating: Play framework version 1.x used to make use of Groovy when creating its templates, you passed in variables and the templating engine could turn these variables into HTML blobs at render time. It worked quite well, but it wasn’t typesafe. Developer X would change backend code Y, but not update template Z and boom, you deployed to production only to find it not working. In Play Framework version 2.x they have overcome this and other issues by using Scala based templates, backing classes are generated at compile time to pass in the require objects and so typesafety is ensured. But, its a pain in the ass, getting the server happy with the imports, the parameters declared at the top of the template etc is a never ending cycle of trial and error, and one day it will work, another day not. I know a lot of this is to do with it being in dev mode and trying to construct these classes on the fly, but its a pain. Also you make one small change to any part of the View and the class needs recompiling and reloading which is time consuming.

So with the clunky templating in mind, I moved my UI to AngularJS which was fantastic but Play Frameworks routing got in the way. Again, not specifically a Play Framework issue but something that is tedious no the less. I wanted to ship my HTML and Javascript code in the same bundle to make development, testing and deployment easier, I also wanted to make use of AngularJS’s HTML5 mode which gets rid of the hashbang and makes the url look nice. But annoyingly of course whenever your route went to a non hashbang tag, Play took that as a requirement to render something and its own routing mechanism took over and tried to find the required route, and failed. It would be cool to tell it to ignore a specific routing subset entirely so you could just plumb in Angular or another framework of your choice.

So with those gripes out of the way whats good about Play Framework version 2.x?

Memcache caching, out of the box, with no extra effort, allows you to utilise a proper caching mechanism for your application and on top of that, because of how memcache distributes, if you use a clustered setup, you can easily add another node and everyone have a copy of the same cache.

Akka integration: I had issues getting my head around Akka at first, so much so I went and bought a book about it. But Akka is great, it allows you to create distributed job management clusters with minimal effort. Once you have an understanding of the code and how to pass the objects around, creating complex Akka jobs to manage nodes and clusters became pretty trivial. The only downside being the out of the box scheduler didn’t allow Cron style schedules for jobs so I found a 3rd party library to allow me to do that using the Quartz scheduler.

For those of us new to Scala the console mode is very useful to test out syntax and Scala functions, and its quick and easy to get to with Play:

play debug

console

And you are in.

Debug mode: rather than remembering those CATALINA_OUT options to get the JVM listening for a debugger, play debug, and all you have to do is connect your favourite IDE and debug away.

Dependency management, of course whilst SBT is annoying, having a dependency management framework setup for you and running out of the box on a new project is always nice.

Plugins, a mainstay of Play Framework version 1 and not gone away in version 2, plugins provide a great way to get additional functionality with minimal effort, add a dependency to SBT and install the plugin, as simple as that, SecureSocial is a great authentication tool.

Hot reloading, gone are the days of you having to reload your JVM everytime you change a line or two of code. Make your changes and refresh the browser, a few seconds later your changes have appeared. Of course with AngularJS on the UI, any HTML or Javascript changes were instantaneous.

In version one when it came to data persistance you really only had once answer: JDBC and SQL data stores. In version 2 they have really gone for ultimate flexibility, and it has really paid off I chose to use Squeryl as my JDBC driver, but that is by no means the (A)norm (sorry terrible joke), anyway there is nothing stopping you using a NOSQL data store and I would be sorely tempted to try out MongoDB on my next project as a Play Framework backing store as its document storage engine, lends itself to just this sort of scenario.

Now my final Pro, Scala. When I started using Play Framework 2.x I was a Java man, whilst I have used other languages over the years, my predominant language is Java. But not to shy away from a challenge and acknowledging that Play whilst it supports Java is mostly written in Scala, I decided to completely rewrite Meteorite Cloud in Scala to conform, and that is by far the best decision I have made on this project. The amount of lines of code I managed to throw away by Scala’s concise nature and syntax was amazing. Whilst I still use Java on a daily basis, the fact Scala is so easily used along side Java classes will make me really consider it every time I start a new project.

So there we go, my overview of Play Framework 2.x, feel free to leave a comment or two if you agree, disagree or think I’ve missed something.

Trout, out.

Pentaho release an Adaptive Big Data Layer for PDI

Cross posted from my blog at http://meteoriteconsulting.com/blog to prove that I don’t just publish negative Pentaho stuff ;)

Today Pentaho have released their Adaptive Big Data Layer for Pentaho Data Integration. This EE plugin allows you to to abstract your ETL and data processing workflow from your NOSQL or Hadoop data source, with the ever changing line up of Big Data sources you may be wary of picking a certain variant fearing lockin to the product and inability to switch because of other systems reliance on it. Adaptive Big Data from Pentaho Corp is designed to alleviate these issues making it easier to switch from one Hadoop vendor to another minimising the effort required to alter your ETL to its nuances.Whilst not quite Data Virtualization, this is along a similar line, allowing users to abstract their data somewhat from the differing sources.

Is this game changing like their Twitter feed seemed to suggest? I dont know. But if you are a big data user, easier integration and deployment must certainly be of interest when it comes to connectivity with your reporting software.

At the same time Pentaho announced Pentaho Labs, which is designed to be a hotbed of data visualization and interactivity and take the Pentaho stack marching further into the 21st century. This will cover many aspects of data analysis, including visualisation, real time data analysis, predictive analysis and more. Whilst I think their Analyzer interface is clunky, the visual features that Analyzer gives users access to are certainly very impressive and hopefully the Pentaho Labs think tank will help drive development in this sector.

– Tom Barber

(Meteorite.bi Technical Director)

Why Pentaho need to trim the crud or catch up (quickly)

There is no disguising the fact that Pentaho Corp is trying to get bought(or as pointed out, to IPO), the marketing department jumps on every bandwagon that passes by and new features keep appearing, then go nowhere when the next new thing comes along and development effort needs shifting.

First of all I’ll start with Pentaho Data Integration. The product is fantastic, I’d use it over Talend every single day of the year, its ease of use and pluggability is amazing, and its run by Matt Casters, who knows pretty much everything there is to know about ETL stuff. If Pentaho was just an ETL product, it would easily be up there with the best in class, but it isn’t.

Along with PDI, there is the BI server, used by 1000’s of companies throughout the world to visualize their data. The BI server, whilst infinitely flexible, has many issues. Its bloated, after years of tacking stuff on to extend the functionality but not break the old stuff, there is a lot of crud. Its UI is dated and inflexible, when you look at the competitors out there (Yellowfin for example), the UI looks like it was built in the 90’s, and to make matters worse, its written in GWT which makes it pretty hard to customize without ripping the backend to pieces. More annoying than all is the design makes upgrades drawn out and hard work(this may be changing in V5).

I went to the Pentaho London User Group last night, there were 4 presentations, 2 about the BI server side of things, and not once did we see a GWT interface anywhere. There has to be a reason for this? Oh yeah, its hard to customize and clunky. Both presentations relied heavily on CTools to provide an alternative, cleaner interface.

So why does Pentaho Corp not take a step back and ask themselves where they want to excel, this will aid the sale of the company because it will provide a clear direction in which they want to go and help target potential buyers.

I understand their desire to keep both parts of the stack running, but why all the extra crud, if you take a look at Yellowfin, their updates are easy to install, the UI is clean, responsive and easy to navigate, customisations are easy to apply and maintain, why does it take a BI guy hours to update (CBF excluded)?

Pentaho recently bought Webdetails for an undisclosed sum, this is fantastic news for Pedro Alves et al at Webdetails, and they have built a fantastic stack, but the tools they have built have been deliberately targeted at the BI developer, not Joe User.

The rate at which companies like Tableau have expanded in the market prove there is a market desire for analytics, but end users want stuff that is easy to install, configure and maintain, and doesn’t take a PHD to run.

I can’t help but feel that the rate at which some of the other BI companies have expanded in overseas markets and gained such a strong foothold that unless Pentaho do something to change the same old same old, they will be rapidly left behind in the scramble to be bought by one of the larger companies which they so desperately crave. Either make your software best in class, or ditch it, don’t leave it stagnating in a pool of malaise.