keep calm open source

Why we decided to open source our technology

Squid Solutions has launched Bouquet, the Open-Source Analytics Platform.

Based in Paris and San Francisco, Squid helps publishers grow usage on their digital platforms thanks to advanced analytics. We recently packaged ten years of experience in analytics into a new technology: Bouquet. Here’s why we chose to open source it.

Big Data Is Open Source

Let’s face it: the big data tech stack is open source.

It started with the hardware. As data accumulated with the economy going online, companies no longer paid inflated prices for proprietary systems. Commodity hardware was virtualized and put to work in parallel, delivering huge computing power at a fraction of the cost.

Moving up the tech stack, major Internet companies developed a new file system, Hadoop, to work on that distributed environment. The demand of the market, and perhaps the need to train and recruit capable engineers, led them to open source it. Companies could now store and process massive amounts of information at a reasonable cost.

Fast-forward a few years to where we are now. Hadoop became the industry standard to support big data applications. Companies want to analyze the information in their Hadoop cluster. New querying frameworks appeared to support SQL, the language of analytics: Hive, Tez, Drill, Tajo, Phoenix, Presto, Impala, Spark — all open source.

So we are now at a point where users can query an extremely powerful stack of open-source technology using their beloved SQL. What’s next? An open-source SQL querying engine of course! Ta-da! Here comes Bouquet.

Lots of open-source front ends . . .

Bouquet links open-source querying frameworks with a bunch of data visualization frameworks. We use D3js a lot and are looking into Vega. Plotly.js just recently announced they were going open. Other, more specialized open-source libraries include Leaflet for displaying interactive maps, or Dygraph for complex time series.

. . . but only a few open-source back ends

How do you connect these beautiful visualizations to big data? How do you make them interact with complex data schemas? There’s a gap here, right now, for open-source analytics back-end engines that scale for big data analytics.

Let’s take a look at two of those engines as an example.

       1. One is Mondrian. It is Pentaho’s OLAP server initially released in 2002. It’s the engine behind traditional business intelligence as we knew it before Hadoop. It faces the traditional limits of the OLAP cube, which is that it does not scale. Maybe it is a sign of the times to note that Pentaho and Jaspersoft, the two flagship companies in open-source BI, were recently acquired (2014–2015).

       2. Another open-source engine is ElasticSearch. It’s not a relational engine: rather, it uses a completely different technology based on indexing. It has been very well received by the dev community, but you need to move data into their server — which is exactly what you’re trying to avoid if you’re running massive Hadoop clusters.

So after a couple of years of being eclipsed by MapReduce, SQL is back big time. There is a huge need for a highly scalable open-source relational engine. That’s where Bouquet comes into play.

The Value Is in the Apps

When I think of big data, I think of very cool applications like search, maps, social media, recommendation engines. The value is in the applications — not in the underlying technology.

That’s another great reason for open source. It’s what developers need to build their applications. They need transparency and the ability to hack their way to what they want to do. They need to reuse components that were written by other developers to go fast.

Developers need standards to avoid reading the f*****g manual. Only open source can provide that in a way that is driven by peer adoption and the market.

Squid Solutions and the team working on Bouquet are fully committed to this vision. We’re a founding member of the open-data platform initiative around OPDi and look forward to committing Bouquet to that initiative.

Bouquet: Powerful SQL

SQL is like a reborn technology. It’s been acknowledged by the big data community as a great way to interact with data for analytics. For one thing, there are a bunch of engineers highly trained in SQL and it’s great to use their skills. Also, there are a bunch of SQL tools out there and it made sense to provide some kind of compatibility. Last but not least, SQL is built on relational algebra, which is a really efficient way to do analytics.

Bouquet is a dynamic query builder that automatically generates optimized SQL code. Optimized means it’s designed to generate queries that run parallel in distributed computing systems to leverage the performance of a horizontally scalable architecture.

You can also call that “in-database analytics” because it runs inside the database, instead of running a massive Select * and then computing in memory. If you need speed, Bouquet already handles a scalable cache using Redis and will soon support Spark.

You got it: Bouquet replaces the good old OLAP cube that huffs and puffs on massive Hadoop systems. It provides similar slicing and dicing, but for big data.

How is that possible when dealing with millions of products, articles, clients? By using search and facets! So guess what: we embedded ElasticSearch in Bouquet to provide a very powerful way to browse through huge amounts of metadata, find what you’re looking for, and run the analytics in SQL.

To sum up: Bouquet uses formal mathematics and the rules of relational algebra to dynamically generate queries that run fast and that scale, opening up the possibility for end users to interact with massive amounts of data.

Analytics for Everyone

What’s all this for? Our goal is to put data in everyone’s hands within the company. We agree with the folks who say that they don’t have a business intelligence team in the company. Everybody in a company should be using analytics: production, sales, product management, logistics, customer support, designers, developers, customers and partners, even the CFO and top management.

But these people have jobs to do. They don’t have time to adapt to new analytics tools. Analytics need to adapt to them: they must be provided in a form and shape that supports the end users’ train of thought.

And who can do that? Developers building apps for them. At the end of the day, Bouquet’s role is to give developers the flexibility to build their own apps for their own users. Bouquet provides an early stage SDK with components to build analytics apps. It will grow and we hope the community will contribute to that. Eventually, enough pieces will be out there to make the design of a new app fast and affordable enough for all the desperate end users out there to get their data.

How does that sound?

Try Bouquet