From database optimisation to data science – (part 3,Cassandra exemple)

In this serie of articles I will try to explain some tuning technics and the problems that we are still facing today, but with the idea to propose a new way of thinking, approaching to data scientist’s way.

If you read the part 1 and part 2 , there are showing Oracle, SQL Server and MongoDB, this is a same exemple, with Cassandra.

Again the same Library exemple, that is selling DVDs and products.

The figures are: 10 000 articles in total

50% of books

50% of DVDs

50% of products in English langage

50% of products in  French langage

So what is the fraction of rows when the language is french and the product is a DVD?

And what is the fraction of rows when the language is english and the product is a DVD?

Without going any further, it seems that Cassandra’s estimation is the same in both cases, for the big and for the small fraction.

It seems less good even than MongoDB, however, by default we Cassandra is dividing the workload into ranges, and there is more to be investigated about how it works exactly.

Leave a comment