Cassandra denormalisation

Here we remind the usual relational model took from Oracle database exemple:

create table dept(
deptno number(2,0),
dname varchar2(14),
loc varchar2(13),
constraint pk_dept primary key (deptno)
)

create table emp(
empno number(4,0),
ename varchar2(10),
job varchar2(9),
mgr number(4,0),
hiredate date,
sal number(7,2),
comm number(7,2),
deptno number(2,0),
constraint pk_emp primary key (empno),
constraint fk_deptno foreign key (deptno) references dept (deptno)
)

Since Cassandra does not support joins, we will simply add the dname column to the EMP cassandra table (no joins, so foreign keys…)

create table emp(
empno int primary key,
ename text,
job text,
mgr int,
hiredate date,
sal decimal,
comm decimal,
dname text
);

And the table is created !

desc emp

CREATE TABLE my_keyspace.emp (
empno int PRIMARY KEY,
comm decimal,
dname text,
ename text,
hiredate date,
job text,
mgr int,
sal decimal
) WITH bloom_filter_fp_chance = 0.01
AND caching = {‘keys’: ‘ALL’, ‘rows_per_partition’: ‘NONE’}
AND comment = ”
AND compaction = {‘class’: ‘org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy’, ‘max_threshold’: ’32’, ‘min_threshold’: ‘4’}
AND compression = {‘chunk_length_in_kb’: ’64’, ‘class’: ‘org.apache.cassandra.io.compress.LZ4Compressor’}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = ’99PERCENTILE’;

We are ready to insert data in the denormalized table.

But let’s try to insert the first row (one may ask, how do you insert a date in Cassandra, but it is simple, date functions are not mandatory, but can be used, we will see this later)

insert into emp (empno,ename,job,mgr,hiredate,sal,comm,dname) values (1,’BLAKE’,’MANAGER’,0,’2020-01-01′,1000,100,’DALLAS’);

It worked.

But, another question we may ask is that does Cassandra supports the “schema on read” concepts, or, simply said, can we insert the rows with different structures just like we inserted documents with different structures in the MongoDB collection:

MongoDB and employees & departments

cqlsh:my_keyspace> insert into emp (empno,ename,job,mgr,hiredate,sal,comm,dname) values (1,’BLAKE’,’MANAGER’,0,’2020-01-01′,1000,100,’DALLAS’,100);
InvalidRequest: Error from server: code=2200 [Invalid query] message=”Unmatched column names/values”

Hmm, ok maybe this is not working because we did not mention the new column name? Let’s try again:

cqlsh:my_keyspace> insert into emp (empno,ename,job,mgr,hiredate,sal,comm,dname,test) values (1,’BLAKE’,’MANAGER’,0,’2020-01-01′,1000,100,’DALLAS’,100);
Traceback (most recent call last):
File “C:\Users\dragu\Downloads\apache-cassandra-3.11.8-bin\apache-cassandra-3.11.8\bin\cqlsh.py”, line 850, in get_input_line
self.lastcmd = raw_input(prompt).decode(self.encoding)
File “c:\python27\lib\encodings\utf_8.py”, line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xef in position 109: invalid continuation byte

Still nothing. And if we try without column names? Maybe Cassandra will “understand” us and simply add the new column?

insert into emp values (1,’BLAKE’,’MANAGER’,0,’2020-01-01′,1000,100,’DALLAS’,100);
Traceback (most recent call last):
File “C:\Users\dragu\Downloads\apache-cassandra-3.11.8-bin\apache-cassandra-3.11.8\bin\cqlsh.py”, line 850, in get_input_line
self.lastcmd = raw_input(prompt).decode(self.encoding)
File “c:\python27\lib\encodings\utf_8.py”, line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xef in position 58: invalid continuation byte

So no, obviously, it is not working, Cassandra needs the predefined table structure.

Of course, if later, we want to change the department’s name, then we will to update all the rows of the emp table, which is not necessary with the relational model, because if one department in relational model changes the name, only one row in the DEPT table will be updated, and all the rows from the EMP table can be joined to it with the foreign key.

Practical Oracle,SQL Server, other databases and Big Data & AI

A little about everything, Databases, Big Data, Data science, from very simple to more complex topics..

Cassandra denormalisation

Leave a comment Cancel reply

Partager:

Related

Leave a comment Cancel reply