Online vibration analysis #Post4
Dear Openadaptronik blog readers,
I already have written couple of posts on the topic “Defining infrastructure of a platform for mechanical vibration analysis”. On my previous post I mentioned that alongside requirement engineering, I am looking through the options for the right database. This post is all about that. It is going to be a bit different than the previous ones. This post will be more detailing on the technical side to make a choice for database for our project’s purpose.
NoSQL versus Relational Database (for sensor data)
The central concept in a relational database is the relation between multiple tables each with their own schema. That makes it possible to relate data of completely different structure with each other. The problem is, when reading very “normalized” data (which means a lot of de-normalization of data schemas), a lot of joins are necessary to assemble the required data.
In the case of sensor data, at least two tables are necessary, one than holds the information about the sensor recording itself (metadata) and the other table containing the recorded data itself, which depending on how long the sensor was recording can be a lot of data. Depending on the sensor, the data could have a very different structure, so either a lot of tables would then be necessary (one for each sensor type), or a very complicated database design would be necessary to model such a “dynamic schema”.
In NoSQL Databases we are not limited by a fixed schema, and also not by needing relations between tables to represent a relation between different kinds of data. In these databases we can model relations between data by having a data hierarchy, or in NoSQL jargon subdocuments. Here is a main conceptual difference between between the two, one is table based the other is document based. The best part, a document could be created without having a predefined structure. The fields in that document could be added later as you need’em. However, to have a table you need a pre-defined structure.
That means, we can combine different kinds of data from a variety of sensors in a single table, or like it’s called in NoSQL: collection. This kind of schema-less storage of sensor data makes it also very easy to later add new sensor types easily without having to change the structure of the database. The expected growth is rapid and the database could be not having any clear schema definition which makes it very flexible to work with.
Retrieving or querying data of a specific sensor recording is also very easy, because we do not have to look in a different table for the data. instead we can simply look for the data in the subdocuments. That makes a Non-relational database so beneficial for our case. There could be billions of data points from one experiment measurement, and therefore trying to perform join operation is going to costly.
Scalability is another factor that comes to play when we talk about sensor data storage. Chances are, eventually the data size will be huge based on how long it has been measured. In case of a non-relational database such a MongoDb it is possible to scale horizontally. On the other hand, relational databases scale vertically. Here is a very well depicted graphical explanation on vertical vs horizontal scaling.
In conclusion, it seems from every aspect, in our case, non-relational database is the optimal choice. There are plenty of them, but call it a easy sort out or a personal preference I will go forward with MongoDB.
If you have any question or suggestions or recommendation, please feel free to make your voice heard in the comment section.
Till my next post!