MongoDB is a key value database

Relational versus NoSQL

In addition to the aspect of their distribution, which was considered in the 2nd part of this series, another essential distinguishing feature of NoSQL databases and relational databases is their data model. In the relational world one always moves in a fixed table structure, the data contents of which can be used by means of a query language (SQL = Structured Query Language). Empty fields are not empty, but have the value "zero".

An SQL query for the example table could be something like this: "Give me all the data records in the table where the first name = Max and the last name = Mustermann." The result would be: "Max; Mustermann; 35; m; Musterhausen" . In order to be able to evaluate or further process the extracted data, one only needs to know the technical meaning of the individual values. It is important that in the relational world attempts are made to avoid data redundancies whenever possible. Ideally, information is only stored once in a table. Relationships between data in different tables are then established through table relationships. This procedure is also known as "normalization".

NoSQL databases (Not Only SQL), however, manage their data differently. The best known concepts are:

  • Key value
  • Column-oriented (column-based)
  • Document-oriented
  • graph

Key-Value and Document-oriented

Of the four technologies mentioned, the concepts of key-value and document-oriented databases come closest to each other. Both work with a key to identify associated data (values). The difference between the two approaches is essentially how these key-value pairs are technically managed.

At Document-oriented technologies the data are in a so-called document. In a very simplified way, the key can be imagined as a guide to a box of information (including the document). If you have the signpost and follow it, you can quickly get to the said data box. The best-known representative of this genus is MongoDB, whose data is made accessible in json format.
So there is no table structure in the relational sense that can simply be queried with SQL. A structure is created exclusively via the key, and so the query path ideally always leads from the key to the data record. It is best if you are never forced to search the content of a data set in order to find certain information. In our example this would be a search for e.g. B. John Doe without knowing that he is behind the key "00001". That would mean, if you were lucky, opening all the “boxes” and searching through their contents. With many millions of boxes of data, this can take a long time, even with a powerful computer. There is a case for this problem too Key value databases the possibility of creating an index - i.e. an additional reference to certain information - but, as with all other databases, it does not make sense here either to permanently and completely index a system with all its data, since this has a negative impact on performance and computing power and memory goes.

In contrast to the relational table, the values ​​of an assigned key can differ structurally, which is not possible in the relational world, since the data / table structure in the relational system is always the same and gaps are filled with the zero value. At first glance, this aspect does not seem particularly exciting, but with large amounts of data it can become relevant for the underlying infrastructure (costs and performance).

One advantage of key value stores is that they are generally well suited to being installed behind an application for a PoC with little effort. The fact that developers z. If, for example, you can determine the structure of a MongoDB .json yourself as required, this form of data management impresses with its great flexibility in the development and production of solutions. The disadvantage - as with a drug - usually only shows up later when you are addicted. Because a key value PoC can often convince because it can be set up quickly and flexibly, virtually without DB expertise. But that is exactly what can be a problem, because there is no natural “Last Line of Defense” between the data model and the development. If you only screw the data structures in the backend long enough through application development, you are guaranteed to screw up the system at some point. You could imagine it this way: Every day additional information is added to the .json structure (in our example: house number, postcode, etc.). Values ​​generated in the past are not automatically updated. Ergo: The data model diverges over time. And since the requirements for a solution are usually constantly changing, the result is very likely that there will be no major key-value installation whose data model has not become almost unusable in at least one place. Therefore: Just because something is quick and easy to manufacture does not make sense in the medium to long term.

Column-oriented databases

Column-based systems form an alternative to the key-value properties just described. Column-oriented means that, at first glance, data is stored in a very similar way to a key-value store. Technically, however, they are managed by column and / or partition on disk or InMemory. The aim is to optimize write and / or read access. A column-oriented database usually works with a unique key, which in turn can be made up of a combination of different values. For example, it is possible to calculate the identification key of a data record from the raw data yourself.

Systems of this type are excellent when it comes to managing consistent data. Once implemented correctly, they are hard to beat when it comes to speed with large volumes. However, the modeling of the data model depends on the desired query. Changes in the data structure due to modified queries are exhausting on the technical level, since a data table is usually modeled for each technical query. Incidentally, this paradigm is the opposite of relational “normalization”, since identical data records are deliberately stored redundantly. In contrast to the key value databases, such a system protects its data model by design. For this reason z. B. Cassandra also often a PoC against MongoDB and is less popular with typical developers. Precisely because the initial implementation is more complex and you have to deal more intensively with the technology and the data model.

A hasty decision for or against a certain database can only take revenge after 12 to 18 months in regular operation. B. Problems or challenges of the key value stores arise because nobody is interested in the data model. Again, ignorance does not protect against punishment. The more convenient way after the PoC has often turned out to be the worse one, as not all facets of the technology have been adequately assessed and understood.

Graph Database

Graph databases offer a completely different approach to storing and evaluating data.

They are often used to visualize relationships between data. That is why three-dimensional display models are often used to visualize results. Such databases can be used primarily to analyze the relationship between entities and, for example, to identify patterns. In our simple example, we add a few properties to the information from the original table so that the meaning of such a database can be better represented. Graph databases often use other database systems as a database for corresponding evaluations or displays. An application e.g. B. in the classic ERP area is rather uncommon because these technologies are tools that essentially focus on analytical applications.

In the next article in this series, we will look at hybrid distribution concepts. The focus of consideration is then in particular on the systems known as NewSQL.