Structured Logging in a relational database - postgresql

How to design the relational database layout to capture structured logging?
Usecase 1
The output of sensors should be logged. Data: temperature and sensor-id.
Usecase 2
The duration of web requests should be logged. One entry for every request. Data: URL and duration
Common data
The two usecases are just examples. There could be much more. Each log entry should have a timestamp and a source-host column
Relational
Please don't tell my to use noSQL. This particular question is about a relational database layout. :-)
Our preferred database is PostgreSQL, but this should not matter here.

In similar case I used, and suggest, separate tables.
If you can use postgresql, you can take advantage from inheritance.
In your case you can use a master_table for common_data and inherit the others table.
You wrote you prefer postgresql, so I assume you know, but, just in case: http://www.postgresql.org/docs/9.4/static/tutorial-inheritance.html
Using this you will take advantage of specific indexes in your data.
Another way is to use a single table with common data and other data in a json or hstore data, but this is close to have a nosql db. And no real advantages in this case, except faster coding.

Related

Save simple information for a database within postgress

I have a multi tennant application which will use the SILO Model to save data (each tennant will get an own database).
Because tennant names could be redundand my database are with GUIDs: MyApp_[GUID].
Now I want to save simple but neccesary information for each database like a tennant name and 3 to 5 more informations.
Is there a simple way to write and get these data?
The only way I can think of is to create a special table for this with only 1 row - but it seems a bot of wasting.
If you're looking for a simpler solution than a table per database (and having to deal with the awkward constraint that it must have exactly one row), you could
use a custom configuration parameter. You can change them with ALTER DATABASE. The downside is that you can only store strings, and that the settings might be overridden per session.
use a COMMENT on the database. The downside is that you can only store a single string per databasebase; the advantage is that it is automatically shown in many lists of databases such as psql's \l+ command
add your own columns to the pg_database system table. You should not mess with that, so it's a spectacularly bad idea even if you knew what you were doing, but in a relational model it's the closest to what you were asking for so I'd mention it for completeness.
I don't really advocate any of these solutions, although they do what you were asking for there's probably a better solution to your actual problem. It might be as simple a table of databases, possibly with a foreign key to pg_database, in an extra database shared by all tenants.

How to have an active flag in NoSQL without transaction

This question is only for understanding purpose. This might be a noob question.
Assume that I have a tabular or document NoSQL database which do not support transactions. And I have a locations table/document which has an is_active column. A user can have multiple locations, but can have a SINGLE is_active:true location only. Now, if the user wants to
Change an is_active location, how do we handle it? As I need to set is_active false for 1 row and true for another
A use case where a user wants to create a new location for himself and set that location as is_active
How do I handle these logics without a transaction in a NoSQL? Do I need to model my tables in some other way?
Let's assume that:
I cannot use an SQL
I should not use transaction support provided by DBs like Mongo
NOTE: These might be a lot of assumptions and might not be real-world use cases. But I am just trying to understand HOW we should model NoSQL databases.
This is a typical data modelling question for nosql systems. I won't put here the whole theory, but these are links to check: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-relational-modeling.html and https://cassandra.apache.org/doc/latest/cassandra/data_modeling/index.html
The key take away is that you don't want to create a normalized version of SQL like db in a nosql system. In your particular case, you have to explore access patterns for reads and writes; and that will help you to define you data structures. For example, you may want to get rid of "locations" table and keep those as properties of users - this is almost classical use case in nosql world.

Best approach to implement inheritance in a data warehouse based on a postgres database

I am developing a multi-step data pipeline that should optimize the following process:
1) Extract data from a NoSQL database (MongoDB).
2) Transform and load the data into a relational (PostgreSQL) database.
3) Build a data warehouse using the Postgres database
I have manually coded a script to handle steps 1) and 2), which is an intermediate ETL pipeline. Now my goal is to build the data warehouse using the Postgres database, but I came across with a few doubts regarding the DW design. Below is the dimensional model for the relational database:
There are 2 main tables, Occurrence and Canonical, from which inherit a set of others (drawn in red and blue, respectively). Note that there are 2 child data types, ObserverNodeOccurrence and CanonicalObserverNode, that have an extra many-to-many relationship with another table.
I made some research regarding how inheritance should be implemented in a data warehouse and figured the best practice would be to merge together the family data types (super and child tables) into a single table. Doing this would imply adding extra attributes and a lot of null values. My new dimensional model would look like the following:
Question 1: Do you think this is the best approach to address this problem? If not, what would be?
Question 2: Any software recommendations for on-premise data warehouses? (on-premise is a must since it contains sensitive data)
Usually having fewer tables to join and denormalizing data will improve query performance for data warehouse queries, so they are often considered a good thing.
This would suggest your second table design. NULL values don't occupy any space in a PostgreSQL table, so you need not worry about that.
As described here there are three options to implement inheritance in a relational database.
IMO the only practicable way to be used in data warehouse is the Table-Per-Hierarchy option, which merges all entities in one table.
The reason is not only the performance gain by saving the joins. In data warehouse often the historical view of the data is important. Think, how would you model a change in a subtype in some entity?
An important thing is to define a discriminator column which uniquely defines the source entity.

Is Azure table or NoSQL in general not so good when updating data

I have only looked in Azure table, but it may well apply for other NoSQL databases as well.
If I have an entity consisting of these following properties
First name - Last name - Hometown - Country
In Azure table there is no concept of relations therefore if I have thousands of data, and I want to change all entities that has 'Canada' in it, to some other country. Then in this scenario there is a possibility it has to go through thousands of data to find entities with 'Canada' and change it to something else.
I wonder, is the benefit of NoSQL only if you have data that is static and not changed after you have written it? Or could this problem be solved for NoSQLs?
In the case of NoSQL data stores the advantages are different from SQL databases. Things like scalability or availability can be better in a NoSQL database like Azure table, but there are tradeoffs. For example you are generally unable to efficiently query any part of a record, only the keys.
In designing your schema for Azure Table you have to consider the use cases of your data layer and let that dictate the schema. In this example, if I thought I would have to update all records in a given country, I would make that part of the partition or row key. That way your query to get all data in a given country is fast and can be updated quickly.

Amazon DynamoDB Item Size?

I'm just exploring the whole NoSQL concept. I've been playing around with Amazons DynamoDB and I really like the concept. However that said I am not quite sure how the data should be separated. By this I mean should I create a new Table for related data features like you would in a relational database or do I use a single table to store all the applications data?
As an example, in a relational DB I might have a table called users and a table called users_details. I would then for example, create a 1:1 relationship between the two tables. With the NoSQL concept I could theoretically create two tables as well but it strikes me as more efficient to have all the data in a single table.
If that is the case then when do you stop? Is the idea to store all the application data for a given user in a single table?
First ask yourself: why did I separate the users from the user details in RDBMS in the first place.
On a more general note, when talking about NoSQL you really shouldn't look at relationships between tables. You shouldn't think about "joining" information from different tables but rather prepare your data in a way that can be retrieved optimally.
In the user/user_details scenario, put all information in a users table, and query what you want by specifying the attributes to get.