FIWARE context broker storing all data to mongodb - mongodb

I have installed fiware context broker and I am sending data to it using the localhost:1026/v1/updateContext end point.
Everything is working properly and I am able to get and visualise the data being sent. As orion is a broker service the latest only entity can be received.
Question: I need to save automatically the historical data to a mongo db database. Orion save only the latest 2 entries. STH and Cygnus is not doing the job since they require a lot of configuration both in sending data and collecting, storing etc.
Is there anyway to automatically save all data being sent to orion? And group them by service ids?
Thank you in advance.

I'm afraid that the only way to store historical data in FIWARE is through STH, QuantumLeap (incubated GE) or Cygnus.
Configuring them is not so difficult. Please follow these tutorials:
https://github.com/Fiware/tutorials.Historic-Context
https://github.com/Fiware/tutorials.Time-Series-Data
https://github.com/Fiware/tutorials.Short-Term-History
http://fiwaretourguide.readthedocs.io/en/latest/generating-historical-context-information-sth-comet/how-to-generate-the-history-of-Context-Information-using-STH-Comet/
http://fiwaretourguide.readthedocs.io/en/latest/storing-data-cygnus-mysql/how-to-store-data-cygnus-mysql/

Precisely, orchestrating the persistence of historical data of context entities is the purpose of CYGNUS Generic Enabler. Then you can use STH to store historical data for the most recent period of time or choose some other alternative such as Cosmos for Big Data.
You can find in the official documentation of cygnus, examples of configuration files in order to persist data for STH. In addition, if you are familiar with MongoDB, here is official documentation of the Mongodb Sink, with examples for the different persistence configurations.
If you give me a little more information about how you are configuring Cygnus and STH I could help you more.
Regards!

Solution
Playing around with context broker, I altered the way that orion stores the data to the auto generated mongodb. When one send data to the orion the id of the json will be always the servicepath, the id of the json and the type. Therefore new data gets overwritten. We need to change that by having another element in the id that is incremental therefore new entries are saved. I am not sure if this is a clumsy solution but is definitely more scalable since we don't need subscriptions.

Related

Spring batch with MongoDB and transactions

I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored. The relation DB still uses DataSourceTransactionManager.
However I dont think the Mongo writes are done within an active transaction with rollbacks. Here is the excerpt from the official Spring Batch documentation on MongoItemWriter:
A ItemWriter implementation that writes to a MongoDB store using an implementation of Spring Data's MongoOperations. Since MongoDB is not a transactional store, a best effort is made to persist written data at the last moment, yet still honor job status contracts. No attempt to roll back is made if an error occurs during writing.
However this is not the case any more; MongoDB introduced ACID transactions in version 4.
How do I go about adding transactions to my writes? I could use #Transactional on my service methods when I use ItemWriterAdapter. But still dont know what to do with MongoItemWriter... What is the right configuration here? Thank you.
I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored.
I invite you to take a look at the following posts to understand the implications of this design choice:
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
How does Spring Batch transaction management work?
In your case, you have a distributed transaction across two data sources:
SQL datasource for the job repository, which is managed by a DataSourceTransactionManager
MongoDB for your step (using the MongoItemWriter), which is managed by a MongoTransactionManager
If you want technical meta-data and business data to be committed/rolled back in the scope of the same distributed transaction, you need to use a JtaTransactionManager that coordinates the DataSourceTransactionManager and MongoTransactionManager. You can find some resources about the matter here: https://stackoverflow.com/a/56547839/5019386.
BTW, there is a feature request to use MongoDB as a job repository in Spring Batch: https://github.com/spring-projects/spring-batch/issues/877. When this is implemented, you could store both business data and technical meta-data in the same datasource (so no need for a distributed transaction anymore) and you would be able to use the same MongoTransactionManager for both the job repository and your step.

InfluxDB and relational metadata?

I am currently working on a project with InfluxDB, where I have to track and store several measurements. My issue: For all my measurement I also need to store some additional relational metadata (for reporting), which only change once or twice a week. Moreover I have some config data, for my applications..
So my question: Do you have any suggestions or experience with storing time-series and relational data like that? Should I try to store everything (even the relational data) within the InfluxDB or should I use an external relational DB for that?
Looking forward to your help!
Marko

How do I integrate MongoDB and Hazelcast without mentioning a specific schema?

I am aware of how we create POJO classes (Java) and map them to the schema of the data in MongoDB, and create a connection with spring data. But if I don't have a specific schema and I want to have MongoDB as a back end for my cache in Hazelcast, how do I do that? In my use-case specifically, I have a cache which needs to keep mongodb updated with whatever updates it comes across.
Check this out:
https://github.com/hazelcast/hazelcast-code-samples/tree/master/hazelcast-integration/mongodb
Do note that this is a sample code that is meant for referential purposes only, do not copy paste into your production system.

Retrieve historical data with Fiware Orion/Cygnus

I have a Fiware Orion installed with a Cygnus sink and a mongoDB back-end.
What is a common method to retrieve the history of the data points stored in mongoDB? Can I do that using Cygnus?
The API of Cygnus doesn't give anything to retrieve the data:
http://fiware-cygnus.readthedocs.io/en/latest/cygnus-common/installation_and_administration_guide/management_interface_v1/index.html
I would like to have some api endpoints for historical data:
api/v1/history/entities...
Cygnus sinks for MongoDB (there are 2 sinks, one for raw data - I guess it is the one you are using -, and another one for aggregated data) are aligned with STH Comet and its API.
Cygnus only provides a managemet API since it is the tool in charge of putting Orion context data in MongoDB (or HDFS, MySQL, Carto, CKAN...). Once persisted the data in the persistence backend, it up to the persistence backend to provide means for its data exploitation; or it is up to some tool wrapping the persistence backend native API, as STH Comet API does.

Sync postgreSql data with ElasticSearch

Ultimately I want to have a scalable search solution for the data in PostgreSql. My finding points me towards using Logstash to ship write events from Postgres to ElasticSearch, however I have not found a usable solution. The soluions I have found involve using jdbc-input to query all data from Postgres on an interval, and the delete events are not captured.
I think this is a common use case so I hope you guys could share with me your experience, or give me some pointers to proceed.
If you need to also be notified on DELETEs and delete the respective record in Elasticsearch, it is true that the Logstash jdbc input will not help. You'd have to use a solution working around the binlog as suggested here
However, if you still want to use the Logstash jdbc input, what you could do is simply soft-delete records in PostgreSQL, i.e. create a new BOOLEAN column in order to mark your records as deleted. The same flag would then exist in Elasticsearch and you can exclude them from your searches with a simple term query on the deleted field.
Whenever you need to perform some cleanup, you can delete all records flagged deleted in both PostgreSQL and Elasticsearch.
You can also take a look at PGSync.
It's similar to Debezium but a lot easier to get up and running.
PGSync is a Change data capture tool for moving data from Postgres to Elasticsearch.
It allows you to keep Postgres as your source-of-truth and expose structured denormalized
documents in Elasticsearch.
You simply define a JSON schema describing the structure of the data in
Elasticsearch.
Here is an example schema: (you can also have nested objects)
e.g
{
"nodes": {
"table": "book",
"columns": [
"isbn",
"title",
"description"
]
}
}
PGsync generates queries for your document on the fly.
No need to write queries like Logstash. It also supports and tracks deletion operations.
It operates both a polling and an event-driven model to capture changes made to date
and notification for changes that occur at a point in time.
The initial sync polls the database for changes since the last time the daemon
was run and thereafter event notification (based on triggers and handled by the pg-notify)
for changes to the database.
It has very little development overhead.
Create a schema as described above
Point pgsync at your Postgres database and Elasticsearch cluster
Start the daemon.
You can easily create a document that includes multiple relations as nested objects. PGSync tracks any changes for you.
Have a look at the github repo for more details.
You can install the package from PyPI
Please take a look at Debezium. It's a change data capture (CDC) platform, which allow you to steam your data
I created a simple github repository, which shows how it works