Retrieve historical data with Fiware Orion/Cygnus - fiware-orion

I have a Fiware Orion installed with a Cygnus sink and a mongoDB back-end.
What is a common method to retrieve the history of the data points stored in mongoDB? Can I do that using Cygnus?
The API of Cygnus doesn't give anything to retrieve the data:
http://fiware-cygnus.readthedocs.io/en/latest/cygnus-common/installation_and_administration_guide/management_interface_v1/index.html
I would like to have some api endpoints for historical data:
api/v1/history/entities...

Cygnus sinks for MongoDB (there are 2 sinks, one for raw data - I guess it is the one you are using -, and another one for aggregated data) are aligned with STH Comet and its API.
Cygnus only provides a managemet API since it is the tool in charge of putting Orion context data in MongoDB (or HDFS, MySQL, Carto, CKAN...). Once persisted the data in the persistence backend, it up to the persistence backend to provide means for its data exploitation; or it is up to some tool wrapping the persistence backend native API, as STH Comet API does.

Related

Can persistent databases and in-memory database work together?

My requirement is to utilize 2 different database sources(persistent) and serve to frontend. To make api response faster can I utilize in-memory database like H2 or gemfire, to store data that is known to be frequently accessed (like a cron job at some time does that) and for other calls go to the databases. Here, the challenge for me is transferring the data from persistent to in-memory as Spring needs same 2 POJO's with different annotation(For e.g #Document for mongo, for h2,gemfire #Entity). as of now it does not make sense manually go through each record from an array received from mongo and save it in in-mem.
You can utilize the Spring Framework's Cache Abstraction and different forms of caching patterns when using Apache Geode (or alternatively, VMware Tanzu GemFire). See a more general description about caching starting here.
NOTE: VMware Tanzu GemFire, up to 9.15, is built on Apache Geode and are virtually interchangeable by swapping the bits.
I also provide many Samples of the different caching patterns in action, Guide and Source included, which are reference in the relevant section in the reference documentation.
Finally, in the Spring Boot for Apache Geode project, I also have 2 test classes testing the Inline Caching Pattern, which you may refer to:
The first test class uses Apache Geode as a caching provider in Spring's Cache Abstraction to cache data from a database, HSQLDB (predecessor of H2) in this case.
The second test class is similar in that Apache Geode caches data from Apache Cassandra.
The primary, backend data store used in Inline Caching is made irrelevant since it uses the Spring Data Repository abstraction to interface with any data store (referred to as "modules" in the Spring Data portfolio; see here) supported by the SD Repository abstraction.
Caching is but 1 technique to keep relevant data in-memory in order to improve response times in a Spring (Boot) application using a backing data store, such as an RDBMS, as the primary System of Record (SOR). Of course, there are other approaches, too.
Apache Geode can even be configured for durability (persistence) and redundancy. In some cases, it might even completely replace your database for some function. So, you have a lot of options.
Hopefully this will get you started and give you more ideas.

Spring batch with MongoDB and transactions

I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored. The relation DB still uses DataSourceTransactionManager.
However I dont think the Mongo writes are done within an active transaction with rollbacks. Here is the excerpt from the official Spring Batch documentation on MongoItemWriter:
A ItemWriter implementation that writes to a MongoDB store using an implementation of Spring Data's MongoOperations. Since MongoDB is not a transactional store, a best effort is made to persist written data at the last moment, yet still honor job status contracts. No attempt to roll back is made if an error occurs during writing.
However this is not the case any more; MongoDB introduced ACID transactions in version 4.
How do I go about adding transactions to my writes? I could use #Transactional on my service methods when I use ItemWriterAdapter. But still dont know what to do with MongoItemWriter... What is the right configuration here? Thank you.
I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored.
I invite you to take a look at the following posts to understand the implications of this design choice:
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
How does Spring Batch transaction management work?
In your case, you have a distributed transaction across two data sources:
SQL datasource for the job repository, which is managed by a DataSourceTransactionManager
MongoDB for your step (using the MongoItemWriter), which is managed by a MongoTransactionManager
If you want technical meta-data and business data to be committed/rolled back in the scope of the same distributed transaction, you need to use a JtaTransactionManager that coordinates the DataSourceTransactionManager and MongoTransactionManager. You can find some resources about the matter here: https://stackoverflow.com/a/56547839/5019386.
BTW, there is a feature request to use MongoDB as a job repository in Spring Batch: https://github.com/spring-projects/spring-batch/issues/877. When this is implemented, you could store both business data and technical meta-data in the same datasource (so no need for a distributed transaction anymore) and you would be able to use the same MongoTransactionManager for both the job repository and your step.

PostgreSQL Store-and-Forrward

I'm building a data collection RESTful API 0 External devices will post json data to database server. So, my idea is to make it by Store-and-Forward ideology.
At the moment of the post, it will store raw json data to table with timestamp and processed true/false fields.
At the next moment when(if) the db server is not loaded will run some function, trigger or stored procedure, etc. The idea is to process all json data into suitable tables and fields for charting graphs/bars and map it over Google Maps later.
So how and what to use to run this functionality(2) when the db server no loaded and free for processing the posted json data?

Rest api vs sqoop

I was trying to import data from mysql to hdfs.
I was able to do it with sqoop but this can be done by fetching the data from api also.
My question is about when to use rest api to load data in hdfs instead of sqoop?
Please specify some difference with use cases!
You could use Sqoop to pull data from Mysql and into Hbase, then put a REST API over Hbase (on Hadoop)... Would be not much different than a REST API over Mysql.
Basically, you're comparing two different things. Hadoop is not meant to replace traditional databases or N-tier user-facing applications, it just is a more distributed, fault tolerant place to store large amounts of data.
And you typically wouldn't use a REST API to talk to a database, then put those values into Hadoop, because that wouldn't be distributed and all database results go through a single process
Sqoop (SQL <=> Hadoop) is basically used for loading data from RDBMS to HDFS.
It's a direct connection to database where you can append/modify/delete data in table(s) using sqoop eval command if privileges are not defined properly for the user accessing the db from sqoop
But using Rest web services api we can fetch data from various databases (can be NoSQL or RDBMS both) connected internally via code.
Consider you are calling a getUsersData restful web service using curl command which is specifically designed only to provide users data and doesn't allow to append/modify/update any components of db irrespective of database (RDBMS/NoSQL)

FIWARE context broker storing all data to mongodb

I have installed fiware context broker and I am sending data to it using the localhost:1026/v1/updateContext end point.
Everything is working properly and I am able to get and visualise the data being sent. As orion is a broker service the latest only entity can be received.
Question: I need to save automatically the historical data to a mongo db database. Orion save only the latest 2 entries. STH and Cygnus is not doing the job since they require a lot of configuration both in sending data and collecting, storing etc.
Is there anyway to automatically save all data being sent to orion? And group them by service ids?
Thank you in advance.
I'm afraid that the only way to store historical data in FIWARE is through STH, QuantumLeap (incubated GE) or Cygnus.
Configuring them is not so difficult. Please follow these tutorials:
https://github.com/Fiware/tutorials.Historic-Context
https://github.com/Fiware/tutorials.Time-Series-Data
https://github.com/Fiware/tutorials.Short-Term-History
http://fiwaretourguide.readthedocs.io/en/latest/generating-historical-context-information-sth-comet/how-to-generate-the-history-of-Context-Information-using-STH-Comet/
http://fiwaretourguide.readthedocs.io/en/latest/storing-data-cygnus-mysql/how-to-store-data-cygnus-mysql/
Precisely, orchestrating the persistence of historical data of context entities is the purpose of CYGNUS Generic Enabler. Then you can use STH to store historical data for the most recent period of time or choose some other alternative such as Cosmos for Big Data.
You can find in the official documentation of cygnus, examples of configuration files in order to persist data for STH. In addition, if you are familiar with MongoDB, here is official documentation of the Mongodb Sink, with examples for the different persistence configurations.
If you give me a little more information about how you are configuring Cygnus and STH I could help you more.
Regards!
Solution
Playing around with context broker, I altered the way that orion stores the data to the auto generated mongodb. When one send data to the orion the id of the json will be always the servicepath, the id of the json and the type. Therefore new data gets overwritten. We need to change that by having another element in the id that is incremental therefore new entries are saved. I am not sure if this is a clumsy solution but is definitely more scalable since we don't need subscriptions.