Make Orion fetch data from Cosmos and publish - fiware-orion

I have set up a subscription between Orion ContextBroker and Cosmos BigData using Cygnus, and data is properly persisted in Cosmos when an update is made to Orion.
But I want to analyze the data in Cosmos and return the results to Orion, and finally access the result data in Orion from "outside".
How would one do this? Of course, I would like the solution I build to be as "automated" as possible, but mostly I just want to solve this problem.
Any advise is much appreciated!

As general response (as also the question is very general ;), what you need is a process that access to information stored in Cosmos (either using HDFS APIs -such as WebHDFS or HttpFs-, Hive queries, general MapReduce jobs on top of Hadoop, etc.), then implement the client side of the NGSI API that Orion implements in order to inject context elements into Orion based in the information you retrieved from Cosmos. The key operation to do so in the Orion API is updateContext.
The automation degree would depend on how you implement that process. It can be as automated as you want.
EDIT: considering this answer comments, I will try to add more detail.
What I mean is to develop a piece of software (let's call it APOS -A Piece Of Software) implementing the following behaviour:
APOS will grab data from Cosmos any of the interfaces provided by Cosmos, i.e. WebHDFS/HttpFs, Hive, MapReduce jobs, etc.
APOS will process the data to produce some result
APOS will inject that result in Orion, using the Orion REST API described in the Orion user manual. It is particularly useful for that task the updateContext operation. From a client-server point of view, Orion is a server exposing a REST API and APOS is the client interacting with that server.
It is completely up to you how to implement this APOS and how orchestrate the flow from 1 to 3 (e.g. it can run in batch mode all midnights, be triggered by user interaction on a web portal, etc.).
At the present moment, FI-WARE doesn't provide any generic enabler to convert from Cosmos data to NGSI given that each particular realization of the steps 1 to 3 above is different and depends on the use case. However, note that there is software component named Cygnus which implements the other way: from NGIS to Cosmos.

Related

Is this a good approach for api clone (api scraping)?

currently I am working with a pipeline node.js base with backpressure in order to download all data from several API endpoints, some of these endpoints have data related between them.
The idea is to make a refactor in order to build an application more maintainable than the current in order to improve the updates over the data, and the current approach is the following.
The first step (map) is to download all data from endpoints and push into several topics, some of this data is complex and it is necessary data from one endpoint in order to retrieve data from another endpoint.
And the second step (reduce) is to get that data from all topics and push into a SQL but only the data that we need.
The question are
Could be a good approach to this problem
Is better to use Kafka streams in order to use KSQL to make transforms and only use a microservice to publish into the database.
The architecture schema is the following, and real time is not necessary for this data:
Thanks

CQRS pattern: using command for external api call

I am aware of CQRS pattern wherein Query to be used for Reading data and Command to be used for updating the data.
In a peculiar case where the rest api is POST but not updating the data directly rather calling an external POST api of other system and passing the details.
In such case which one holds true - use Query or Command?
*Update *
System involves multiple DB. Not necessarily different DB for query and command however
Super simple. If you know for sure the call will not update or modify state or data, then it's a query, if it does (or might) then it's a command.
However, CQRS is often more about the physical structure of your system. You might have separate command and query databases... and that complicates the answer. It can have both logical and physical aspects.

hazelcast spring-data write-through

I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.
That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?

How DSpace process a query in jspui?

How any query is processed in DSpace and data is managed between front end and PostgreSQL
Like every other webapp running in a Servlet Container like Tomcat, the file WEB-INF/web.xml controls how a query is processed. In case of DSpace's JSPUI you'll find this file in [dspace-install]/webapps/jspui/WEB-INF/web.xml. The JSPUI defines several filters, listeners and servlets to process a request.
The filters are used to report that the JSPUI is running, that restricted areas can be seen by authenticated users or even by authenticated administrators only and to handle Content Negotiation.
The listeners ensure that DSpace has started correctly. During its start DSpace loads the configuration, opens database connections that it uses in a connection pool, let Spring do its IoC magic and so on.
For the beginning the most important part to see how a query is processed are the servlets and the servlet-mappings. A servlet-mapping defines which servlet is used to process a request with a specific request path: e.g. all requests to example.com/dspace-jspui/handle/* will be processed by org.dspace.app.webui.servlet.HandleServlet, all requests to example.com/dspace-jspui/submit will be processed by org.dspace.app.webui.servlet.SubmissionController.
The servlets uses their Java code ;-) and the DSpace Java API to process the request. You'll find most of it in the dspace-api module (see [dspace-source]/dspace-api/src/main/java/...) and some smaller part in dspace-services module ([dspace-source]dspace-services/src/main/java/...). Within the DSpace Java API their are two important classes if you're interested in the communication with the database:
One is org.dspace.core.Context. The context contains information whether and which user is logged in, an initialized and connected database connection (if all went well) and a cache. The methods Context.abort(), Context.commit() and Context.complete() are used to manage the database transaction. That is the reason, why almost all methods manipulating the database requests a Context as method parameter: it controls the database connection and the database transaction.
The other one is org.dspace.storage.rdbms.DatabaseManager. The DatabaseManager is used to handle database queries, updates, deletes and so on. All DSpaceObjects contains an object TableRow which contains the information of the object stored in the database. Inside the DSpaceObject classes (e.g. org.dspace.content.Item, org.dspace.content.Collection, ...) the TableRow may be manipulated and the changes stored back to the database by using DatabaseManager.update(Context, DSpaceObject). The DatabaseManager provides several methods to send SQL queries to the database, to update, delete, insert or even create data in the database. Just take a look to its API or look for "SELECT" it the DSpace source to get an example.
In JSPUI it is important to use Context.commit() if you want to commit the database state. If a request is processed and Context.commit() was not called, then the transaction will be aborted and the changes gets lost. If you call Context.complete() the transaction will be committed, the database connection will be freed and the context is marked as been finished. After you called Context.complete() the context cannot be used for a database connection any more.
DSpace is quite a huge project and their could be written a lot more about its ORM, the initialization of the database and so on. But this should already help you to start developing for DSpace. I would recommend you to read the part "Architecture" in the DSpace manual: https://wiki.duraspace.org/display/DSDOC5x/Architecture
If you have more specific questions you are always invited to ask them here on stackoverflow or on our mailing lists (http://sourceforge.net/p/dspace/mailman/) dspace-tech (for any question about DSpace) and dspace-devel (for question regarding the development of DSpace).
It depends on the version of DSpace you are running, along with your configuration.
In DSpace 4.0 or above, by default, the DSpace JSPUI uses Apache Solr for all searching and browsing. DSpace performs all indexing and querying of Solr via its Discovery module. The Discovery (Solr) based searche/indexing classes are available under the "org.dspace.discovery" package.
In earlier versions of DSpace (3.x or below), by default, the DSpace JSPUI uses Apache Lucene directly. In these older versions, DSpace called Lucene directly for all indexing and searching. The Lucene based search/indexing classes are available under the "org.dspace.search" package.
In both situations, queries are passed directly to either Solr or Lucene (again depending on the version of DSpace). The results are parsed and displayed within the DSpace UI.

What is the best practice to handle Multitenant security in Breeze?

I'm developing an Azure application using this stack:
(Client) Angular/Breeze
(Server) Web API/Breeze Server/Entity Framework/SQL Server
With every request I want to ensure that the user actually has the authorization to execute that action using server-side code. My question is how to best implement this within the Breeze/Web API context.
Is the best strategy to:
Modify the Web API Controller and try to analyze the contents of the
Breeze request before passing it further down the chain?
Modify the EFContextProvider and add an authorization test to
every method exposed?
Move the security all into the database layer and make sure that a User GUID and Tenant GUID are required parameters for every query and only return relevant data?
Some other solution, or some combination of the above?
If you are using Sql Azure then one option is to use Azure Federation to do exactly that.
In a very simplistic term if you have TenantId in your table which stores data from multiple tenants then before you execute a query like SELECT Col1 FROM Table1, you execute USE FEDERATION... statement to restrict the query results to a particular TenantId only, and you don't need to add WHERE TenantId=#TenantId to your query,
USE FEDERATION example: http://msdn.microsoft.com/en-us/library/windowsazure/hh597471.aspx
Note that use of Sql Azure Federation comes with lots of strings attached when it comes to Building a DB schema one of the best blog I have found about it is http://blogs.msdn.com/b/cbiyikoglu/archive/2011/04/16/schema-constraints-to-consider-with-federations-in-sql-azure.aspx.