How to insert historical data into fiware (with correct dates)?

How to insert historical data into fiware (with correct dates)? - fiware-orion

I have a bunch of historical data (csv) which I want to make accessible through sth-comet. The data is the history of water levels from multiple revers. The data is not provided live, but more or less on a daily basis and contains all the historic records for multiple days.
What I did so far was:
Convert the data into NGSIv2 format data model with dateObserved: DateTime and waterlevel : number field
Update/append the data into Fiware orion
Create a subscription for sth-comet for the entity type
Access the historical data in sth-comet (wrong time)
With this I now have the problem that the "rcvTime" is of course the time when sth-cometreceived the data. Is there a way that I can "overwrite" that attribute or is there a better solution? I also looked at cygnus on inserting data but I think the underlying problem is the same.
I could not find any hint in the avaiable documentation.

In the case of using Cygnus NGSIMongoSink and NGSISthSink you can use TimeInstant metadata in attributes to override received time with the time given in the metadata value.
Have a look to NGSIMongoSink documentation
By default, NGSIMongoSink stores the notification reception timestamp. Nevertheless, if (and only if) working in row mode and a metadata named TimeInstant is notified, then such metadata value is used instead of the reception timestamp. This is useful when wanting to persist a measure generation time (which is thus notified as a TimeInstant metadata) instead of the reception time.
or this similar fragment in NGSISTHSink documentation:
By default, NGSISTHSink stores the notification reception timestamp. Nevertheless, if a metadata named TimeInstant is notified, then such metadata value is used instead of the reception timestamp. This is useful when wanting to persist a measure generation time (which is thus notified as a TimeInstant metadata) instead of the reception time.

Related

Swift Firebase - Combining estimated ServerTimestamp with Codable Custom Objects

I have a messaging app, where I have a Chats collection in my Firebase Firestore database. I use a custom object which is Codable to read and write changes to firebase.
struct ChatFirebaseDO: Codable {
#DocumentID var id: String?
... {100 other fields} ...
var lastMessageDate: Date
}
When a user sends a new message, I update this lastMessageDate with the FieldValue.serverTimestamp()
I also have a listener which is listening for changes, and it immediately returns any update to me (wnether that is a new Chat of an update to an existing one). However if it is my own user that has created this new chat, it will be returned to me with a null timestamp.
From the docs I see this is intentional behaviour. It suggests that I change replace the nulls with estimated timestamps values (perfect!) however, I can't work out how to combine this with my custom objects.
To get the estimated timestamps, I need to do this:
diff.document.data(with: .estimate)
which returns a dictionary of fields.
But for my Codable custom objects to work, I have to use:
let messageDO = try diff.document.data(as: ChatFirebaseDO.self)
which uses a document (not a dictionary of data).
Is there a way I can (1) replace the nulls with estimated timestamps but (2) still have a document object I can use for my custom object transformation.
Perhaps its a global setting I can make to use estimates, or locally within a single listener request. Or perhaps it is a way to use custom objects from a data dictionary and not just from the FIRDocument.
Thank you in advance!

If you're not encoding these chats for disk storage then why are they even Codable is a question to ask yourself. This particular method is made for that purpose so I'd argue you're using the wrong tool for the job—a tool that also doesn't work because of the timestamp conflict, which I imagine will be addressed in a future update to Firestore.
That said, the timestamps (which are tokens) only return nil when they haven't reached the server which means only from latency-compensated snapshots generated by the client (or only when the signed-in user posts). Therefore, you can provide your own estimate when the value is nil (which would be the current date and time), which would not only be accurate but would be overwritten by the subsequent snapshot anyway when it has a real value. It's not a pleasant workaround but it accomplishes exactly what the token does with its own estimate.
If you don't want to ditch Codable then you can ditch Firestore's timestamp, which I've personally done. I'm not a fan of the token system and I've replaced it with a basic Unix Timestamp (an integer) that makes things much simpler. I don't have to worry about nil times, latency-compensated returns, or configuring snapshot data just to handle the value of a single field. If I had to guess, I would imagine Firestore will eventually allow a global setting of timestamp behavior in addition to expanding the API to allow the Codable method to also account for timestamp behavior. The TLDR of it is that what you want doesn't yet exist natively in the Firestore SDK, unfortunately, and I'd consider making it a feature request on the Firestore-iOS git repo.

Event Sourcing and dealing with data dependencies

Given a REST API with the following operations resulting in events posted to Kafka:
AddCategory
UpdateCategory
RemoveCategory
AddItem (refers to a category by some identifier)
UpdateItem
RemoveItem
And an environment where multiple users may use the REST API at the same time, and the consumers must all get the same events. The consumers may be offline for an extended period of time (more than a day). New consumers may be added, and others removed.
The problems:
Event ordering (only workaround single topic/partition?)
AddItem before AddCategory, invalid category reference.
UpdateItem before AddCategory, used to be a valid reference, now invalid.
RemoveCategory before AddItem, category reference invalid.
....infinite list of other concurrency issues.
Event Store snapshots for fast resync of restarted consumers
Should there be a compacted log topic for both categories and items, each entity keyed by its identifier?
Can the whole compacted log topic be somehow identified as an offset?
Should there only be one one entry in the compacted log topic, and the data of it contain a serialized blob of all categories and items given an offset (would require single topic/partition).
How to deal with the handover from replaying the rendered entities event store to the "live stream" of commands/events? Encode offset in each item in the compacted log view, and pass that to replay from the live event log?
Are there other systems that fit this problem better?

I will give you a partial answer based on my experience in Event sourcing.
Event ordering (only workaround single topic/partition?)
AddItem before AddCategory, invalid category reference.
UpdateItem before AddCategory, used to be a valid reference, now invalid.
RemoveCategory before AddItem, category reference invalid.
....infinite list of other concurrency issues.
All scalable Event stores that I know of guaranty events ordering inside a partition only. In DDD terms, the Event store ensure that the Aggregate is rehydrated correctly by replaying the events in the order they were generated. The Apache-kafka topic seems to be a good choice for that. While this is sufficient for the Write side of an application, it is harder for the Read side to use it. Harder but not impossible.
Given that the events are already validated by the Write side (because they represent facts that already happened) we can be sure that any inconsistency that appears in the system is due to the wrong ordering of events. Also, given that the Read side is eventually consistent with the Write side, the missing events will eventually reach our Read models.
So, first thing, in your case AddItem before AddCategory, invalid category reference, should be in fact ItemAdded before CategoryAdded (terms are in the past).
Second, when ItemAdded arrives, you try to load the Category by ID and if it fails (because of the delayed CategoryAdded event) then you can create a NotYetAvailableCategory having the ID equal to the referenced ID in the ItemAdded event and a title of "Not Yet Available Please Wait a few miliseconds". Then, when the CategoryAdded event arrives, you just update all the Items that reference that category ID. So, the main idea is that you create temporary entities that will be finalized when their events eventually arrive.
In the case of CategoryRemoved before ItemAdded, category reference invalid, when the ItemAdded event arrives, you could check that the category was deleted (by havind a ListOfCategoriesThatWereDeleted read model) and then take the appropriate actions in your Item entity - what depends on you business.

Storing custom temporary data in Sitecore xDB

I am using Sitecore 8.1 with xDB enabled (MongoDB). I would like to store the user-roles of the visiting users in the xDB, so I can aggregate on these data in my reports. These roles can change over time, so one user could have one set of roles at some point in time and another set of roles at a later time.
I could go and store these user-roles as custom facets on the Contact entity, but as they may change for a user from visit to visit, I will loose historical data if I update the data in the facet every time the user log in (fx. I will not be able to tell which roles a given user had, at some given visit).
Instead, I could create a custom IElement for my facet data, and store the roles along with a timestamp saying when the given roles were registered, but this model may be hard to handle during the reporting phase, where I would need to connect the interaction data with the role-data based on timestamps every time I generate a report.
Is it possible to store these custom data in the xDB in something else than the Contact collection? Can I store custom data in the Interactions collection? There is a property called Tracker.Current.Session.Interaction.CustomValues which sounds like what I need, but if I store data here, will I be able to perform proper aggregation/reporting on the data? Any other approaches I haven't thought about?

CustomValues
Yes, the CustomValues dictionary is what I would use in your case. This dictionary will get serialized to MongoDB as a nested document of every interaction (unless the dictionary is empty).
Also note that, since CustomValues is a member of the base class Sitecore.Analytics.Model.Entity, this dictionary is available in many other data classes of xDB. For example, you can store custom values in PageData and PageEventData objects.
Since CustomValues takes an object of any class, your custom data class needs some extra things for it to be successfully saved to and subsequently loaded from MongoDB:
It has to be marked as [Serializable].
It needs to be registered in the MongoDB driver like this:
using Sitecore.Analytics.Data.DataAccess.MongoDb;
// [...]
MongoDbObjectMapper.Instance.RegisterModelExtension<YourCustomClassName>();
This needs to be done only once per application lifetime - for example, in an initialize pipeline processor.
Your own storage
Of course, you don't have to use Sitecore's API to store your custom data. So the alternative would be to manually save data to a custom MongoDB collection or an SQL table. You can then read that data in your aggregation processor, finding it by the ID of currently processed interaction.
The benefit of this approach is that you can decide where and how your data is stored. The downside is extra work of implementing and maintaining this data storage.

Mule: after delivering a message, save the current timestamp for later use. What's the correct idiom?

I'm connecting to a third-party web service to retrieve rows from the underlying database. I can optionally pass a parameter like this:
http://server.com/resource?createdAfter=[yyyy-MM-dd hh:ss]
to get only the rows created after a given date.
This means I have to store the current timestamp (using #[function:datestamp:...], no problem) in one message scope and then retrieve it in another.
It also implies the timestamp should be preserved in case of an outage.
Obviously, I could use a subflow containing a file endpoint, saving in a designated file on a path. But, intuitively, based on my (very!) limited experience, it feels hackish.
What's the correct idiom to solve this?
Thanks!

The Object Store Module is designed just for that: to allow you to save bits of information from your flows.
See:
http://mulesoft.github.io/mule-module-objectstore/mule/objectstore-config.html
https://github.com/mulesoft/mule-module-objectstore/

Remove read data for authenticated user?

In DDS what my requirement is, I have many subscribers but the publisher is single. My subscriber reads the data from the DDS and checks the message is for that particular subscriber. If the checking success then only it takes the data and remove from DDS. The message must maintain in DDS until the authenticated subscriber takes it's data. How can I achieve this using DDS (in java environment)?

First of all, you should be aware that with DDS, a Subscriber is never able to remove data from the global data space. Every Subscriber has its own cached copy of the distributed data and can only act on that copy. If one Subscriber takes data, then other Subscribers for the same Topic will not be influenced by that in any way. Only Publishers can remove data globally for every Subscriber. From your question, it is not clear whether you know this.
Independent of that, it seems like the use of a ContentFilteredTopic (CFT) is suitable here. According to the description, the Subscriber knows the file name that it is looking for. With a CFT, the Subscriber can indicate that it is only interested in samples that have a particular value for the file_name attribute. The infrastructure will take care of the filtering process and will ensure that the Subscriber will not receive any data with a different value for the attribute file_name. As a consequence, any take() action done on the DataReader will contain relevant information and there is no need to check the data first and then take it.
The API documentation should contain more detailed information about how to use a ContentFilteredTopic.