Format sensor data as it comes into IBM Watson - raspberry-pi

I am still new to IBM Watson. Is there any way that i can format the sensor data that comes into IBM Watson? The issue that i am facing right now is that the timestamp bunch the date and the time together and it poses problems when i try to create certain data visualizations in any data analytics and visualization software. I will make things easier for me splitting the date and time from the timestamp. I am aware that the data is in json format.
In addition, I am using node-red, do let me know if the formatting of data should be done at node red.
Here is my sample sensor data :
{
"_id": "04691370-387e-11e8-8cd5-8b3f61628d0d",
"_rev": "1-a4328ecd41d03b8e4ac86de06baf03d2",
"deviceType": "RaspberryPi",
"deviceId": "9074bd",
"eventType": "event",
"format": "json",
"timestamp": "2018-04-05T11:04:12.583+08:00",
"data": {
"d": {
"temperature": 19.5,
"humidity": 44,
"heatIndex": 18.65
}
}
}
Things that I am using:
Raspberry Pi 3 Model B
Raspbian for Robots (Dexter Industries)
GrovePi+
GrovePi DHT 11, Light sensor , Sound sensor , UV sensor
Node Red with all the grovepi+ nodes including nodes for IBM Watson
IBM Watson , IBM Waston Iot
Cloudant NoSQL DB
CData ODBC Driver for Cloudant
Microsoft Power Bi (subject to change , depends on which software is easier to Adopt)

This is just JSON data, there is nothing to stop you adding 2 new fields to the object (e.g date and time).
It's probably simplest to do this in Node-RED with a function node with something like the following:
var timestamp = msg.payload.timestamp;
msg.payload.date = timestamp.substring(0,timestamp.indexOf('T'));
msg.payload.time = timestamp.substring(timestamp.indexOf('T') + 1);
return msg;

Related

MongoDB scheme collection for monitoring sensors

I'm designing a monitoring application and hesitating about mongodb scheme. The app will monitoring 32 sensors. Each sensors will have multiple scalar value, but that not the same between sensors (could be differents units, config etc..). Scalar data will be pushed every minute. And for each sensors, there will be one or some array (3200 value) to be pushed too. Each sensors could have a completly different config, that have to be stored too (config is complex, but does not change often at all) and of course event log.
I don't know if i have to create different collection for each sensors, with a flat db like :
Config
1D_data
2D_data
event_log
Or creating sub collection for each sensors with config/1d/2d in each of them.
Request to display data will be "display all 1D from one sensor" (to display a trend over time) and "show a 2D_data at this time".
If it was only scalar value i'll chose the first solution, but dunno if 2D "big" results is also on the game.
Thanks for your advices !
It seems a little oversimplified at first but all the information can be stored in a single collection. There is no added value in having 32 different collections. Each document will have a shape similar to this:
{
"device":"D1",
"created": ISODate("2017-10-02T13:59:48.194Z"),
"moreCommonMetadata": { whatever: 77 },
"data": {
"color":"red",
"maxAmps":10,
"maxVolts":240
}
},
{
"device":"D2",
"created": ISODate("2017-10-02T13:59:48.194Z"),
"moreCommonMetadata": { whatever: 77 },
"data": {
"bigArray":[23,34,45,56,7,78,78,78],
"adj": [{foo:'bar', bin:'baz'}, {corn:'dog'}],
"vers": 2
}
}
Queries can easily sweep across the metadata, and the data field can be any shape and will be logically tied to the device field i.e. some value of the device field will drive logic to manipulate the data field.
A fairly comprehensive example can be found at
http://moschetti.org/rants/lhs.html

Getting measurement into influxdb, nosql database

I have a measurement im want to persist in a influxdb database. The measurement itself consists of approx. 4000 measurement points which are generated by a microcontroller. Measurement points are in float format and are generated periodically (every few minutes) with a constant frequency.
I trying to get some knowledge for NoSQL databases, influxdb is my first try here.
Question is: How do I get these measurements in the influxdb assuming they are within an mqtt-message (in json format)? How are the insert strings generated/handled?
{
"begin_time_of_meas": "2020-11-19T16:02:48+0000",
"measurement": [
1.0,
2.2,
3.3,
...,
3999.8,
4000.4
],
"device": "D01"
}
I have used Node-RED in the past and i know there is a plugin for influx db, so i guess this would be a way. But im an quite unsure how the insert string is genereated/handled for an array of measurement points. Every exmaple i have seen so far handles only 1 point measurements like one temperature measurement every few seconds or cpu load. Thanks for your help.
I've successfully used the influxdb plugin with a time precision of milliseconds. Not sure how to make it work for more precise timestamps, and I've never needed to.
It sounds like you have more than a handful of points arriving per second; send groups of messages as an array to the influx batch node.
In your case, it depends what those 4000 measurements are, and how it best makes sense to group them. If the variables all measure the same point, something like this might work. I've no idea what the measurements are, etc. A function that takes the mqtt message and converts it to a block of messages like this might work well (note that this function output could replace the join node):
[{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0001",
},
fields: {
value: 1.0
}
},
{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0002",
},
fields: {
value: 2.2
}
},
...etc...
]
That looks like a lot of information to store, but measurement and tags values are basically header values that don't get written with every entry. The fields values do get stored, but these are compressed. The json describing the data to be stored is much larger than the on-disk space the storage will actually use.
It's also possible to have multiple fields, but I believe this will make data retrieval trickier:
{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0001",
},
fields: {
value_0001: 1.0,
value_0002: 2.2,
...etc...
}
}
Easier to code, but it would make for some ugly and inflexible queries.
You will likely have some more meaningful names than "microcontroller_data", or "0001", "0002" etc. If the 4000 signals are for very distinct measurements, it's also possible that there is more than one "measurement" that makes sense, e.g. cpu_parameters, flowrate, butterflies, etc.
Parse your MQTT messages into that shape. If the messages are sent one-at-a-time, then send to a join node; mine is set to send after 500 messages or 1 second of inactivity; you'll find something that fits.
If the json objects are grouped into an array by your processing, send directly to the influx batch node.
In the influx batch node, under "Advanced Query Options", I set the precision to ms because that's the default from Date().getTime().

IBM watson IoT platform-Hex data

My question is about connecting loriot network server to the IBM Watson IoT Platform.
I have managed to connect loriot backend with the Watson IoT Platform and see some data coming through. However, the data are in a hexadecimal format. Any idea on how I can convert this hex data to be human readable?
If the data coming through to the Watson IoT Platform is in JSON format, but contains properties whose values are in hex, you can use the Data Management capabilities to convert the data in these events to Device State. The expression language used in the property mapping expressions includes an $unpack function that can be used to convert strings and hex octets to numeric values. When used in conjunction with the $substring function, you can extract specific strings from a large hex value and convert it to a number.
As an example, say you had the following inbound event:
{
"propertyA": "valueA",
"propertyB": "valueB",
"data": "3b45940201000000010e4601"
}
... you could map values to properties on the device state using mapping expressions similar to the following:
$unpack($substring($event.data, 0, 8), "l32f")
$unpack($substring($event.data, 8, 2), "l8")
$unpack($substring($event.data, 10, 8), "l32")
The corresponding output of the three expressions above is:
2.1786381830505485E-37
1
16777216
The Data Management capabilities are documented here:
https://console.bluemix.net/docs/services/IoT/GA_information_management/ga_im_device_twin.html#device_twins

Simplest way to go about transforming data from kafka

I am working on a project that pulls data from multiple db sources using kafka connect. I want to then be able to transform the data into a specified json format and then finally push that final json to an S3 bucket preferably using kafka connect to keep my overhead down.
Here is an example of what the data currently looks like coming into kafka (in avro format):
{"tableName":"TABLE1","SchemaName{"string":"dbo"},"tableID":1639117030,"columnName":{"string":"DATASET"},"ordinalPosition":{"int":1},"isNullable":{"int":1},"dataType":{"string":"varchar"},"maxLength":{"int":510},"precision":{"int":0},"scale":{"int":0},"isPrimaryKey":{"int":0},"tableSizeKB":{"long":72}}
{"tableName":"dtproperties","SchemaName":{"string":"dbo"},"tableID":1745441292,"columnName":{"string":"id"},"ordinalPosition":{"int":1},"isNullable":{"int":0},"dataType":{"string":"int"},"maxLength":{"int":4},"precision":{"int":10},"scale":{"int":0},"isPrimaryKey":{"int":1},"tableSizeKB":{"long":24}}
This looks like so when converted to JSON:
{
"tablename" : "AS_LOOKUPS",
"tableID": 5835333,
"columnName": "SVALUE",
"ordinalPosition": 6,
"isNullable": 1,
"dataType": "varchar",
"maxLength": 4000,
"precision": 0,
"scale": 0,
"isPrimaryKey": 0,
"tableSize": 0,
"sizeUnit": "GB"
},
{
"tablename" : "AS_LOOKUPS",
"tableID": 5835333,
"columnName": "SORT_ORDER",
"ordinalPosition": 7,
"isNullable": 1,
"dataType": "int",
"maxLength": 4,
"precision": 10,
"scale": 0,
"isPrimaryKey": 0,
"tableSize": 0,
"sizeUnit": "GB"
}
My goal is to get the data to look like so:
{
"header": "Database Inventory",
"DBName": "DB",
"ServerName": "server#server.com",
"SchemaName": "DBE",
"DB Owner": "Name",
"DB Guardian" : "Name/Group",
"ASV" : "ASVC1AUTODWH",
"ENVCI": "ENVC1AUTODWHORE",
"Service Owner" : "Name/Group",
"Business Owner" : "Name/Group",
"Support Owner" : "Name/Group",
"Date of Data" : "2017-06-28 12:12:55.000",
"TABLE_METADATA": {
"TABLE_SIZE" : "500",
"UNIT_SIZE" : "GB",
"TABLE_ID": 117575457,
"TABLE_NAME": "spt_fallback_db",
"COLUMN_METADATA": [
{
"COLUMN_NM": "xserver_name",
"DATE_TYPE": "varchar",
"MAX_LENGTH": 30,
"PRECISION": 0,
"SCALE": 0,
"IS_NULLABLE": 0,
"PRIMARY_KEY": 0,
"ORDINAL_POSITION": 1
},
{
"COLUMN_NM": "xdttm_ins",
"DATE_TYPE": "datetime",
"MAX_LENGTH": 8,
"PRECISION": 23,
"SCALE": 3,
"IS_NULLABLE": 0,
"PRIMARY_KEY": 0,
"ORDINAL_POSITION": 2
}, ........
The header data will mostly be generic, but some of the stuff like date and etc. will need to be populated.
Initially my original thought were that I could do everything utilizing kafka connect, and that I could just create a schema for the way I want the data to be formatted. I am having a problem though with utilizing a different schema with the connectors and I'm not really sure if it is even possible.
Another solution I thought about was utilizing Kafka Streams, and writing code to transform the data into what is needed. I'm not to sure how easy it is do that w/ Kafka Streaming.
And finally a third solution I have seen is to utilize Apache Spark, and manipulating the data with dataframes. But this will add more overhead.
I'm honestly not to sure what route to go, or if any of these solutions are what I'm looking for. So I am open to all advice on how to solve this problem.
Kafka Connect does have Simple Message Transforms (SMTs), a framework for making minor adjustments to the records produced by a source connector before they are written into Kafka, or to the records read from Kafka before they are send to sink connectors. Most SMTs are quite simple functions, but you can chain them together to slightly more complex operations. You can always implement your own Transformation with custom logic, but no matter what each transform operates on a single record at a time and never should make calls out to other services. SMTs are only for basic manipulation of individual records.
However, the changes you want to make are probably a bit more complex than what is suitable through SMTs. Kafka Streams seems like it is the best solution to this problem, since it allows you to create a simple stream processor that consumes the topic(s) produced by the source connector, alters (and possibly combines) the messages accordingly, and writes them out to other topic(s). Since you're already using Avro, you can write your Streams application to use Avro generic records (see this example) or with classes auto-generated from the Avro schemas (see this example).
You also mention that you have data from multiple sources, and chances are those are going to separate topics. If you want to integrate, join, combine, or simply merge those topics into other topics, then Kafka Streams is a great way to do this.
Kafka Streams apps are also just normal Java applications, so you can deploy them using the platform of your choosing, whether that's Docker, Kubernetes, Mesos, AWS, or something else. And they don't require a running distributed platform like Apache Spark requires.

Plain text view of a document in Couchbase GUI

I am using Couchbase and testing it on my local computer with ASP.NET. I've inserted some data into a sample document and I can read the data using ASP.NET C# Driver for Couchbase. The thing is that when I logged in to the cluster management GUI and look at the document, I get to see lots of characters with no meaning, can't actually see a text representation of the document that I've inserted. With MongoDB, BigCouch, RavenDB the data is plain and simple JOSN document and its easy to update a single document. Am I missing something here?
In my .NET application I have this code:
var client = new CouchbaseClient();
client.Store(StoreMode.Add, "aaa", "sample_data");
client.Dispose();
What I get is in the console when I view the document:
"Y29tcGFyaXNvbl9pZDogMQ=="
This is a binary format, not JSON. Using CouchBase 2.0 beta
In Couchbase Server 2.0, if you store invalid JSON as the value in the key/value pair that you save, you will see a Base64 encoded version of the item you've saved. Since "sample_data" is not a valid JSON document, Couchbase Server treats it like a byte array. When you view the bytes, they're Base64 encoded. Instead, if you change your store method to something like the following:
client.Store(StoreMode.Add, "aaa", "{ \"message\" : \"sample_data\" }");
you'd then see the actual JSON document.
The Getting Started guide for the latest Beta of the Couchbase Client has more information on working with JSON and views with Couchbase Server 2.0 - http://www.couchbase.com/develop/net/next.
It's a bug of Couchbase 2.0 GUI display.Now I use couchbase-server-2.0.0-1723.x86_64 in RHEL 6.0,and create new document with Coubase 2.0 GUI.When I insert under json:
{
"_id": "100",
"name": "Thomas",
"dept": "Sales",
"salary": 5000
}
,then save while list base64 string:
"eyJfaWQiOiIxMDAiLCJuYW1lIjoiVGhvbWFzIiwiZGVwdCI6IlNhbGVzIiwic2FsYXJ5Ijo1MDAwfQ=="
I can follow this post:base64 value display in GUI (Beta 2.0),to fix javascript code in this path:
/opt/couchbase/lib/ns_server/erlang/lib/ns_server-2.0.0r_331_g08fb51b/priv/public/js
Fianlly,Clean the browser cache and login Couchbase 2.0 GUI agian.The documents will display:
"{"_id":"100","name":"Thomas","dept":"Sales","salary":5000}"
It's corrent.