Extract value from cloudant IBM Bluemix NoSQL Database - pyspark

How to Extract value from Cloudant IBM Bluemix NoSQL Database stored in JSON format?
I tried this code
def readDataFrameFromCloudant(host,user,pw,database):
cloudantdata=spark.read.format("com.cloudant.spark"). \
option("cloudant.host",host). \
option("cloudant.username", user). \
option("cloudant.password", pw). \
load(database)
cloudantdata.createOrReplaceTempView("washing")
spark.sql("SELECT * from washing").show()
return cloudantdata
hostname = ""
user = ""
pw = ""
database = "database"
cloudantdata=readDataFrameFromCloudant(hostname, user, pw, database)
It is stored in this format
{
"_id": "31c24a382f3e4d333421fc89ada5361e",
"_rev": "1-8ba1be454fed5b48fa493e9fe97bedae",
"d": {
"count": 9,
"hardness": 72,
"temperature": 85,
"flowrate": 11,
"fluidlevel": "acceptable",
"ts": 1502677759234
}
}
I want this result
Expected
Actual Outcome

Create a dummy dataset for reproducing the issue:
cloudantdata = spark.read.json(sc.parallelize(["""
{
"_id": "31c24a382f3e4d333421fc89ada5361e",
"_rev": "1-8ba1be454fed5b48fa493e9fe97bedae",
"d": {
"count": 9,
"hardness": 72,
"temperature": 85,
"flowrate": 11,
"fluidlevel": "acceptable",
"ts": 1502677759234
}
}
"""]))
cloudantdata.take(1)
Returns:
[Row(_id='31c24a382f3e4d333421fc89ada5361e', _rev='1-8ba1be454fed5b48fa493e9fe97bedae', d=Row(count=9, flowrate=11, fluidlevel='acceptable', hardness=72, temperature=85, ts=1502677759234))]
Now flatten:
flat_df = cloudantdata.select("_id", "_rev", "d.*")
flat_df.take(1)
Returns:
[Row(_id='31c24a382f3e4d333421fc89ada5361e', _rev='1-8ba1be454fed5b48fa493e9fe97bedae', count=9, flowrate=11, fluidlevel='acceptable', hardness=72, temperature=85, ts=1502677759234)]
I tested this code with an IBM Data Science Experience notebook using Python 3.5 (Experimental) with Spark 2.0
This answer is based on: https://stackoverflow.com/a/45694796/1033422

Related

How move data to a location after doing a Databricks Merge

I would like to know how to go about moving the results of a Merge with Databricks to a location such as Azure SQL Database.
The folloiwng is a typical Databricks Merge sample from:
https://learn.microsoft.com/en-us/azure/databricks/delta/merge
I would like to know how to send the results of the following Python merge to an Azure SQL Database
from delta.tables import *
deltaTablePeople = DeltaTable.forPath(spark, '/tmp/delta/people-10m')
deltaTablePeopleUpdates = DeltaTable.forPath(spark, '/tmp/delta/people-10m-updates')
dfUpdates = deltaTablePeopleUpdates.toDF()
deltaTablePeople.alias('people') \
.merge(
dfUpdates.alias('updates'),
'people.id = updates.id'
) \
.whenMatchedUpdate(set =
{
"id": "updates.id",
"firstName": "updates.firstName",
"middleName": "updates.middleName",
"lastName": "updates.lastName",
"gender": "updates.gender",
"birthDate": "updates.birthDate",
"ssn": "updates.ssn",
"salary": "updates.salary"
}
) \
.whenNotMatchedInsert(values =
{
"id": "updates.id",
"firstName": "updates.firstName",
"middleName": "updates.middleName",
"lastName": "updates.lastName",
"gender": "updates.gender",
"birthDate": "updates.birthDate",
"ssn": "updates.ssn",
"salary": "updates.salary"
}
) \
.execute()
Firstly you need to create a mount point in Databricks
Please refer this link for creating mount point https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts
Once you complete merge operation write the dataframe to ADLS
Follow this link https://docs.delta.io/0.2.0/delta-batch.html

Save DF with JSON string as JSON without escape characters with Apache Spark

I have a dataframe which contains some column and json string:
val df = Seq (
(0, """{​"device_id": 0, "device_type": "sensor-ipad", "ip": "68.161.225.1", "cca3": "USA", "cn": "United States", "temp": 25, "signal": 23, "battery_level": 8, "c02_level": 917, "timestamp" :1475600496 }​"""),
(1, """{​"device_id": 1, "device_type": "sensor-igauge", "ip": "213.161.254.1", "cca3": "NOR", "cn": "Norway", "temp": 30, "signal": 18, "battery_level": 6, "c02_level": 1413, "timestamp" :1475600498 }​""")
).toDF("id", "json")
Which I want to save as json - without a nested json string in it but a 'raw' one instead.
When I
df.write.json("path")
It saves my json column as string:
{"id":0,"json":"{​\"device_id\": 0, \"device_type\": \"sensor-ipad\", \"ip\": \"68.161.225.1\", \"cca3\": \"USA\", \"cn\": \"United States\", \"temp\": 25, \"signal\": 23, \"battery_level\": 8, \"c02_level\": 917, \"timestamp\" :1475600496 }​"}
And what I need is:
{"id": 0,"json": {"device_id": 0,"device_type": "sensor-ipad","ip": "68.161.225.1","cca3": "USA","cn": "United States","temp": 25,"signal": 23,"battery_level": 8,"c02_level": 917,"timestamp": 1475600496}}
How can I achieve it? Please not that the structure of json could be different for each row, it can contain additional fields.
You can use from_json function to get the json string data as a new column
// get schema of the json data
// You can also define your own schema
import org.apache.spark.sql.functions._
val json_schema = spark.read.json(df.select("json").as[String]).schema
val resultDf = df.withColumn("json", from_json($"json", json_schema))
Output:
{"id":0,"json":{"battery_level":8,"c02_level":917,"cca3":"USA","cn":"United States","device_id":0,"device_type":"sensor-ipad","ip":"68.161.225.1","signal":23,"temp":25,"timestamp":1475600496}}
{"id":1,"json":{"battery_level":6,"c02_level":1413,"cca3":"NOR","cn":"Norway","device_id":1,"device_type":"sensor-igauge","ip":"213.161.254.1","signal":18,"temp":30,"timestamp":1475600498}}

plotting graph in sapui5 with pandas dataframe

As pandas supports dataframe to json conversion and the dataframe can be converted to a json data as below: (1) and 2) are just for references nothing to do with sapui5,
1) for eg:
import pandas as pd
df = pd.DataFrame([['madrid', 10], ['venice', 20],['milan',40],['las vegas',35]],columns=['city', 'temp'])
df.to_json(orient="records")
gives:
[{"city":"madrid","temp":10},{"city":"venice","temp":20},{"city":"milan","temp":40},{"city":"las vegas","temp":35}]
and
df.to_json(orient="split")
gives:
{"columns":["city","temp"],"index":[0,1,2,3],"data":[["madrid",10],["venice",20],["milan",40],["las vegas",35]]}
As we have json data , this data could be used as input to plot properties.
2)for the same json data I have created an API (running on localhost):
http://127.0.0.1:****/graph
API using in flask:(just for refernce)
from flask import Flask
import pandas as pd
app=Flask(__name__)
#app.route('/graph')
def plot():
df=pd.DataFrame([['madrid', 10], ['venice', 20], ['milan', 40], ['las vegas', 35]],
columns=['city', 'temp'])
jsondata=df.to_json(orient='records')
return jsondata
if __name__=='__main__':
app.run()
postman result:
[
{
"city": "madrid",
"temp": 10
},
{
"city": "venice",
"temp": 20
},
{
"city": "milan",
"temp": 40
},
{
"city": "las vegas",
"temp": 35
}
]
3)How can I make use of this sample api to fetch data and then plot a sample graph for {city vs temp} using sapui5 ??
looking for an example to do so, (or) any help on how to make use of api's in sapui5 ?

Error: At least one schedule rule should be specified in schedules in the input JSON for API

I created an auto-scaling in the Bluemix UI saved it and then retrieved the policy using cf env. The policy was:
{
"policyState": "ENABLED",
"policyId": "",
"instanceMinCount": 2,
"instanceMaxCount": 5,
"policyTriggers": [
{
"metricType": "Memory",
"statWindow": 300,
"breachDuration": 600,
"lowerThreshold": 30,
"upperThreshold": 80,
"instanceStepCountDown": 1,
"instanceStepCountUp": 1,
"stepDownCoolDownSecs": 600,
"stepUpCoolDownSecs": 600
}
],
"schedules": {
"timezone": "(GMT +01:00) Africa/Algiers",
"recurringSchedule": null,
"specificDate": null
}
}
I'm then trying to apply the policy from an IBM devops deploy stage:
curl https://ScalingAPI.ng.bluemix.net/v1/autoscaler/apps/xxxx/policy -X 'PUT' \
-H 'Content-Type:application/json' \
-H 'Accept:application/json' \
-H 'Authorization:Bearer *****' \
--data-binary #./autoscaling_policy.json \
-s -o response.txt -w '%{http_code}\n'
The response:
{"error" : "CWSCV6003E: Input JSON strings format error: At least one schedule rule should be specified in schedules in the input JSON for API: Create/Update Policy for App xxxxx."}
The workaround was to remove the schedules element:
{
"policyState": "ENABLED",
"policyId": "",
"instanceMinCount": 2,
"instanceMaxCount": 5,
"policyTriggers": [
{
"metricType": "Memory",
"statWindow": 300,
"breachDuration": 600,
"lowerThreshold": 30,
"upperThreshold": 80,
"instanceStepCountDown": 1,
"instanceStepCountUp": 1,
"stepDownCoolDownSecs": 600,
"stepUpCoolDownSecs": 600
}
]
}
Question: Why did the UI not complaining about the schedule and allow me to export an invalid schedule that the API call did not like?

Sending Sensu data to influx db fails

I have been trying to send Data from Sensu to Influx DB.
I created DB for Sensu, and also updated to listen on port 8090 in my case.
User login looks fine on influxdb.
I configured almost everything similar to this link
https://libraries.io/github/nohtyp/sensu-influxdb
I am not getting any success, and not seeing any data in the database ..
Anyone tried this ?
You can also use the custom script in case default configuration is not working. it gives the options to write only the data you want to save, before running the script, install InfluxDBClient (sudo apt-get install python-influxdb)
from influxdb import InfluxDBClient
import fileinput
import json
import string
import datetime
json_body = " "
for line in fileinput.input():
json_body = json_body + string.replace(line, '\n', ' ')
json_body = json.loads(json_body)
alert_in_ip = str(json_body["client"]["name"])
alert_in_ip = 'ip-' + string.replace(alert_in_ip, '.', '-')
alert_type = json_body["check"]["name"]
status = str(json_body['check']['status'])
time_stamp =(datetime.datetime.fromtimestamp(int(json_body["timestamp"])).strftime('%Y-%m-%d %H:%M:%S'))
json_body = [{ "measurement": alert_type,
"tags": {
"host": alert_in_ip
},
"time": time_stamp,
"fields": {
"value": int(status)
}
}]
client = InfluxDBClient('localhost', 8086, 'root', 'root', 'sensu')
client.write_points(json_body)
And call the above script from your handler.
For example:
"sendinflux":{
"type": "pipe",
"command": "echo $(cat) | /usr/bin/python /home/ubuntu/save_to_influx.py",
"severites": ["critical", "unknown"]
}
Hope it helps!!