I need to call an API to load data on an ongoing basis. API returns different properties for each event. When I create the swagger file it has properties from sample return, but in the long run there will be more properties which can be added by the source system and they will not be in swagger file.
Is there any way to recreate swagger file before data load dynamically with additional properties?
Swagger file is generated by Informatica Cloud based on a sample return while testing the connection.
Properties list has a different number of entries based on event type.
swagger file:
{"swagger" : "2.0",
"info" : {
"description" : null,
"version" : "1.0.0",
"title" : null,
"termsOfService" : null,
"contact" : null,
"license" : null
},
"host" : "<host>.com",
"basePath" : "/api",
"schemes" : [ "https" ],
"paths" : {
"/2.0" : {
"post" : {
"tags" : [ "events" ],
"summary" : null,
"description" : null,
"operationId" : "events",
"produces" : [ "application/json" ],
"consumes" : [ "application/json" ],
"parameters" : [ {
"name" : "script",
"in" : "query",
"description" : null,
"required" : false,
"type" : "string"
}, {
"name" : "Authorization",
"in" : "header",
"description" : null,
"required" : false,
"type" : "string"
} ],
"responses" : {
"200" : {
"description" : "successful operation",
"schema" : {
"$ref" : "#/definitions/events"
}
}
}
}
}
},
"definitions" : {
"events##properties" : {
"properties" : {
"$app_build_number" : {
"type" : "string"
},
"$app_version_string" : {
"type" : "string"
},
"$carrier" : {
"type" : "string"
},
"$lib_version" : {
"type" : "string"
},
"$manufacturer" : {
"type" : "string"
},
"$model" : {
"type" : "string"
},
"$os" : {
"type" : "string"
},
"$os_version" : {
"type" : "string"
},
"$radio" : {
"type" : "string"
},
"$region" : {
"type" : "string"
},
"$screen_height" : {
"type" : "number",
"format" : "int32"
},
"$screen_width" : {
"type" : "number",
"format" : "int32"
},
"Home Step Enabled" : {
"type" : "string"
},
"Number Of Lifetime Logins" : {
"type" : "number",
"format" : "int32"
},
"Sessions" : {
"type" : "number",
"format" : "int32"
},
"mp_country_code" : {
"type" : "string"
},
"mp_lib" : {
"type" : "string"
}
}
},
"events" : {
"properties" : {
"name" : {
"type" : "string"
},
"distinct_id" : {
"type" : "string"
},
"labels" : {
"type" : "string"
},
"time" : {
"type" : "number",
"format" : "int64"
},
"sampling_factor" : {
"type" : "number",
"format" : "int32"
},
"dataset" : {
"type" : "string"
},
"properties" : {
"$ref" : "#/definitions/events##properties"
}
}
}
}
}
sample return:
"name": "Session",
"distinct_id": "1234567890",
"labels": [],
"time": 1520072505000,
"sampling_factor": 1,
"dataset": "$event_data_set",
"properties": {
"$app_build_number": "900",
"$app_version_string": "1.9",
"$carrier": "AT&T",
"$lib_version": "2.0.1",
"$manufacturer": "Apple",
"$model": "iPhone10,6",
"$os": "iOS",
"$os_version": "11.2.6",
"$radio": "LTE",
"$region": "Florida",
"$screen_height": 667,
"$screen_width": 375,
"Number Of Lifetime Logins": 2,
"Session Length": "00h:00m:08s",
"Sessions": 43,
"mp_country_code": "US",
"mp_lib": "swift"
}
}
Related
I have a fairly simple cloudformation template. I am trying to learn about them. I created one where I am trying to create 2 dyanmo table when I deploy the stack. But only one table gets created. Not two. I am not sure what is wrong with my syntax. Pasting the json below
"AWSTemplateFormatVersion" : "2010-09-09",
"Resources" : {
"resource1" : {
"Type" : "AWS::DynamoDB::Table",
"Properties" : {
"AttributeDefinitions" : [
{
"AttributeName" : "Name",
"AttributeType" : "S"
},
{
"AttributeName" : "Age",
"AttributeType" : "S"
}
],
"KeySchema" : [
{
"AttributeName" : "Name",
"KeyType" : "HASH"
},
{
"AttributeName" : "Age",
"KeyType" : "RANGE"
}
],
"ProvisionedThroughput" : {
"ReadCapacityUnits" : "5",
"WriteCapacityUnits" : "5"
},
"TableName" : "tablecloudformation3_1"
}
}
},
"Resources" : {
"resource2" : {
"Type" : "AWS::DynamoDB::Table",
"Properties" : {
"AttributeDefinitions" : [
{
"AttributeName" : "Name",
"AttributeType" : "S"
},
{
"AttributeName" : "Age",
"AttributeType" : "S"
}
],
"KeySchema" : [
{
"AttributeName" : "Name",
"KeyType" : "HASH"
},
{
"AttributeName" : "Age",
"KeyType" : "RANGE"
}
],
"ProvisionedThroughput" : {
"ReadCapacityUnits" : "5",
"WriteCapacityUnits" : "5"
},
"TableName" : "tablecloudformation3_2"
}
}
},
}
There are few mistakes in the template. One already pointed out by #MariaInesParnisari.
The other ones are missing open bracket and unneeded brackets in the middle.
I fixed the template and can confirm it works:
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Resources" : {
"resource1" : {
"Type" : "AWS::DynamoDB::Table",
"Properties" : {
"AttributeDefinitions" : [
{
"AttributeName" : "Name",
"AttributeType" : "S"
},
{
"AttributeName" : "Age",
"AttributeType" : "S"
}
],
"KeySchema" : [
{
"AttributeName" : "Name",
"KeyType" : "HASH"
},
{
"AttributeName" : "Age",
"KeyType" : "RANGE"
}
],
"ProvisionedThroughput" : {
"ReadCapacityUnits" : "5",
"WriteCapacityUnits" : "5"
},
"TableName" : "tablecloudformation3_1"
}
},
"resource2" : {
"Type" : "AWS::DynamoDB::Table",
"Properties" : {
"AttributeDefinitions" : [
{
"AttributeName" : "Name",
"AttributeType" : "S"
},
{
"AttributeName" : "Age",
"AttributeType" : "S"
}
],
"KeySchema" : [
{
"AttributeName" : "Name",
"KeyType" : "HASH"
},
{
"AttributeName" : "Age",
"KeyType" : "RANGE"
}
],
"ProvisionedThroughput" : {
"ReadCapacityUnits" : "5",
"WriteCapacityUnits" : "5"
},
"TableName" : "tablecloudformation3_2"
}
}
}
}
More generally, the CloudFormation Linter can help catch these template issues faster with errors like:
E0000 Duplicate found "Resources" (line 35)
I have two mongodb collections, Transactions and Users,this is transaction example
{ "_id" : ObjectId("5cdd391e1e4b8f0cb8e17d0f"), "txId" :
"6910dc01ff167d90e8fe249ce68a5149f82d099345473fab02916068570974bd",
"sender" : ObjectId("5cdbe473ca52557874005809"), "receiver" :
ObjectId("5cdd26a6b2d370061e8435d7"), "senderWalletId" :
ObjectId("5cdbe473ca5255787400580a"), "receiverWalletId" :
ObjectId("5cdd26a6b2d370061e8435d8"), "status" : "success", "type" :
"transfer", "amount" : 3000, "totalFee" : 40, "createdAt" :
ISODate("2019-05-16T10:19:10.809Z"), "updatedAt" :
ISODate("2019-05-16T10:19:10.809Z"), "__v" : 0 }
and this is a users example
{ "_id" : ObjectId("5d010a140f0c30757f59fe18"), "role" : "client",
"status" : "active", "isPhoneVerified" : true, "personalDocuments" : [
], "email" : "example.com", "firstName" : "mm", "lastName" : "mm",
"phoneNumber" : "0000000", "password" : "$2a$10$.", "pushToken" :
"cSRdsgJXAc67k-PIKwHvslINb0kAaStVzmYPqeIH5oudVTqppHFjxbGEg2B-
1Xe8P0iTH0EYB9PHbKKey", "created_at" :
ISODate("2019-06-12T14:20:04.338Z"), "updated_at" :
ISODate("2019-06-17T13:33:41.613Z"), "__v" : 0, "verifyPhoneCode" :
null }
i've indexed the Transactions documents in ES using logstash and the mongodb-input-plugin successfully, this is the mapping generated by ES for that
"puretransactionsmodified" : {
"mappings" : {
"doc" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"#version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"__v" : {
"type" : "long"
},
"amount" : {
"type" : "long"
},
"createdAt" : {
"type" : "date"
},
"host" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"log_entry" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"logdate" : {
"type" : "date"
},
"mongo_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"receiver" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"receiverWalletId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"sender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"senderWalletId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"total" : {
"type" : "long"
},
"totalFee" : {
"type" : "long"
},
"txId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"updatedAt" : {
"type" : "date"
}
}
}
}
}
}
when i've tried to index both the collections for some analysis using that same plugin
input {
mongodb {
uri => 'mongodb://**.**.**.***:27017/db'
placeholder_db_dir => '/home/jhon/Desktop
/userstransactions'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'transactions'
batch_size => 202
parse_method => "simple"
}
mongodb {
uri => 'mongodb://**.**.**.***:27017/db'
placeholder_db_dir => '/home/jhon/Desktop
/userstransactions'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'wallets'
batch_size => 202
parse_method => "simple"
}
}
it did not work all i had was the transactions , i want to have under the reciever and the sender fields in transaction the name of the reciever provided by user, i'm a newbie on this i've also tried mapping it myself but did not work.
I have setup druid and was able to run the tutorial at:Tutorial: Loading a file. I was also able to execute native json queries and get the results as described at : http://druid.io/docs/latest/tutorials/tutorial-query.html The druid setup is working fine.
I now want to ingest additional data from a Java program into this datasource. Is it possible to send data into druid using tranquility from a java program for a datasource created using batch load?
I tried the example program at : https://github.com/druid-io/tranquility/blob/master/core/src/test/java/com/metamx/tranquility/example/JavaExample.java
But this program just keeps running and doesn't show any output. how can druid be setup to accept data using tranquility core APIs?
Following are the ingestion specs and config file for tranquility:
wikipedia-index.json
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
]
},
"timestampSpec": {
"column": "time",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"rollup" : false
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}
}
example.json (tranquility config):
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"metricsSpec" : [
{ "type" : "count", "name" : "count" }
],
"granularitySpec" : {
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"type" : "uniform"
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : { "column": "time", "format": "iso" },
"dimensionsSpec" : {
"dimensions" : ["channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }]
}
}
}
},
"tuningConfig" : {
"type" : "realtime",
"windowPeriod" : "PT10M",
"intermediatePersistPeriod" : "PT10M",
"maxRowsInMemory" : "100000"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost"
}
}
I did not find any example on setting up a datasource on druid which accepts continuously accepts data from a java program. I don't want to use Kafka. Any pointers on this would be greatly appreciated.
You need to create the data files with the additional data first and than run the ingestion task with new fields, You can't edit the same record in druid, It overwrites to new record.
I am a newbie in druid. Trying to load a very simple data in JSON format to druid. The data contains just one dimension, one metric and timestamp. I have been successfully able to load data to druid for a different dataset but somehow I am getting errors for this dataset.
This is my index file :
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "datatemplate",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"Loc"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "Timestamp"
}
}
},
"metricsSpec" : [{"name" : "Qty","type" : "doubleSum","fieldName" : "Qty"}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2016-01-01T00:00:00Z/2030-06-30T00:00:00Z"],
"rollup" : true
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "datatemplate/",
"filter" : "datatemplate.json"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 10000000,
"maxRowsInMemory" : 40000,
"forceExtendableShardSpecs" : true
}
}
}
Also here is my dataset in JSON format:
{"Loc": "A", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "A", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "B", "Qty": "2", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "B", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
I am indexing a data stream to Elasticsearch and I cannot figure out how to normalize incoming data to make it index without error. I have a mapping type "getdatavalues" which is a meta-data query. This meta-data query can return very different looking responses but I'm not seeing the difference. The error I get:
{"index":{"_index":"ens_event-2016.03.11","_type":"getdatavalues","_id":"865800029798177_2016_03_11_03_18_12_100037","status":400,"error":"MapperParsingException[object mapping for [getdatavalues] tried to parse field [output] as object, but got EOF, has a concrete value been provided to it?]"}}
when performing:
curl -XPUT 'http://192.168.99.100:80/es/ens_event-2016.03.11/getdatavalues/865800029798177_2016_03_11_03_18_12_100037' -d '{
"type": "getDataValues",
"input": {
"deviceID": {
"IMEI": "865800029798177",
"serial-number": "64180258"
},
"handle": 644,
"exprCode": "200000010300140000080001005f00a700000000000000",
"noRollHandle": "478669308-578452",
"transactionID": 290
},
"timestamp": "2016-03-11T03:18:12.000Z",
"handle": 644,
"output": {
"noRollPubSessHandle": "478669308-578740",
"publishSessHandle": 1195,
"status": true,
"matchFilter": {
"prefix": "publicExpr.operatorDefined.commercialIdentifier.FoodSvcs.Restaurant.\"A&C Kabul Curry\".\"Rooster Street\"",
"argValues": {
"event": "InternationalEvent",
"hasEvent": "anyEvent"
}
},
"transactionID": 290,
"validFor": 50
}
}'
Here's what Elasticsearch has for the mapping:
"getdatavalues" : {
"dynamic_templates" : [ {
"strings" : {
"mapping" : {
"index" : "not_analyzed",
"type" : "string"
},
"match_mapping_type" : "string"
}
} ],
"properties" : {
"handle" : {
"type" : "long"
},
"input" : {
"properties" : {
"deviceID" : {
"properties" : {
"IMEI" : {
"type" : "string",
"index" : "not_analyzed"
},
"serial-number" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"exprCode" : {
"type" : "string",
"index" : "not_analyzed"
},
"handle" : {
"type" : "long"
},
"noRollHandle" : {
"type" : "string",
"index" : "not_analyzed"
},
"serviceVersion" : {
"type" : "string",
"index" : "not_analyzed"
},
"transactionID" : {
"type" : "long"
}
}
},
"output" : {
"properties" : {
"matchFilter" : {
"properties" : {
"argValues" : {
"properties" : {
"Interests" : {
"type" : "object"
},
"MerchantId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Queue" : {
"type" : "string",
"index" : "not_analyzed"
},
"Vibe" : {
"type" : "string",
"index" : "not_analyzed"
},
"event" : {
"properties" : {
"event" : {
"type" : "string",
"index" : "not_analyzed"
},
"hasEvent" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"hasEvent" : {
"type" : "string",
"index" : "not_analyzed"
},
"interests" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"prefix" : {
"type" : "string",
"index" : "not_analyzed"
},
"transactionID" : {
"type" : "long"
},
"validFor" : {
"type" : "long"
}
}
},
"noRollPubSessHandle" : {
"type" : "string",
"index" : "not_analyzed"
},
"publishSessHandle" : {
"type" : "long"
},
"status" : {
"type" : "boolean"
},
"transactionID" : {
"type" : "long"
},
"validFor" : {
"type" : "long"
}
}
},
"timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"type" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
Looks like the argValues object doesn't quite agree with your mapping:
"argValues": {
"event": "InternationalEvent",
"hasEvent": "anyEvent"
}
Either this:
"argValues": {
"event": {
"event": "InternationalEvent"
},
"hasEvent": "anyEvent"
}
Or this:
"argValues": {
"event": {
"event": "InternationalEvent"
"hasEvent": "anyEvent"
},
}
Would both seem to be valid.