The scenario is i want to setup a stock quote server and save the quote data into druid.
my requirement is to get the latest price of all the stock by a query.
But i notice that the query interface of druid such as time series only work on metrics filed ,not the dimension fields.
so i consider to make the price filed one of the metrics,but no need to aggregated.
how can i do it?
Any suggestions?
here is my tranquility config file.
{
"dataSources" : {
"stock-index-topic" : {
"spec" : {
"dataSchema" : {
"dataSource" : "stock-index-topic",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : ["code","name","acronym","market","tradeVolume","totalValueTraded","preClosePx","openPrice","highPrice","lowPrice","latestPrice","closePx"],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "HOUR",
"queryGranularity" : "SECOND",
},
"metricsSpec" : [
{
"name" : "firstPrice",
"type" : "doubleFirst",
"fieldName" : "tradePrice"
},{
"name" : "lastPrice",
"type" : "doubleLast",
"fieldName" : "tradePrice"
}, {
"name" : "minPrice",
"type" : "doubleMin",
"fieldName" : "tradePrice"
}, {
"name" : "maxPrice",
"type" : "doubleMax",
"fieldName" : "tradePrice"
}
]
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1",
"topicPattern" : "stock-index-topic"
}
}
},
"properties" : {
"zookeeper.connect" : "localhost:2181",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"commit.periodMillis" : "15000",
"consumer.numThreads" : "2",
"kafka.zookeeper.connect" : "localhost:2181",
"kafka.group.id" : "tranquility-kafka"
}
}
I think you should make [latest_price] as new numeric dimension, it would be much better from performance and querying standpoint considering how druid works.
Metrics and meant to perform aggregation functions as core so won't be helpful in your use case.
Related
I am a newbie in druid. Trying to load a very simple data in JSON format to druid. The data contains just one dimension, one metric and timestamp. I have been successfully able to load data to druid for a different dataset but somehow I am getting errors for this dataset.
This is my index file :
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "datatemplate",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"Loc"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "Timestamp"
}
}
},
"metricsSpec" : [{"name" : "Qty","type" : "doubleSum","fieldName" : "Qty"}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2016-01-01T00:00:00Z/2030-06-30T00:00:00Z"],
"rollup" : true
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "datatemplate/",
"filter" : "datatemplate.json"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 10000000,
"maxRowsInMemory" : 40000,
"forceExtendableShardSpecs" : true
}
}
}
Also here is my dataset in JSON format:
{"Loc": "A", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "A", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "B", "Qty": "2", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "B", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
I need to upload data to an existing model. This has to be done on daily basis. I guess some changes needs to be done in the index file and i am not able to figure out. I tried pushing the data with the same model name but the parent data was removed.
Any help would be appreciated.
Here is the ingestion json file :
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "mksales",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : ["Address",
"City",
"Contract Name",
"Contract Sub Type",
"Contract Type",
"Customer Name",
"Domain",
"Nation",
"Contract Start End Date",
"Zip",
"Sales Rep Name"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "time"
}
}
},
"metricsSpec" : [
{ "type" : "count", "name" : "count", "type" : "count" },
{"name" : "Price","type" : "doubleSum","fieldName" : "Price"},
{"name" : "Sales","type" : "doubleSum","fieldName" : "Sales"},
{"name" : "Units","type" : "longSum","fieldName" : "Units"}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2000-12-01T00:00:00Z/2030-06-30T00:00:00Z"],
"rollup" : true
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "mksales/",
"filter" : "mksales.json"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 10000000,
"maxRowsInMemory" : 40000,
"forceExtendableShardSpecs" : true
}
}
}
There are 2 ways using which you can append/update the data to an existing segment.
Reindexing and Delta Ingestion
You need to reindex your data every time new data comes in a particular segment.(In your case its day) For the reindexing you need to give all the files having data for that day.
For Delta Ingestion you need to use inputSpec type="multi"
You can refer the documentation link for more details - http://druid.io/docs/latest/ingestion/update-existing-data.html
i am trying to load test MongoDB using Jmeter, i am using JSR223Sampler using Groovy, i am able to connect but for some reason insert part is not working
i need to insert below :
"cart" : {
"schema" : "http://dell.com/dcp/schemas/cart/3.0.0#",
"_id" : "s5ChQonvAUGKM6s2Yq8Z31",
"createdOn" : {
"DateTime" : ISODate("2018-03-07T06:54:01.242Z"),
"Ticks" : NumberLong(636560222412422269),
"Offset" : 330
},
"lastModifiedOn" : {
"DateTime" : ISODate("2018-03-07T06:54:01.245Z"),
"Ticks" : NumberLong(636560222412452266),
"Offset" : 330
},
"expiresOn" : {
"DateTime" : ISODate("2019-04-10T08:21:43.984Z"),
"Ticks" : NumberLong(636904813039840000),
"Offset" : 0
},
"commerceContext" : {
"region" : "us",
"country" : "US",
"language" : "en",
"currency" : "USD",
"segment" : "bsd",
"customerSet" : "rc1005388",
"accessGroup" : "DSA",
"companyNumber" : "08",
"businessUnitId" : "11",
"classCode" : "string",
"sourceApplicationName" : "OLRGCOMM"
},
"items" : [],
"shipments" : [],
"price" : {
"couponCodes" : []
},
"references" : [
{
"referenceId" : "8TOOOrdEJUeiGPTqWA226Q",
"referenceType" : "New Cart",
"referencedOn" : {
"DateTime" : ISODate("2018-03-07T06:54:01.239Z"),
"Ticks" : NumberLong(636560222412392112),
"Offset" : 330
},
"referenceCreatedBy" : "DCQO",
"targetSystem" : "DSP",
"target" : "string"
}
],
"validation" : {},
"properties" : {}
}
})
First of all you need to get MongoDB connection from the MongoDB Source Config, it can be done as follows:
import com.mongodb.DB;
import org.apache.jmeter.protocol.mongodb.config.MongoDBHolder;
DB db = MongoDBHolder.getDBFromSource("mongodb source name", "database name");
Next you just need to call DBCollection.insert() function like:
db.getCollection('your collection name').insert(your DBObject payload here)
More information: How to Load Test MongoDB with JMeter
i'm pushing kafka stream into druid via tranquility.
kafka version is 0.9.1 , tranquility is 0.8 , druid is 0.10.
tranquility is started fine when no message produced,but when producer sending message i will get JsonMappingException like this:
ava.lang.IllegalArgumentException: Can not deserialize instance of java.util.ArrayList out of VALUE_STRING token
at [Source: N/A; line: -1, column: -1]
at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774) ~[com.fasterxml.jackson.core.jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[com.fasterxml.jackson.core.jackson-databind-2.4.6.jar:2.4.6]
at com.metamx.tranquility.druid.DruidBeams$.makeFireDepartment(DruidBeams.scala:406) ~[io.druid.tranquility-core-0.8.0.jar:0.8.0]
at com.metamx.tranquility.druid.DruidBeams$.fromConfigInternal(DruidBeams.scala:291) ~[io.druid.tranquility-core-0.8.0.jar:0.8.0]
at com.metamx.tranquility.druid.DruidBeams$.fromConfig(DruidBeams.scala:199) ~[io.druid.tranquility-core-0.8.0.jar:0.8.0]
at com.metamx.tranquility.kafka.KafkaBeamUtils$.createTranquilizer(KafkaBeamUtils.scala:40) ~[io.druid.tranquility-kafka-0.8.0.jar:0.8.0]
at com.metamx.tranquility.kafka.KafkaBeamUtils.createTranquilizer(KafkaBeamUtils.scala) ~[io.druid.tranquility-kafka-0.8.0.jar:0.8.0]
at com.metamx.tranquility.kafka.writer.TranquilityEventWriter.<init>(TranquilityEventWriter.java:64) ~[io.druid.tranquility-kafka-0.8.0.jar:0.8.0]
at com.metamx.tranquility.kafka.writer.WriterController.createWriter(WriterController.java:171) ~[io.druid.tranquility-kafka-0.8.0.jar:0.8.0]
at com.metamx.tranquility.kafka.writer.WriterController.getWriter(WriterController.java:98) ~[io.druid.tranquility-kafka-0.8.0.jar:0.8.0]
at com.metamx.tranquility.kafka.KafkaConsumer$2.run(KafkaConsumer.java:231) ~[io.druid.tranquility-kafka-0.8.0.jar:0.8.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_67]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_67]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
and my kafka.json is :
{
"dataSources" : {
"stock-index-topic" : {
"spec" : {
"dataSchema" : {
"dataSource" : "stock-index-topic",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : ["code","name","acronym","market","tradeVolume","totalValueTraded","preClosePx","openPrice","highPrice","lowPrice","tradePrice","closePx","timestamp"],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "none",
"intervals":"no"
},
"metricsSpec" : [
{
"name" : "firstPrice",
"type" : "doubleFirst",
"fieldName" : "tradePrice"
},{
"name" : "lastPrice",
"type" : "doubleLast",
"fieldName" : "tradePrice"
}, {
"name" : "minPrice",
"type" : "doubleMin",
"fieldName" : "tradePrice"
}, {
"name" : "maxPrice",
"type" : "doubleMax",
"fieldName" : "tradePrice"
}
]
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1",
"topicPattern" : "stock-index-topic"
}
}
},
"properties" : {
"zookeeper.connect" : "localhost:2181",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"commit.periodMillis" : "15000",
"consumer.numThreads" : "2",
"kafka.zookeeper.connect" : "localhost:2181",
"kafka.group.id" : "tranquility-kafka"
}
}
i use the kafka-console-consumer to get the data ,it looks like
{"code": "399982", "name": "500等权", "acronym": "500DQ", "market": "102", "tradeVolume": 0, "totalValueTraded": 0.0, "preClosePx": 0.0, "openPrice": 0.0, "highPrice": 0.0, "lowPrice": 0.0, "tradePrice": 7184.7142, "closePx": 0.0, "timestamp": "2017-05-16T09:06:39.000+08:00"}
Any idea why? Thanks.
"metricsSpec" : [
{
"name" : "firstPrice",
"type" : "doubleFirst",
"fieldName" : "tradePrice"
},{
"name" : "lastPrice",
"type" : "doubleLast",
"fieldName" : "tradePrice"
}, {
"name" : "minPrice",
"type" : "doubleMin",
"fieldName" : "tradePrice"
}, {
"name" : "maxPrice",
"type" : "doubleMax",
"fieldName" : "tradePrice"
}
]
},
it's wrong.The document said :
First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
So,the issue is solved.
Below is a document from my database:
{
"_id" : ObjectId("58635ac32c9592064471cf5b"),
"agency_code" : "v5global",
"client_code" : "whirlpool",
"project_code" : "whirlpool",
"date" : {
"datetime" : 1464739200000.0,
"date" : 1464739200000.0,
"datejs" : ISODate("2016-06-01T00:00:00.000+0000"),
"datetimejs" : ISODate("2016-06-01T00:00:00.000+0000"),
"month" : NumberInt(5),
"year" : NumberInt(2016),
"day" : NumberInt(1)
},
"user" : {
"promoter_id" : NumberInt(19),
"promoter_name" : "Hira Singh Pawar",
"empcode" : "519230"
},
"counter" : {
"store_id" : NumberInt(4),
"store_name" : "Maya Sales ",
"chain_type" : "BS",
"address" : "6 Filamingo Market , Hissar",
"city" : "Hissar",
"state" : "Faridabad",
"region" : "North",
"sap_code" : "N_Far_91103948_1",
"unique_tp_code" : "91103948",
"location" : "6"
},
"insertedon" : {
"date" : 1464739200000.0,
"datejs" : ISODate("2016-06-01T00:00:00.000+0000"),
"datetimejs" : ISODate("2016-06-01T00:00:00.000+0000")
},
"insertedby" : "akshay",
"manager" : {
"manager_id" : NumberInt(5943),
"manager_name" : "Sonu Singh"
},
"type" : "display",
"data" : {
"brand" : "whirlpool",
"sku" : "60",
"model_name" : "Icemagic Fresh",
"sub_cat_name" : "DC",
"cat_name" : "Refrigerator",
"value" : NumberInt(1)
},
"IsDeleted" : false
}
I want to apply aggregation where I have to group it with city, state and region and if that counter has sold refrigerator I need that details in my result e.g if a counter has sold 2 refrigerators of whirlpool company then I want that to reflect in my result.
A counter can also sell other things like washing machines etc. So if they have sold 2 washing machines I want a result with { washingMachine: 2 }.
I have tried everything and nothing seems to be working here:
db.display_mop.aggregate( // Pipeline [
// Stage 1
{ $match: { "project_code":"whirlpool" } },
// Stage 2
{
$group: {
_id: {
"userid": "$user.promoter_id",
"userName": "$user.promoter_name",
"usercode": "$user.empcode",
"storename": "$counter.store_name",
"address": "$counter.address",
"city": "$counter.city",
"state": "$counter.state",
"region": "$counter.region"
}
}
},
],
// Options
{ allowDiskUse: true }