How to construct image url from Axomic OpenAsset REST API - rest

I'd like to get the URL for an image that is stored in our OpenAsset server. I can curl a request to get the file's information:
curl -u myUsername:myPassword -X GET http://our.IP.address//REST/1/Files/11 | json_pp
and I can reverse engineer the url that their front end uses to show me an image:
our.IP.address/Serve/DirectImage/imageId.7235-defaultImageSizeId.1
But trying to do some kind of string format to make the url feels hacky. E.g.:
"our.IP.address/Serve/DirectImage/imageId.{}-defaultImageSizeId.1".format(theImageID)
Is there a way get a url directly from the rest request? Is this the correct way to do it?
The OpenAsset REST API is in beta still, so its documentation is surprisingly good given that fact.

starting with : curl -u username:password -X GET http://my.IP.add.ress//REST/1/Files/
{
"copyright_holder_id" : "0",
"download_count" : "1",
"original_filename" : "C990705_Colleges_011.tif",
"photographer_id" : "0",
"contains_video" : "0",
"md5_now" : "",
"category_id" : "1",
"caption" : "",
"md5_at_upload" : "90d661ec1...06b71",
"id" : "11",
"project_id" : "854",
"click_count" : "2",
"rotation_since_upload" : "0",
"alternate_store_id" : "0",
"duration" : "0",
"description" : "",
"created" : "0",
"filename" : "C990705_N1.tif",
"uploaded" : "20101202062201",
"contains_audio" : "0",
"user_id" : "12",
"access_level" : "2",
"rank" : "5"
}
Which is the information about the original file as it was uploaded. To get to a downloadable file you need to go to the sizes route.
curl -u username:password -X GET http://my.IP.add.ress//REST/1/Files/11/Sizes | json_pp
Which gives a list of the possible sizes and formats.
[
{
"unc_root" : "//SYD-OA001/openasset/",
"width" : "1383",
"watermarked" : "0",
"relative_path" : "Projects/C990705/C990705_N1_tif/C990705_N1_medium.jpg",
"colourspace" : "RGB",
"y_resolution" : "150",
"height" : "666",
"http_root" : "/Images/",
"filesize" : "106363",
"x_resolution" : "150",
"recreate" : "0",
"id" : "8",
"quality" : "0",
"file_format" : "jpg"
},
{
"width" : "1383",
"unc_root" : "//SYD-OA001/openasset/",
"watermarked" : null,
"relative_path" : "Projects/C990705/C990705_N1.tif",
"colourspace" : "CMYK",
"y_resolution" : "300",
"height" : "666",
"x_resolution" : "300",
"filesize" : "3697734",
"http_root" : "/Images/",
"id" : "1",
"quality" : "0",
"file_format" : "tif"
}
]
To get to the file you want you'll need to search this list for the right size and format, and then concatinate the unc_root and the relative_path.

Related

Debezium mongodb kafka connector not producing some of records in topic as it is in mongodb

In my mongodb there i have this data
mongo01:PRIMARY> db.col.find({"_id" : ObjectId("5d8777f188fef5555b")})
{ "_id" : ObjectId("5d8777f188fef5555b"), "attachments" : [ { "name" : "Je", "src" : "https://google.co", "type" : "image/png" } ], "tags" : [ 51, 52 ], "last_comment" : [ ], "hashtags" : [ "Je" ], "badges" : [ ], "feed_id" : "1", "company_id" : 1, "message" : "aJsm9LtK", "group_id" : "106", "feed_type" : "post", "thumbnail" : "", "group_tag" : false, "like_count" : 0, "clap_count" : 0, "comment_count" : 0, "created_by" : 520, "created_at" : "1469577278628", "updated_at" : "1469577278628", "status" : 1, "__v" : 0 }
mongo01:PRIMARY> db.col.find({"_id" : ObjectId("5d285b4554e3b584bf97759")})
{ "_id" : ObjectId("5d285b4554e3b584bf97759"), "attachments" : [ ], "tags" : [ ], "last_comment" : [ ], "company_id" : 1, "group_id" : "00e35289", "feed_type" : "post", "group_tag" : false, "status" : 1, "feed_id" : "3dc44", "thumbnail" : "{}", "message" : "s2np1HYrPuFF", "created_by" : 1, "html_content" : "", "created_at" : "144687057949", "updated_at" : "144687057949", "like_count" : 0, "clap_count" : 0, "comment_count" : 0, "__v" : 0, "badges" : [ ], "hashtags" : [ ] }
I am using this debezium mongodb connector in order to get the mongodb data in kafka topic.
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json"
http://localhost:8083/connectors/ -d '{
"name": "mongo_connector-4",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"mongodb.hosts": "mongo01/localhost:27017",
"mongodb.name": "mongo_1",
"collection.whitelist": "data.col",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms" : "unwrap",
"transforms.unwrap.type" : "io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope",
"transforms.unwrap.drop.tombstones" : "false",
"transforms.unwrap.delete.handling.mode" : "drop",
"transforms.unwrap.operation.header" : "true",
"errors.tolerance" : "all",
"snapshot.delay.ms":"120000",
"poll.interval.ms":"3000",
"heartbeat.interval.ms":"90000"
}
}'
now while printing the topic in ksql i am getting that for some records data came with all columns(as it was in mongodb) while for some records some columns
are missing.
ksql> print 'mongo_1.data.col' from beginning;
Format:JSON
{"ROWTIME":1571148520736,"ROWKEY":"{\"id\":\"5d8777f188fef5555b\"}","attachments":[{"name":"Je","src":"https://google.co","type":"image/png"}],"tags":[51,52],"last_comment":[],"hashtags":[],"badges":[],"feed_id":"1","company_id":1,"message":"aJsm9LtK","group_id":"106","feed_type":"post","thumbnail":"","group_tag":false,"like_count":0,"clap_count":0,"comment_count":0,"created_by":520,"created_at":"1469577278628","updated_at":"1469577278628","status":1,"__v":0,"id":"5d8777f188fef5555b"}
{"ROWTIME":1571148520736,"ROWKEY":"{\"id\":\"5d285b4554e3b584bf97759\"}","badges":[],"hashtags":[],"id":"5d285b4554e3b584bf97759"}
Why this is happening and how to resolve this issue?
PS: the only difference i found that both records have different order of columns.
While searching about this issue only close thing i found here https://github.com/hpgrahsl/kafka-connect-mongodb
something they are saying about post-processing and redacting fields which have sensitive data. But as you can see both mine records are similar and have no sensitive data(by sensitive data i mean encrypted data, maybe they meant something else).
Are not the missing values after updates? Don't forget that MongoDB connector provides patch for updates not after - https://debezium.io/documentation/reference/0.10/connectors/mongodb.html#change-events-value
If you need to construct full format after in case of MongoDB you need to introduce a Kafka Streams pipeline that would store the event after insert into a persistent store and then merge the patch with the original insert to create the final event.

Need to count field based on value in mongodb

We have mongo 3.2. we have collection name test5.
It has many field and array ( undaries.couies.ZIPCodes.status )
Status field has few values like Add1 & Add2
I want to take COUNT based on status=ADD1 or ADD2
"**undaries**" : [
{
"**couies**" : [
{
"**ZIPCodes**" : [
{
"ZIPCode" : "60349",
"city" : "Test",
"household" : "Test2",
"accounts" : "0",
"SD" : "Y",
"**status**" : "Add1",
"lastUpdateDate" : "2017-01-24T09:39:56.417Z",
"lastUpdateBy" : "Test"
},
{
"ZIPCode" : "60234",
"city" : "Test",
"household" : "test1",
"accounts" : "0",
"SD" : "Y",
"status" : "Add2",
"lastUpdateDate" : "2017-01-24T09:39:56.417Z",
"lastUpdateBy" : "Test"
},
{
"ZIPCode" : "60235",
"city" : "Test",
"household" : "test1",
"accounts" : "0",
"SD" : "Y",
"status" : "Add1",
"lastUpdateDate" : "2017-01-24T09:39:56.417Z",
"lastUpdateBy" : "Test"
}................
How to get total count of status based on value.
Thanks & Regards.
You may use the count() method and the $in operator
db.yourCollection.count({"undaries.couies.ZIPCodes.status":{$in : ["Add1", "Add2"]}})
count() is shorthand for .find({query}).count()

How to insert data into druid via tranquility

By following tutorial at http://druid.io/docs/latest/tutorials/tutorial-loading-streaming-data.html , I was able to insert data into druid via Kafka console
Kafka console
The spec file looks as following
examples/indexing/wikipedia.spec
[
{
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
"dimensionExclusions" : [],
"spatialDimensions" : []
}
}
},
"metricsSpec" : [{
"type" : "count",
"name" : "count"
}, {
"type" : "doubleSum",
"name" : "added",
"fieldName" : "added"
}, {
"type" : "doubleSum",
"name" : "deleted",
"fieldName" : "deleted"
}, {
"type" : "doubleSum",
"name" : "delta",
"fieldName" : "delta"
}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "NONE"
}
},
"ioConfig" : {
"type" : "realtime",
"firehose": {
"type": "kafka-0.8",
"consumerProps": {
"zookeeper.connect": "localhost:2181",
"zookeeper.connection.timeout.ms" : "15000",
"zookeeper.session.timeout.ms" : "15000",
"zookeeper.sync.time.ms" : "5000",
"group.id": "druid-example",
"fetch.message.max.bytes" : "1048586",
"auto.offset.reset": "largest",
"auto.commit.enable": "false"
},
"feed": "wikipedia"
},
"plumber": {
"type": "realtime"
}
},
"tuningConfig": {
"type" : "realtime",
"maxRowsInMemory": 500000,
"intermediatePersistPeriod": "PT10m",
"windowPeriod": "PT10m",
"basePersistDirectory": "\/tmp\/realtime\/basePersist",
"rejectionPolicy": {
"type": "messageTime"
}
}
}
]
I start realtime via
java -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=examples/indexing/wikipedia.spec -classpath config/_common:config/realtime:lib/* io.druid.cli.Main server realtime
In Kafka console, I paste and enter the following
{"timestamp": "2013-08-10T01:02:33Z", "page": "Good Bye", "language" : "en", "user" : "catty", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}
Then I tend to perform query by creating select.json and run curl -X POST 'http://localhost:8084/druid/v2/?pretty' -H 'content-type: application/json' -d #select.json
select.json
{
"queryType": "select",
"dataSource": "wikipedia",
"dimensions":[],
"metrics":[],
"granularity": "all",
"intervals": [
"2000-01-01/2020-01-02"
],
"filter" : {"type":"and",
"fields" : [
{ "type": "selector", "dimension": "user", "value": "catty" }
]
},
"pagingSpec":{"pagingIdentifiers": {}, "threshold":500}
}
I was able to get the following result.
[ {
"timestamp" : "2013-08-10T01:02:33.000Z",
"result" : {
"pagingIdentifiers" : {
"wikipedia_2013-08-10T00:00:00.000Z_2013-08-11T00:00:00.000Z_2013-08-10T00:00:00.000Z" : 0
},
"events" : [ {
"segmentId" : "wikipedia_2013-08-10T00:00:00.000Z_2013-08-11T00:00:00.000Z_2013-08-10T00:00:00.000Z",
"offset" : 0,
"event" : {
"timestamp" : "2013-08-10T01:02:33.000Z",
"continent" : "North America",
"robot" : "false",
"country" : "United States",
"city" : "San Francisco",
"newPage" : "true",
"unpatrolled" : "true",
"namespace" : "article",
"anonymous" : "false",
"language" : "en",
"page" : "Good Bye",
"region" : "Bay Area",
"user" : "catty",
"deleted" : 200.0,
"added" : 57.0,
"count" : 1,
"delta" : -143.0
}
} ]
}
} ]
It seem that I had setup Druid correctly.
Now, I would like to insert data via HTTP endpoint. According to How realtime data input to Druid?, it seems like recommended way is to use tranquility
tranquility
I have indexing service started via
java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/overlord:lib/*: io.druid.cli.Main server overlord
conf/server.json looks like
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
"dimensionExclusions" : [],
"spatialDimensions" : []
}
}
},
"metricsSpec" : [{
"type" : "count",
"name" : "count"
}, {
"type" : "doubleSum",
"name" : "added",
"fieldName" : "added"
}, {
"type" : "doubleSum",
"name" : "deleted",
"fieldName" : "deleted"
}, {
"type" : "doubleSum",
"name" : "delta",
"fieldName" : "delta"
}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "NONE"
}
},
"tuningConfig" : {
"windowPeriod" : "PT10M",
"type" : "realtime",
"intermediatePersistPeriod" : "PT10M",
"maxRowsInMemory" : "100000"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost",
"http.port" : "8200",
"http.threads" : "8"
}
}
Then, I start the server using
bin/tranquility server -configFile conf/server.json
I perform post to http://xx.xxx.xxx.xxx:8200/v1/post/wikipedia, with content-type equals application/json
{"timestamp": "2013-08-10T01:02:33Z", "page": "Selamat Pagi", "language" : "en", "user" : "catty", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}
I get the the following respond
{"result":{"received":1,"sent":0}}
It seems that tranquility has received our data, but failed to send it to druid!
I try to run curl -X POST 'http://localhost:8084/druid/v2/?pretty' -H 'content-type: application/json' -d #select.json, but doesn't get the output I inserted via tranquility.
Any idea why? Thanks.
This generally happens when the data you send is out of the window period. If you are inserting data manually, give the exact current timestamp (UTC) in milliseconds. Else it can be easily done if you are using any script to generate data. Make sure it is UTC current time.
It is extremely difficult to setup druid to work properly with real-time data insertion.
The best bet I found is, use https://github.com/implydata . Imply is a set of wrappers around druid, to make it easy to use.
However, the real-time insertion in imply is not perfect either. I had experiment OutOfMemoryException, after inserting 30 millions items via real-time. This will caused data loss on previous inserted 30 millions rows.
The detailed regarding data loss can be found here : https://groups.google.com/forum/#!topic/imply-user-group/95xpYojxiOg
An issue ticket has been filed : https://github.com/implydata/distribution/issues/8
Druid streaming windowPeriod is very short (10 minutes). Outside this period, your event will be ignored.
As you got {"result":{"received":1,"sent":0}}, your worker threads are working fine. Tranquility decides what data is sent to the druid based on the timestamp associated with the data.
This period is decided by the "windowPeriod" configuration. So if your type is realtime ("type":"realtime") and window period is PT10M ("windowPeriod" : "PT10M"), tranquility will send any data between t-10, t+10 and not send anything outside this period.
I disagree with the insertion efficiency problems, we have been sending 3million rows every 15 minutes since June 2016 and has been running beautifully. Of course, we have a stronger infrastructure deemed for the scale.
Another reason for not inserting, is out memory on the coordinador/overloard are running

How to specify 1 to many and many to 1 relations in json-ld document?

How can I specify one to many and many to one relations in json-ld.
For example :
{
"#context" : {
"#vocab" : "http://www.schema.org/",
"#id" : "http://www.example.com/users/Joe",
"name" : "name",
"dob" : "birthDate",
"age" : {
"#id" : "http://www.example.com/users/Joe#age",
"#type" : "Number"
}
"knows" : ["http://www.example.com/users/Jill", "http://www.example.com/users/James"]
},
"name" : "Joe",
"age" : "24",
"dob" : "12-Jun-2013"
}
this doesn't parse in json-ld playground.
What is the valid and best way to specify relations like this either in json-ld or using Hydra?
You need to be carful what you put into the context and what you put into the body of the document. Simply speaking the context defines the mappings to URLs while the body contains the actual data. Your example should thus look something like this:
{
"#context" : {
"#vocab" : "http://www.schema.org/",
"dob" : "birthDate",
"age" : {
"#id" : "http://www.example.com/users/Joe#age",
"#type" : "Number"
},
"knows": { "#type": "#id" }
},
"#id" : "http://www.example.com/users/Joe",
"name" : "Joe",
"age" : "24",
"dob" : "12-Jun-2013",
"knows" : [
"http://www.example.com/users/Jill",
"http://www.example.com/users/James"
]
}

MongoTemplate find not returning results

MongoTemplate not returning results.
Json:
{
"_id" : ObjectId("51acf6ab0d5d46077ae12b2a"),
"docType" : "row",
"value" : {
"entity" : {
"#id" : "1111",
"#version" : "3434",
"name" : "XY XY",
"listId" : 28,
"listCode" : "BS",
"entityType" : "03",
"createdDate" : "09/12/2006",
"lastUpdateDate" : "04/20/2011",
"source" : "CCC",
"address1" : "XXXXXXXX",
"city" : "CITY",
"country" : "CO",
"countryName" : "COUNTRY NAME"
}
},
"createdDate" : ISODate("2013-06-03T20:03:55.127Z"),
"createdBy" : "Test"
}
Query:
queryy = new Query(Criteria.where("docType").is("row").andOperator(Criteria.where("value.entity.city").is("CITY")));
Above query doesn't return any result back.
Am I missing something? Please help.
Try this:
query = new Query(Criteria.where("docType").is("row")
.and("value.entity.city").is("CITY"));
You're code doesn't need the andOperator function (as that translates to $and). It just needs multiple criteria using and.
And, you can use query.toString() to see what the output would be of the query (and compare it to what you might have used in the MongoDB shell.