Debezium mongodb kafka connector not producing some of records in topic as it is in mongodb - mongodb

In my mongodb there i have this data
mongo01:PRIMARY> db.col.find({"_id" : ObjectId("5d8777f188fef5555b")})
{ "_id" : ObjectId("5d8777f188fef5555b"), "attachments" : [ { "name" : "Je", "src" : "https://google.co", "type" : "image/png" } ], "tags" : [ 51, 52 ], "last_comment" : [ ], "hashtags" : [ "Je" ], "badges" : [ ], "feed_id" : "1", "company_id" : 1, "message" : "aJsm9LtK", "group_id" : "106", "feed_type" : "post", "thumbnail" : "", "group_tag" : false, "like_count" : 0, "clap_count" : 0, "comment_count" : 0, "created_by" : 520, "created_at" : "1469577278628", "updated_at" : "1469577278628", "status" : 1, "__v" : 0 }
mongo01:PRIMARY> db.col.find({"_id" : ObjectId("5d285b4554e3b584bf97759")})
{ "_id" : ObjectId("5d285b4554e3b584bf97759"), "attachments" : [ ], "tags" : [ ], "last_comment" : [ ], "company_id" : 1, "group_id" : "00e35289", "feed_type" : "post", "group_tag" : false, "status" : 1, "feed_id" : "3dc44", "thumbnail" : "{}", "message" : "s2np1HYrPuFF", "created_by" : 1, "html_content" : "", "created_at" : "144687057949", "updated_at" : "144687057949", "like_count" : 0, "clap_count" : 0, "comment_count" : 0, "__v" : 0, "badges" : [ ], "hashtags" : [ ] }
I am using this debezium mongodb connector in order to get the mongodb data in kafka topic.
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json"
http://localhost:8083/connectors/ -d '{
"name": "mongo_connector-4",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"mongodb.hosts": "mongo01/localhost:27017",
"mongodb.name": "mongo_1",
"collection.whitelist": "data.col",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms" : "unwrap",
"transforms.unwrap.type" : "io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope",
"transforms.unwrap.drop.tombstones" : "false",
"transforms.unwrap.delete.handling.mode" : "drop",
"transforms.unwrap.operation.header" : "true",
"errors.tolerance" : "all",
"snapshot.delay.ms":"120000",
"poll.interval.ms":"3000",
"heartbeat.interval.ms":"90000"
}
}'
now while printing the topic in ksql i am getting that for some records data came with all columns(as it was in mongodb) while for some records some columns
are missing.
ksql> print 'mongo_1.data.col' from beginning;
Format:JSON
{"ROWTIME":1571148520736,"ROWKEY":"{\"id\":\"5d8777f188fef5555b\"}","attachments":[{"name":"Je","src":"https://google.co","type":"image/png"}],"tags":[51,52],"last_comment":[],"hashtags":[],"badges":[],"feed_id":"1","company_id":1,"message":"aJsm9LtK","group_id":"106","feed_type":"post","thumbnail":"","group_tag":false,"like_count":0,"clap_count":0,"comment_count":0,"created_by":520,"created_at":"1469577278628","updated_at":"1469577278628","status":1,"__v":0,"id":"5d8777f188fef5555b"}
{"ROWTIME":1571148520736,"ROWKEY":"{\"id\":\"5d285b4554e3b584bf97759\"}","badges":[],"hashtags":[],"id":"5d285b4554e3b584bf97759"}
Why this is happening and how to resolve this issue?
PS: the only difference i found that both records have different order of columns.
While searching about this issue only close thing i found here https://github.com/hpgrahsl/kafka-connect-mongodb
something they are saying about post-processing and redacting fields which have sensitive data. But as you can see both mine records are similar and have no sensitive data(by sensitive data i mean encrypted data, maybe they meant something else).

Are not the missing values after updates? Don't forget that MongoDB connector provides patch for updates not after - https://debezium.io/documentation/reference/0.10/connectors/mongodb.html#change-events-value
If you need to construct full format after in case of MongoDB you need to introduce a Kafka Streams pipeline that would store the event after insert into a persistent store and then merge the patch with the original insert to create the final event.

Related

Kafka connect Stream data from mongodb to elasticsearch

I am trying to stream data from mongodb to elasticsearch using kafka connect.
The data that is stream into kafka by mongodb connector is as given below
{
"updatedAt" : {
"$date" : 1591596275939
},
"createdAt" : {
"$date" : 1362162600000
},
"name" : "my name",
"_id" : {
"$oid" : "5ee0cc7e0c3273f3d4a3c20f"
},
"documentId" : "mydoc1",
"age" : 20,
"language" : "English",
"validFrom" : {
"$date" : 978307200000
},
"remarks" : [
"remarks"
],
"married" : false
}
I have below two problem while saving data to elasticsearch
_id is an object and I want to use "documentId" key as _id instead in elasticsearch
Dates are an object with $date key which I cant't figure out how we can convert to normal date.
Can anyone please point me to the right direction regarding above two issues.
Mongodb source config
{
"tasks.max" : "5",
"change.stream.full.document" : "updateLookup",
"name" : "mongodb-source",
"value.converter" : "org.apache.kafka.connect.storage.StringConverter",
"collection" : "collection",
"poll.max.batch.size" : "1000",
"connector.class" : "com.mongodb.kafka.connect.MongoSourceConnector",
"batch.size" : "1000",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"connection.uri" : "mongodb://connection",
"publish.full.document.only" : "true",
"database" : "databasename",
"poll.await.time.ms" : "5000",
"topic.prefix" : "mongodb"
}
Elastic sink config
{
"write.method" : "upsert",
"errors.deadletterqueue.context.headers.enable" : "true",
"name" : "elasticsearch-sink",
"connection.password" : "password",
"topic.index.map" : "mongodb.databasename.collection:elasticindexname",
"connection.url" : "http://localhost:9200",
"errors.log.enable" : "true",
"flush.timeout.ms" : "20000",
"errors.log.include.messages" : "true",
"key.ignore" : "false",
"type.name" : "_doc",
"key.converter" : "org.apache.kafka.connect.json.JsonConverter",
"value.converter" : "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"tasks.max" : "1",
"batch.size" : "100",
"schema.ignore" : "true",
"schema.enable" : "false",
"connector.class" : "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"read.timeout.ms" : "6000",
"connection.username" : "elastic",
"topics" : "mongodb.databasename.collection",
"proxy.host": "localhost",
"proxy.port": "8080"
}
Exception
Caused by: org.apache.kafka.connect.errors.DataException: MAP is not supported as the document id.
at io.confluent.connect.elasticsearch.DataConverter.convertKey(DataConverter.java:107)
at io.confluent.connect.elasticsearch.DataConverter.convertRecord(DataConverter.java:182)
at io.confluent.connect.elasticsearch.ElasticsearchWriter.tryWriteRecord(ElasticsearchWriter.java:291)
at io.confluent.connect.elasticsearch.ElasticsearchWriter.write(ElasticsearchWriter.java:276)
at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.put(ElasticsearchSinkTask.java:174)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
... 10 more
Connector Links :
https://docs.mongodb.com/kafka-connector/master/kafka-source/
https://docs.confluent.io/current/connect/kafka-connect-elasticsearch

how to do aggregation query with pagination in druid(Druid is a column-oriented, distributed data store). Does it support query with offset?

how to do aggregation query with pagination in druid(Druid is a column-oriented, distributed data store). Does it support query with offset?
I have searched but did not get anything so posting here. Thanks in advance.
Yes you can do pagination with select queries. See below druid documentation link for details - http://druid.io/docs/latest/querying/select-query.html
eg. you can send below query -
{
"queryType": "select",
"dataSource": "wikipedia",
"descending": "false",
"dimensions":[],
"metrics":[],
"granularity": "all",
"intervals": [
"2013-01-01/2013-01-02"
],
"pagingSpec":{"pagingIdentifiers": {}, "threshold":5}
}
Result -
[{
"timestamp" : "2013-01-01T00:00:00.000Z",
"result" : {
"pagingIdentifiers" : {
"wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9" : 4
},
"events" : [ {
"segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
"offset" : 0,
"event" : {
"timestamp" : "2013-01-01T00:00:00.000Z",
"robot" : "1",
"namespace" : "article",
"anonymous" : "0",
"unpatrolled" : "0",
"page" : "11._korpus_(NOVJ)",
"language" : "sl",
"newpage" : "0",
"user" : "EmausBot",
"count" : 1.0,
"added" : 39.0,
"delta" : 39.0,
"variation" : 39.0,
"deleted" : 0.0
}
}, {
"segmentId" : "wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
"offset" : 1,
"event" : {
"timestamp" : "2013-01-01T00:00:00.000Z",
"robot" : "0",
"namespace" : "article",
"anonymous" : "0",
"unpatrolled" : "0",
"page" : "112_U.S._580",
"language" : "en",
"newpage" : "1",
"user" : "MZMcBride",
"count" : 1.0,
"added" : 70.0,
"delta" : 70.0,
"variation" : 70.0,
"deleted" : 0.0
}
}, {
"segmentId" : "wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
"offset" : 2,
"event" : {
"timestamp" : "2013-01-01T00:00:00.000Z",
"robot" : "0",
"namespace" : "article",
"anonymous" : "0",
"unpatrolled" : "0",
"page" : "113_U.S._243",
"language" : "en",
"newpage" : "1",
"user" : "MZMcBride",
"count" : 1.0,
"added" : 77.0,
"delta" : 77.0,
"variation" : 77.0,
"deleted" : 0.0
}
}, {
"segmentId" : "wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
"offset" : 3,
"event" : {
"timestamp" : "2013-01-01T00:00:00.000Z",
"robot" : "0",
"namespace" : "article",
"anonymous" : "0",
"unpatrolled" : "0",
"page" : "113_U.S._73",
"language" : "en",
"newpage" : "1",
"user" : "MZMcBride",
"count" : 1.0,
"added" : 70.0,
"delta" : 70.0,
"variation" : 70.0,
"deleted" : 0.0
}
}, {
"segmentId" : "wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
"offset" : 4,
"event" : {
"timestamp" : "2013-01-01T00:00:00.000Z",
"robot" : "0",
"namespace" : "article",
"anonymous" : "0",
"unpatrolled" : "0",
"page" : "113_U.S._756",
"language" : "en",
"newpage" : "1",
"user" : "MZMcBride",
"count" : 1.0,
"added" : 68.0,
"delta" : 68.0,
"variation" : 68.0,
"deleted" : 0.0
}
} ]
}
} ]
Result comes with pagingIdentifiers which you can pass in next query.
Please note it doesn't work with top N queries as of now.
Update -
For top N queries or aggregated results, there is currently no direct way of fetching paginated results, but you can use bigger threshold and limits and exclude the previous results at client side.

Tranquility not sending data to Druid

I am evaluating Druid for my use case which ingest csv data through tranquility in real time. Following is the server configuration:-
{
"dataSources" : {
"audience" : {
"spec" : {
"dataSchema" : {
"dataSource" : "audience",
"parser" : {
"type" : "string",
"parseSpec":{
"format" : "csv",
"timestampSpec" : {
"column" : "timestamp"
},
"columns" : ["timestamp","partner_id","event_id","product_id","device_id","count"],
"dimensionsSpec" : {
"dimensions" : ["partner_id","event_id","product_id","device_id"]
}
}
},
"metricsSpec" : [{ "type" : "longSum", "name" : total, "fieldName" : "count" }],
"granularitySpec" : {
"segmentGranularity" : "HOUR",
"queryGranularity" : "HOUR",
"intervals" : [ "2013-08-31/2013-09-01" ]
}
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
},
"properties" : {
"zookeeper.connect" : "localhost",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "8"
}
}
data is generated randomly by a python script as:-
1471336991,1,960,136,3ZLA7,1
1471336991,1,369,367,8MP2B,1
1471336991,2,544,550,C9ZG8,1
1471336991,1,135,394,XFX31,1
1471336991,2,590,552,VXMTL,1
1471336991,1,493,615,0C2HR,1
1471336991,2,435,710,HKYP0,1
1471336991,1,394,483,V2HP9,1
1471336991,2,441,376,J1LYO,1
Following commands submits the data and returns {"result":{"received":1000,"sent":0}}
python createData.py |curl -XPOST -H'Content-Type: text/plain' --data-binary #- http://localhost:8200/v1/post/audience.
Finally able to solve the problem. Actually I was sending time to Druid in Epoch time format, but it expect ISO-8601 format. In python one can easily get so by :-
datetime.datetime.utcnow().isoformat()
Druid supports multiple time formats which can be specified in the "timestampSpec" property.
The Druid documentation lists the following timestamp formats: "iso, millis, posix, auto or any Joda time format."
For example, to send time in milliseconds:
"timestampSpec" : {
"column" : "timestamp",
"format" : "millis"
}
A couple of things
Use ISO 8601 Datetime format
Make sure the written timestamp is within +/- 10Mins of the current hour

How to insert data into druid via tranquility

By following tutorial at http://druid.io/docs/latest/tutorials/tutorial-loading-streaming-data.html , I was able to insert data into druid via Kafka console
Kafka console
The spec file looks as following
examples/indexing/wikipedia.spec
[
{
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
"dimensionExclusions" : [],
"spatialDimensions" : []
}
}
},
"metricsSpec" : [{
"type" : "count",
"name" : "count"
}, {
"type" : "doubleSum",
"name" : "added",
"fieldName" : "added"
}, {
"type" : "doubleSum",
"name" : "deleted",
"fieldName" : "deleted"
}, {
"type" : "doubleSum",
"name" : "delta",
"fieldName" : "delta"
}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "NONE"
}
},
"ioConfig" : {
"type" : "realtime",
"firehose": {
"type": "kafka-0.8",
"consumerProps": {
"zookeeper.connect": "localhost:2181",
"zookeeper.connection.timeout.ms" : "15000",
"zookeeper.session.timeout.ms" : "15000",
"zookeeper.sync.time.ms" : "5000",
"group.id": "druid-example",
"fetch.message.max.bytes" : "1048586",
"auto.offset.reset": "largest",
"auto.commit.enable": "false"
},
"feed": "wikipedia"
},
"plumber": {
"type": "realtime"
}
},
"tuningConfig": {
"type" : "realtime",
"maxRowsInMemory": 500000,
"intermediatePersistPeriod": "PT10m",
"windowPeriod": "PT10m",
"basePersistDirectory": "\/tmp\/realtime\/basePersist",
"rejectionPolicy": {
"type": "messageTime"
}
}
}
]
I start realtime via
java -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=examples/indexing/wikipedia.spec -classpath config/_common:config/realtime:lib/* io.druid.cli.Main server realtime
In Kafka console, I paste and enter the following
{"timestamp": "2013-08-10T01:02:33Z", "page": "Good Bye", "language" : "en", "user" : "catty", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}
Then I tend to perform query by creating select.json and run curl -X POST 'http://localhost:8084/druid/v2/?pretty' -H 'content-type: application/json' -d #select.json
select.json
{
"queryType": "select",
"dataSource": "wikipedia",
"dimensions":[],
"metrics":[],
"granularity": "all",
"intervals": [
"2000-01-01/2020-01-02"
],
"filter" : {"type":"and",
"fields" : [
{ "type": "selector", "dimension": "user", "value": "catty" }
]
},
"pagingSpec":{"pagingIdentifiers": {}, "threshold":500}
}
I was able to get the following result.
[ {
"timestamp" : "2013-08-10T01:02:33.000Z",
"result" : {
"pagingIdentifiers" : {
"wikipedia_2013-08-10T00:00:00.000Z_2013-08-11T00:00:00.000Z_2013-08-10T00:00:00.000Z" : 0
},
"events" : [ {
"segmentId" : "wikipedia_2013-08-10T00:00:00.000Z_2013-08-11T00:00:00.000Z_2013-08-10T00:00:00.000Z",
"offset" : 0,
"event" : {
"timestamp" : "2013-08-10T01:02:33.000Z",
"continent" : "North America",
"robot" : "false",
"country" : "United States",
"city" : "San Francisco",
"newPage" : "true",
"unpatrolled" : "true",
"namespace" : "article",
"anonymous" : "false",
"language" : "en",
"page" : "Good Bye",
"region" : "Bay Area",
"user" : "catty",
"deleted" : 200.0,
"added" : 57.0,
"count" : 1,
"delta" : -143.0
}
} ]
}
} ]
It seem that I had setup Druid correctly.
Now, I would like to insert data via HTTP endpoint. According to How realtime data input to Druid?, it seems like recommended way is to use tranquility
tranquility
I have indexing service started via
java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/overlord:lib/*: io.druid.cli.Main server overlord
conf/server.json looks like
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
"dimensionExclusions" : [],
"spatialDimensions" : []
}
}
},
"metricsSpec" : [{
"type" : "count",
"name" : "count"
}, {
"type" : "doubleSum",
"name" : "added",
"fieldName" : "added"
}, {
"type" : "doubleSum",
"name" : "deleted",
"fieldName" : "deleted"
}, {
"type" : "doubleSum",
"name" : "delta",
"fieldName" : "delta"
}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "NONE"
}
},
"tuningConfig" : {
"windowPeriod" : "PT10M",
"type" : "realtime",
"intermediatePersistPeriod" : "PT10M",
"maxRowsInMemory" : "100000"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost",
"http.port" : "8200",
"http.threads" : "8"
}
}
Then, I start the server using
bin/tranquility server -configFile conf/server.json
I perform post to http://xx.xxx.xxx.xxx:8200/v1/post/wikipedia, with content-type equals application/json
{"timestamp": "2013-08-10T01:02:33Z", "page": "Selamat Pagi", "language" : "en", "user" : "catty", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}
I get the the following respond
{"result":{"received":1,"sent":0}}
It seems that tranquility has received our data, but failed to send it to druid!
I try to run curl -X POST 'http://localhost:8084/druid/v2/?pretty' -H 'content-type: application/json' -d #select.json, but doesn't get the output I inserted via tranquility.
Any idea why? Thanks.
This generally happens when the data you send is out of the window period. If you are inserting data manually, give the exact current timestamp (UTC) in milliseconds. Else it can be easily done if you are using any script to generate data. Make sure it is UTC current time.
It is extremely difficult to setup druid to work properly with real-time data insertion.
The best bet I found is, use https://github.com/implydata . Imply is a set of wrappers around druid, to make it easy to use.
However, the real-time insertion in imply is not perfect either. I had experiment OutOfMemoryException, after inserting 30 millions items via real-time. This will caused data loss on previous inserted 30 millions rows.
The detailed regarding data loss can be found here : https://groups.google.com/forum/#!topic/imply-user-group/95xpYojxiOg
An issue ticket has been filed : https://github.com/implydata/distribution/issues/8
Druid streaming windowPeriod is very short (10 minutes). Outside this period, your event will be ignored.
As you got {"result":{"received":1,"sent":0}}, your worker threads are working fine. Tranquility decides what data is sent to the druid based on the timestamp associated with the data.
This period is decided by the "windowPeriod" configuration. So if your type is realtime ("type":"realtime") and window period is PT10M ("windowPeriod" : "PT10M"), tranquility will send any data between t-10, t+10 and not send anything outside this period.
I disagree with the insertion efficiency problems, we have been sending 3million rows every 15 minutes since June 2016 and has been running beautifully. Of course, we have a stronger infrastructure deemed for the scale.
Another reason for not inserting, is out memory on the coordinador/overloard are running

How to construct image url from Axomic OpenAsset REST API

I'd like to get the URL for an image that is stored in our OpenAsset server. I can curl a request to get the file's information:
curl -u myUsername:myPassword -X GET http://our.IP.address//REST/1/Files/11 | json_pp
and I can reverse engineer the url that their front end uses to show me an image:
our.IP.address/Serve/DirectImage/imageId.7235-defaultImageSizeId.1
But trying to do some kind of string format to make the url feels hacky. E.g.:
"our.IP.address/Serve/DirectImage/imageId.{}-defaultImageSizeId.1".format(theImageID)
Is there a way get a url directly from the rest request? Is this the correct way to do it?
The OpenAsset REST API is in beta still, so its documentation is surprisingly good given that fact.
starting with : curl -u username:password -X GET http://my.IP.add.ress//REST/1/Files/
{
"copyright_holder_id" : "0",
"download_count" : "1",
"original_filename" : "C990705_Colleges_011.tif",
"photographer_id" : "0",
"contains_video" : "0",
"md5_now" : "",
"category_id" : "1",
"caption" : "",
"md5_at_upload" : "90d661ec1...06b71",
"id" : "11",
"project_id" : "854",
"click_count" : "2",
"rotation_since_upload" : "0",
"alternate_store_id" : "0",
"duration" : "0",
"description" : "",
"created" : "0",
"filename" : "C990705_N1.tif",
"uploaded" : "20101202062201",
"contains_audio" : "0",
"user_id" : "12",
"access_level" : "2",
"rank" : "5"
}
Which is the information about the original file as it was uploaded. To get to a downloadable file you need to go to the sizes route.
curl -u username:password -X GET http://my.IP.add.ress//REST/1/Files/11/Sizes | json_pp
Which gives a list of the possible sizes and formats.
[
{
"unc_root" : "//SYD-OA001/openasset/",
"width" : "1383",
"watermarked" : "0",
"relative_path" : "Projects/C990705/C990705_N1_tif/C990705_N1_medium.jpg",
"colourspace" : "RGB",
"y_resolution" : "150",
"height" : "666",
"http_root" : "/Images/",
"filesize" : "106363",
"x_resolution" : "150",
"recreate" : "0",
"id" : "8",
"quality" : "0",
"file_format" : "jpg"
},
{
"width" : "1383",
"unc_root" : "//SYD-OA001/openasset/",
"watermarked" : null,
"relative_path" : "Projects/C990705/C990705_N1.tif",
"colourspace" : "CMYK",
"y_resolution" : "300",
"height" : "666",
"x_resolution" : "300",
"filesize" : "3697734",
"http_root" : "/Images/",
"id" : "1",
"quality" : "0",
"file_format" : "tif"
}
]
To get to the file you want you'll need to search this list for the right size and format, and then concatinate the unc_root and the relative_path.