Partition JSON data using jq and then send Rest query - rest

I have a json file like below
[
{
"field": {
"empID": "sapid",
"location": "India",
}
},
{
"field": {
"empID": "sapid",
"location": "India",
}
},
{
"field": {
"empID": "sapid",
"location": "India",
}
}
{
"field": {
"empID": "sapid",
"location": "India",
}
},
{
"field": {
"empID": "sapid",
"location": "India",
}
}
{
"field": {
"empID": "sapid",
"location": "India",
}
}
.... upto 1 million
]
I have to use this json as an input for a rest request For example
curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d #temp.json
My server will not accept 1 million json object at a time.
I am looking for an approach where i have to extract the first 500 objects from the main json and send it in one rest query and then next 500 object in second rest query and so on.
Can you please suggest how can i achieve this by jq?

There's an intrinsic tradeoff here between space and time efficiency. In the following, the focus is on the latter.
Assuming that each call to curl must send a JSON array, a time-efficient solution can be constructed along the following lines:
< array.json jq -c '
def batch($n): length as $l | range(0;length;$n) as $i | .[$i: $i+n];
batch(500)
' | while read -r json
do
echo "$json" | curl -X POST -H "Content-Type: application/json" -d -# ....
done
Here .... signifies additional appropriate curl arguments.
GNU parallel
You might also want to consider using GNU parallel, e.g.:
< array.json jq -c '
def batch($n):
length as $l
| range(0;length;$n) as $i
| .[$i: $i+n];
batch(500)
' | parallel --pipe -N1 curl -X POST -H "Content-Type: application/json" -d #- ....

You have not shared any HW information of the system you are running this on. At the minimum you need to do some sort of multiprocessing to make this faster instead of running (1000000/500) curl requests altogether.
One way, would be to use GNU xargs which has a built-in to run number of parallel instances of a given process using the -P flag and number of lines of input to read from at any time with the L flag.
To start with you can do something like below to instruct curl to run on 500 lines at a time and invoke 20 such invocations in parallel. So at a given tick, approximately (500 *20) lines of input are processed. You can tune the numbers depending on your HW capability both on the host and the server side.
xargs -L 500 -P 20 curl -X POST -H "Content-Type: application/json" http://sample-url -d #- < <(jq -c 'range(0;length;500) as $i | .[$i: $i+500]' json)
Modified jq filter to pack the JSON payload as an array of objects (credit peak's answer). The earlier version jq -c '.[]' json might not work as the individual chunk of lines passed at a time doesn't represent a valid JSON.
Note: Not tested due to performance constraints.

Assuming you have this formatting, splitting can be done by unpacking the array and saving the desired number of objects to separate files, e.g.:
<input.json jq -c '.[]' | split -l500
Which creates xaa with the first 500 objects, xab with the next 500 objects, etc. If you want to repackage the objects in an array, use the -s option to jq, e.g.: jq -s . xaa.

If you want to do this from the shell, you could use jq to split your JSON and pass it to xargs to call curl for each object returned.
jq -c '.[]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
This will send one curl request for each object. However, if you e.g. want to send the first 500 objects in a single curl request, you can specify a subarray in the jq filter. To send all of your JSON objects you will then somehow need to repeat the command, as afaik jq has no built-in way to split the input into chunks of objects.
jq -c '.[0:500]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
jq -c '.[500:1000]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
jq -c '.[1000:1500]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
[...]

Related

How to add the column to Google Sheets using API and provide the name and type of the column in the same call?

So, what I could achieve using the Google Sheets API is being able to create a new column using the following curl based call
curl -v \
-H 'Authorization: Bearer ya29.GlxUB9K_96tyQFyQ64eaYOeImtJt32213zjosf6LW1Inv6MOqQCCodA7CycvL5EFKIpeX4dVEebS4rUl24U1J7euhMjqBZq0QEU7ZK1B64THQXNwBpDvTzoUT9hTRg' \
-H 'Content-Type: application/json' \
-d '{
"requests": [
{
"insertDimension": {
"range": {
"sheetId": 2052094881,
"dimension": "COLUMNS",
"startIndex": 0,
"endIndex": 1
}
}
}
],
}' \
https://sheets.googleapis.com/v4/spreadsheets/1mHrPXQILuprO4NdqTgrVKlGazvvzgCFqIphGdsmptD8:batchUpdate
While this call is useful, it does not completely help. This is because the reason I wanted to add a column was to give a name and type (or format) the column values. But, as per this API, this is what see as an output
Is there a way to create and add name and type to the column in a single API call?
Thanks a lot!

Why does DESCRIBE EXTENDED in Kafka KSQL return error ShowColumns not supported?

I have a simple KTABLE in KSQL called DIMAGE
When I run the following code
{
"ksql": "DESCRIBE EXTENDED DIMAGE ;"
}
I receive the following error
{
"#type": "generic_error",
"error_code": 40000,
"message": "Statement type `io.confluent.ksql.parser.tree.ShowColumns' not supported for this resource",
"stackTrace": []
}
I also receive a similar error message trying to describe a stream. I also receive the same error message if I remove the EXTENDED attribute.
You're using the wrong REST endpoint. If you use query endpoint query you'll get your error:
$ curl -s -X "POST" "http://localhost:8088/query" \
-H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" \
-d '{
"ksql": "DESCRIBE EXTENDED COMPUTER_T;"
}'
{"#type":"generic_error","error_code":40000,"message":"Statement type `io.confluent.ksql.parser.tree.ShowColumns' not supported for this resource","stackTrace":[]}⏎
If you use the statement endpoint ksql it works fine:
$ curl -s -X "POST" "http://localhost:8088/ksql" \
-H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" \
-d '{
"ksql": "DESCRIBE EXTENDED COMPUTER_T;"
}'|jq '.'
[
{
"#type": "sourceDescription",
"statementText": "DESCRIBE EXTENDED COMPUTER_T;",
"sourceDescription": {
"name": "COMPUTER_T",
"readQueries": [
{
"sinks": [
"COMP_WATCH_BY_EMP_ID_T"
],
"id": "CTAS_COMP_WATCH_BY_EMP_ID_T_0",
[...]
I've logged #2362 so that we can improve the UX of this.

Hbase REST call - getting junk characters " \x0A "

I'm trying to insert values into a Hbase table using the HBase REST API call.. Below is the curl command i'm using..
curl -v -XPUT 'http://localhost:8080/emp/1/pers:name' -H "Accept: application/json" -H "Content-Type: application/json" --data '{ "Row": [ { "Cell": [ { "column": "cGVyczpuYW1lCg==", "$": "TXlOYW1lCg==" } ], "key": "MQo=" } ] }'
The call works fine and I get a "HTTP/1.1 200 OK" .. But when i see the Hbase table, instead of updating the value in Row "1", the call creates a new row "1\x0A" and inserts the new value with the same junk characters..
1\x0A column=pers:name\x0A, timestamp=1437596697507, value=MyName\x0A
Anyone seen something like this? Thanks in advance..
Ok i sorted this out..
\x0A is the escaped hexadecimal Line Feed. The equivalent of \n..
When we base64 encode, it takes the \n escape character into account.. so we need to pass a " -n " when we base64 encode the rowkey, column and value..
For ex:
echo -n MyName | base64
TXlOYW1l
echo MyName | base64
TXlOYW1lCg==
There is a difference between the two.. and thats what caused my problem..

Index parameterization in Cypher REST query

I have this query which works well but without parametrization in index. emp is an index, and NUM_OFC_CA is an emp number (key in emp index), to simplify i want to return NME_CA.
curl -X POST http://xyzhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query -H "Content-Type: application/json" --data-binary '{
"query": "START ca=node:emp(\"NUM_OFC_CA: 997015\") RETURN distinct ca.NME_CA as `CA Name`",
"params": {
}
}'
How can i parametrize above REST query, i have tried something like this:
curl -X POST http://xyzhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query -H "Content-Type: application/json" --data-binary '{
"query": "START ca=node:emp(\"NUM_OFC_CA: {num_ofc_ca}\") RETURN distinct ca.NME_CA as `CAName`",
"params": {
"num_ofc_ca": "997015"
}
}'
I am getting this error:
{
"message" : "org.apache.lucene.queryParser.ParseException: Cannot parse 'NUM_OFC_CA: {num_ofc_ca}': Encountered \" \"}\" \"} \"\" at line 1, column 23.\nWas expecting one of:\n \"TO\" ...\n <RANGEEX_QUOTED> ...\n <RANGEEX_GOOP> ...\n ",
"exception" : "BadInputException",
"stacktrace" : [ "org.neo4j.server.plugin.cypher.CypherPlugin.executeScript(CypherPlugin.java:61)", "java.lang.reflect.Method.invoke(Method.java:597)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:597)" ]
}
Need help to resolve this issue.
Thanks!
If you need a plain index query, the syntax is as follows:
curl -X POST http://<host>:7474/db/data/ext/CypherPlugin/graphdb/execute_query -H "Content-Type: application/json" --databinary '{
"query": ca=node:emp(NUM_OFC_CA = {num_ofc_ca}) RETURN distinct ca.NME_CA as `CAName`",
"params": {
"num_ofc_ca": "997015"
}
}'
For a general lucene index query, you need to parametrize the full query:
curl -X POST http://<host>:7474/db/data/ext/CypherPlugin/graphdb/execute_query -H "Content-Type: application/json" --databinary '{
"query": ca=node:emp({num_ofc_ca_query}) RETURN distinct ca.NME_CA as `CAName`",
"params": {
"num_ofc_ca_query": "NUM_OFC_CA:997015"
}
}'

Parameterize collection: IN Operator WHERE clause Cypher REST

How to specify parameters in something like this - WHERE a.name IN ["Peter", "Tobias"]. I am trying to pass the collection after IN operator as a parameter in Cypher. I am using Cypher through REST API.
This is my example:
curl -X POST http://localhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query -H "Content-Type: applicatio/json" --data-binary '{
"query": "start ca=node:ca({search_ca_query}) MATCH ca_club-[:has]-ca WHERE (ca_club.CA_CLUB IN {CA_CLUB}) RETURN distinct ca.NUM_OFC_CA, ca.NME_CA, ca_club.CA_CLUB",
"params": {
"search_ca_query": "NUM_OFC_CA:(\"000333\", \"111033\", \"222197\")",
"CA_CLUB": "[\"Driad\", \"No-Club\"]"
}
}'
I have also tried swapping square brackets in query, but even that didn't worked. (i.e. i am not getting any error but getting an empty list - "data" : [ ].
Any suggestions on how to do this?
Your in parameter needs to be a list:
curl -X POST http://localhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query -H "Content-Type: applicatio/json" --data-binary '{
"query": "start ca=node:ca({search_ca_query}) MATCH ca_club-[:has]-ca WHERE (ca_club.CA_CLUB IN {CA_CLUB}) RETURN distinct ca.NUM_OFC_CA, ca.NME_CA, ca_club.CA_CLUB",
"params": {
"search_ca_query": "NUM_OFC_CA:(\"000333\", \"111033\", \"222197\")",
"CA_CLUB": ["Driad", "No-Club"]
}
}'