Accessing parent level using Data::Visitor::Callback (Perl) - perl

This is a long-shot, but are there any perl developers out there who know anything about Data::Visitor::Callback?
I have a complex data structure that I am traversing. Each time I find a hash that has a 'total' key, I need to build up a URL. I have some of the data that I need to create these URLs but some of the data comes from higher up in the structure.
I don't think I can access the levels above and that makes it impossible to build my URLs. I only just realised that I needed data from higher up the structure.
If I can't make Data::Visitor::Callback work for me, then it means rolling my own traversal code - which is a pain.
Data I am traversing is converted from the following JSON (the "count" keys are renamed to "total" as part of the conversion process):
[
{
"field": "field_name",
"value": "A",
"count": 647,
"pivot": [
{
"field": "field_name",
"value": "B",
"count": 618,
"pivot": [
{
"field": "field_name",
"value": "C1",
"count": 572
},
{
"field": "field_name",
"value": "C2",
"count": 266
},
{
"field": "field_name",
"value": "C3",
"count": 237
}
]
},
...
Once I get to the deepest level (C), I need both A and B values in order to construct my URLs.
Because Data::Visitor::Callback is acting on each leaf independently, I'm not sure that it 'knows' where in the structure it is.
All help very much appreciated.
Thanks.

Given the JSON you posted is in the variable $json_string, the following code uses recursion to add the parents to all the children in the hash key parent, that way you can access the parents in your code.
use strict;
use warnings;
use JSON;
my $data = decode_json($json_string);
add_parent_to_children($_) for #$data;
sub add_parent_to_children {
my $node = shift;
my $parent = shift;
$node->{parent} = $parent if $parent;
if ($node->{pivot}) {
add_parent_to_children($_, $node) for #{$node->{pivot}};
}
}
Demo:
my $c3 = $data->[0]{pivot}[0]{pivot}[2];
print "$c3->{value}\n"; # prints C3
print "$c3->{parent}{value}\n"; # prints B
print "$c3->{parent}{parent}{value}\n"; # prints A

Related

How to implement a RESTful API for order changes on large collection entries?

I have an endpoint that may contain a large number of resources. They are returned in a paginated list. Each resource has a unique id, a rank field and some other data.
Semantically the resources are ordered with respect to their rank. Users should be able to change that ordering. I am looking for a RESTful interface to change the rank field in many resources in a large collection.
Reordering one resource may result in a change of the rank fields of many resources. For example consider moving the least significant resource to the most significant position. Many resources may need to be "shifted down in their rank".
The collection being paginated makes the problem a little tougher. There has been a similar question before about a small collection
The rank field is an integer type. I could change its type if it results in a reasonable interface.
For example:
GET /my-resources?limit=3&marker=234 returns :
{
"pagination": {
"prevMarker": 123,
"nextMarker": 345
},
"data": [
{
"id": 12,
"rank": 2,
"otherData": {}
},
{
"id": 35,
"rank": 0,
"otherData": {}
},
{
"id": 67,
"rank": 1,
"otherData": {}
}
]
}
Considered approaches.
1) A PATCH request for the list.
We could modify the rank fields with the standard json-patch request. For example the following:
[
{
"op": "replace",
"path": "/data/0/rank",
"value": 0
},
{
"op": "replace",
"path": "/data/1/rank",
"value": 1
},
{
"op": "replace",
"path": "/data/2/rank",
"value": 2
}
]
The problems I see with this approach:
a) Using the array indexes in path in patch operations. Each resource has already a unique ID. I would rather use that.
b) I am not sure to what the array index should refer in a paginated collection? I guess it should refer to the global index once all pages are received and merged back to back.
c) The index of a resource in a collection may be changed by other clients. What the current client thinks at index 1 may not be at that index anymore. I guess one could add test operation in the patch request first. So the full patch request would look like:
[
{
"op": "test",
"path": "/data/0/id",
"value": 12
},
{
"op": "test",
"path": "/data/1/id",
"value": 35
},
{
"op": "test",
"path": "/data/2/id",
"value": 67
},
{
"op": "replace",
"path": "/data/0/rank",
"value": 0
},
{
"op": "replace",
"path": "/data/1/rank",
"value": 1
},
{
"op": "replace",
"path": "/data/2/rank",
"value": 2
}
]
2) Make the collection a "dictionary"/ json object and use a patch request for a dictionary.
The advantage of this approach over 1) is that we could use the unique IDs in path in patch operations.
The "data" in the returned resources would not be a list anymore:
{
"pagination": {
"prevMarker": 123,
"nextMarker": 345
},
"data": {
"12": {
"id": 12,
"rank": 2,
"otherData": {}
},
"35": {
"id": 35,
"rank": 0,
"otherData": {}
},
"67": {
"id": 67,
"rank": 1,
"otherData": {}
}
}
}
Then I could use the unique ID in the patch operations. For example:
{
"op": "replace",
"path": "/data/12/rank",
"value": 0
}
The problems I see with this approach:
a) The my-resources collection can be large, I am having difficulty about the meaning of a paginated json object, or a paginated dictionary. I am not sure whether an iteration order can be defined on this large object.
3) Have a separate endpoint for modifying the ranks with PUT
We could add a new endpoint like this PUT /my-resource-ranks.
And expect the complete list of the ordered id's to be passed in a PUT request. For example
[
{
"id": 12
},
{
"id": 35
},
{
"id": 67
}
]
We would make the MyResource.rank a readOnly field so it cannot be modified through other endpoints.
The problems I see with this approach:
a) The need to send the complete ordered list. In the PUT request for /my-resource-ranks we will not include any other data, but only the unique id's of resources. It is less severe than sending the full resources but still the complete ordered list can be large.
4) Avoid the MyResource.rank field and the "rank" be the order in the /my-collections response.
The returned resources would not have the "rank" field in them and they will be already sorted with respect to their rank in the response.
{
"pagination": {
"prevMarker": 123,
"nextMarker": 345
},
"data": [
{
"id": 35,
"otherData": {}
},
{
"id": 67,
"otherData": {}
},
{
"id": 12,
"otherData": {}
}
]
}
The user could change the ordering with the move operation in json-patch.
[
{
"op": "test",
"path": "/data/2/id",
"value": 12
},
{
"op": "move",
"from": "/data/2",
"path": "/data/0"
}
]
The problems I see with this approach:
a) I would prefer the freedom for the server to return to /my-collections in an "arbitrary" order from the client's point of view. As long as the order is consistent, the optimal order for a "simpler" server implementation may be different than the rank defined by the application.
b) Same concern as 1)b). Does index in the patch operation refer to the global index once all pages are received and merged back to back? Or does it refer to the index in the current page ?
Update:
Does anyone know further examples from an existing public API ? Looking for further inspiration. So far I have:
Spotify's Reorder a Playlist's Tracks
Google Tasks: change order, move
I would
Use PATCH
Define a specialized content-type specifically for updating the order.
The application/patch+json type is pretty great for doing straight-up modifications, but I think your use-case is unique enough to warrant a useful, minimal specialized content-type.

How can I count all possible subdocument elements for a given top element in Mongo?

Not sure I am using the right terminology here, but assume following oversimplified JSON structure available in Mongo :
{
"_id": 1234,
"labels": {
"label1": {
"id": "l1",
"value": "abc"
},
"label3": {
"id": "l2",
"value": "def"
},
"label5": {
"id": "l3",
"value": "ghi"
},
"label9": {
"id": "l4",
"value": "xyz"
}
}
}
{
"_id": 5678,
"labels": {
"label1": {
"id": "l1",
"value": "hjk"
},
"label5": {
"id": "l5",
"value": "def"
},
"label10": {
"id": "l10",
"value": "ghi"
},
"label24": {
"id": "l24",
"value": "xyz"
}
}
}
I know my base element name (labels in the example), but I do not know the various sub elements I can have (so in this case the labelx names).
How can I group / count the existing elements (like as if I would be using a wildcard) so I would get some distinct overview like
"label1":2
"label3":1
"label5":2
"label9":1
"label10":1
"label24":1
as a result? So far I only found examples where you actually need to know the element names. But I don't know them and want to find some way to get all possible sub element names for a given top element for easy review.
In reality the label names can be pretty wild, I used labelx for readability in the example.
You can try below aggregation in 3.4.
Use $objectToArray to transform object to array of key value pairs followed by $unwind and $group on key to count occurrences.
db.col.aggregate([
{"$project":{"labels":{"$objectToArray":"$labels"}}},
{"$unwind":"$labels"},
{"$group":{"_id":"$labels.k","count":{"$sum":1}}}
])

MongoDB data model for fast reads using array data

I have a dataset which returns an array named "data_arr" that contains anywhere from 5 to 200 subitems which have a labelspace & key-value pair as follows.
{
"other_fields": "other_values",
...
"data_arr":[
{
"labelspace": "A",
"color": "red"
},
{
"labelspace": "A",
"size": 500
},
{
"labelspace": "B",
"shape": "round"
},
]
}
The question is, within MongoDB, how to store this data optimized for fast reads. Specifically, there would be queries:
Comparing key-values (ie. Average size of objects which are both red
and round).
Returning all documents which meet a criteria (ie. Red objects
larger than 300).
Label space is important because some key names are reused.
I've contemplated indexing with the existing structure by indexing labelspace.
I've considered grouping all labelspace key/values into a single sub-document as follows:
{
"other_fields": "other_values",
...
"data_a":
{
"color": "red",
"size": 500
},
"data_b":
{
"shape": "round"
}
}
Or modeling it as follows with a multi-value index
{
"other_fields": "other_values",
...
"data_arr":[
{
"labelspace": "A",
"key": "color",
"value": "red"
},
{
"labelspace": "A",
"key": "size",
"value": 500
},
{
"labelspace": "B",
"key": "shape",
"value": "round"
},
]
}
This is a new data set that needs to be collected. So it's difficult for me to build up enough of a sample only to discover I've ventured down the wrong path.
I think the last one is best suited for indexing, so possibly the best approach?

Does the OData protocol provide a way to transform an array of objects to an array of raw values?

Is there a way specify in an OData query that instead of certain name/value pairs being returned, a raw array should be returned instead? For example, if I have an OData query that results in the following:
{
"#odata.context": "http://blah.org/MyService/$metadata#People",
"value": [
{
"Name": "Joe Smith",
"Age": 55,
"Employers": [
{
"Name": "Acme",
"StartDate": "1/1/1990"
},
{
"Name": "Enron",
"StartDate": "1/1/1995"
},
{
"Name": "Amazon",
"StartDate": "1/1/1999"
}
]
},
{
"Name": "Jane Doe",
"Age": 30,
"Employers": [
{
"Name": "Joe's Crab Shack",
"StartDate": "1/1/2007"
},
{
"Name": "TGI Fridays",
"StartDate": "1/1/2010"
}
]
}
]
}
Is there anything I can add to the query to instead get back:
{
"#odata.context": "http://blah.org/MyService/$metadata#People",
"value": [
{
"Name": "Joe Smith",
"Age": 55,
"Employers": [
[ "Acme", "1/1/1990" ],
[ "Enron", "1/1/1995" ],
[ "Amazon", "1/1/1999" ]
]
},
{
"Name": "Jane Doe",
"Age": 30,
"Employers": [
[ "Joe's Crab Shack", "1/1/2007" ],
[ "TGI Fridays", "1/1/2010" ]
]
}
]
}
While I could obviously do the transformation client side, in my use case the field names are very large compared to the data, and I would rather not transmit all those names over the wire nor spend the CPU cycles on the client doing the transformation. Before I come up with my own custom parameters to indicate that the format should be as I desire, I wanted to check if there wasn't already a standardized way to do so.
OData provides several options to control the amount of data and metadata to be included in the response.
In OData v4, you can add odata.metadata=minimal to the Accept header parameters (check the documentation here). This is the default behaviour but even with this, it will still include the field names in the response and for a good reason.
I can see why you want to send only the values without the fields name but keep in mind that this will change the semantic meaning of the response structure. It will make it less intuitive to deal with as a json record on the client side.
So to answer your question, The answer is 'NO',
Other options to minimize the response size:
You can use the $value OData option to gets the raw value of a single property.
Check this example:
services.odata.org/OData/OData.svc/Categories(1)/Products(1)/Supplier/Address/City/$value
You can also use the $select option to cherry pick only the fields you need by selecting a subset of properties to include in the response

Filtering nested results an OData Query

I have a OData query returning a bunch of items. The results come back looking like this:
{
"d": {
"__metadata": {
"id": "http://dev.sp.swampland.local/_api/SP.UserProfiles.PeopleManager/GetPropertiesFor(accountName=#v)",
"uri": "http://dev.sp.swampland.local/_api/SP.UserProfiles.PeopleManager/GetPropertiesFor(accountName=#v)",
"type": "SP.UserProfiles.PersonProperties"
},
"UserProfileProperties": {
"results": [
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "UserProfile_GUID",
"Value": "66a0c6c2-cbec-4abb-9e25-cc9e924ad390",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "ADGuid",
"Value": "System.Byte[]",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "SID",
"Value": "S-1-5-21-2355771569-1952171574-2825027748-500",
"ValueType": "Edm.String"
}
]
}
}
}
In reality, there's a lot of items (100+) coming back in the UserProfileProperties collection however I'm only looking for a few where the KEY matches a few items but I can't figure out exactly what I need my filter to be. I've tried $filter=UserProfileProperties/Key eq 'SID' but that still gives me everything. Also trying to figure out how to pull back multiple items.
Ideas?
I believe you forgot about how each of the results have a key, not the UserProfileProperties so UserProfileProperties/Key doesn't actually exist. Instead because result is an array you must check either a certain position (eq. result(1)) or use the oData functions any or all.
Try $filter=UserProfileProperties/results/any(r: r/Key eq 'SID') if you want all the profiles where just one of the keys is SID or use
$filter=UserProfileProperties/results/all(r: r/Key eq 'SID') if you want the profiles where every result has a key equaling SID.