selecting and filtering cloudflare pagerules with jq - select

I was trying to work out the best way to filter out some pagerules data from Cloudflare, and while I've got a solution I'm looking at how ugly it is and thinking "there has to be a simpler way to do this."
I'm specifically asking about a better way to achieve the following goal using jq. I understand there are programming libraries I could use to accomplish the same, but the point of this question is to get a better understanding of how jq is intended to work.
Say I've got a long list of CloudFlare pagerules records, here are a few entries as a minimal example:
{
"example.org": [
{
"id": "341",
"targets": [
{
"target": "url",
"constraint": {
"operator": "matches",
"value": "http://ng.example.org/*"
}
}
],
"actions": [
{
"id": "always_use_https"
}
],
"priority": 12,
"status": "active",
"created_on": "2017-11-29T18:07:36.000000Z",
"modified_on": "2020-09-02T16:09:03.000000Z"
},
{
"id": "406",
"targets": [
{
"target": "url",
"constraint": {
"operator": "matches",
"value": "http://nz.example.org/*"
}
}
],
"actions": [
{
"id": "always_use_https"
}
],
"priority": 9,
"status": "active",
"created_on": "2017-11-29T18:07:55.000000Z",
"modified_on": "2020-09-02T16:09:03.000000Z"
},
{
"id": "427",
"targets": [
{
"target": "url",
"constraint": {
"operator": "matches",
"value": "nz.example.org/*"
}
}
],
"actions": [
{
"id": "ssl",
"value": "flexible"
}
],
"priority": 8,
"status": "active",
"created_on": "2017-11-29T18:08:00.000000Z",
"modified_on": "2020-09-02T16:09:03.000000Z"
}
]
}
What I want to do is extract the urls nested in the constraint.value fields for the always_use_https actions. The goal is to extract the values and return them as a json array. What I came up with is this:
jq '[
[
[
[
.[] | .[] | select(.actions[].id | contains("always_use_https"))
] | .[].targets[] | select(.target | contains("url"))
] | .[] | .constraint | select(.operator | contains("matches"))
] | .[].value
]'
Against our example this produces:
[
"http://ng.example.org/*",
"http://nz.example.org/*"
]
Is there a more succinct way to achieve this in jq?

This produces the expected output in accordance with the criteria as I understand them:
jq '.["example.org"]
| map(select( any(.actions[]; .id == "always_use_https"))
| .targets[]
| select(.target == "url")
| .constraint.value )
' cloudfare.json

Related

Powershell Iterate through multidimensional array of hashtables to find a match and combine values from both arrays

I need to combine values from 2 JSONs:
If there is a match in alerts IDs, I need to create structure, that will take data from both jsons
Result for a match should look like:
$array = #()
$hashtable = #{}
$hashtable.AlertID (does not matter what JSON is it from)
$hashtable.Tags (from JSON 1)
$hashtable.IncidentName (from JSON2)
$hashtable.IncidentID (from JSON2)
$array += $hashtable
I would prefer if this would be done with c style powershell loop.
c style for loop = for ($x = 0; $x -array.count; $x++)
JSON 1:
[
{
"Status": "Active",
"IncidentId": "3",
"tags": "SINC0008009",
"AlertId": [
"da637563185629568182_-638872186",
"da637563185631732095_1120592736",
"da637563185706412029_-614525914",
"da637563185760439486_-276692370",
"da637563185856325888_-1949235651",
"da637563186785996176_2128073884",
"da637563186789897000_1239551047",
"da637563186806513555_1512241399",
"da637563193194338043_-244132089"
],
"severity": "Medium"
},
{
"Status": "Active",
"IncidentId": "4",
"tags": "SINC0008008",
"AlertId": [
"da637643650725801726_1735022501",
"da637643650741237104_1473290917",
"da637643650748739479_-40211355",
"da637643652767933265_-1887823168",
"da637643670830160376_-443360743"
],
"severity": "Medium"
},
{
"Status": "Active",
"IncidentId": "2",
"tags": null,
"AlertId": [
"caD76232A5-F386-3C5D-94CD-7C82A7F778DC"
],
"severity": "Medium"
},
{
"Status": "Active",
"IncidentId": "1",
"tags": null,
"AlertId": [
"ca6534FF45-D62A-3FB7-BD6B-FF5029C553DB"
],
"severity": "Medium"
}
]
JSON 2:
{
"value": [
{
"incidentId": 3,
"incidentName": "Multi-stage incident involving Initial access & Discovery on one endpoint",
"status": "Active",
"severity": "Medium",
"tags": ["SINC0000001"],
"comments": [],
"alerts": [
{
"alertId": "da637563185629568182_-638872186",
"incidentId": 3,
"description": "A suspicious PowerShell activity was observed on the machine. ",
"status": "New",
"severity": "Medium",
"devices": [
{
"deviceDnsName": "xxxxx"
}
],
"entities": [
{
"entityType": "User",
"accountName": "xxxxxx",
"userPrincipalName": "xxx#xx.xx"
},
{
"entityType": "Process"
},
{
"entityType": "Process",
"verdict": "Suspicious"
},
{
"entityType": "File"
}
]
},
{
"alertId": "da637563185631732095_1120592736",
"incidentId": 3,
"devices": [
{
"osPlatform": "Windows10",
"version": "1909"
}
],
"entities": [
{
"entityType": "User",
"remediationStatus": "None"
}
]
}
]
},
{
"incidentId": 4,
"incidentName": "Multi-stage incident involving Initial access & Discovery on one endpoint",
"status": "Active",
"severity": "Medium",
"tags": ["SINC0000002"],
"comments": [],
"alerts": [
{
"alertId": "da637563185629568182_-638872186",
"incidentId": 3,
"description": "A suspicious PowerShell activity was observed on the machine. ",
"status": "New",
"severity": "Medium",
"devices": [
{
"deviceDnsName": "xxxxx"
}
],
"entities": [
{
"entityType": "User",
"accountName": "xxxxxx",
"userPrincipalName": "xxx#xx.xx"
},
{
"entityType": "Process"
},
{
"entityType": "Process",
"verdict": "Suspicious"
},
{
"entityType": "File"
}
]
},
{
"alertId": "da637563185631732095_1120592736",
"incidentId": 3,
"devices": [
{
"osPlatform": "Windows10",
"version": "1909"
}
],
"entities": [
{
"entityType": "User",
"remediationStatus": "None"
}
]
}
]
}
]
}
Till now, I was looking into using nested foreach loop to address it but it does not behave like I want. I am looking for for loop as I could use the indexes.
Instead of creating an array of Hashtables, I think it's better to create an array of PsCustomObjects, because outputting the result to console/file/json would be a lot easier then.
$json1 = Get-Content -Path 'X:\json1.json' -Raw | ConvertFrom-Json
$json2 = Get-Content -Path 'X:\json2.json' -Raw | ConvertFrom-Json
$result = foreach ($incident in $json1) {
foreach ($alertId in $incident.AlertId) {
$json2.value | Where-Object { $_.alerts.alertId -eq $alertId } | ForEach-Object {
# output an object with the wanted properties
[PsCustomObject]#{
AlertID = $alertId # from json1
Tags = $incident.Tags # from json1
IncidentName = $_.incidentName # from json2
IncidentID = $_.incidentId # from json2
}
}
}
}
# output on screen
$result | Format-Table -AutoSize # or use Out-GridView
# output to new JSON
$result | ConvertTo-Json
# output to CSV file
$result | Export-Csv -Path 'X:\incidents.csv' -NoTypeInformation
Using your examples, the output to console window is:
AlertID Tags IncidentName IncidentID
------- ---- ------------ ----------
da637563185629568182_-638872186 SINC0008009 Multi-stage incident involving Initial access & Discovery on one endpoint 3
da637563185629568182_-638872186 SINC0008009 Multi-stage incident involving Initial access & Discovery on one endpoint 4
da637563185631732095_1120592736 SINC0008009 Multi-stage incident involving Initial access & Discovery on one endpoint 3
da637563185631732095_1120592736 SINC0008009 Multi-stage incident involving Initial access & Discovery on one endpoint 4

obtaining raw user and session data

so I'm really new to working with the Google Analytics API. I have managed to make the request work:
{
"dateRange": {
"startDate": "2021-01-08",
"endDate": "2021-05-05"
},
"activityTypes": [
"GOAL"
],
"user": {
"type": "CLIENT_ID",
"userId": "2147448080.1620199617"
},
"viewId": "1556XXX89"
}
such that I can get back a json format file like:
{
"sessions": [
{
"sessionId": "1620199614",
"deviceCategory": "mobile",
"platform": "Android",
"dataSource": "web",
"activities": [
{
"activityTime": "2021-05-05T07:53:08.366983Z",
"source": "(direct)",
"medium": "(none)",
"channelGrouping": "Direct",
"campaign": "(not set)",
"keyword": "(not set)",
"hostname": "somewebsite.com",
"landingPagePath": "/client/loginorcreate/login",
"activityType": "GOAL",
"customDimension": [
{
"index": 1
},
{
"index": 2
},
{
"index": 3,
"value": "59147"
}
],
"goals": {
"goals": [
{
"goalIndex": 1,
"goalCompletions": "1",
"goalCompletionLocation": "/order/registerorder/postregister.html",
"goalPreviousStep1": "page-z",
"goalPreviousStep2": "page-y",
"goalPreviousStep3": "page-x",
"goalName": "order"
}
]
}
}
],
"sessionDate": "2021-05-05"
}
],
"totalRows": 1,
"sampleRate": 1
}
now, ideally, I would get a different format response, and more importantly on where I don't have to specify each individual client ID. is there such a request format I can build, which would return something like:
clientID1 | activityTime | sessionId | activities
clientID2 | activityTime | sessionId | activities
clientID3 | activityTime | sessionId | activities
thanks!
The google analytics api returns data in the json format that you are seeing currently. How you format that data will be up to you this must however been done locally by you.

Sum(distinct metric) in apache druid

how do we write sum(distinct col) in druid ? if i try to write in druid, it says plans can't be build, but same is possible in Druid. I tried to convert to subquery approach, but my inner query returns lot of item level data, hence timing out.
The distinct count or sum is not something which is by default supported by druid.
There are actually several methods which give you a similar result.
Option 1. Theta Sketch extension (recommended)
If you enable the Theta Sketch extension (See https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html) you can use this to get the same result.
Example:
{
"queryType": "groupBy",
"dataSource": "hits",
"intervals": [
"2020-08-14T11:00:00.000Z/2020-08-14T12:00:00.000Z"
],
"dimensions": [],
"granularity": "all",
"aggregations": [
{
"type": "cardinality",
"name": "col",
"fields": [
{
"type": "default",
"dimension": "domain",
"outputType": "string",
"outputName": "domain"
}
],
"byRow": false,
"round": false
}
]
}
Result:
+--------+
| domain |
+--------+
| 22 |
+--------+
Option 2: cardinality
The cardinality() aggregation computes the cardinality of a set of Apache Druid (incubating) dimensions, using HyperLogLog to estimate the cardinality.
Example:
{
"queryType": "groupBy",
"dataSource": "hits",
"intervals": [
"2020-08-14T11:00:00.000Z/2020-08-14T12:00:00.000Z"
],
"dimensions": [],
"granularity": "all",
"aggregations": [
{
"type": "cardinality",
"name": "domain",
"fields": [
{
"type": "default",
"dimension": "domain",
"outputType": "string",
"outputName": "domain"
}
],
"byRow": false,
"round": false
}
]
}
Response:
+-----------------+
| domain |
+-----------------+
| 22.119017166376 |
+-----------------+
Option 3. use hyperUnique
This option requires that you keep track of the counts at indexation time. If you have applied this, you can use this in your query:
{
"queryType": "groupBy",
"dataSource": "hits",
"intervals": [
"2020-08-14T11:00:00.000Z/2020-08-14T12:00:00.000Z"
],
"dimensions": [],
"granularity": "all",
"aggregations": [
{
"type": "hyperUnique",
"name": "domain",
"fieldName": "domain",
"isInputHyperUnique": false,
"round": false
}
],
"context": {
"groupByStrategy": "v2"
}
}
As I have no hyperUnique metric in my data set, I have no exact example response.
This page explains this method very well: https://blog.mshimul.com/getting-unique-counts-from-druid-using-hyperloglog/
Conclusion
In my opinion the Theta Sketch extension is the best and most easy way to get the result. Please read the documentation carefully.
If you are an PHP user you could take a look at this, maybe it helps:
https://github.com/level23/druid-client#hyperunique
https://github.com/level23/druid-client#cardinality
https://github.com/level23/druid-client#distinctcount

VPCDHCPOptionsAssociation encountered unsupported property DHCPOptionsId

When i'm trying to create the stack it throws the above error. I've checked and i'm using the correct property value.
I've tried adding a "DependsOn" for DHCPOptions. I've tried using the Fn:GetAtt for the DHCPOptions. None have proved successful.
"DHCPOptions": {
"Type": "AWS::EC2::DHCPOptions",
"Properties": {
"DomainName": { "Ref": "DNSName" },
"DomainNameServers": [ "AmazonProvidedDNS" ],
"Tags": [{
"Key": "Name",
"Value": {
"Fn::Sub": "${VPCStackName}-DHCPOPTS"
}
}]
}
},
"VPCDHCPOptionsAssociation": {
"Type": "AWS::EC2::VPCDHCPOptionsAssociation",
"DependsOn": "DHCPOptions",
"Properties": {
"VpcId": { "Ref": "TestVPC" },
"DHCPOptionsId": { "Ref": "DHCPOptions" }
}
},
Expecting to pass the DHCPOptionsId from the DHCPOptions.
I've found the issue. Just a simple error regarding the casing.
It should be DhcpOptionsId not DHCPOptionsId

Configuring wch3 hash function in mcrouter

I came across this page discussing different hash functions that mcrouter can use, but could not find an example of how a hash function can be configured if you do not want to use the default ch3. In my case, i would like to use the "wch3" with a balanced (50/50) weight between two nodes in a pool. How can i manually change the default to configure wch3?
Thanks in advance.
Here is an example that can help you:
{
"pools": {
"cold": {
"servers": [
"memc_1:11211",
"memc_2:11211"
]
},
"warm": {
"servers": [
"memc_11:11211",
"memc_12:11211"
]
}
},
"route": {
"type": "OperationSelectorRoute",
"operation_policies": {
"get": {
"type": "WarmUpRoute",
"cold": "PoolRoute|cold",
"warm": "PoolRoute|warm",
"exptime": 20
}
},
"default_policy": {
"type": "AllSyncRoute",
"children": [{
"type": "PoolRoute",
"pool": "cold",
"hash": {
"hash_func": "WeightedCh3",
"weights": [
1,
1
]
}
},
{
"type": "PoolRoute",
"pool": "warm",
"hash": {
"hash_func": "WeightedCh3",
"weights": [
1,
1
]
}
}
]
}
}
}
You can adjust the weight in the range [0.0, 1.0].