How do I filter a 'reduce inputs' over a large stream of objects? - filtering

I use this to accumulate a map of unique keys whose value is the aggregate count and duration totals. Currently it runs on every input via 'reduce inputs'.
reduce inputs as $r
({};
("Pipeline:" + $r.m."topic.type") as $topic
| ("Channel:" + $r.channel) as $channel
| ("Campaign:" + $r.campaign) as $campaign
| ("Cellcode:" + $r.cellcode) as $cellcode
| ("Tracking:" + $r.tracking) as $tracking
| ("Template:" + $r.m."template.id") as $template
| ("Event:" + $r.name) as $event
| ("Reason:" + $r.reason) as $reason
| ($r.duration|tonumber) as $duration
| (($topic + ":" + $channel + ":" + $campaign + ":" + $cellcode + ":" + $tracking + ":" + $template + ":" + $event + ":" + $reason) as $key
| .[$key][0] += 1 | .[$key][1] += $duration)
I cannot figure out where to put a select() filter so that I do my reduce across only those entries that pass a 'select($r.type == "AUDIT_CHANNEL")' check, in order to skip the 2 "type":"AUDIT_SYSTEM" events in this test data:
{"type":"AUDIT_CHANNEL","name":"DROPPED","reason":"INVALID_MAIL_META_DATA","start":"1472083067058","duration":"91","end":"1472083067149","dc":"dev","pool":"raptor-app","host.name":"L-SEA-10002721","host.ip":"10.236.67.80","rlogid":"tfsqiu.dvw9%3FJ*P%40G*25671246-156befd00b2-0x293","channel":"EMAIL","m":{"audited":"1472083067058","created":"1472083066974","enabled":"true","entity.common.version":"1","template.id":"2840df6d-d9e8-4f27-e8b5-918c122d4561","template.version":"17","topic.curname":"eddude-default-topic","topic.curtype":"DEFAULT","topic.dc":"LVS","topic.name":"eddude-default-topic","topic.part":"5","topic.type":"DEFAULT"},"id":"0AEC4350-1C6E2FC9B80-0156BEF9ED92-0000000000000003","campaign":"999","contract":"a5872a5c-8912-dd63-583f-61fa8db3efde","user":1276847275,"cellcode":"","age":"175"}
{"type":"AUDIT_SYSTEM","name":"ROTATED","start":"1472083081033","duration":"0","end":"1472083081033","dc":"dev","pool":"raptor-app","host.name":"L-SEA-10002721","host.ip":"10.236.67.80","rlogid":"tfsqiu.dvw9%3FJ*P%40G*25671246-156befd3749-0xce"}
{"type":"AUDIT_SYSTEM","name":"ROTATED","start":"1472083141034","duration":"0","end":"1472083141034","dc":"dev","pool":"raptor-app","host.name":"L-SEA-10002721","host.ip":"10.236.67.80","rlogid":"tfsqiu.dvw9%3FJ*P%40G*25671246-156befe21aa-0xce"}
{"type":"AUDIT_CHANNEL","name":"RECEIVED","start":"1472083158860","duration":"109","end":"1472083158969","dc":"dev","pool":"raptor-app","host.name":"L-SEA-10002721","host.ip":"10.236.67.80","rlogid":"tfsqiu.dvw9%3FJ*P%40G*25671246-156befe674c-0x10f","channel":"EMAIL","m":{"audited":"1472083158860","created":"1472083158860","enabled":"true","entity.common.version":"1","template.id":"2840df6d-d9e8-4f27-e8b5-918c122d4561","template.version":"17","topic.curname":"eddude-default-topic","topic.curtype":"DEFAULT","topic.dc":"LVS","topic.name":"eddude-default-topic","topic.part":"5","topic.type":"DEFAULT"},"id":"0AEC4350-1C6E2FC9B80-0156BEF9ED92-0000000000000004","campaign":"999","contract":"a5872a5c-8912-dd63-583f-61fa8db3efde","user":1276847275,"cellcode":"","age":"109"}
I tried putting it in front of the reduce, inside the reduce, etc but I don't get the desired output which is:
{
"Pipeline:DEFAULT:Channel:EMAIL:Campaign:999:Cellcode::Tracking::Template:2840df6d-d9e8-4f27-e8b5-918c122d4561:Event:DROPPED:Reason:INVALID_MAIL_META_DATA": [
1,
91
],
"Pipeline:DEFAULT:Channel:EMAIL:Campaign:999:Cellcode::Tracking::Template:2840df6d-d9e8-4f27-e8b5-918c122d4561:Event:RECEIVED:Reason:": [
1,
109
]
}
Do I have to perform filtering totally outside of the reduce run, or am I just not aware of how to do this with a single filter-and-reduce?
Btw, assume this input is a giant stream of millions of records, with a few hundred unique "keys" that get calculated for accumulating into.

inputs will produce a result for every input that it is fed. You want to filter those inputs by the type so you could put your filter there:
reduce (inputs | select(.type == "AUDIT_CHANNEL")) as $r ...
I would write your filter like so:
reduce (inputs | select(.type == "AUDIT_CHANNEL")) as $r ({};
([
"Pipeline", $r.m."topic.type",
"Channel", $r.channel,
"Campaign", $r.campaign,
"Cellcode", $r.cellcode,
"Tracking", $r.tracking,
"Template", $r.m."template.id",
"Event", $r.name,
"Reason", $r.reason
] | join(":")) as $key
| .[$key] |= [ .[0]+1, .[1]+($r.duration|tonumber) ]
)

Related

(Powershell) Does powershell have some weird parentheses/bracket balance rules?

My powershell acts very strangely when I try to make a function. To demonstrate, Here is a basic function that, I dunno, calculates the quadratic formula:
1 function Get-Quadratic {
2
3 [CmdletBinding()]
4
5 param (
6 [Parameter(Position = 0, Mandatory = $true)]
7 [int32]$a
8 [Parameter(Position = 1, Mandatory = $true)]
9 [int32]$b
10 [Parameter(Position = 2, Mandatory = $true)]
11 [int32]$c
12 )
13
14 Write-Output $a + " " + $b + " " + $c
15 }
And if you were to try loading this in with . .\quadratic.ps1, you get:
At C:\Users\(me)\Desktop\Cur\playground\quadratic.ps1:7 char:11
+ [int32]$a
+ ~
Missing ')' in function parameter list.
At C:\Users\(me)\Desktop\Cur\playground\quadratic.ps1:1 char:24
+ function Get-Quadratic {
+ ~
Missing closing '}' in statement block or type definition.
At C:\Users\(me)\Desktop\Cur\playground\quadratic.ps1:12 char:1
+ )
+ ~
Unexpected token ')' in expression or statement.
At C:\Users\(me)\Desktop\Cur\playground\quadratic.ps1:15 char:1
+ }
+ ~
Unexpected token '}' in expression or statement.
+ CategoryInfo : ParserError: (:) [], ParseException
+ FullyQualifiedErrorId : MissingEndParenthesisInFunctionParameterList
I have followed tutorials down to the pixel when trying to make functions, but my powershell doesn't seem to like even the most basic commands. If it helps, my version is
Major Minor Build Revision
----- ----- ----- --------
5 1 19041 1320
Abraham Zinala provided the crucial pointer with respect to your code's primary problem:
The individual parameter declarations inside a param(...) block must be ,-separated.
While the conceptual about_Functions does state this requirement, it may not be obvious, given that in other contexts (#(...), $(...) and #{ ... }) separating elements by newlines alone is sufficient. In fact, there is an existing feature request to make the use of , optional - see GitHub issue #8957.
The secondary problem is that Write-Output $a + " " + $b + " " + $c won't work as you expect:
Expressions - such as string concatenation with the + operator in your case - require enclosure in (...) in order for their result to be passed as a single command argument - see this answer.

formatting phone numbers using powershell

I'm newbie for script. I am trying to create a powershell to take all my users in active directory and format all of their phone numbers the same + 90 (XXX) XXX XX XX
So an example - +901111111111 will turn to +90 (111) 111 11 11
if your numbers are all the same length & pattern, then the -f string format operator with a format pattern would do the job. [grin] like this ...
$InString = '+901111111111'
$OutPattern = '+## (###) ### ## ##'
$OutString = "{0:$OutPattern}" -f [int64]($InString.Trim('+'))
$OutString
output = +90 (111) 111 11 11
Just in case all numbers have same length this quick solution could solve the problem
$input = "+901234567890"
$output = $input.Substring(0,3) +
" (" +
$input.Substring(3,3) +
") " +
$input.Substring(6,3) +
" " +
$input.Substring(9,2) +
" " +
$input.Substring(11,2)
# $output value should be "+90 (123) 456 78 90"

Split an array based on value

This is my first question here, so sorry if I make any mistakes posting this.
I'm trying to split an array based on its values. Basically I want to create two arrays whose values are as close to the average as possible. I managed to do this with this code:
function Sum($v) {
[Linq.Enumerable]::Sum([int64[]]$v)
}
$arr = 0..9 | % {get-random -min 1 -max 10}
"ARRAY:" + $arr
"SUM: " + (sum $arr)
"AVG: " + (sum $arr)/2
# start of the code that matters
$wavg = (sum $arr)/2
foreach ($i in (0..($arr.Count-1))) {
$wavg -= $arr[$i]
if ($wavg -le 0) {
$i-=(-$wavg -gt $arr[$i]/2);break
}
}
"SPLIT INDEX: " + $i
"ARR1: " + $arr[0..$i] + " (" + $(sum $arr[0..$i]) + ")"
"ARR2: " + $arr[($i+1)..$arr.Count] + " (" + $(sum $arr[($i+1)..$arr.Count]) + ")"
The reason my foreach is structured this way is because in my actual code the values are in an index hash and are accessed as $index[$arr[$i]].
This means that the resulting two arrays could be of unequal size (it would be easy if I could just split the array in half). Sample output of my code to demonstrate this:
ARRAY: 5 3 6 3 2 3 6 3 1 3
SUM: 35
AVG: 17.5
SPLIT INDEX: 3
ARR1: 5 3 6 3 (17)
ARR2: 2 3 6 3 1 3 (18)
The code works as is, but I feel it could be done in a more elegant and speedier way. Because I need to execute this code a few thousand times in my script I want it to be as fast as possible.

PowerShell HashTable - self referencing during initialization

I have a theoretical problem - how to reference a hash table during its initialization, for example, to compute a member based other already stated members.
Remove-Variable myHashTable -ErrorAction Ignore
$myHashTable =
#{
One = 1
Two= 2
Three = ??? # following expressions do not work
# $This.One + $This.Two or
# $_.One + $_.Two
# $myHashTable.One + $myHashTable.Two
# ????
}
$myHashTable.Three -eq 3 # make this $true
Any ideas how to do it? Is it actually possible?
Edit:
This was my solution:
$myHashTable =
#{
One = 1
Two= 2
}
$myHashTable.Three = $myHashTable.One + $myHashTable.Two
This won't be possible using the object initializer syntax I'm afraid. While it is possible to use variables, you'll have to compute the values before creating the object.
I cannot recommend this, but you can iterate the initializer twice or more:
(0..1) | %{
$a = #{
One = 1
Two = $a.One + 1
}
}
(0..2) | %{
$b = #{
One = 1
Two = $b.One + 1
Three = $b.Two + 1
}
}
Make sure all calculations are idempotent, i.e. do not depend on a number of iterations.
You can also recur to this...
sometimes when the hashtable is very long
and can be defined only in 2 or three recurrences...
works fine:
$AAA = #{
DAT = "C:\MyFolderOfDats"
EXE = "C:\MyFolderOfExes"
}
$AAA += #{
Data = $AAA.DAT + "\#Links"
Scripts = $AAA.EXE + "\#Scripts"
ScriptsX = $AAA.EXE + "\#ScriptsX"
}
Note in the second part we are just adding ( += ) more items to the first part... but now... we can refer the items in first part
of the hashtable

How to create a script that would calculate the difference between samples in a list?

I am trying to create a script that would calculate the difference between samples in a list.
If we take this example:
- result1 = 33
- result2 = 45
- result3 = 66
- result4 = 47
- result"n" = 50
The calculus should start at the second result from the list and descend until the last result, and then sum up those results:
result2 - result1 = 12,
result3 - result2 = 21,
result4 - result3 = 19,
result"n" - result4= 3
sum = 12 + 21 + 19 + 3 = 55
I am new at scripting, and so far i only came up with this solution:
$numbers
$1=[math]::abs($numbers[0]-$numbers[1])
$2=[math]::abs($numbers[1]-$numbers[2])
$3=[math]::abs($numbers[2]-$numbers[3])
$4=[math]::abs($numbers[3]-$numbers[4])
write-host "the results = $1, $2, $3, $4"
$sum = $1 + $2 + $3 + $4
The problem is that the list is dynamic and changes in length, one time there are 10 results and one time 20 for example.
I found a similar question here, but i don't know how to implement the solution to my case, as that is too complicated for me.
What you need is a For loop. It is structured as such:
For(<initial declaration, usually a start point like $i = 0>; <Condition to stop when false>;<Action to perform on each iteration to progress loop>){
Code to perform on each loop
}
For you we would do something like:
For($i=1;$i -le $numbers.count;$i++)
That starts 1, and since arrays start at 0 this will get you going with the second record. Then in the scriptblock we do something like:
{
[array]$Results += [math]::abs($numbers[$i] - $numbers[($i-1)])
}
That will get the differences for you, then to display them you can do something like:
"the results = " + ($Results -join ", ")
$sum = $Results|Measure -sum|select -expand Sum
So you put that all together and get
For($i=1;$i -le $numbers.count;$i++){
[array]$Results += [math]::abs($numbers[$i] - $numbers[($i-1)])
}
"the results = " + ($Results -join ", ")
$sum = $Results|Measure -sum|select -expand Sum
Use a for loop, use the length of your $numbers array to know when to stop.
$numbers = #(33,45,66,47,50)
$sum = 0
for($cur=1;$cur -lt $numbers.Length; $cur += 1){
$sum += [math]::abs($numbers[$cur]-$numbers[$cur-1]);
}
$sum