Getting an average of each column from CSV - powershell

I am trying to get the average of all the columns in CSV with respect to timestamp. The object type is system.array. Whenever I try to convert the integer it is showing an error.
timestamp streams TRP A B C D
6/4/2019 6775 305 56 229 132 764
6/4/2019 6910 316 28 356 118 134
6/4/2019 6749 316 54 218 206 144
6/5/2019 5186 267 84 280 452 258
6/5/2019 5187 240 33 436 455 245
6/5/2019 5224 291 21 245 192 654
6/6/2019 5254 343 42 636 403 789
6/6/2019 5180 252 23 169 328 888
6/6/2019 5181 290 32 788 129 745
6/6/2019 5244 328 44 540 403 989
I got help from Lee_Dailey on the below code, I was trying to produce the average of each column based on timestamp. I get an error
Cannot convert value " " to type "System.Int32".
Error: "Index was outside the bounds of the array."
+ ... l = [Math]::Round(($GIS_Item.Group.$TPL_Item.ForEach({[int]$_}) | Mea ...
+ ~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
$InStuff = Import-Csv 'M:\MyDoc\script\logfiles\Output_18Mar\streams_E1WAF2_OUTPUT.csv'
$TargetPropertyList = $InStuff[0].PSObject.Properties.Name.Where({$_ - ne 'TimeStamp'})
$GroupedInStuff = $InStuff | Group-Object -Property TimeStamp
$Results = foreach ($GIS_Item in $GroupedInStuff) {
$HighestValues = [ordered]#{
TimeStamp = $GIS_Item.Name
}
foreach ($TPL_Item in $TargetPropertyList) {
$TempHiVal = [Math]::Round(($GIS_Item.Group.$TPL_Item.ForEach({[int]$_}) | Measure-Object -Average).Average)
$HighestValues.Add($TPL_Item, $TempHiVal)
}
[PSCustomObject]$HighestValues
}
$Results = $Results | Sort-Object -Property {[DateTime]$_.TimeStamp}

here's a way to deal with the somewhat damaged CSV file you presented in the new data set. [grin] it reads the file in as plain text, removes the spaces, and trims the final |.
i was wondering if having only one date in a grouping was a problem, so i added a final line to the data set that has a different date.
# fake reading in a defective CSV file as plain text
# in real life, use Get-Content
$InStuff = #'
timestamp|abc | A | B | C | D | E | F | G |
6/4/2019 |6775 | 3059 | 4 | 2292 | 1328 | 764 | 0 | 0 |
6/4/2019 |6910 | 3167 | 28 | 3568 | 1180 | 1348 | 0 | 0 |
6/4/2019 |6749 | 3161 | 0 | 2180 | 2060 | 1440 | 0 | 28 |
6/5/2019 |6738 | 3118 | 4 | 2736 | 1396 | 984 | 0 | 0 |
6/5/2019 |6718 | 3130 | 12 | 3076 | 1008 | 452 | 0 | 4 |
6/5/2019 |6894 | 3046 | 4 | 2284 | 1556 | 624 | 0 | 0 |
1/1/2021 |1111 | 2222 | 3 | 4444 | 5555 | 666 | 7 | 8 |
'# -split [System.Environment]::NewLine
$CleanedInStuff = $InStuff.ForEach({$_.Replace(' ', '').Trim('|')}) |
ConvertFrom-Csv -Delimiter '|'
$TargetPropertyList = $CleanedInStuff[0].PSObject.Properties.Name.
Where({
$_ -ne 'TimeStamp'
})
$GroupedCIS = $CleanedInStuff |
Group-Object -Property TimeStamp
$Results = foreach ($GCIS_Item in $GroupedCIS) {
$TempObject = [ordered]#{
TimeStamp = $GCIS_Item.Name
}
foreach ($TPL_Item in $TargetPropertyList) {
$TempAveValue = [Math]::Round(($GCIS_Item.Group.$TPL_Item.
ForEach({[int]$_}) |
Measure-Object -Average).Average, 2)
$TempObject.Add($TPL_Item, $TempAveValue)
}
[PSCustomObject]$TempObject
}
$Results = $Results |
Sort-Object -Property {
[DateTime]$_.TimeStamp
}
$Results
output ...
TimeStamp : 6/4/2019
abc : 6811.33
A : 3129
B : 10.67
C : 2680
D : 1522.67
E : 1184
F : 0
G : 9.33
TimeStamp : 6/5/2019
abc : 6783.33
A : 3098
B : 6.67
C : 2698.67
D : 1320
E : 686.67
F : 0
G : 1.33
TimeStamp : 1/1/2021
abc : 1111
A : 2222
B : 3
C : 4444
D : 5555
E : 666
F : 7
G : 8

There is probably a nicer way to do this that Lee will surely advise, but this is how I would complete the task. This will group the objects by the Timestamp property and then you can average them there. I recommend playing around with $csv | Group-Object timestamp and see what you can do with it.
$csv = import-csv C:\temp\test.csv
$Averages = New-Object System.Collections.ArrayList
Foreach($object in ($csv | Group-Object timestamp)) {
$Averages.Add([pscustomobject]#{
timestamp = $object.Name
abc = ($object.group | Select-Object -ExpandProperty abc | Measure-Object -Average).Average
a = ($object.group | Select-Object -ExpandProperty a | Measure-Object -Average).Average
b = ($object.group | Select-Object -ExpandProperty b | Measure-Object -Average).Average
c = ($object.group | Select-Object -ExpandProperty c | Measure-Object -Average).Average
d = ($object.group | Select-Object -ExpandProperty d | Measure-Object -Average).Average
e = ($object.group | Select-Object -ExpandProperty e | Measure-Object -Average).Average
f = ($object.group | Select-Object -ExpandProperty f | Measure-Object -Average).Average
g = ($object.group | Select-Object -ExpandProperty g | Measure-Object -Average).Average
})
}
Output:
PS H:\> $Averages
timestamp : 6/4/2019
abc : 6811.33333333333
a : 3129
b : 10.6666666666667
c : 2680
d : 1522.66666666667
e : 1184
f : 0
g : 9.33333333333333
timestamp : 6/5/2019
abc : 6783.33333333333
a : 3098
b : 6.66666666666667
c : 2698.66666666667
d : 1320
e : 686.666666666667
f : 0
g : 1.33333333333333

The ConvertFrom-SourceTable cmdlet I created is actually supposed to read fixed width column tables but there isn't any reason why it shouldn't be able to also read delimited (or distorted) tables. So this question has encouraged me to create an update which no longer produces errors when this happens.
Converting the table in the question
$Table = '
timestamp streams TRP A B C D
6/4/2019 6775 305 56 229 132 764
6/4/2019 6910 316 28 356 118 134
6/4/2019 6749 316 54 218 206 144
6/5/2019 5186 267 84 280 452 258
6/5/2019 5187 240 33 436 455 245
6/5/2019 5224 291 21 245 192 654
6/6/2019 5254 343 42 636 403 789
6/6/2019 5180 252 23 169 328 888
6/6/2019 5181 290 32 788 129 745
6/6/2019 5244 328 44 540 403 989
'
# Raw Table
ConvertFrom-SourceTable $Table | Format-Table
timestamp streams TRP A B C D
--------- ------- --- - - - -
6/4/2019 6775 305 56 229 132 764
6/4/2019 6910 316 28 356 118 134
6/4/2019 6749 316 54 218 206 144
6/5/2019 5186 267 84 280 452 258
6/5/2019 5187 240 33 436 455 245
6/5/2019 5224 291 21 245 192 654
6/6/2019 5254 343 42 636 403 789
6/6/2019 5180 252 23 169 328 888
6/6/2019 5181 290 32 788 129 745
6/6/2019 5244 328 44 540 403 989
#Streamed rows from pipeline:
$Table -split [System.Environment]::NewLine | ConvertFrom-SourceTable | Format-Table
timestamp streams TRP A B C D
--------- ------- --- - - - -
6/4/2019 6775 305 56 229 132 764
6/4/2019 6910 316 28 356 118 134
6/4/2019 6749 316 54 218 206 144
6/5/2019 5186 267 84 280 452 258
6/5/2019 5187 240 33 436 455 245
6/5/2019 5224 291 21 245 192 654
6/6/2019 5254 343 42 636 403 789
6/6/2019 5180 252 23 169 328 888
6/6/2019 5181 290 32 788 129 745
6/6/2019 5244 328 44 540 403 989
Fixed width column table with vertical rulers
$Table = '
| date | abc | A | B | C | D | E | F | G |
| 6/4/2019 | 6775 | 3059 | 4 | 2292 | 1328 | 764 | 0 | 0 |
| 6/4/2019 | 6910 | 3167 | 28 | 3568 | 1180 | 1348 | 0 | 0 |
| 6/4/2019 | 6749 | 3161 | 0 | 2180 | 2060 | 1440 | 0 | 28 |
| 6/5/2019 | 6738 | 3118 | 4 | 2736 | 1396 | 984 | 0 | 0 |
| 6/5/2019 | 6718 | 3130 | 12 | 3076 | 1008 | 452 | 0 | 4 |
| 6/5/2019 | 6894 | 3046 | 4 | 2284 | 1556 | 624 | 0 | 0 |
| 1/1/2021 | 1111 | 2222 | 3 | 4444 | 5555 | 666 | 7 | 8 |
'
# Raw Table
ConvertFrom-SourceTable $Table | Format-Table
date abc A B C D E F G
---- --- - - - - - - -
6/4/2019 6775 3059 4 2292 1328 764 0 0
6/4/2019 6910 3167 28 3568 1180 1348 0 0
6/4/2019 6749 3161 0 2180 2060 1440 0 28
6/5/2019 6738 3118 4 2736 1396 984 0 0
6/5/2019 6718 3130 12 3076 1008 452 0 4
6/5/2019 6894 3046 4 2284 1556 624 0 0
1/1/2021 1111 2222 3 4444 5555 666 7 8
#Streamed rows from pipeline:
$Table -split [System.Environment]::NewLine | ConvertFrom-SourceTable | Format-Table
date abc A B C D E F G
---- --- - - - - - - -
6/4/2019 6775 3059 4 2292 1328 764 0 0
6/4/2019 6910 3167 28 3568 1180 1348 0 0
6/4/2019 6749 3161 0 2180 2060 1440 0 28
6/5/2019 6738 3118 4 2736 1396 984 0 0
6/5/2019 6718 3130 12 3076 1008 452 0 4
6/5/2019 6894 3046 4 2284 1556 624 0 0
1/1/2021 1111 2222 3 4444 5555 666 7 8
Note the type casting (table alignment) in the results, meaning that the result is symmetrical:
$Result = $Table | ConvertFrom-SourceTable | Format-Table
$Result | Format-Table <=> $Result | ConvertFrom-SourceTable | Format-Table
Distorted table
$Table = '
timestamp|abc | A | B | C | D | E | F | G |
6/4/2019 |6775 | 3059 | 4 | 2292 | 1328 | 764 | 0 | 0 |
6/4/2019 |6910 | 3167 | 28 | 3568 | 1180 | 1348 | 0 | 0 |
6/4/2019 |6749 | 3161 | 0 | 2180 | 2060 | 1440 | 0 | 28 |
6/5/2019 |6738 | 3118 | 4 | 2736 | 1396 | 984 | 0 | 0 |
6/5/2019 |6718 | 3130 | 12 | 3076 | 1008 | 452 | 0 | 4 |
6/5/2019 |6894 | 3046 | 4 | 2284 | 1556 | 624 | 0 | 0 |
1/1/2021 |1111 | 2222 | 3 | 4444 | 5555 | 666 | 7 | 8 |
'
# Raw Table
ConvertFrom-SourceTable $Table | Format-Table
timestamp abc A B C D E F G
--------- --- - - - - - - -
6/4/2019 6775 3059 4 2292 1328 764 0 0
6/4/2019 6910 3167 28 3568 1180 1348 0 0
6/4/2019 6749 3161 0 2180 2060 1440 0 28
6/5/2019 6738 3118 4 2736 1396 984 0 0
6/5/2019 6718 3130 12 3076 1008 452 0 4
6/5/2019 6894 3046 4 2284 1556 624 0 0
1/1/2021 1111 2222 3 4444 5555 666 7 8
#Streamed rows from pipeline:
$Table -split [System.Environment]::NewLine | ConvertFrom-SourceTable | Format-Table
timestamp abc A B C D E F G
--------- --- - - - - - - -
6/4/2019 6775 3059 4 2292 1328 764 0 0
6/4/2019 6910 3167 28 3568 1180 1348 0 0
6/4/2019 6749 3161 0 2180 2060 1440 0 28
6/5/2019 6738 3118 4 2736 1396 984 0 0
6/5/2019 6718 3130 12 3076 1008 452 0 4
6/5/2019 6894 3046 4 2284 1556 624 0 0
1/1/2021 1111 2222 3 4444 5555 666 7 8
Note that distorted rows will always result in a literal (string) conversion (no type casting)

Related

Get previous days' value in column in postgres

I have a question.
I have a sql command that is getting the moving average for each day using window functions.
BEGIN;
DROP TABLE IF EXISTS vol_stats;
SELECT pk as fk,
avg(CAST(volume as FLOAT)) over (partition by account_id order by "endts") as average,
INTO vol_stats
from volume_temp
order by account_id, "endts";
COMMIT;
I would like to get one more value and that is the previous days' value.
The data structure looks like this.
acccount_id | value | timestamp
-------------------------------
a12 | 122 | jan 1
a13 | 133 | jan 1
a14 | 443 | jan 1
a12 | 251 | jan 2
a13 | 122 | jan 2
a14 | 331 | jan 2
a12 | 412 | jan 3
a13 | 323 | jan 3
a14 | 432 | jan 3
and we are computing this
acccount_id | value | timestamp | Average
-----------------------------------------
a12 | 122 | jan 1 | 122
a13 | 133 | jan 1 | 133
a14 | 443 | jan 1 | 443
a12 | 251 | jan 2 | 188.5
a13 | 122 | jan 2 | 222.5
a14 | 331 | jan 2 | 387
a12 | 412 | jan 3 | 261.6
a13 | 323 | jan 3 | 192.6
a14 | 432 | jan 3 | 402
What would be helpful would be to grab the previous days' value as well. Like this.
acccount_id | value | timestamp | Average | previous
-----------------------------------------
a12 | 122 | jan 1 | 122 | null
a13 | 133 | jan 1 | 133 | null
a14 | 443 | jan 1 | 443 | null
a12 | 251 | jan 2 | 188.5 | 122
a13 | 122 | jan 2 | 222.5 | 133
a14 | 331 | jan 2 | 387 | 443
a12 | 412 | jan 3 | 261.6 | 251
a13 | 323 | jan 3 | 192.6 | 122
a14 | 432 | jan 3 | 402 | 331
Just add another column to the SELECT list:
lag(volume) OVER (PARTITION BY account_id ORDER BY endts)

find when two equal numbers are in the same column of two tables

I have two tables made this way:
tb1:
id f0 | f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9
---+----+----+----+----+----+----+----+----+----+----
1 90 | 81 | 82 | 83 | 54 | 85 | 86 | 77 | 88 | 79
2 80 | 1 | 62 | 63 | 74 | 55 | 6 | 87 | 68 | 49
...
(9 rows)
tb2:
id r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9
---+-----+----+----+----+----+----+----+----+----+----
1 70 | 11 | 62 | 3 | 44 | 5 | 56 | 77 | 38 | 9
2 50 | 81 | 2 | 23 | 14 | 85 | 26 | 87 | 58 | 19
3 90 | 51 | 82 | 33 | 64 | 25 | 16 | 27 | 48 | 49
---
(9 rows)
the result of the expected query is this:
tfr:
id fr11 fr12 fr21 fr13
---+------+------+------+------
0 | - | - | - | 90
1 | - | 81 | - | -
2 | - | - | 62 | 82
3 | - | - | - | -
4 | - | - | - | -
5 | - | 85 | - | -
6 | - | - | - | -
7 | 77 | - | - | -
8 | - | - | - | -
9 | - | - | - | -
starting with the first line that is with tb1.id = 1 and tb2.id = 1 only in column 7 of both tables there are two equal values,
therefore the number 77 will be shown in the fr11 column on line 7, while on the other rows of the same column there will be a dash.
advancing with tb1.id = 1 and tb2.id = 2 in column 1 and 5 of both tables you will find two equal numbers 81 and 85 respectively
which will be shown in column fr12 in the position of row 1 and 5.
again with tb1.id = 2 and tb2.id = 1 there will be a match on column 2 with the number 62 (column f2 row (2) and column r2 row (1)).
same procedure with tb1.id = 1 and tb2.id = 3 the columns f0 row1, r0 row 3 and f2, r2 have the numbers 90 and 82 equal respectively and will be shown in the column fr13.
how can this query be realized?
thanks in advance for the considerations!

SQL - Queries with GROUP BY?

I want to use a query to get this result:
Postid(20) ---> 4 type(0) et 2 type(1)
Postid(21) ---> 3 type(0) et 3 type(1).
From this table:
id | userid | postid | type
1 | 465 | 20 | 0
2 | 465 | 21 | 1
3 | 466 | 20 | 1
4 | 466 | 21 | 0
5 | 467 | 20 | 0
6 | 467 | 21 | 0
7 | 468 | 20 | 1
8 | 468 | 21 | 1
9 | 469 | 20 | 0
10 | 469 | 21 | 1
11 | 470 | 20 | 0
12 | 470 | 21 | 0
I think I have to use GROUP BY, I tried it but I get no results.
How can I achieve that result?
You need to use an aggregation function, alongside the columns you want to group by in the SELECT part.
Note: Any column that is selected alongside an aggregation function MUST come up in the GROUP BY section.
The following code should answer your question:
SELECT COUNT(id), postid, type FROM table_name GROUP BY postid, type
When using multiple GROUP BY columns, those entries that have all those columns in common will be grouped up, see here: https://stackoverflow.com/a/2421441/9743294

Postgress SQL list gender types with values on the same row

I have the following table:
SELECT name, location_id, store_date, type, fact_count
FROM table_test
ORDER BY name, store_date;
name | location_id | store_date | type | fact_count
Paris | 466 | 2015-12-01 | 0 | 255
Paris | 466 | 2015-12-01 | 1 | 256
Berlin | 329 | 2015-12-01 | 1 | 248
Berlin | 329 | 2015-12-01 | 0 | 244
Prague | 201 | 2015-12-01 | 1 | 107
Prague | 201 | 2015-12-01 | 0 | 102
How can list type + value on the same row (I have always only 2 types)?
name | location_id | store_date | type_0 | fact_count_for_type_0 | type_1 | fact_count_for_type_1
Paris | 466 | 2015-12-01 | 0 | 255 | 1 | 256
Berlin | 329 | 2015-12-01 | 0 | 244 | 1 | 248
Prague | 201 | 2015-12-01 | 0 | 102 | 1 | 107
SELECT
name,
location_id,
store_date,
sum(fact_count * (type = 0)::int) as fact_count_type_0,
sum(fact_count * (type = 1)::int) as fact_count_type_1
FROM table_test
group by 1,2,3
ORDER BY name, store_date;

Divison with more than one result from postgresql query

I am using postgresql and I have a table called accidents (state, total accidents) and another table called population. I want to get the top 3 state names with high total accidents and then get the population of those 3 states divided by total accidents in postgresql? How to write the query in the following way?
Explanation:
Population Table
rank| state | population
---+-----------------------------+------------
1 | Uttar Pradesh | 199581477
2 | Maharashtra | 112372972
3 | Bihar | 103804630
4 | West Bengal | 91347736
5 | Madhya Pradesh | 72597565
6 | Tamil Nadu | 72138958
7 | Rajasthan | 68621012
8 | Karnataka | 61130704
9 | Gujarat | 60383628
10 | Andhra Pradesh | 49665533
11 | Odisha | 41947358
12 | Telangana | 35193978
13 | Kerala | 33387677
14 | Jharkhand | 32966238
15 | Assam | 31169272
16 | Punjab | 27704236
17 | Haryana | 25753081
18 | Chhattisgarh | 25540196
19 | Jammu and Kashmir | 12548926
20 | Uttarakhand | 10116752
21 | Himachal Pradesh | 6856509
22 | Tripura | 3671032
23 | Meghalaya | 2964007
24 | Manipur*β* | 2721756
25 | Nagaland | 1980602
26 | Goa | 1457723
27 | Arunachal Pradesh | 1382611
28 | Mizoram | 1091014
29 | Sikkim | 607688
30 | Delhi | 16753235
31 | Puducherry | 1244464
32 | Chandigarh | 1054686
33 | Andaman and Nicobar Islands | 379944
34 | Dadra and Nagar Haveli | 342853
35 | Daman and Diu | 242911
36 | Lakshadweep | 64429
accident table:
state | eqto8 | eqto10 | mrthn10 | ntknwn | total
-----------------------------+-------+--------+---------+--------+--------
Andhra Pradesh | 6425 | 8657 | 8144 | 19298 | 42524
Arunachal Pradesh | 88 | 76 | 87 | 0 | 251
Assam | 0 | 0 | 0 | 6535 | 6535
Bihar | 2660 | 3938 | 3722 | 0 | 10320
Chhattisgarh | 2888 | 7052 | 3571 | 0 | 13511
Goa | 616 | 1512 | 2184 | 0 | 4312
Gujarat | 4864 | 7864 | 7132 | 8089 | 27949
Haryana | 3365 | 2588 | 4112 | 0 | 10065
Himachal Pradesh | 276 | 626 | 977 | 1020 | 2899
Jammu and Kashmir | 1557 | 618 | 434 | 4100 | 6709
Jharkhand | 1128 | 701 | 1037 | 2845 | 5711
Karnataka | 11167 | 14715 | 18566 | 0 | 44448
Kerala | 5580 | 13271 | 17323 | 0 | 36174
Madhya Pradesh | 15630 | 16226 | 19354 | 0 | 51210
Maharashtra | 4117 | 5350 | 10538 | 46311 | 66316
Manipur | 147 | 453 | 171 | 0 | 771
Meghalaya | 210 | 154 | 119 | 0 | 483
Mizoram | 27 | 58 | 25 | 0 | 110
Nagaland | 11 | 13 | 18 | 0 | 42
Odisha | 1881 | 3120 | 4284 | 0 | 9285
Punjab | 1378 | 2231 | 1825 | 907 | 6341
Rajasthan | 5534 | 5895 | 5475 | 6065 | 22969
Sikkim | 6 | 144 | 8 | 0 | 158
Tamil Nadu | 8424 | 18826 | 29871 | 10636 | 67757
Tripura | 290 | 376 | 222 | 0 | 888
Uttarakhand | 318 | 305 | 456 | 393 | 1472
Uttar Pradesh | 8520 | 10457 | 10995 | 0 | 29972
West Bengal | 1494 | 1311 | 974 | 8511 | 12290
Andaman and Nicobar Islands | 18 | 104 | 114 | 0 | 236
Chandigarh | 112 | 39 | 210 | 58 | 419
Dadra and Nagar Haveli | 40 | 20 | 17 | 8 | 85
Daman and Diu | 11 | 6 | 8 | 25 | 50
Delhi | 0 | 0 | 0 | 6937 | 6937
Lakshadweep | 0 | 0 | 0 | 3 | 3
Puducherry | 154 | 668 | 359 | 0 | 1181
All India | 88936 | 127374 | 152332 | 121741 | 490383
So that result should be
21.57
81.03
107.44
explanation:
Highest accidents states Tamilnadu, Maharashtra, Madhyapradesh.
Tamilnadu population/accidents = 21213/983 = 21.57 (Assumed values)
Maharasthra population/accidents = 10000/123 = 81.03
Madhyapradesh population/accidents = 34812/324 = 107.44
My query is:
SELECT POPULATION/
(SELECT TOTAL
FROM accidents
WHERE STATE NOT LIKE 'All %'
ORDER BY TOTAL DESC
LIMIT 3)
aVG FROM population
WHERE STATE IN
(SELECT STATE
FROM accidents
WHERE STATE NOT LIKE 'All %'
ORDER BY TOTAL DESC
LIMIT 3);
throwing ERROR: more than one row returned by a subquery used as an expression.
How to modify the query to get the required result or any other way to get the result in postgresql?
This ought to do it.
SELECT a.state, population.population/a.total FROM
(SELECT total, state FROM accidents WHERE state <> 'All India' ORDER BY total DESC LIMIT 3 ) AS a
INNER JOIN population on a.state = population.state