PostgreSQL: Summing info from Two Aggregated Tables - postgresql

There is something wrong with my method or my logic here.
I am trying to sum all the data from both tables. If the two correspond, add them up, if either doesn't correspond, still show the individual query total, ending up with estimates per year in sequence.
I have tried LEFT JOINS, FULL JOINS, (UNIONS). Nothing comes close to just summing where possible and supplying the data otherwise.
The key point here is pb and th_year information are years when the results are needed.
The error must be obvious in my code.
The separate aggregate queries produce the correct results.
Its the combining of the two queries where I am going wrong.
Would appreciate advice on this.
I thought it would be simple.
I think it probably is simple. Just stupidity on my side.
CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2
AS SELECT
a.owner,
a.region,
a.district,
a.plantation,
b.th_year,
a.pb,
a.wc,
sum(a.tcf_calcarea + b.tth_calcarea) AS area,
sum(a.tcf_total + b.tth_total) AS total,
sum(a.tcf_ws + b.tth_ws) AS ws,
sum(a.tcf_util + b.tth_util) AS util,
sum(a.tcf_s + b.tth_s) AS s,
sum(a.tcf_a + b.tth_a) AS a,
sum(a.tcf_b + b.tth_b) AS b,
sum(a.tcf_c + b.tth_c) AS c,
sum(a.tcf_d + b.tth_d) AS d
FROM
(SELECT
cfdata.owner,
cfdata.region,
cfdata.district,
cfdata.plantation,
cfdata.pb,
cfdata.wc,
sum(cfdata.calcarea)AS tcf_calcarea,
sum(cfdata._ba) AS tcf_ba,
sum(cfdata._total) AS tcf_total,
sum( cfdata._ws) AS tcf_ws,
sum( cfdata._util) AS tcf_util,
sum( cfdata._s) AS tcf_s,
sum( cfdata._a) AS tcf_a,
sum( cfdata._b) AS tcf_b,
sum( cfdata._c) AS tcf_c,
sum( cfdata._d) AS tcf_d
FROM cfdata
GROUP BY cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc
ORDER BY cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc) a
JOIN
(SELECT
thdata.owner,
thdata.region,
thdata.district,
thdata.plantation,
thdata.th_year,
thdata.wc,
sum(thdata.calcarea)AS tth_calcarea,
sum(thdata.th_ba) AS tth_ba,
sum(thdata.th_total) AS tth_total,
sum(thdata.th_ws) AS tth_ws,
sum(thdata.th_util) AS tth_util,
sum(thdata.th_s) AS tth_s,
sum(thdata.th_a) AS tth_a,
sum(thdata.th_b) AS tth_b,
sum(thdata.th_c) AS tth_c,
sum(thdata.th_d) AS tth_d
FROM thdata
GROUP BY thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc
ORDER BY thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc) b
ON a.owner = b.owner AND a.region = b.region AND a.district = b.district and a.plantation = b.plantation AND a.pb = b.th_year AND a.wc = b.wc
GROUP BY a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc
ORDER BY a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc
thdata sample:
owner region district plantation compartment calcarea wc plantdate th_year th_age th_dbh th_ht th_vtree th_sph th_ba th_total th_ws th_util th_s th_a th_b th_c th_d thdata_id
KeyProjects Northern Marshlands River Glen A27 14.02 PFN 01/08/2009 2017 8 12.3 7.3 0.0289 179 28 70 14 56 42 14 0 0 0 1
KeyProjects Northern Marshlands River Glen A28 2.1 ESN 01/12/2010 2012 2 4.5 4.2 0 479 2 0 0 0 0 0 0 0 0 2
KeyProjects Northern Marshlands River Glen A28 2.1 ESN 01/12/2010 2014 4 10.2 9.6 0.0188 250 4 11 0 8 4 6 0 0 0 3
KeyProjects Northern Marshlands River Glen A29 2.71 ESN 01/08/2009 2011 2 4.5 4.2 0 479 3 0 0 0 0 0 0 0 0 4
KeyProjects Northern Marshlands River Glen A29 2.71 ESN 01/08/2009 2013 4 10.2 9.6 0.0188 250 5 14 0 11 5 8 0 0 0 5
thdata sample:
owner region district plantation compartment wc pb calcarea cfage dbh ht vtree sph _ba _total _ws _util _s _a _b _c _d tmai umai smai cfdata_id
KeyProjects Northern Marshlands River Glen A01 EF1 2021 5.27 10 14.5 20.4 0.1109 1004 90 585 21 564 84 401 79 0 0 11.1 10.7 1.5 1
KeyProjects Northern Marshlands River Glen A02 EF1 2021 36.1 10 14.5 20.4 0.1109 1004 614 4007 144 3863 578 2744 542 0 0 11.1 10.7 1.5 2
KeyProjects Northern Marshlands River Glen A03 EF1 2021 5.5 10 14.5 20.4 0.1109 1004 94 611 22 589 88 418 83 0 0 11.1 10.7 1.5 3
KeyProjects Northern Marshlands River Glen A04 EF1 2021 11.91 10 14.5 20.4 0.1109 1004 202 1322 48 1274 191 905 179 0 0 11.1 10.7 1.5 4
KeyProjects Northern Marshlands River Glen A05 EF1 2022 39.17 11 14.9 21.8 0.1286 1000 705 5053 157 4857 666 3486 744 0 0 11.7 11.3 1.7 5
expected result:
owner region district plantation th_year pb wc area total ws util s a b c d
KeyProjects Northern Marshlands River Glen 2008 2008 EF1 620.49 44176 1788 42389 7562 31953 2852 0 0
KeyProjects Northern Marshlands River Glen 2009 2009 EF1 635.65 44319 1778 42476 7634 31993 2852 0 0
KeyProjects Northern Marshlands River Glen 2010 2010 EF1 1202.31 87980 3453 84487 14906 63883 5704 0 0
KeyProjects Northern Marshlands River Glen 2011 2011 EF1 1948.37 132378 5275 127104 22662 95895 8556 0 0
KeyProjects Northern Marshlands River Glen 2012 2012 EF1 1378.61 87928 3429 84477 14878 63922 5704 0 0

Ok, you have a few issues with your query:
In the main query, do not use sum(a.tcf_calcarea + b.tth_calcarea) AS area. You can simply add but you should make sure to substitute any NULL values with 0 first: write coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area instead, for all sum()s. This also means you are not aggregating anymore at this level, so you should drop the final GROUP BY clause.
Now make a FULL OUTER JOIN between the two sub-queries. This means you get all rows from both sub-queries joined and where a corresponding row does not exist for either side, there are NULLs for column values.
It makes no sense to ORDER BY in a sub-query, the planner will process the row set in the way it sees best. You should order at the outer level only.
By definition (join condition) b.th_year = a.pb so you can drop one of the two columns.
Some syntactical pointers:
Your sub-queries use only one table so there is no need to work with table aliases, saves you a lot a typing.
More savings: Use positional parameters in your GROUP BY clause, so you can write GROUP BY 1, 2, 3, 4, 5, 6. Same with ORDER BY.
On the JOIN clause you can write USING (owner, region, district, plantation, wc) and then add WHERE a.pb = b.th_year. Other than that being shorter, you do not need sub-query aliases in the main query anymore for any of the USING columns. However, the fact that one join condition does not have corresponding column names does make things slightly more confused; up to you.
All in all, this is what you get:
CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2 AS
SELECT owner, region, district, plantation, b.th_year, wc,
coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area,
...
FROM (
SELECT owner, region, district, plantation, pb, wc,
sum(calcarea) AS tcf_calcarea,
...
FROM cfdata
GROUP BY 1, 2, 3, 4, 5, 6) a
FULL JOIN (
SELECT owner, region, district, plantation, th_year, wc,
sum(calcarea) AS tth_calcarea,
...
FROM thdata
GROUP BY 1, 2, 3, 4, 5, 6) b
USING (owner, region, district, plantation, wc)
WHERE a.pb = b.th_year
ORDER BY 1, 2, 3, 4, 5, 6;

Related

Assign new matrices for a certain condition, in Matlab

I have a matrix,DataFile=8x8. One of those columns(column 6 or "coarse event") can only be 0 or a 1. It will be 0 for a non-stable condition and 1 for a stable condition.Now for the example:
DataFile = [ 11 5 66 1.2 14.1 0 -1 0.1;...
12 6 67 1.4 15.1 0 -1 0.1;...
13 7 68 1.6 16.1 1 -1 0.2;...
14 8 69 1.7 16.5 1 -2 0.1;...
15 9 68 1.6 16.2 0 -1 0.3;...
16 8 66 1.3 15.7 1 -2 0.0;...
17 5 65 1.5 16.1 1 0 0.0;...
18 6 66 1.2 16.6 0 1 1.0];
With slight changes from the code in the comments:
DataFile =[zeros(1,size(DataFile,2)); DataFile; zeros(1,size(DataFile,2))];
startInd = [find(diff(DataFile(:,6))==1)];
endInd = [find(diff(DataFile(:,6)) <0)];
B={};
for n=1:1:numel(endInd)
B(n)={DataFile(startInd(n):endInd(n),:)};
end
FirstBlock=B{1};
SecondBlock=B{2};
The result is 2 matrices(FirstBlock=3x8,SecondBlock=3x8), which wrongfully includes 0's in the 6th column. It should be giving two matrices(dataIs1(1)=2x8 and dataIs1(2)=2x8), with only 1's in the 6th column.
In reality I would like have the a n-amount of matrices, for which the "coarse event" is 1. Thank you for the help!
The magic word is logical indexing:
If you have a Matrix A:
A=[1 2 3 4 5;...
0 6 7 8 9;...
1 7 8 9 10]
you can extact Row 1 and 2 by:
B=A(A(:,1)==1)
Hope thats waht your looking for, have fun.
To seperate the groups we need to know where they start and end:
endInd = [find(diff(A(:,1))<0) size(A,1)]
startInd = [1 find(diff(A(:,1))==1)]
Then assigne the Data to arrays:
B={};
for n=1:1:numel(endInd)
B(n)={A(startInd(n):endInd(n),:)};
end
Edit:
Your new Data:
DataFile = [ 11 5 66 1.2 14.1 0 -1 0.1;...
12 6 67 1.4 15.1 0 -1 0.1;...
13 7 68 1.6 16.1 1 -1 0.2;...
14 8 69 1.7 16.5 1 -2 0.1;...
15 9 68 1.6 16.2 0 -1 0.3;...
16 8 66 1.3 15.7 1 -2 0.0;...
17 5 65 1.5 16.1 1 0 0.0;...
18 6 66 1.2 16.6 0 1 1.0];
I add some padding to avoid mistakes:
DataFile =[zeros(1,size(DataFile,2)); DataFile; zeros(1,size(DataFile,2))]
Now, as before, we look for the starts and ends of the blocks:
endInd = [find(diff(A(:,1)) <0) -1]
startInd = [find(diff(A(:,1))==1)]
Then assigne the Data to a cell in a arrays:
B={};
for n=1:1:numel(endInd)
B(n)={A(startInd(n):endInd(n),:)};
end
If you want to retrive, say, the second block:
secondBlock=B{2};

Filter text file, with matlab or other program

I have this huge file with wind data from 1953 to 2010, wind speed and wind direction is recorded every hour as shown below. I was wondering if it was possible to filter this file so it only contains wind speeds say above 12 m/s for example. So the dataset would be decreased dramatically. Is this possible to do with Matlab or any other program? What is the simplest way to do it?
Year, month, day, hour, wind speed, wind direction, wind direction
1953 1 1 0 10.0 90 90
1953 1 1 1 10.0 90 90
1953 1 1 2 10.0 90 90
1953 1 1 3 8.0 90 90
1953 1 1 4 8.0 90 90
1953 1 1 5 13.0 90 90
1953 1 1 6 13.0 70 70
1953 1 1 7 14.0 90 90
1953 1 1 8 16.0 90 90
1953 1 1 9 13.0 90 90
1953 1 1 10 13.0 90 90
1953 1 1 11 16.0 90 90
Remove comma (,) from header and save file then use code below
#Read file space deliminator, Offset row=1, col=0
filename = 'input.txt';
M = dlmread(filename,' ',1,0)
#Find index of Speed that is M(:,5) > 12.0
Idx = find(M(:, 5)> 12.0)
#Extact all columns of index (or rows)
M = M(Idx, :)

Aggregating second-by-second sampling interval to 30 sec interval, POSIXct

New to [R]studio and respectfully requesting help.
Goal: I'd like to take data collected at 1 second intervals, collapse it to 30 sec intervals, and, subsequently, have the "mean" of each variable associated with it.
Here is what my data looks like:
line datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
What I'd like to see is this:
line datetime AA BB CC
1 2016-06-27 14:13:16 18 1 50.5
2 2016-06-27 14:13:46 19 1 52.8
(here, variables AA, BB, and CC were averaged).
There have been questions similar to this, but none that were similar enough to give me a foundation to work on with my little coding and programming knowledge. I've been pacing back and forth between probable base r solutions and probable package solutions to no avail; mainly because the language/syntax implementation is still a bit foreign to me.
I think you want to try this: (base solution)
etw
datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
aggregate(x = etw, by = list(cut(etw$datetime,breaks = "10 sec")), FUN=mean )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:16 2016-06-27 14:13:20 7.8 0.3 48.22
2 2016-06-27 14:13:26 2016-06-27 14:13:26 4.0 1.0 53.40
you can change the 10 sec part to 30 sec. however - take care: breaks = "10 sec" will cut the range into 10 sec slices starting with the minimum time. which in your case result in a single slice.
you can also manually define the range using
breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))
aggregate(x = etw,FUN=mean, by = list(cut(etw$datetime,breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))) )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:10 2016-06-27 14:13:17 9.000000 0.0000000 38.75000
2 2016-06-27 14:13:20 2016-06-27 14:13:23 6.571429 0.5714286 54.37143
this is not exactly what you wanted to get but imho - your sample data does not correspond to the desired output :)

joining three tables each with an incomplete set of each others' keys

I have three tables (actually temp tables, each the result of other queries), with very similar data sets; I need to "condense", for lack of a better term, and my limited SQL knowledge is stopping me.
For example, we have Budgets by Code, Estimates by Code, and Actuals by Code. Not all possible values for Code exist in any of the three, nor even in another accessible table.
Budgets
1 $13
2 $22
4 $44
7 $71
Estimates
1 $14
4 $49
5 $55
Actuals
2 $21
3 $33
5 $57
7 $70
What I want:
Code Bgt Est Act
1 13 14 0
2 22 0 21
3 0 0 33
4 44 49 0
5 0 55 57
7 71 0 70
(I don't have to have 0 when there's no value, that's just for illustrative purposes.)
I just have no idea how to approach this - any help appreciated!
Try using Full Outer Join, In your case query will look like -
Select ISNULL(Bgt.Code,ISNULL(EST.Code,Act.Code)) AS Code,
ISNULL(Bgt.Budget,0) AS Bgt,
ISNULL(Est.Estimate,0) AS Est,
ISNULL(Act.Actual,0) AS Bgt
FROM Budget Bgt
FULL OUTER JOIN Estimates Est ON Est.Code=Bgt.Code
FULL OUTER JOIN Actuals Act ON Act.Code=bgt.Code

how to get the row in some column in one table

I have some problem with query in postgresql
I have 7 column in one table
Year Month Date Rain Tmax Tmin ID Stat Location
1996 1 1 3 25.4 20 98212 air
1996 1 2 1 25.4 19.6 96112 land
1996 1 3 -9999 24.6 19.2 97110 sea
1996 1 4 1 22 19 98212 air
1996 1 5 -9999 24.4 19 96112 land
1996 1 6 -9999 24.2 18.6 98212 air
1996 1 7 1 24.2 19.4 96112 land
1996 1 8 -9999 24.8 20 97110 sea
1996 1 9 -9999 25 19.6 97110 sea
I want to query the row in table and get output to the text file with name (ID-Stat Location)
the expected output :
98212-air.txt
Year Month Date Rain Tmax Tmin
1996 1 1 3 25.4 20
1996 1 4 1 22 19
1996 1 6 -9999 24.2 18.6
what should I do?
I'm using postgresql.
thank you..
This is the query to get the output like you told but writing in the text file you need to work.
SELECT year,month,date, rain, tmax,timin
FROM yourTable WHERE Location='air' and id_stat='98212';