I have the following table:
t:(([]y:2001 2002) cross ([]m:5 6 7) cross ([]sector:`running`hiking`swimming`cycling)),'([]sales: 14 12 5 9 4 894 1 4 87 12 24 6 4 8 64 354 3 4 86 43 1053 2 43 4);
y m sector sales
------------------------
2001 5 running 14
2001 5 hiking 12
2001 5 swimming 5
2001 5 cycling 9
2001 6 running 4
2001 6 hiking 894
2001 6 swimming 1
2001 6 cycling 4
...
2002 5 running 4
2002 5 hiking 8
2002 5 swimming 64
2002 5 cycling 354
2002 6 running 3
...
I want to pivot the sales values by sector, while keeping the first two y and m columns, such that the resulting table would look like this:
y m cycling hiking running swimming
--------------------------------------
2001 5 9 12 14 5
2001 6 4 894 4 1
2001 7 6 12 87 24
2002 5 354 8 4 64
2002 6 43 4 3 86
2002 7 4 2 1053 43
As per
https://code.kx.com/v2/kb/pivoting-tables/
q) P:asc exec distinct sector from t;
q) exec P#(sector!sales) by y:y,m:m from t
You can unkey the result by () xkey if you need a normal table.
In KDB I have this table:
q)tab
items sales prices adjust factor
--------------------------------
nut 6 10 1b 1.2
bolt 8 20 1b 1.5
cam 0 15 1b 2
cog 3 20 0b 0n
nut 6 10 0b 0n
bolt 8 20 0b 0n
I would like to compute a 4th column based on a condition, for instance:
if[adjust; prices * factor;]
The aim is to get the following result:
items sales prices newPrices
----------------------------
nut 6 10 12
bolt 8 20 30
cam 0 15 30
cog 3 20 20
nut 6 10 10
bolt 8 20 20
Can some please help me out?
I think you're looking for something like:
q) update newPrices:?[adjust;prices*factor;prices] from tab
items sales prices adjust factor newPrices
------------------------------------------
nut 6 10 1 1.2 12
bolt 8 20 1 1.5 30
cam 0 15 1 1 15
cog 3 20 0 20
nut 6 10 0 10
bolt 8 20 0 20
You can use a dictionary and fill the prices
q)d:`nut`bolt`cam!1.2 1.5 2
q)update newPrices:prices^prices*d items from tab
items sales prices newPrices
----------------------------
nut 6 10 12
bolt 8 20 30
cam 0 15 30
cog 3 20 20
bolt
screw
New to [R]studio and respectfully requesting help.
Goal: I'd like to take data collected at 1 second intervals, collapse it to 30 sec intervals, and, subsequently, have the "mean" of each variable associated with it.
Here is what my data looks like:
line datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
What I'd like to see is this:
line datetime AA BB CC
1 2016-06-27 14:13:16 18 1 50.5
2 2016-06-27 14:13:46 19 1 52.8
(here, variables AA, BB, and CC were averaged).
There have been questions similar to this, but none that were similar enough to give me a foundation to work on with my little coding and programming knowledge. I've been pacing back and forth between probable base r solutions and probable package solutions to no avail; mainly because the language/syntax implementation is still a bit foreign to me.
I think you want to try this: (base solution)
etw
datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
aggregate(x = etw, by = list(cut(etw$datetime,breaks = "10 sec")), FUN=mean )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:16 2016-06-27 14:13:20 7.8 0.3 48.22
2 2016-06-27 14:13:26 2016-06-27 14:13:26 4.0 1.0 53.40
you can change the 10 sec part to 30 sec. however - take care: breaks = "10 sec" will cut the range into 10 sec slices starting with the minimum time. which in your case result in a single slice.
you can also manually define the range using
breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))
aggregate(x = etw,FUN=mean, by = list(cut(etw$datetime,breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))) )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:10 2016-06-27 14:13:17 9.000000 0.0000000 38.75000
2 2016-06-27 14:13:20 2016-06-27 14:13:23 6.571429 0.5714286 54.37143
this is not exactly what you wanted to get but imho - your sample data does not correspond to the desired output :)
There is something wrong with my method or my logic here.
I am trying to sum all the data from both tables. If the two correspond, add them up, if either doesn't correspond, still show the individual query total, ending up with estimates per year in sequence.
I have tried LEFT JOINS, FULL JOINS, (UNIONS). Nothing comes close to just summing where possible and supplying the data otherwise.
The key point here is pb and th_year information are years when the results are needed.
The error must be obvious in my code.
The separate aggregate queries produce the correct results.
Its the combining of the two queries where I am going wrong.
Would appreciate advice on this.
I thought it would be simple.
I think it probably is simple. Just stupidity on my side.
CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2
AS SELECT
a.owner,
a.region,
a.district,
a.plantation,
b.th_year,
a.pb,
a.wc,
sum(a.tcf_calcarea + b.tth_calcarea) AS area,
sum(a.tcf_total + b.tth_total) AS total,
sum(a.tcf_ws + b.tth_ws) AS ws,
sum(a.tcf_util + b.tth_util) AS util,
sum(a.tcf_s + b.tth_s) AS s,
sum(a.tcf_a + b.tth_a) AS a,
sum(a.tcf_b + b.tth_b) AS b,
sum(a.tcf_c + b.tth_c) AS c,
sum(a.tcf_d + b.tth_d) AS d
FROM
(SELECT
cfdata.owner,
cfdata.region,
cfdata.district,
cfdata.plantation,
cfdata.pb,
cfdata.wc,
sum(cfdata.calcarea)AS tcf_calcarea,
sum(cfdata._ba) AS tcf_ba,
sum(cfdata._total) AS tcf_total,
sum( cfdata._ws) AS tcf_ws,
sum( cfdata._util) AS tcf_util,
sum( cfdata._s) AS tcf_s,
sum( cfdata._a) AS tcf_a,
sum( cfdata._b) AS tcf_b,
sum( cfdata._c) AS tcf_c,
sum( cfdata._d) AS tcf_d
FROM cfdata
GROUP BY cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc
ORDER BY cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc) a
JOIN
(SELECT
thdata.owner,
thdata.region,
thdata.district,
thdata.plantation,
thdata.th_year,
thdata.wc,
sum(thdata.calcarea)AS tth_calcarea,
sum(thdata.th_ba) AS tth_ba,
sum(thdata.th_total) AS tth_total,
sum(thdata.th_ws) AS tth_ws,
sum(thdata.th_util) AS tth_util,
sum(thdata.th_s) AS tth_s,
sum(thdata.th_a) AS tth_a,
sum(thdata.th_b) AS tth_b,
sum(thdata.th_c) AS tth_c,
sum(thdata.th_d) AS tth_d
FROM thdata
GROUP BY thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc
ORDER BY thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc) b
ON a.owner = b.owner AND a.region = b.region AND a.district = b.district and a.plantation = b.plantation AND a.pb = b.th_year AND a.wc = b.wc
GROUP BY a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc
ORDER BY a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc
thdata sample:
owner region district plantation compartment calcarea wc plantdate th_year th_age th_dbh th_ht th_vtree th_sph th_ba th_total th_ws th_util th_s th_a th_b th_c th_d thdata_id
KeyProjects Northern Marshlands River Glen A27 14.02 PFN 01/08/2009 2017 8 12.3 7.3 0.0289 179 28 70 14 56 42 14 0 0 0 1
KeyProjects Northern Marshlands River Glen A28 2.1 ESN 01/12/2010 2012 2 4.5 4.2 0 479 2 0 0 0 0 0 0 0 0 2
KeyProjects Northern Marshlands River Glen A28 2.1 ESN 01/12/2010 2014 4 10.2 9.6 0.0188 250 4 11 0 8 4 6 0 0 0 3
KeyProjects Northern Marshlands River Glen A29 2.71 ESN 01/08/2009 2011 2 4.5 4.2 0 479 3 0 0 0 0 0 0 0 0 4
KeyProjects Northern Marshlands River Glen A29 2.71 ESN 01/08/2009 2013 4 10.2 9.6 0.0188 250 5 14 0 11 5 8 0 0 0 5
thdata sample:
owner region district plantation compartment wc pb calcarea cfage dbh ht vtree sph _ba _total _ws _util _s _a _b _c _d tmai umai smai cfdata_id
KeyProjects Northern Marshlands River Glen A01 EF1 2021 5.27 10 14.5 20.4 0.1109 1004 90 585 21 564 84 401 79 0 0 11.1 10.7 1.5 1
KeyProjects Northern Marshlands River Glen A02 EF1 2021 36.1 10 14.5 20.4 0.1109 1004 614 4007 144 3863 578 2744 542 0 0 11.1 10.7 1.5 2
KeyProjects Northern Marshlands River Glen A03 EF1 2021 5.5 10 14.5 20.4 0.1109 1004 94 611 22 589 88 418 83 0 0 11.1 10.7 1.5 3
KeyProjects Northern Marshlands River Glen A04 EF1 2021 11.91 10 14.5 20.4 0.1109 1004 202 1322 48 1274 191 905 179 0 0 11.1 10.7 1.5 4
KeyProjects Northern Marshlands River Glen A05 EF1 2022 39.17 11 14.9 21.8 0.1286 1000 705 5053 157 4857 666 3486 744 0 0 11.7 11.3 1.7 5
expected result:
owner region district plantation th_year pb wc area total ws util s a b c d
KeyProjects Northern Marshlands River Glen 2008 2008 EF1 620.49 44176 1788 42389 7562 31953 2852 0 0
KeyProjects Northern Marshlands River Glen 2009 2009 EF1 635.65 44319 1778 42476 7634 31993 2852 0 0
KeyProjects Northern Marshlands River Glen 2010 2010 EF1 1202.31 87980 3453 84487 14906 63883 5704 0 0
KeyProjects Northern Marshlands River Glen 2011 2011 EF1 1948.37 132378 5275 127104 22662 95895 8556 0 0
KeyProjects Northern Marshlands River Glen 2012 2012 EF1 1378.61 87928 3429 84477 14878 63922 5704 0 0
Ok, you have a few issues with your query:
In the main query, do not use sum(a.tcf_calcarea + b.tth_calcarea) AS area. You can simply add but you should make sure to substitute any NULL values with 0 first: write coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area instead, for all sum()s. This also means you are not aggregating anymore at this level, so you should drop the final GROUP BY clause.
Now make a FULL OUTER JOIN between the two sub-queries. This means you get all rows from both sub-queries joined and where a corresponding row does not exist for either side, there are NULLs for column values.
It makes no sense to ORDER BY in a sub-query, the planner will process the row set in the way it sees best. You should order at the outer level only.
By definition (join condition) b.th_year = a.pb so you can drop one of the two columns.
Some syntactical pointers:
Your sub-queries use only one table so there is no need to work with table aliases, saves you a lot a typing.
More savings: Use positional parameters in your GROUP BY clause, so you can write GROUP BY 1, 2, 3, 4, 5, 6. Same with ORDER BY.
On the JOIN clause you can write USING (owner, region, district, plantation, wc) and then add WHERE a.pb = b.th_year. Other than that being shorter, you do not need sub-query aliases in the main query anymore for any of the USING columns. However, the fact that one join condition does not have corresponding column names does make things slightly more confused; up to you.
All in all, this is what you get:
CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2 AS
SELECT owner, region, district, plantation, b.th_year, wc,
coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area,
...
FROM (
SELECT owner, region, district, plantation, pb, wc,
sum(calcarea) AS tcf_calcarea,
...
FROM cfdata
GROUP BY 1, 2, 3, 4, 5, 6) a
FULL JOIN (
SELECT owner, region, district, plantation, th_year, wc,
sum(calcarea) AS tth_calcarea,
...
FROM thdata
GROUP BY 1, 2, 3, 4, 5, 6) b
USING (owner, region, district, plantation, wc)
WHERE a.pb = b.th_year
ORDER BY 1, 2, 3, 4, 5, 6;