postgres 10 - convert categorical column to presence absence matrix

postgres 10 - convert categorical column to presence absence matrix - postgresql

I'd like to create a new table like so:
original table:
site_id, site_period
1, period_a
2, period_b
2, period_c
3, period_d
4, period_a
4, period_b
desired table:
site_id, period_a, period_b, period_c, period_d
1, 1, 0, 0, 0
2, 0, 1, 1, 0
3, 0, 0, 0, 1
4, 1, 1, 0, 0
This is probably a duplicate question as this is a relatively simple problem, but I didn't know what vocabulary to use to describe it to find a solution. I'm familiar with coding logic, but not terribly comfortable with sql queries. Thanks!

You can use CREATE TABLE ... AS SELECT and conditional aggregation.
CREATE TABLE desiredtable
AS
SELECT site_id,
count(CASE site_period
WHEN 'period_a' THEN
1
END) period_a,
...
count(CASE site_period
WHEN 'period_d' THEN
1
END) period_d
FROM originaltable
GROUP BY site_id;

Related

How to approach summing up individual sets of columns per id in postgreSQL?

Task: I need to sum up relevant values from a json for a specific id. How can I accomplish this in postgreSQL?
I receive post insights from Facebook's Graph API and it contains a cell with a json listing countries with their two letter abbreviation and the corresponding watchtime in ms from that country.
post_id
date
watchtime_per_country
107_[pageID]
2022-09-01
** see json below **
The second part is a table that contains the relevant countries for each [page_id]
page_id
target country
P01
Germany (DE)
P01
Italy (IT)
P02
Mozambique (MZ)
P02
Colombia (CO)
Now I would like to get the sum of
Germany (DE): 162 and Japan (JP): 24 --> 186 for P01
Mozambique (MZ): 3 and 6 --> 9 for P02
So far I have unnested the json and unpacked all possible +-250 country values into own columns but I am not sure whether this is a good approach. After that I am not sure how to build those sums in a flexible efficient way. Or whether it is possible at all in postgreSQL.
Does anyone have an idea?
**** json ****
{"Brazil (BR)": 9210, "Germany (DE)": 162, "Portugal (PT)": 68, "Japan (JP)": 24, "United States (US)": 17, "Italy (IT)": 13, "France (FR)": 9, "United Kingdom (GB)": 8, "Netherlands (NL)": 6, "Belgium (BE)": 6, "Colombia (CO)": 6, "Austria (AT)": 5, "Sweden (SE)": 4, "Canada (CA)": 4, "Argentina (AR)": 3, "Mozambique (MZ)": 3, "Angola (AO)": 3, "Switzerland (CH)": 2, "Saudi Arabia (SA)": 2, "New Zealand (NZ)": 2, "Norway (NO)": 2, "Indonesia (ID)": 2, "Denmark (DK)": 2, "United Arab Emirates (AE)": 2, "Russia (RU)": 2, "Spain (ES)": 1, "China (CN)": 1, "Israel (IL)": 1, "Chile (CL)": 0, "Bulgaria (BG)": 0, "Australia (AU)": 0, "Cape Verde (CV)": 0, "Ireland (IE)": 0, "Egypt (EG)": 0, "Luxembourg (LU)": 0, "Bolivia (BO)": 0, "Paraguay (PY)": 0, "Uruguay (UY)": 0, "Czech Republic (CZ)": 0, "Hungary (HU)": 0, "Finland (FI)": 0, "Algeria (DZ)": 0, "Peru (PE)": 0, "Mexico (MX)": 0, "Guinea-Bissau (GW)": 0}

You have a couple ways you can go. If you will do little the post insights then you can get page sums directly processing the JSON.
Your later comment indicates there may be more. Unpacking the JSON into a single table is the way to go; it is data normalization.
One very slight correction. The 2 character is not a MS coding for the country. It is the ISO-3166 alpha-2 code (defined in ISO-3166-1) (yes MS uses it).
Either way the first step is to extract the keys from the JSON then use those keys to extract the Values. Then JOIN the relevant_countries table on the alpha2 code.
with work_insights (jkey,country_watchtime) as
( select json_object_keys(country_watchtime), country_watchtime
from insights_data_stage
)
, watch_insights(cntry, alpha2, watchtime) as
( select trim(replace(substring(jkey, '^.*\('),'(',''))
, upper(trim(replace(replace(substring(jkey, '\(.*\)'),'(',''),')','')) )
, (country_watchtime->> jkey)::numeric
from work_insights
)
, relevant_codes (page_id, alpha2) as
( select page_id, substring(substring(target_country, '\(..\)'),2,2) alpha2
from relevant_countries
)
select rc.page_id, sum(watchtime) watchtime
from relevant_codes rc
join watch_insights wi
on (wi.alpha2 = rc.alpha2)
where rc.page_id in ('P01','P02')
group by rc.page_id
order by rc.page_id;
For the normalization process you need a country table (as you already said you have) and another table for the normalized insights data. Populating begins the same
parsing as above, but developing columns for each value. Once created you JOIN this table with relevant_countries. (See demo containing both). Note: I normalized the relevant_countries table.
select rc.page_id, sum(pi.watchtime) watchtime
from post_insights pi
join relevant_countries_rev rc on (rc.alpha2 = pi.alpha2)
group by rc.page_id
order by rc.page_id;
Update: The results for P01 do not match your expected results. Your expectations indicate to sum Germany and Japan, but your relevant_countries table indicates Germany and Italy.

Is there a PostGIS function to conditionally merge linestring geometries to the neighboring ones?

I had a lines (multilinestring) table in my PostGIS database (Postgres 11), which I have converted to linestrings and also checked the validity (ST_IsValid()) of new linestring geometries.
create table my_line_tbl as
select
gid gid_multi,
adm_code, t_count,
st_length((st_dump(st_linemerge(geom))).geom)::int len,
(st_dump(st_linemerge(geom))).geom geom
from
my_multiline_tbl
order by gid;
alter table my_line_tbl add column id serial primary key not null;
The first 10 rows look like this:
id, gid_multi, adm_code, t_count, len, geom
1, 1, 30, 5242, 407, LINESTRING(...)
2, 1, 30, 3421, 561, LINESTRING(...)
3, 2, 50, 5248, 3, LINESTRING(...)
4, 2, 50, 1458, 3, LINESTRING(...)
5, 2, 60, 2541, 28, LINESTRING(...)
6, 2, 30, 3325, 4, LINESTRING(...)
7, 2, 20, 1142, 5, LINESTRING(...)
8, 2, 30, 1425, 7, LINESTRING(...)
9, 3, 30, 2254, 4, LINESTRING(...)
10, 3, 50, 2254, 50, LINESTRING(...)
I am trying to develop the logic.
Find all <= 10 m segments and merge those to neighboring geometries
(previous or next) > 10 m
If there are many <= 10 m segments next to
each other merge them to make > 10 m segments (min length: > 10 m)
In case of intersections, merge any <= 10 m segments to the longest neighboring geometry
I thought of using SQL window functions to check the length (st_length()) of succeeding geometries (lead(id) over()), and then merging them, but the problem with this approach is, the successive IDs are not next to each other (do not intersects, st_intersects()).
My code attempt (dynamic SQL) is here, where I try to separate <= 10 and > 10 meter geometries.
with lt10mseg as (
select
id, gid_multi,
len, geom lt10m_geom
from
my_line_tbl
where len <= 10
order by id
), gt10mseg as (
select
id, gid_multi,
len, geom gt10m_geom
from
my_line_tbl
where len > 10
order by id
)
select
st_intersects(lt10m_geom,gt10m_geom)
from
lt10mseg, gt10mseg
order by id
Any help/suggestions (dynamic SQL/PLPGSQL) to continue develop the above logic? The ultimate goal is to get rid of <= 10 m segments by merging them to the neighbors.

Unexpected end of input in Postgresql stored procedure multidimensial array parameter

I have built stored procedure in PostgreSql which accept multidimensial array parameters like below
SELECT horecami.insert_obj_common(
'{"(5, 2, LLLLL rest, 46181, a#a.com, ooo, kkk, 12:09, 20:40, 23, true, 49.667, 48.232, fu, 2011-12-15 15:28:19+04, 2011-12-15 15:28:19+04, 3, 1)"}'::obj_special[],
'{"(1, 3, q1, q2, q3, q4, qson latest, true, 2011-12-15 15:28:19+04, 2, 2, 3, 2011-12-15 15:28:19+04, ' || '{"(1, 1, 1, 1, 1)"}'::horecami.obj_soft_hardware[] || ')"}'::obj_soft[]
);
Inside this procedure there ara foreach loops that works without problem.
But when i added last extra parametr as array (horecami.obj_soft_hardware[]) it gives me malformed array error.
This is error
ERROR: malformed array literal: "{"(1, 3, q1, q2, q3, q4, qson latest, true, 2011-12-15 15:28:19+04, 2, 2, 3, 2011-12-15 15:28:19+04, "
LINE 3: '{"(1, 3, q1, q2, q3, q4, qson latest, true, 2011-12-15 1...
^
DETAIL: Unexpected end of input.
SQL state: 22P02
Character: 202
It must return number
I guess this syntac error.
Thanks beforehands.

You don't have a multi-dimensional array; you've got an array containing a composite type, which in turn contains an array containing a composite type.
When writing this as a string literal, certain characters have to be escaped (e.g. strings with spaces need quotes, and those quotes need escaping). Then at nested levels they all need to be double quoted and escaped.
To determine what the string literal should look like, just create it using actual arrays and rows (or composite types), then cast to text to get the literal string value with all fields correctly quoted and escaped:
SELECT ARRAY[ROW(1, 3, 'q1', 'q2', 'q3', 'q4', 'qson latest', ARRAY[ROW(1, 1, 1, 1, 1)])]::TEXT
Returns:
{"(1,3,q1,q2,q3,q4,\"qson latest\",\"{\"\"(1,1,1,1,1)\"\"}\")"}

One2many field issue Odoo 10.0

I have this very weird issue with One2many field.
First let me explain you the scenario...
I have a One2many field in sale.order.line, below code will explain the structure better
class testModule(models.Model):
_name = 'test.module'
name = fields.Char()
class testModule2(models.Model):
_name = 'test.module2'
location_id = fields.Many2one('test.module')
field1 = fields.Char()
field2 = fields.Many2one('sale.order.line')
class testModule3(models.Model):
_inherit = 'sale.order.line'
test_location = fields.One2many('test.module2', 'field2')
CASE 1:
Now what is happening is that when i create a new sales order, i select the partner_id and then add a sale.order.line and inside this line i add the One2many field test_location and then i save.
CASE 2:
Create new sales order, select partner_id then add sale.order.line and inside the sale.order.line add the test_location line [close the sales order line window]. Now after the entry before hitting save i change a field say partner_id and then click save.
CASE 3:
this case is same as case 2 but with the addition that i again change the partner_id field [changes made total 2 times first of case2 and then now], then i click on save.
RESULTS
CASE 1 works fine.
CASE 2 has a issue of
odoo.sql_db: bad query: INSERT INTO "test_module2" ("id", "field2", "field1", "location_id", "create_uid", "write_uid", "create_date", "write_date") VALUES(nextval('test_module2_id_seq'), 27, 'asd', ARRAY[1, '1'], 1, 1, (now() at time zone 'UTC'), (now() at time zone 'UTC')) RETURNING id
ProgrammingError: column "location_id" is of type integer but expression is of type integer[]
LINE 1: ...VALUES(nextval('test_module2_id_seq'), 27, 'asd', ARRAY[1, '...
now for this case i put a debugger on create/write method of sale.order.line to see waht the values are getting passed..
values = {u'product_uom': 1, u'sequence': 0, u'price_unit': 885, u'product_uom_qty': 1, u'qty_invoiced': 0, u'procurement_ids': [[5]], u'qty_delivered': 0, u'qty_to_invoice': 0, u'qty_delivered_updateable': False, u'customer_lead': 0, u'analytic_tag_ids': [[5]], u'state': u'draft', u'tax_id': [[5]], u'test_location': [[5], [0, 0, {u'field1': u'asd', u'location_id': [1, u'1']}]], 'order_id': 20, u'price_subtotal': 885, u'discount': 0, u'layout_category_id': False, u'product_id': 29, u'price_total': 885, u'invoice_status': u'no', u'name': u'[CARD] Graphics Card', u'invoice_lines': [[5]]}
in the above values location_id is getting passed like u'location_id': [1, u'1']}]] which is not correct...so for this i correct the issue in code and the update the values and pass that...
CASE 3
if the user changes the field say 2 or more than 2 times then the values are
values = {u'invoice_lines': [[5]], u'procurement_ids': [[5]], u'tax_id': [[5]], u'test_location': [[5], [1, 7, {u'field1': u'asd', u'location_id': False}]], u'analytic_tag_ids': [[5]]}
here
u'location_id': False
MULTIPLE CASE
if the user does case 1 the on the same record does case 2 or case 3 then sometimes the line will be saved as field2 = Null or False in the database other values like location_id and field1 will have data but not field2
NOTE: THIS HAPPENS WITH ANY FIELD NOT ONLY PARTNER_ID FIELD ON HEADER LEVEL OF SALE ORDER
I tried debugging myself but couldn't find the reason why this is happening .

What does the exclude_nodata_value argument to ST_DumpValues do?

Could anyone explain what the exclude_nodata_value argument to ST_DumpValues does?
For example, given the following:
WITH
-- Create a raster 4x4 raster, with each value set to 8 and NODATA set to -99.
tbl_1 AS (
SELECT
ST_AddBand(
ST_MakeEmptyRaster(4, 4, 0, 0, 1, -1, 0, 0, 4326),
1, '32BF', 8, -99
) AS rast
),
-- Set the values in rows 1 and 2 to -99.
tbl_2 AS (
SELECT
ST_SetValues(
rast, 1, 1, 1, 4, 2, -99, FALSE
) AS rast FROM tbl_1)
Why does the following select statement return NULLs in the first two rows:
SELECT ST_DumpValues(rast, 1, TRUE) AS cell_values FROM tbl_2;
Like this:
{{NULL,NULL,NULL,NULL},{NULL,NULL,NULL,NULL},{8,8,8,8},{8,8,8,8}}
But the following select statement return -99s?
SELECT ST_DumpValues(rast, 1, FALSE) AS cell_values FROM tbl_2;
Like this:
{{-99,-99,-99,-99},{-99,-99,-99,-99},{8,8,8,8},{8,8,8,8}}
Clearly, with both statements the first two rows really contain -99s. However, in the first case (exclude_nodata_value=TRUE) these values have been masked (but not replaced) by NULLS.
Thanks for any help. The subtle differences between NULL and NODATA within PostGIS have been driving me crazy for several days.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgres 10 - convert categorical column to presence absence matrix - postgresql

You can use CREATE TABLE ... AS SELECT and conditional aggregation. CREATE TABLE desiredtable AS SELECT site_id, count(CASE site_period WHEN 'period_a' THEN 1 END) period_a, ... count(CASE site_period WHEN 'period_d' THEN 1 END) period_d FROM originaltable GROUP BY site_id;

Related

How to approach summing up individual sets of columns per id in postgreSQL?

Is there a PostGIS function to conditionally merge linestring geometries to the neighboring ones?

Unexpected end of input in Postgresql stored procedure multidimensial array parameter

One2many field issue Odoo 10.0

What does the exclude_nodata_value argument to ST_DumpValues do?

Categories

Resources