How to save results of postgresql to csv/excel file using psycopg2? - postgresql

I use driving_distance in postgresql to find distances between all nodes, and here's my python script in pyscripter,
import sys
#set up psycopg2 environment
import psycopg2
#driving_distance module
query = """
select *
from driving_distance ($$
select
gid as id,
start_id::int4 as source,
end_id::int4 as target,
shape_leng::double precision as cost
from network
$$, %s, %s, %s, %s
)
;"""
#make connection between python and postgresql
conn = psycopg2.connect("dbname = 'routing_template' user = 'postgres' host = 'localhost' password = '****'")
cur = conn.cursor()
#count rows in the table
cur.execute("select count(*) from network")
result = cur.fetchone()
k = result[0] + 1
#run loops
rs = []
i = 1
while i <= k:
cur.execute(query, (i, 1000000, False, False))
rs.append(cur.fetchall())
i = i + 1
#print result
for record in rs:
print record
conn.close()
The result is fine, and part of the it in python interpreter looks like this,
[(1, 2, 35789.4069722436), (2, 2, 31060.0761437413), (3, 19, 30915.1312550546), (4, 3, 33438.0715007666), (5, 4, 29149.0894812718), (6, 7, 25504.020006665), (7, 7, 29594.741802956), (8, 5, 20736.2427352646), (9, 10, 19545.809601197), (10, 8, 22609.5146670393), (11, 9, 14134.5400189648), (12, 11, 12266.7845493204), (13, 18, 17426.7449057031), (14, 21, 11754.7277029158), (15, 18, 13128.3548040769), (16, 20, 21924.2253916803), (17, 11, 15209.9969992088), (18, 20, 26316.7797545076), (19, 13, 604.414419026164), (20, 16, 740.652673783403), (21, 15, 0.0), (22, 15, 2378.768084459)]
[(1, 2, 38168.1750567026), (2, 2, 33438.8442282003), (3, 19, 33293.8993395136), (4, 3, 35816.8395852256), (5, 4, 31527.8575657308), (6, 7, 27882.788091124), (7, 7, 31973.509887415), (8, 5, 23115.0108197236), (9, 10, 21924.577685656), (10, 8, 24988.2827514983), (11, 9, 16513.3081034238), (12, 11, 14645.5526337793), (13, 18, 19805.5129901621), (14, 21, 14133.4957873748), (15, 18, 15507.1228885359), (16, 20, 24302.9934761393), (17, 11, 17588.7650836678), (18, 20, 28695.5478389666), (19, 13, 2983.18250348516), (20, 16, 3119.4207582424), (21, 15, 2378.768084459), (22, 15, 0.0)]
I want to export these results to a new csv or excel files, and I have looked these related post and website,
PostgreSQL: export resulting data from SQL query to Excel/CSV
save (postgres) sql output to csv file
Psycopg 2.5.3.dev0 documentation
But still can't export these working under pyscripter, how can I do?
I am working with postgresql 8.4, python 2.7.6 under Windows 8.1 x64.
Update#1:
I tried the following code provided by Talvalin(thanks!),
import sys
#set up psycopg2 environment
import psycopg2
#driving_distance module
query = """
select *
from driving_distance ($$
select
gid as id,
start_id::int4 as source,
end_id::int4 as target,
shape_leng::double precision as cost
from network
$$, %s, %s, %s, %s
)
"""
#make connection between python and postgresql
conn = psycopg2.connect("dbname = 'TC_routing' user = 'postgres' host = 'localhost' password = '****'")
cur = conn.cursor()
outputquery = 'copy ({0}) to stdout with csv header'.format(query)
with open('resultsfile', 'w') as f:
cur.copy_expert(outputquery, f)
conn.close()
But got error below,
>>>
Traceback (most recent call last):
File "C:/Users/Heinz/Desktop/python_test/driving_distance_loop_test.py", line 27, in <module>
cur.copy_expert(outputquery, f)
ProgrammingError: 錯誤: 在"語法錯誤"附近發生 %
LINE 10: $$, %s, %s, %s, %s
^
Maybe I need to add something more in the code above.

Based on Psycopg2's cursor.copy_expert() and Postgres COPY documentation and your original code sample, please try this out. I tested a similar query export on my laptop, so I'm reasonably confident this should work, but let me know if there are any issues.
import sys
#set up psycopg2 environment
import psycopg2
#driving_distance module
#note the lack of trailing semi-colon in the query string, as per the Postgres documentation
query = """
select *
from driving_distance ($$
select
gid as id,
start_id::int4 as source,
end_id::int4 as target,
shape_leng::double precision as cost
from network
$$, %s, %s, %s, %s
)
"""
#make connection between python and postgresql
conn = psycopg2.connect("dbname = 'routing_template' user = 'postgres' host = 'localhost' password = 'xxxx'")
cur = conn.cursor()
outputquery = "COPY ({0}) TO STDOUT WITH CSV HEADER".format(query)
with open('resultsfile', 'w') as f:
cur.copy_expert(outputquery, f)
conn.close()

Related

Writing a query in SQLAlchemy to count occurrences and store IDs

I'm working with a postgres db using SQLAlchemy.
I have a table like this
class Author(Base):
__tablename__ = "Author"
id = Column(BIGINT, primary_key=True)
name = Column(Unicode)
and I want to identify all homonymous authors and save their id in a list.
For example if in the database there are 2 authors named "John" and 3 named "Jack", with ID respectively 11, 22, 33, 44 a 55, I want my query to return
[("John", [11,22]), ("Jack", [33,44,55])]
For now I've been able to write
[x for x in db_session.query(
func.count(Author.name),
Author.name
).group_by(Author.name) if x[0]>1]
but this just gives me back occurrences
[(2,"John"),(3,"Jack")]
Thank you very much for the help!
The way to do this in SQL would be to use PostgreSQL's array_agg function to group the ids into an array:
SELECT
name,
array_agg(id) AS ids
FROM
my_table
GROUP BY
name
HAVING
count(name) > 1;
The array_agg function collects the ids for each name, and the HAVING clause excludes those with only a single row. The output of the query would look like this:
name │ ids
═══════╪════════════════════
Alice │ {2,4,9,10,16}
Bob │ {1,6,11,12,13}
Carol │ {3,5,7,8,14,15,17}
Translated into SQLAlchemy, the query would look like this:
import sqlalchemy as sa
...
q = (
db_session.query(Author.name, sa.func.array_agg(Author.id).label('ids'))
.group_by(Author.name)
.having(sa.func.count(Author.name) > 1)
)
Calling q.all() will return a list of (name, [ids]) tuples like this:
[
('Alice', [2, 4, 9, 10, 16]),
('Bob', [1, 6, 11, 12, 13]),
('Carol', [3, 5, 7, 8, 14, 15, 17]),
]
In SQLAlchemy 1.4/2.0-style syntax equivalent would be:
with Session() as s:
q = (
sa.select(Author.name, sa.func.array_agg(Author.id).label('ids'))
.group_by(Author.name)
.having(sa.func.count(Author.name) > 1)
)
res = s.execute(q)

Is there a PostGIS function to conditionally merge linestring geometries to the neighboring ones?

I had a lines (multilinestring) table in my PostGIS database (Postgres 11), which I have converted to linestrings and also checked the validity (ST_IsValid()) of new linestring geometries.
create table my_line_tbl as
select
gid gid_multi,
adm_code, t_count,
st_length((st_dump(st_linemerge(geom))).geom)::int len,
(st_dump(st_linemerge(geom))).geom geom
from
my_multiline_tbl
order by gid;
alter table my_line_tbl add column id serial primary key not null;
The first 10 rows look like this:
id, gid_multi, adm_code, t_count, len, geom
1, 1, 30, 5242, 407, LINESTRING(...)
2, 1, 30, 3421, 561, LINESTRING(...)
3, 2, 50, 5248, 3, LINESTRING(...)
4, 2, 50, 1458, 3, LINESTRING(...)
5, 2, 60, 2541, 28, LINESTRING(...)
6, 2, 30, 3325, 4, LINESTRING(...)
7, 2, 20, 1142, 5, LINESTRING(...)
8, 2, 30, 1425, 7, LINESTRING(...)
9, 3, 30, 2254, 4, LINESTRING(...)
10, 3, 50, 2254, 50, LINESTRING(...)
I am trying to develop the logic.
Find all <= 10 m segments and merge those to neighboring geometries
(previous or next) > 10 m
If there are many <= 10 m segments next to
each other merge them to make > 10 m segments (min length: > 10 m)
In case of intersections, merge any <= 10 m segments to the longest neighboring geometry
I thought of using SQL window functions to check the length (st_length()) of succeeding geometries (lead(id) over()), and then merging them, but the problem with this approach is, the successive IDs are not next to each other (do not intersects, st_intersects()).
My code attempt (dynamic SQL) is here, where I try to separate <= 10 and > 10 meter geometries.
with lt10mseg as (
select
id, gid_multi,
len, geom lt10m_geom
from
my_line_tbl
where len <= 10
order by id
), gt10mseg as (
select
id, gid_multi,
len, geom gt10m_geom
from
my_line_tbl
where len > 10
order by id
)
select
st_intersects(lt10m_geom,gt10m_geom)
from
lt10mseg, gt10mseg
order by id
Any help/suggestions (dynamic SQL/PLPGSQL) to continue develop the above logic? The ultimate goal is to get rid of <= 10 m segments by merging them to the neighbors.

converting hex string in ipv6 format in postgresql

I have a hex string like \xfc80000000000000ea508bfff217b628 in bytea format and I want to convert it into fc80:0000:0000:0000:ea50:8bff:f217:b628 in select query, I tried:
select '0:0:0:0:0:0:0:0'::inet + encode(ip::bytea,'hex') from a;
but following error is coming
ERROR: operator does not exist: inet + text
LINE 1: select '0:0:0:0:0:0:0:0'::inet + encode(stationipv6::bytea,'...
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
substring() works with bytea values and you can use that to extract the individual parts of the bytes to convert it to an inet:
select concat_ws(':',
encode(substring(stationipv6, 1, 2), 'hex'),
encode(substring(stationipv6, 3, 2), 'hex'),
encode(substring(stationipv6, 5, 2), 'hex'),
encode(substring(stationipv6, 7, 2), 'hex'),
encode(substring(stationipv6, 9, 2), 'hex'),
encode(substring(stationipv6, 11, 2), 'hex'),
encode(substring(stationipv6, 13, 2), 'hex'),
encode(substring(stationipv6, 15, 2), 'hex')
)::inet
from your_table
works on bytea columns

Aggregation on fixed size JSONB array in PostgreSQL

I'm struggling doing aggregations on a JSONB field in a PostgreSQL database. This is probably easier explained with an example so if create and populate a table called analysis with 2 columns (id and analysis) as follows: -
create table analysis (
id serial primary key,
analysis jsonb
);
insert into analysis
(id, analysis) values
(1, '{"category" : "news", "results" : [1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, null, null]}'),
(2, '{"category" : "news", "results" : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, null, 26]}'),
(3, '{"category" : "news", "results" : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]}'),
(4, '{"category" : "sport", "results" : [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]}'),
(5, '{"category" : "sport", "results" : [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]}'),
(6, '{"category" : "weather", "results" : [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106]}');
As you can see the analysis JSONB field always contains 2 attributes category and results. The results attribute will always contain an fixed length array of size 16. I've used various functions such as jsonb_array_elements but what I'm trying to do is the following: -
Group by analysis->'category'
Average of each array element
When I want is a statement to return 3 rows grouped by category (i.e. news, sport and weather) and a 16 fixed length array containing averages. To further complicate things, if there are nulls in the array then we should ignore them (i.e. we are not simply summing and averaging by the number of rows). The result should look something like the following: -
category | analysis_average
-----------+--------------------------------------------------------------------------------------------------------------
"news" | [14.33, 15.33, 16.33, 17.33, 18.33, 19.33, 20.33, 21.33, 22.33, 23.33, 24.33, 25.33, 26.33, 27.33, 45, 36]
"sport" | [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]
"weather" | [91, 92, 93, 94, 95, 96, 97, 98, 99, 00, 101, 102, 103, 104, 105, 106]
NOTE: Notice the 45 and 36 in the last 2 array itmes on the 1st row which illustrates ignoring the nullss.
I had considered creating a view which exploded the array into 16 columns i.e.
create view analysis_view as
select a.*,
(a.analysis->'results'->>0)::int as result0,
(a.analysis->'results'->>1)::int as result1
/* ... etc for all 16 array entries .. */
from analysis a;
This seems extremely inelegant to me and removes the advantages of using an array in the first place but could probably hack something together using that approach.
Any pointers or tips will be most appreciated!
Also performance is really important here so the higher the performance the better!
This will work for any array length
select category, array_agg(average order by subscript) as average
from (
select
a.analysis->>'category' category,
subscript,
avg(v)::numeric(5,2) as average
from
analysis a,
lateral unnest(
array(select jsonb_array_elements_text(analysis->'results')::int)
) with ordinality s(v,subscript)
group by 1, 2
) s
group by category
;
category | average
----------+----------------------------------------------------------------------------------------------------------
news | {14.33,15.33,16.33,17.33,18.33,19.33,20.33,21.33,22.33,23.33,24.33,25.33,26.33,27.33,45.00,36.00}
sport | {61.00,62.00,63.00,64.00,65.00,66.00,67.00,68.00,69.00,70.00,71.00,72.00,73.00,74.00,75.00,76.00}
weather | {91.00,92.00,93.00,94.00,95.00,96.00,97.00,98.00,99.00,100.00,101.00,102.00,103.00,104.00,105.00,106.00}
table functions - with ordinality
lateral
Because the array is always of the same length, you can use generate_series instead of typing the index of every array element yourself. You CROSS JOIN with that generated series so the index is applied to every category and you can get every element at position s from the array. Then it is just aggregating the data using GROUP BY.
The query then becomes:
SELECT category, array_agg(val ORDER BY s) analysis_average
FROM (
SELECT analysis->'category' category, s, AVG((analysis->'results'->>s)::numeric) val
FROM analysis
CROSS JOIN generate_series(0, 15) s
GROUP BY category,s
) q
GROUP BY category
15 is in this case the last index of the array (16-1).
It can be done in more traditional way like
select
(t.analysis->'category')::varchar,
array_math_avg(array(select jsonb_array_elements_text(t.analysis->'results')::int))::numeric(9,2)[]
from
analysis t
group by 1 order by 1;
but we need to do some preparation:
create type t_array_math_agg as(
c int[],
a numeric[]
);
create or replace function array_math_sum_f(in t_array_math_agg, in numeric[]) returns t_array_math_agg as $$
declare
r t_array_math_agg;
i int;
begin
if $2 is null then
return $1;
end if;
r := $1;
for i in array_lower($2,1)..array_upper($2,1) loop
if coalesce(r.a[i],$2[i]) is null then
r.a[i] := null;
else
r.a[i] := coalesce(r.a[i],0) + coalesce($2[i],0);
r.c[i] := coalesce(r.c[i],0) + 1;
end if;
end loop;
return r;
end; $$ immutable language plpgsql;
create or replace function array_math_avg_final(in t_array_math_agg) returns numeric[] as $$
declare
r numeric[];
i int;
begin
if array_lower($1.a, 1) is null then
return null;
end if;
for i in array_lower($1.a,1)..array_upper($1.a,1) loop
r[i] := $1.a[i] / $1.c[i];
end loop;
return r;
end; $$ immutable language plpgsql;
create aggregate array_math_avg(numeric[]) (
sfunc=array_math_sum_f,
finalfunc=array_math_avg_final,
stype=t_array_math_agg,
initcond='({},{})'
);

PostgreSQL, calculating ingredients from normatives

Before some time i get here very reliable query for calculating ingredients from normatives but with time I get need for advancing a bit.
Here are example tables:
CREATE TABLE myusedfood
(mybill int, mydate text, food_code int, food_name text, qtyu integer, meas text);
INSERT INTO myusedfood (mybill, mydate, food_code, food_name, qtyu, meas)
VALUES (1, '03.01.2014', 10, 'spaghetti', 3, 'pcs'),
(2, '04.01.2014', 156, 'mayonnaise', 2, 'pcs'),
(3, '06.01.2014', 173, 'ketchup', 1, 'pcs'),
(4, '07.01.2014', 172, 'bolognese sauce', 2, 'pcs'),
(5, '08.01.2014', 173, 'ketchup', 1, 'pcs'),
(6, '15.01.2014', 175, 'worchester sauce', 2, 'pcs'),
(7, '16.01.2014', 177, 'parmesan', 1, 'pcs'),
(8, '17.01.2014', 10, 'spaghetti', 2, 'pcs'),
(9, '18.01.2014', 156, 'mayonnaise', 1, 'pcs'),
(10, '19.01.2014', 10, 'spaghetti', 2, 'pcs'),
(11, '19.01.2014', 1256, 'spaghetti rinf', 100, 'gramm'),
(12, '20.01.2014', 156, 'mayonnaise', 2, 'pcs'),
(13, '21.01.2014', 173, 'ketchup', 1, 'pcs'),
(14, '19.01.2014', 10, 'spaghetti', 2, 'pcs');
DROP TABLE IF EXISTS myingredients;
CREATE TABLE myingredients
(food_code int, ingr_code int, ingr_name text, qtyi decimal(10, 3), meas text);
INSERT INTO myingredients (food_code, ingr_code, ingr_name, qtyi, meas)
VALUES (10, 1256, 'spaghetti rinf', 75, 'gramm'),
(156, 1144, 'salt', 0.3, 'gramm'),
(10, 1144, 'salt', 0.5, 'gramm'),
(156, 1140, 'fresh egg', 50, 'gramm'),
(172, 1138, 'tomato', 80, 'gramm'),
(156, 1139, 'mustard', 5, 'gramm'),
(172, 1136, 'clove', 1, 'gramm'),
(156, 1258, 'oil', 120, 'gramm'),
(172, 1135, 'laurel', 0.4, 'gramm'),
(10, 1258, 'oil', 0.4, 'gramm'),
(172, 1130, 'corned beef', 40, 'gramm');
First table contains bill's number and date, notmative's/ingredient's code, name, selled quantity.
Second represent normative's code, ingredient's code, ingredient's name, ingredient's quantity in related normative.
With this query I get listed all ingredients from all normatives plus normatives which haven't any ingredients. That work OK.
SELECT SUM(f.qtyu) AS used,
COALESCE(i.ingr_code, f.food_code) AS code,
COALESCE(i.ingr_name, f.food_name) AS f_name,
SUM(COALESCE(i.qtyi, 1) * f.qtyu) AS qty,
COALESCE(i.meas, f.meas) AS meas
FROM myusedfood f LEFT JOIN myingredients i
ON f.food_code = i.food_code
GROUP BY COALESCE(i.ingr_code, f.food_code),
COALESCE(i.ingr_name, f.food_name),
COALESCE(i.meas, f.meas)
ORDER BY code;
With time my real tables becames huge and queries slower. I often need to get results for just one food code but I don't know how to do that properly regarding of speed.
For example with showed tables if we query for ingredient code 1256 result should be
used ing.code ing.name qty meas
------------------------------------------------------
109, 1256, "spaghetti rinf", 775.000, "gramm"
Where article 1256 is "used" 9 times in normative (10) and 100g :) as alone, all in qty of 775 gramms.
Here I have 2 questions:
How to optimize upper query to list only one code instead of all, say code 1256?
How to get list of all records related to code 1256.
Result should look like this:
bill date code name qty
------------------------------------------------------
1, '03.01.2014', 10, 'spaghetti', 225
8, '17.01.2014', 10, 'spaghetti', 150
10, '19.01.2014', 10, 'spaghetti', 150
11, '19.01.2014', 1256, 'spaghetti rinf', 100
14, '19.01.2014', 10, 'spaghetti', 150
------------------------------------------------------
775
Note that ingredient 1256 may be selled in normative (10) or alone as is.
Please help to get those two needed queries.