I query a whole postgres table using
c.execute("select * from train_temp")
trans=np.array(c.fetchall())
and amid the expected data I got one row with the column names.
trans[-1,]
Out[63]:
array(['ACTION', 'RESOURCE', 'MGR_ID', 'ROLE_ROLLUP_1', 'ROLE_ROLLUP_2',
'ROLE_DEPTNAME', 'ROLE_TITLE', 'ROLE_FAMILY_DESC', 'ROLE_FAMILY',
'ROLE_CODE', None, None, None, None, None, None, None, None, None], dtype=object)
More puzzling is the fact the the number of rows returned match the number of row in the table
trans.shape
Out[67]: (32770, 19)
select count(1) from train_temp ;
count
-------
32770
(1 row)
Here's the schema of the table
Table "public.train_temp"
Column | Type | Modifiers | Storage | Description
---------------------+------------------+-----------+----------+-------------
action | text | | extended |
resource | text | | extended |
mgr_id | text | | extended |
role_rollup_1 | text | | extended |
role_rollup_2 | text | | extended |
role_deptname | text | | extended |
role_title | text | | extended |
role_family_desc | text | | extended |
role_family | text | | extended |
role_code | text | | extended |
av_role_code | double precision | | plain |
av_role_family | double precision | | plain |
av_role_family_desc | double precision | | plain |
av_role_title | double precision | | plain |
av_role_deptname | double precision | | plain |
av_role_rollup_2 | double precision | | plain |
av_role_rollup_1 | double precision | | plain |
av_mgr_id | double precision | | plain |
av_resource | double precision | | plain |
Has OIDs: no
What's going on here? Note it does not happen with all tables. Actually for this last one the process works fine
Table "public.play"
Column | Type | Modifiers | Storage | Description
-----------+------------------+-----------+----------+-------------
row.names | text | | extended |
action | double precision | | plain |
color | text | | extended |
type | text | | extended |
Has OIDs: no
This last table is completely passed as string, while the problematic one respects the data types.
play[1,]
Out[73]:
array(['2', '0.0', 'blue', 'car'],
dtype='|S5')
trans[1,]
Out[74]:
array(['1', '0', '36', '117961', '118413', '119968', '118321', '117906',
'290919', '118322', 0.920412992041299, 0.942349726775956,
0.933439675174014, 0.920412992041299, 0.976, 0.964478764478764,
0.949222217031812, 0.909090909090909, 0.923076923076923], dtype=object)
Thanks for insight.
Actually I just wrote the headers myself when importing the *csv into postgres.
I should have used the header option in psql such
\copy test from 'test.csv' with (delimiter ',' , format csv, header TRUE);
Related
I have a PostgrqSQL table w/ a bunch of columns that are just different sizes of integer.
Table "public.place2022_final"
Column | Type | Collation | Nullable | Default
---------------+---------+-----------+----------+---------
toff | integer | | |
palette_index | bigint | | |
censorship | boolean | | |
row0 | integer | | |
col0 | integer | | |
row1 | integer | | |
col1 | integer | | |
uint_id | bigint | | |
seqno | bigint | | |
I can export it to a CSV, but for my purposes I really want the data to be small. Is there a way I can create a minimal dump to a binary file, w/ a format something like
<8 bytes for # of rows in table><4 bytes for row 1 toff><8 bytes for row 1 palette_index>...<do that for all fields, then repeat for all rows>.
I also know for a fact that all these bigints can be squashed down to 32-bit ints... so doing that "conversion" here would be nice too.
I have a postgres table that has a schema like this
Table "am.old_product"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
p_config_sku | text | | | | extended | |
p_simple_sku | text | | | | extended | |
p_merchant_id | text | | | | extended | |
p_country | character varying(2) | | | | extended | |
p_discount_rate | numeric(10,2) | | | | main | |
p_black_price | numeric(10,2) | | | | main | |
p_red_price | numeric(10,2) | | | | main | |
p_received_at | timestamp with time zone | | | | plain | |
p_event_id | uuid | | | | plain | |
p_is_deleted | boolean | | | | plain | |
Indexes:
"product_p_simple_sku_p_country_p_merchant_id_idx" UNIQUE, btree (p_simple_sku, p_country, p_merchant_id)
"config_sku_country_idx" btree (p_config_sku, p_country)
We decided that it would be a better idea remove the TEXT field merchant_id and move it to another table, and reference it in the product table using a foreign key. So the new schema looks just like this.
Table "am.product"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
p_config_sku | text | | not null | | extended | |
p_simple_sku | text | | not null | | extended | |
p_country | character varying(2) | | not null | | extended | |
p_discount_rate | numeric(10,2) | | | | main | |
p_black_price | numeric(10,2) | | | | main | |
p_red_price | numeric(10,2) | | | | main | |
p_received_at | timestamp with time zone | | not null | | plain | |
p_event_id | uuid | | not null | | plain | |
p_is_deleted | boolean | | | false | plain | |
p_merchant_id_new | integer | | not null | | plain | |
Indexes:
"new_product_p_simple_sku_p_country_p_merchant_id_new_idx" UNIQUE, btree (p_simple_sku, p_country, p_merchant_id_new)
"p_config_sku_country_idx" btree (p_config_sku, p_country)
Foreign-key constraints:
"fk_merchant_id" FOREIGN KEY (p_merchant_id_new) REFERENCES am.merchant(m_id)
Now this should make the product table size drop right? we are using a 4 bytes integer instead of a TEXT. Well not really, the two tables, have the same exact number of rows. The product table (one with integer field) size is 34.3 GB. While the old table's size (with TEXT) has size of 19.7GB
Does anyone have an explanation for that?
At a wild guess you have done this with various ALTER TABLE commands forcing at least one rewrite of the entire table.
The unused space will be gradually re-used, or for a more prompt change try a CLUSTER or VACUUM FULL on the table.
Look at the VACUUM command.
A database file is an organized collection of tuples. A row can be made up of one or more tuples. When you added a new column, you added tuples to the table file. But when you dropped a column, the space occupied by the tuples remains, because to delete it from the file is a costly operation. They are dead tuples.
VACUUM FULL am.product;
This will unfortunately will create exclusive locks on the table, and you won't be able to query it in the process.
I have a very weird issue on our postgresql DB. I have a table called "statement" which has some strange records in it.
Using the command line console psql, I query select * from customer.statement where type in ('QUOTE'); and get 12 rows back. 7 rows look normal, 5 are missing all data except a single column which is a nullable column but seems to hold real values entered by the user. psql tells me that 7 rows were returned even though there are 12. Most of the other columns are not nullable. The weird records look like this:
select * from customer.statement where type = 'QUOTE';
id | issuer_id | recipient_id | recipient_name | recipient_reference | source_statement_id | catalogue_id | reference | issue_date | due_date | description | total | currency | type | tax_level | rounding_mode | status | recall_requested | time_created | time_updated | time_paid
------------------+------------------+------------------+----------------+---------------------+---------------------+--------------+-----------+------------+------------+------------------------------------------------------------------+-----------+----------+-------+-----------+---------------+-----------+------------------+----------------------------+----------------------------+-----------
... 7 valid records removed ...
| | | | | | | | | | Build bulkheads and sheet with plasterboard. +| | | | | | | | | |
| | | | | | | | | | Patch all patches. +| | | | | | | | | |
| | | | | | | | | | Set and sand all joints ready for painting. +| | | | | | | | | |
| | | | | | | | | | Use wall angle on bulkhead in main bedroom. +| | | | | | | | | |
| | | | | | | | | | Build nib and sheet and set in entrance | | | | | | | | | |
(7 rows)
If I run the same query using pgAdmin, I don't see those weird records.
Anyone know what these are?
The plus sign before the separator (+|) indicates a newline character in the displayed string value in psql. So no additional rows, just the same row continued with line breaks. The final line of output in your quote confirms as much: (7 rows).
In pgAdmin you don't see the extra lines as long as you don't increase the height of the field (or copy / paste the content somewhere), but there are multiple lines as well.
Try in psql and in pgAdmin:
test=# SELECT E'This\nis\na\ntest.' AS multi_line, 'foo' AS single_line;
multi_line | single_line
--------------+-------------
This +| foo
is +|
a +|
test. |
(1 row)
The manual about psql:
linestyle
Sets the border line drawing style to one of ascii, old-ascii, or unicode. [...] The default setting is ascii. [...]
ascii style uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin. [...]
I am trying to run this query from spark code using databricks:
select * from svv_table_info
but I am getting this error msg:
Exception in thread "main" java.sql.SQLException: Amazon Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
any opinion why I am getting this?
That view returns table_id which is in the Postgres system type OID.
psql=# \d+ svv_table_info
Column | Type | Modifiers | Storage | Description
---------------+---------------+-----------+----------+-------------
database | text | | extended |
schema | text | | extended |
table_id | oid | | plain |
table | text | | extended |
encoded | text | | extended |
diststyle | text | | extended |
sortkey1 | text | | extended |
max_varchar | integer | | plain |
sortkey1_enc | character(32) | | extended |
sortkey_num | integer | | plain |
size | bigint | | plain |
pct_used | numeric(10,4) | | main |
empty | bigint | | plain |
unsorted | numeric(5,2) | | main |
stats_off | numeric(5,2) | | main |
tbl_rows | numeric(38,0) | | main |
skew_sortkey1 | numeric(19,2) | | main |
skew_rows | numeric(19,2) | | main |
You can cast it to INTEGER and Spark should be able to handle it.
SELECT database,schema,table_id::INT
,"table",encoded,diststyle,sortkey1
,max_varchar,sortkey1_enc,sortkey_num
,size,pct_used,empty,unsorted,stats_off
,tbl_rows,skew_sortkey1,skew_rows
FROM svv_table_info;
My insert query is,
insert into app_library_reports
(app_id,adp_id,reportname,description,searchstr,command,templatename,usereporttemplate,reporttype,sentbothfiles,useprevioustime,usescheduler,cronstr,option,displaysettings,isanalyticsreport,report_columns,chart_config)
values
(25,18,"Report_Barracuda_SpamDomain_summary","Report On Domains Sending Spam Emails","tl_tag:Barracuda_spam AND action:2","BarracudaSpam/Report_Barracuda_SpamDomain_summary.py",,,,,,,,,,,,);
Schema for the table 'app_library_reports' is:
Table "public.app_library_reports"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+---------+------------------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('app_library_reports_id_seq'::regclass) | plain | |
app_id | integer | | plain | |
adp_id | integer | | plain | |
reportname | text | | extended | |
description | text | | extended | |
searchstr | text | | extended | |
command | text | | extended | |
templatename | text | | extended | |
usereporttemplate | boolean | | plain | |
reporttype | text | | extended | |
sentbothfiles | text | | extended | |
useprevioustime | text | | extended | |
usescheduler | text | | extended | |
cronstr | text | | extended | |
option | text | | extended | |
displaysettings | text | | extended | |
isanalyticsreport | boolean | | plain | |
report_columns | json | | extended | |
chart_config | json | | extended | |
Indexes:
"app_library_reports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"app_library_reports_adp_id_fkey" FOREIGN KEY (adp_id) REFERENCES app_library_adapter(id)
"app_library_reports_app_id_fkey" FOREIGN KEY (app_id) REFERENCES app_library_definition(id)
When I execute insert query it gives error:ERROR: syntax error at or near ","
Please help me to find out this error.Thank you.
I'm fairly certain your immediate error is coming from the empty string of commas (i.e. ,,,,,,,) appearing at the end of the INSERT. If you don't want to specify values for a particular column, you can pass NULL for the value. But in your case, since you only specify values for the first 6 columns, another way is to just specify those 6 columns names when you insert:
INSERT INTO app_library_reports
(app_id, adp_id, reportname, description, searchstr, command)
VALUES
(25, 18, 'Report_Barracuda_SpamDomain_summary',
'Report On Domains Sending Spam Emails', 'tl_tag:Barracuda_spam AND action:2',
'BarracudaSpam/Report_Barracuda_SpamDomain_summary.py')
This insert would only work if the columns not specified accept NULL. If some of the other columns are not nullable, then you would have to pass in values for them.