I have data 66596470 which i got from calculation in query not from table. but in pgAdmin4 I can only see 1000 of 66596470. How can I see all rows or download them to csv? Thank you...
This is not PostgreSQL you are talking about, but a GUI client that limits the number of result rows. You can probably configure the unnamed tool to display more than 1000 rows, but if you tried to actually see all 66 million rows, that would take a lot of time, both for the client software and you as a reader, and the client would probably go out of memory.
If you want to retrieve (rather than see) the complete result set, use psql's \copy command to write the result set to a file. FOr example:
\copy (SELECT ...) TO 'filename' (FORMAT 'csv')
Related
I am trying to export a large table using
SELECT name1, name2, name3, name4
From table1
GROUP BY 1, 2, 3, 4
However, after waiting for 1 hour, it returns OUT OF Memory DB connection needs to reset. I tried using the COPY table to csv file, but it returns needs to be superuser with STDIN/STDOUT. I am new to Postgresql.
How can I export this table without running out of memory?
Thanks in advance.
I think your best bet would be to chunk it up, I don't see how it's going to process 1.2 billion lines without freaking out.
Have a script that does off 10000 at a time or something and saves the starting index for the next cycle.
Please read error messages carefully and convey them to us verbatim. COPY says you need to be a superuser unless you use STDIN/STDOUT, not with them.
You should use a client-specific method to do this. If your client is psql that would be \copy. If your client is something else, you should tell us what that is.
I have a csv file with 108 columns which i try to import in my postgresql table. It is obvious that I don't want to specify every columns in my CREATE TABLE statement. But when I enter
\COPY 'table_name' FROM 'directory' DELIMITER ',' CSV HEADER; this error message shows up: "ERROR: Extra Data after Last Expected Column". When having a few columns I know how to fix this problem but, like I said, i don't want to specified the entire 108 columns. By the way my table does contain any columns at all. Any help on how I could do that? Thx !
When dealing with problems like this, I often cheat. Plenty of tools exist online for converting CSV to SQL, https://www.convertcsv.com/csv-to-sql.htm being one of them.
Copy/paste your CSV, copy/paste the generated SQL. Not the most elegant solution, although will work as a one-off situation.
Now, if you're looking to repeat this process regularly (automated I hope), then Python may be a interesting language to explore to quickly write a script to do this for you, then schedule it at a CRON job or whatever method you prefer for invoking it automatically with the correct input (CSV file).
Please feel free to let me know if I've misunderstood your original question, or if I can provide any more help give me a shout and I'll do my best!
I get an odd error when trying to query too many dates from a date-partitioned historical database:
q)eod: h"select from eod where date within 2018.01.01 2018.04.22"
'/tablepath/2018.04.04/eod/somecolumn: invalid host
q)eod: h"select from eod where date within 2018.01.17 2018.04.20"
'/tablepath/2018.04.20/eod/othercolumn: invalid host
q)eod: h"select from eod where date within 2018.01.18 2018.04.20"
q)
Note that both dates mentioned in the error messages are within the date range that we manage to extract in the end, and that it fails on a different column each time. This seems to indicate it's something to do with the size of the table being pulled, but when we check the size of the largest table we managed to get:
q)(-22!eod) % 1024 * 1024
646.9043
q)count eod
2872546
we find that it's not particularly large by either memory size nor by number of rows.
Googling for "invalid host" errors doesn't seem to turn up anything relevant, and I'm not seeing anything in the kdb docs about size limits that would be relevant. Anyone got any ideas?
Edit:
When loading the table in a session and making the queries directly, we get what appears to be the same error, but with a different message. For instance:
q)jj: select from eod where date within 2018.01.01 2018.04.22
Too many compressed files open
k){0!(?).#[x;0;p1[;y;z]]}
'./2018.04.04/eod/settlecab: No such file or directory
.
?
(+`exch`date`class..
q.Q))
Note that the file ./2018.04.04/eod/settlecab does in fact exist, and contains data:
I have no problem loading the data for just the date mentioned in the error, and the column mentioned has meaningful values:
q)jj: select from eod where date=2018.04.04
q)select count i by settlecab from jj
settlecab| x
---------| -----
0 | 41573
1 | 2269
The key point seems to be the Too many compressed files open message, but what can I do about this?
Edit for Summary/Solutions:
The table in question had many columns, all stored in a compressed format. When issuing a query against too many dates at once, kdb would try to mmap all of those columns at once, running into a limit on how many compressed files could be open at once.
Once I understood the problem, several solutions were available:
I could pull only certain columns from the database, reducing the number of files that kdb needed to keep open,
I could force kdb to pull all the data into memory by adding a dummy where clause to the query, such as (null column) | not null column (hacky, but it works),
I could have upgraded the kdb version and lifted OS limits (not practical in my case).
I still have no idea why this resulted in an invalid host error when querying the database remotely.
First off, can we just clarify the database structure you're working with. It seems from the filepaths returned in your errors that you've got a date-partitioned database. Did you mean non-segmented database when you said non-partitioned in your original query?
In terms of a fix for your issue, have you tried loading your database into a session, and making those queries directly? If so do you get the same issues?
If that seems to be working alright, the problem might lie with how you're defining your database handle. How is h defined in your original example?
It might also be worth trying to select individual dates from your database, to try and isolate the problem, and to determine if it lies with your on-disk data. Try specifically querying the dates that are mentioned in your errors.
You could also try performing your original queries with a subset a columns, again to try and pinpoint where your issue is coming from.
Let us know if you get any further with this.
Joseph
I have a simple query which make a GROUP BY using two fields:
#facturas =
SELECT a.CodFactura,
Convert.ToInt32(a.Fecha.ToString("yyyyMMdd")) AS DateKey,
SUM(a.Consumo) AS Consumo
FROM #table_facturas AS a
GROUP BY a.CodFactura, a.DateKey;
#table_facturas has 4100 rows but query takes several minutes to finish. Seeing the graph explorer I see it uses 2500 vertices because I'm having 2500 CodFactura+DateKey unique rows. I don't know if it normal ADAL behaviour. Is there any way to reduce the vertices number and execute this query faster?
First: I am not sure your query actually will compile. You would need the Convert expression in your GROUP BY or do it in a previous SELECT statement.
Secondly: In order to answer your question, we would need to know how the full query is defined. Where does #table_facturas come from? How was it produced?
Without this information, I can only give some wild speculative guesses:
If #table_facturas is coming from an actual U-SQL Table, your table is over partitioned/fragmented. This could be because:
you inserted a lot of data originally with a distribution on the grouping columns and you either have a predicate that reduces the number of rows per partition and/or you do not have uptodate statistics (run CREATE STATISTICS on the columns).
you did a lot of INSERT statements, each inserting a small number of rows into the table, thus creating a big number of individual files. This will "scale-out" the processing as well. Use ALTER TABLE REBUILD to recompact.
If it is coming from a fileset, you may have too many small files in the input. See if you can merge them into less, larger files.
You can also try to hint a small number of rows in your query that creates #table_facturas if the above does not help by adding OPTION(ROWCOUNT=4000).
By tsv I mean a file delimited by tabs. I have a pretty large (6GB) data file that I have to import into a PostgreSQL database, and out of 56 columns, the first 8 are meaningful, then out of the other 48 there are several columns (like 7 or so) with 1's sparsely distributed with the rest being 0's. Is there a way to specify which columns in the file you want to copy into the table? If not, then I am fine with importing the whole file and just extracting the desired columns to use as data for my project, but I am concerned about allocating excessively large memory to a table in which less than 1/4 of the data is meaningful. Will this pose an issue, or will I be fine accommodating the meaningful columns into my table? I have considered using that table as a temp table and then importing the meaningful columns to another table, but I have been instructed to try to avoid doing an intermediary cleaning step, so I should be fine directly using the large table if it won't cause any problems in PostgreSQL.
With PostgreSQL 9.3 or newer, COPY accepts a program as input . This option is precisely meant for that kind of pre-processing. For instance, to keep only tab-separated fields 1 to 4 and 7 from a TSV file, you could run:
COPY destination_table FROM PROGRAM 'cut -f1-4,7 /path/to/file' (format csv, delimiter '\t');
This also works with \copy in psql, in which case the program is executed client-side.