Crate.io Copy From 0 rows affected - crate

I have a Crate.io database, running CrateDB version 3.2.7, under Windows Server 2012. (I know it's not the best, it's only for testing purposes right now and will not be the final setup.)
I have create the table dbo.snapshots
I exported the data from SQL Server to a CSV file via the BCP command.
bcp DatabaseName.dbo.Snapshots out F:\Path\dbo_OldSnapshots.dat -S ServerName -U UserName -P Password -a65535 -c -C 65001 -t ,
Then I tried to import the data into CrateDb with the "COPY FROM" command.
COPY dbo.snapshots FROM 'file:///F:/Path/dbo_OldSnapshots.dat';
The file is about 11go big. I know it found the file, as I could see the I/O on the drive in Task Manager.
It ran for about 13 minutes, and then said "0 rows affected". I have no idea why it didn't work, I didn't get any errors.
Any idea what I can do to make it work?
********************************* EDITED ADDED ADDITIONAL INFO ****************************
Ok, so I've found out that you can specify the "RETURN SUMMARY" clause at the end of the COPY command. I tested it with a smaller file.
With that, I got an error that says that the primary key cannot be NULL. I know that it's not NULL in the data that I extract, so I'm gonna have to find out why it says that my primary key is NULL.
So I changed the BCP delimiter to be a comma, since the CSV file for CrateDB must be comma separated, and I manually edited the file to add the column headers since CrateDB asks for a header.I also edited the file in Notepadd++ to save it in UTF-8 encoding, to make sure it was the right encoding.But even with all that, I still get the error saying that the primary key value must not be NULL.

Ok so I managed to make it work. Here is what you need to check if you try to Export data from SQL or another dbms to CrateDB :
- File encoding in UTF-8
- Comma separated file
- The first line needs to be a header with all the columns be carefull, the names are case sensitive so if the column is "MyColumn" in SQL Server, but "mycolumn" in CrateDB, it must be in lower case in the header or else CrateDb won't be able to find it correctly
- If you have DateTime types, it must be between "" in the file (ex: 1,0,"2019-05-10T16:40:00",0,0)
- If you have DateTime types, please note that you need to have T between the date and time part. So "2019-05-10T16:40:00" instead of "2019-05-10 16:40:00"
With all of that in check, I was able to import a sample data in my CrateDB database.

Related

How does the pgadmin encodes the file path in backups?

I'm trying to restore dump files from locations that contain character from other languages besides English.
So here is what I did:
From inside the pgadmin I used the backup tool like:
And inside the FileName input provided an actual real folder named "א":
C:\א\toc.dump
The actual file argument (-f file) has been auto decoded into:
pg_dump.exe --file "C:\\0F04~1\\TOC~1.DUM"
My question is what is the decoding system pgadmin uses in order to decode the file path argument?
How did it came up with 0F04~1 from א?
I'm asking it because pg_restore is not supporting file path that contains not English chars (from cmd):
pg_dump.exe --file "C:\\0F04~1\\TOC1.DUMP" .... WORKS OK!
pg_dump.exe --file "C:\\א\\TOC1.DUMP" ... Not Working!
pg_restore: [custom archiver] could not open input file "..."
As in this question, so if I'll find the encoding system for pgadmin I'll use it from code.
My goal is to encode the path that contain not-English chars from a batch code so it will work.
This is not something weird pgadmin does, but rather it is something weird Windows itself does when needing to represent such file names in a DOS-like setting. Like when the name is more than 8 chars, or extension more than 3.
In my hands the weird presentation is only there in the logs and status messages. If I use the GUI file chooser, the file names look normal, and replay successfully.
If you really want to know what Windows is doing, I think that is a better question for superuser with a Windows tag. I don't know why you can't restore these files. Are you using the pgAdmin GUI file chooser or trying to type the names in directly to something?

How to fix syntax errors in postgresql .sql dump file when restoring with psql?

I have a postgresql .sql dump file created by pg_dump on another windows 10 box. I am trying to restore it on my windows 10 laptop with
"psql -U user -d database -1 -f filename.sql". I created the database, but when I run the command to do the restore I get an error from psql after I give it my password:
psql:filename.sql:1:1: ERROR: syntax error at or near "ÿ_"
LINE 1: ÿ_;
The file looks like straight ascii (I only see two dashes on line one. I don't see a 'y' with an umlaut anywhere). I did a file on the .sql file with cygwin bash, and it says:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line >terminators
I really don't want to recreate the database by hand. I am looking for any suggestions.
I tried psql with and without the '-1' option; no luck. I tried putting a ';' at the top of the sql file, which I found suggested somewhere; again no luck.
I did a psql -l on my postgresql installation and the encoding on all my databases (including the one to which I am trying to do the restore) shows UTF8.
There really is no code. It is just that I can't seem to restore this dump file because it errors out.
I think that captures my problem. The windows box that I got the dump from is not available to me now; so I'm just hoping there is a way to get around this problem. Recreating the database by hand table by table is something I would prefer to avoid.
Thanks--
Al
In my case , this exact thing happened because I was taking the dump using windows Powershell , due to which other characters got included in the dump file.
Simply using command prompt to take the solved my problem.
I can only give you leads how to debug the problem, because the cause is not immediately obvious.
First, there should be a line close to the beginning of the dump file that sets client_encoding. The dump file should be in that encoding.
I can see two possibilities:
The file got mangled during transfer. To test for that, calculate a checksum for both files and compare.
Always use binary mode to transfer PostgreSQL dumps.
some editor or something else sneaked a BOM (byte order mark) into the file at the very beginning.
That's my prime suspect, since the problem is at line 1.
Use a hex editor or od (in Cygwin) to verify that. If this is the problem, simply replace the BOM with spaces.

Cannot restore data from pg_dump due to blank strings being treated as nonexistant data

I have a database currently on a PostgreSQL 9.3.9 server that I am backing up with pgdump in the simplest possible fashion, eg pg_dump orb > mar_9_2018.db.
One of those tables (linktags) has the following definition:
CREATE TABLE linktags (
linktagid integer NOT NULL,
linkid integer,
tagval character varying(1000)
);
When attempting to restore the database on PostgreSQL 11.2 via
cat mar_9_2018.db | docker exec -i pg-docker psql -U postgres
(docker container restore) the table returns empty because of the following error -
ERROR: missing data for column "tagval"
CONTEXT: COPY linktags, line 737: "1185 9325"
setval
I checked the db file and found that there are missing tabs where I would expect some sort of information, and clearly the restore process does as well.
I also verified that the value in the database is a blank string.
So -
Is there an idiomatic method to backup and restore a postgres database I am missing?
Is my version old enough that this version of pg_dump should have some special considerations?
Am I just restoring this wrong?
Edit:
I did some further research and found that I was incorrect in the original checking of NULLs, it was instead blank strings that are causing the issue.
If I make an example table with null strings and then blank strings, I can see the NULLs get a newline but the blank does not
pg_dump has an option to use INSERT instead of COPY
pg_dump -d db_name --inserts
as the manual warns, it might make restoration slow (and much larger dump file). Even in case of some inconsistencies tables will be filled with valid rows.
Another problem is with empty tables, pg_dump generates empty copy statement like:
COPY config (key, value) FROM stdin;
\.
in this case you'll be getting errors on reimport like:
ERROR: invalid input syntax for type smallint: " "
CONTEXT: COPY config, line 1, column group: " "
which doesn't happen with --insert option (no insert statement is generated).

ERROR: could not stat file "XX.csv": Unknown error

I run this command:
COPY XXX FROM 'D:/XXX.csv' WITH (FORMAT CSV, HEADER TRUE, NULL 'NULL')
In Windows 7, it successfully imports CSV files of less than 1GB.
If the file is more then 1GB big, I get an “unknown error”.
[Code: 0, SQL State: XX000] ERROR: could not stat file "'D:/XXX.csv' Unknown error
How can I fix this issue?
You can work around this by piping the file through a program. For example I just used this to copy from a 24GB file on Windows 10 and PostgreSQL 11.
copy t(c,d) from program 'cmd /c "type x:\path\to\file.txt"' with (format text);
This copies the text file file.txt into the table t, columns c and d.
The trick here is to run cmd in a single command mode, with /c and telling it to type out the file in question.
https://github.com/MIT-LCP/mimic-code/issues/493
alistairewj commented Nov 3, 2018 • ►
edited
Okay, the could not stat file "CHARTEVENTS.csv": Unknown error is actually a bug in PostgreSQL 11. Under the hood it makes a call to fstat() to make sure the file is not a directory, and unfortunately fstat() is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.
The best workaround is to keep the files compressed (i.e. keep them as .csv.gz files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/
The brief version of above is that you keep the .csv.gz files, you add the 7zip binary to your windows environment path, and then you call the postgres_load_data_7zip.sql file to load in the data. You can use the postgres_checks.sql file after everything to make sure you loaded in all the data correctly.
edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!
For anyone else who googled this Postgres error message after attempting to work with a >1gb file in Postgres 11, I can confirm that #亚军吴's answer above is spot-on. It is indeed a size issue.
I tried a different approach, though, than #亚军吴's and #Loren's: I simply uninstalled Postgres 11 and installed the stable version of Postgres 10.7. (I'm on Windows 10, by the way, in case that matters.)
I re-ran the original code that had prompted the error and voila, a few minutes later I'd filled in a new table with data from a medium-ish-size csv file (~3gb). I initially tried to use CSVSplitter, per #Loren, which was working fine until I got close to running out of storage space on my machine. (Thanks, Battlefield 5.)
In my case, there isn't anything in PGSQL 11 that I was relying on that wasn't in version 10.7, so I think this could be a good solution for anyone else who runs into this problem. Thanks everyone above for contributing, especially to the OP for posting this in the first place. I cured a huge, huge headache!
This has been fixed in commit bed90759f in PostgreSQL v14.
The file limit for the error is actually 4 GB.
The fix was too invasive to be backported, so you can only upgrade to avoid the problem. Once the fix has had some field testing, you could lobby the pgsql-hackers mailing list to get it backported.
With pgAdmin and AWS, I used CSVSplitter to split into files less than 1GB. Lame, but worked. pgAdmin import appends to the existing table. (Changed escape character from ' to " in order to avoid error due to unquoted text in the source file. Typically I apply quotes in LibreOffice, but these files were too big to open.)
It seems this is not a database problem, but a problem of psql / pgadmin. The workaround is using an admin software from the previous psql versions:
Use the existing PostgreSQL 11 database
Install psql or pgadmin from the PostgreSQL 10 installation and use it to upload the file (with the command shown in the question)
Hope this helps anyone coming across the same problem.
Add two lines to your CSV file: One at the begining and one at the end:
COPY XXX FROM STDIN WITH (FORMAT CSV, HEADER TRUE, NULL 'NULL');
<here are the lines your file already contains>
\.
Don't forget another newline after the \. line. Then call
psql -h hostname -d dbname -U username -f 'D:/XXX.csv'
This is what worked for me:
\COPY member_data.lab_result FROM PROGRAM 'gzip -dcf lab_result.dat.gz' WITH (FORMAT 'csv', DELIMITER '|', QUOTE '`')

Postgres COPY FROM csv file- No such file or directory

I'm trying to import a (rather large) .txt file into a table geonames in PostgreSQL 9.1. I'm in the /~ directory of my server, with a file named US.txt placed in that directory. I set the search_path variable to geochat, the name of the database I'm working in. I then enter this query:
COPY geonames
FROM 'US.txt',
DELIMITER E'\t',
NULL 'NULL');
I then receive this error:
ERROR: could not open file "US.txt" for reading: No such file or directory.
Do I have to type in \i US.txt or something similar first, or should it just get it from the present working directory?
Maybe a bit late, but hopefully useful:
Use \copy instead
https://wiki.postgresql.org/wiki/COPY
jvdw
A couple of misconceptions:
1.
I'm in the /~ directory of my server
There is no directory /~. It's either / (root directory) or ~ (home directory of current user). It's also irrelevant to the problem.
2.
I set the search_path variable to geochat, the name of the database I'm working in
The search_path has nothing to do with the name of the database. It's for schemas inside the current database. You probably need to reset this.
3.
You are required to use the absolute path for your file. As documented in the manual here:
filename
The absolute path name of the input or output file.
4.
DELIMITER: just noise.
The default is a tab character in text format
5.
NULL: It's rather uncommon to use the actual string 'NULL' for a NULL value. Are you sure?
The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
My guess (after resetting search_path - or you schema-qualify the table name):
COPY geonames FROM '/path/to/file/US.txt';
The paths are relative to the PostgreSQL server, not the psql client.
Assuming you are running PostgreSQL 9.4, you can put US.txt in the directory /var/lib/postgresql/9.4/main/.
Another option is to pipe it in from stdin:
cat US.txt | psql -c "copy geonames from STDIN WITH (FORMAT csv);"
if you're running your COPY command from a script, you can have a step in the script that creates the COPY command with the correct absolute path.
MYPWD=$(pwd)
echo "COPY geonames FROM '$MYPWD/US.txt', DELIMITER E'\t';"
MYPWD=
you can then run this portion into a file and execute it
./step_to_create_COPY_with_abs_path.sh >COPY_abs_path.sql
psql -f COPY_abs_path.sql -d your_db_name