How can I use mwdumper with Postgresql command line - postgresql

I was importing a MediaWiki database using mwdumper with MySql. Now I need to do the same thing, but using Postgresql.
Basicly I get a archive in this link:
http://dumps.wikimedia.org/enwiki/20140903/
And I use mwdumper program to get informations and put in my database.
This is the database script:
https://git.wikimedia.org/blob/mediawiki%2Fcore.git/HEAD/maintenance%2Fpostgres%2Ftables.sql
I created the database through this sql, and now I need to use mwdumper to put data in my database.
I saw many links about this, but only to do in MySql.
Anyone know how to do this import using Postgres, using command line?
Mwdumper: www.mediawiki.org/wiki/Manual:MWDumper

I forgot this question, but I found the solution, the command line to use mwdumper with postgres is:
java -jar mwdumper-1.16.jar --format=pgsql:1.5 ARCHIVE.xml.gz | psql -U wikiUSER -d wikiDATABASE
The command isn't wrong, the errors that happen is because the mwdumper-1.16 convert xml to sql with wrong sintaxe.
This is a insert sql after convert mwdumper (XML->PostgreSql):
INSERT INTO revision (rev_id,rev_page,rev_text_id,rev_comment,rev_user,rev_user_text,rev_timestamp,rev_minor_edit,rev_deleted) VALUES (378187747,676,378187747,'there is no such thing as \"Jr.\" in Russian names. sincerely yours, X\\'ZZ\\'',0,'198.240.130.75','2010-08-10 14:55:48',0,0);
Analizing the same insert in my data base Mysql, the expected text in Postgres is:
INSERT INTO (...) ,' there is no such thing as "Jr." in Russian names. sincerely yours, X\''ZZ\'' ', (...).
For example:
To represent double quotes, mwdumper give a sintaxe \" , but to represent " in Postgres doesn't have \ , it's just ". The same idea to others sintaxe errors.
When you solve all sintaxe errors you can dump perfectly.

Related

Creating Batch Files with PostgreSQL \copy Command in Jetbrains Datagrip

I'm familiarizing myself with the standalone version of Datagrip and having a bit of trouble understanding the different approaches to composing SQL via console, external files, scratch files, etc.
I'm managing, referencing the documentation, and am happy to figure things out as such.
However, I'm trying to ingest CSV data into tables via batch files using the Postgres \copy command. Datagrip will execute this command without error but no data is being populated.
This is my syntax, composed and ran in the console view:
\copy tablename from 'C:\Users\username\data_file.txt' WITH DELIMITER E'\t' csv;
Note that the data is tab-separated and stored in a .txt file.
I'm able to use the import functions of Datagrip (via context menu) just fine but I'd like to understand how to issue commands to do similarly.
\copy is a command of the command-line PostgreSQL client psql.
I doubt that Datagrip invokes psql, so it won't be able to use \copy or any other “backslash command”.
You probably have to use Datagrip's import facilities. Or you start using psql.
Ok, but what about the SQL COPY command https://www.postgresql.org/docs/12/sql-copy.html ?
How can I run something like that with datagrip ?
BEGIN;
CREATE TEMPORARY TABLE temp_json(values text) ON COMMIT DROP;
COPY temp_json FROM 'MY_FILE.JSON';
SELECT values->>'aJsonField' as f
FROM (select values::json AS values FROM temp_json) AS a;
COMMIT;
I try to replace 'MY_FILE.JSON' with full path, parameter (?), I put it in sql directory etc.
The data grip answer is :
[2021-05-05 10:30:45] [58P01] ERROR: could not open file '...' for reading : No such file or directory
EDIT :
I know why. RTFM! -_-
COPY with a file name instructs the PostgreSQL server to directly read from or write to a file. The file must be accessible by the PostgreSQL user (the user ID the server runs as) and the name must be specified from the viewpoint of the server.
Sorry.....

how to pass variable to copy command in Postgresql

I tried to make a variable in SQL statement in Postgresql, but it did not work.
There are many csv files stored under the path. I want to set path in Postgresql that can tell copy command where can find csv files.
SQL statement sample:
\set outpath '/home/clients/ats-dev/'
\COPY licenses (_id, name,number_seats ) FROM :outpath + 'licenses.csv' CSV HEADER DELIMITER ',';
\COPY uploaded_files (_id, added_date ) FROM :outpath + 'files.csv' CSV HEADER DELIMITER ',';
It did not work. I got error: no such files. The two files licneses.csv and files.csv are stored under /home/cilents/ats-dev on Ubuntu. I found some sultion that use "\set file 'license.csv'". It did not work for me becacuse I have many csv files. also I tried to use "from : outpath || 'licenses.csv'". it did not work ether. Appreciate for any helps.
Using 9.3.
It looks like psql does not support :variable substitution withinpsql backslash commands.
test=> \set somevar fred
test=> \copy z from :somevar
:somevar: No such file or directory
so you will need to do this via an external tool like the unix shell. e.g.
for f in *.sql; do
psql -c "\\copy $(basename $f) FROM '$f'"
done
You can try COPY command
\set outpath '\'/home/clients/ats-dev/'
COPY licenses (_id, name,number_seats ) FROM :outpath/licenses.csv' WITH CSV HEADER DELIMITER ',';
COPY uploaded_files (_id, added_date ) FROM :outpath/files.csv' WITH CSV HEADER DELIMITER ',';
Note: Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. Similarly, the command specified with PROGRAM is executed directly by the server, not by the client application, must be executable by the PostgreSQL user. COPY naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
Documentation: Postgresql 9.3 COPY
It may have been true when this was originally asked, that psql backslash commands didn't support variable interpolation, but in my PostgreSQL 14 instance that's no longer the case. However, the psql manpage is clear that \copy specifically does not support variable interpolation.

How to source multiple functions in psql?

I have a directory with at least 6 function files.
I'm using psql and I need to be able to source (initialize ?) all function files at once.
I'm sure making a single function and call all others like SELECT fn1, fn2 isn't going to work.
Doing \i functions_folder/fn1.sql 6 times isn't ideal either.
So is there a way I can (maybe) \i functions/*.sql? Doing that currently gives me
functions/*.sql: No such file or directory
I'm using psql on postgresql 9.6.2
If you are using *nix OS:
postgres=# \! cat ./functions/*.sql > all_functions.sql
postgres=# \i all_functions.sql
For Windows try to find the analogue of the cat (as I remember it is copy command with some flags)
PS I have the feeling that it could be done by using backquotes:
Within an argument, text that is enclosed in backquotes (`) is taken as a command line that is passed to the shell. The output of the command (with any trailing newline removed) replaces the backquoted text.
Documentation
But still have no idea how. Probably somebody more experienced will provide a hint...
Create a wrapper that contains the \i commands, and \i the wrapper.

Automating PostgreSQL output to csv

mohpc04pp1: /h/u544835 % psql arco
Welcome to psql 8.1.17, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
WARNING: You are connected to a server with major version 8.3,
but your psql client is major version 8.2. Some backslash commands,
such as \d, might not work properly.
dbname=> \o /h/u544835/data25000.csv
dbname=> select url from urltable where scoreid=1 limit 25000;
dbname=> \q
This is took from a link online of basically what I have been doing, but what I need to do is make a script that I can use to produce csv files daily
So my aim of the script is to while in the script connect to the db, run the \o etc commands then close it
but I'm having trouble scripting it to say go into the psql arco database then run those queries.
command line to connect to db = psql arco then once the scrits recognised I'm in that databse perform those commands to automate a query to a csv file.
if anyone can get me started or point me towards reading material for me to get past that bit, it will be duely appreciated.
i'm running all this off a standard windows xp, ssh'ing to a SLES set-up web server that holds my postgresql database running psql version 8.1.17
First of all you should fix your setup. As it turns out, we are dealing with PostgreSQL 8.1 here. This version has reached end of live in 2010. You need to seriously think about upgrading - or at least remind the guys running the server. Current version is 9.1.
The command you are looking for:
psql arco -c "\copy (select url from urltable where scoreid=1 limit 25000) to '/h/u544835/data25000.csv'"
Assuming your db is named "arco". Adjusted for changed question (including changed port).
I now see version 8.1 popping up in your question, but it's all contradictory. You need Postgres 8.2 or later to use a query (instead of a table) with the \copy meta-command.
Details about psql in the fine manual.
Alternative approach that should work with obsolete PostgreSQL 8.1:
psql arco -o /h/u544835/data25000.csv -t -A -c 'SELECT url FROM urltable WHERE scoreid = 1 LIMIT 25000'
Find some more info about these command line options under this related question on dba.SE.
With function (syntax compatible with 8.1)
Another way would be to create a server side function (if you can!) that executes COPY from a temp table (old syntax - works with pg 8.1):
CREATE OR REPLACE FUNCTION f_copy_file()
RETURNS void AS
$BODY$
BEGIN
CREATE TEMP TABLE u_tmp AS (
SELECT url FROM urltable WHERE scoreid = 1 LIMIT 25000
);
COPY u_tmp TO '/h/u544835/data25000.csv';
DROP TABLE u_tmp;
END;
$BODY$
LANGUAGE plpgsql;
And then from the shell:
psql arco -c 'SELECT f_copy_file()'
Change the separator
\f sets the field separator. I quote the manual again:
-F separator
--field-separator=separator
Use separator as the field separator for unaligned output.
This is equivalent to \pset fieldsep or \f.
Or you can change the column separator in Excel, here are the instructions from MS.
Thanks to Erwin's help and a link I read up on he posted for me I managed to combine the two to get
#!/bin/sh
dbname='arco'
username='' # If you actually supply a username, you need to add the -U switch!
psql $dbname $username << EOF
\f ,
\o /h/u544835/showme.csv
SELECT * FROM storage;
EOF
which will write my queries to a csv file etc for me.
From what there is above, it is not separating the sql query so if I load it straight into excel, they all stay in the same column too which means I'm having a problem with the delimiter
I've tried tabbed delimiters, also tried , ; etc but none are letting me separate it
I need for it
is there an option I can click to see which delimiter is being used with my psql? or a different way of dumping the data from a query into a file that can be read by excel, so a different column for each row etc

PostgreSQL - batch + script + variable

I am not a programmer, I am struggling a bit with this.
I have a batch file connecting to my PostgreSQL server, and then open a sql script. Everything works as expected. My question is how to pass a variable (if possible) from one to the other.
Here is my batch file:
set PGPASSWORD=xxxx
cls
#echo off
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres -f C:\TotalProteinImport.sql
And here's the script:
copy totalprotein from 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
update anagrafica
set pt=(select totalprotein.resultvalue from totalprotein where totalprotein.accessionnbr=anagrafica.id)
where data_analisi = '12/23/2011';
delete from totalprotein;
This is working great, now the question is how could I pass a variable that would carry the date for data_analisi?
Like in the batch file, "Please enter date", and then the value is passed to the sql script.
You could create a function out of your your SQL script like this:
CREATE OR REPLACE FUNCTION f_myfunc(date)
RETURNS void AS
$BODY$
CREATE TEMP TABLE t_tmp ON COMMIT DROP AS
SELECT * FROM totalprotein LIMIT 0; -- copy table-structure from table
COPY t_tmp FROM 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
UPDATE anagrafica a
SET pt = t.resultvalue
FROM t_tmp t
WHERE a.data_analisi = $1
AND t.accessionnbr = a.id;
-- Temp table is dropped automatically at end of session
-- In this case (ON COMMIT DROP) after the transaction
$BODY$
LANGUAGE sql;
You can use language SQL for this kind of simple SQL batch.
As you can see I have made a couple of modifications to your script that should make it faster, cleaner and safer.
Major points
For reading data into an empty table temporarily, use a temporary table. Saves a lot of disc writes and is much faster.
To simplify the process I use your existing table totalprotein as template for the creation of the (empty) temp table.
If you want to delete all rows of a table use TRUNCATE instead of DELETE FROM. Much faster. In this particular case, you need neither. The temporary table is dropped automatically. See comments in function.
The way you updated anagrafica.pt you would set the column to NULL, if anything goes wrong in the process (date not found, wrong date, id not found ...). The way I rewrote the UPDATE, it only happens if matching data are found. I assume that is what you actually want.
Then ask for user input in your shell script and call the function with the date as parameter. That's how it could work in a Linux shell (as user postgres, with password-less access (using IDENT method in pg_haba.conf):
#! /bin/sh
# Ask for date. 'YYYY-MM-DD' = ISO date-format, valid with any postgres locale.
echo -n "Enter date in the form YYYY-MM-DD and press [ENTER]: "
read date
# check validity of $date ...
psql db -p5432 -c "SELECT f_myfunc('$date')"
-c makes psql execute a singe SQL command and then exits. I wrote a lot more on psql and its command line options yesterday in a somewhat related answer.
The creation of the according Windows batch file remains as exercise for you.
Call under Windows
The error message tells you:
Function tpimport(unknown) does not exist
Note the lower case letters: tpimport. I suspect you used mixe case letters to create the function. So now you have to enclose the function name in double quotes every time you use it.
Try this one (edited quotes!):
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres
-c "SELECT ""TPImport""('%dateimport%')"
Note how I use singe and double quotes here. I guess this could work under windows. See here.
You made it hard for yourself when you chose to use mixed case identifiers in PostgreSQL - a folly which I never tire of warning against. Now you have to double quote the function name "TPImport" every time you use it. While perfectly legit, I would never do that. I use lower case letters for identifiers. Always. This way I never mix up lower / upper case and I never have to use double quotes.
The ultimate fix would be to recreate the function with a lower case name (just leave away the double quotes and it will be folded to lower case automatically). Then the function name will just work without any quoting.
Read the basics about identifiers here.
Also, consider upgrading to a more recent version of PostgreSQL 8.3 is a bit rusty by now.
psql supports textual replacement variables. Within psql they can be set using \set and used using :varname.
\set xyz 'abcdef'
select :'xyz';
?column?
----------
abcdef
These variables can be set using command line arguments also:
psql -v xyz=value
The only problem is that these textual replacements always need some fiddling with quoting as shown by the first \set and select.
After creating the function in Postgres, you must create a .bat file in the bin directory of your Postgres version, for example C:\Program Files\PostgreSQL\9.3\bin. Here you write:
#echo off
cd C:\Program Files\PostgreSQL\9.3\bin
psql -p 5432 -h localhost -d myDataBase -U postgres -c "select * from myFunction()"