Importing CSV to postgres with new columns - postgresql

I have an issue importing csv data to a postgres database with geo data / postgis enabled via the following command on my database 'landmarks':
CREATE EXTENSION postgis;
So.... the story goes:
I am following this tutorial.
I am trying to import a csv with these columns
name conf capital venture latitude longitude
the first line, as an example of the data, is:
example, 1, 1, 1, 51.51923, -0.12205
I have set the table up following the tutorial except adding conf, capital and venture instead of the columns in his data (address, date_built, architect, landmark). i.e.:
CREATE TABLE landmarks
(
gid serial NOT NULL,
name character varying(50),
conf character varying(10),
capital character varying(10),
venture character varying(10),
the_geom geometry,
CONSTRAINT landmarks_pkey PRIMARY KEY (gid),
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
CONSTRAINT enforce_geotype_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 4326)
);
then
CREATE INDEX landmarks_the_geom_gist
ON landmarks
USING gist
(the_geom );
The data is essentially the same otherwise to his example.
I've set the table up properly and enabled the postgis extension to deal with geom data fine.
However, the problem comes when I try to import my csv:
landmarks=# \copy landmarks(name,conf,capital,venture,latitude,longitude) FROM '../../../../../var/tmp/map2.csv' DELIMITERS ',' CSV HEADER;
ERROR: column "latitude" of relation "landmarks" does not exist
Now, I noticed that when he creates the table, he doesn't add latitude or longitude columns... so I wondered if that was the issue and tried to create another table with those columns as well as integers, however that just gives me this error:
ptmap3=# \copy landmarks(name,conf,capital,venture,latitude,longitude) FROM '../../../../../var/tmp/map2.csv' DELIMITERS ',' CSV HEADER;
ERROR: invalid input syntax for integer: "51.51923"
CONTEXT: COPY landmarks, line 2, column latitude: "51.51923"
So... it seems that if I add the latitude column then it works, but fails with the data? After checking the csv for errors using this
od -c map2.csv
...there is nothing wrong with my csv (no hidden characters or errors)... so what's the deal?
If anyone can help me import my csv to this db I'd be very grateful!

You need to process in two steps to import your (latitude,longitude) data into the geometry column of your database.
1. Import the data into two float columns latitude and longitude:
From your original table landmarks:
CREATE TABLE landmarks
(
gid serial NOT NULL,
name character varying(50),
conf character varying(10),
capital character varying(10),
venture character varying(10),
the_geom geometry,
CONSTRAINT landmarks_pkey PRIMARY KEY (gid),
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
CONSTRAINT enforce_geotype_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 4326)
);
CREATE INDEX landmarks_the_geom_gist
ON landmarks
USING gist
(the_geom );
Add the temporary two columns latitude and longitude:
ALTER TABLE landmarks ADD COLUMN latitude double precision;
ALTER TABLE landmarks ADD COLUMN longitude double precision;
Then import your data:
\copy landmarks(name,conf,capital,venture,latitude,longitude) FROM '../../../../../var/tmp/map2.csv' DELIMITERS ',' CSV HEADER;
2. Populate the geometry column by creating POINT geometries from latitude and longitude:
UPDATE landmarks SET geom = ST_SetSRID(ST_MakePoint(longitude,latitude),4326);
Finally, delete the temporary columns:
ALTER TABLE landmarks DROP COLUMN latitude;
ALTER TABLE landmarks DROP COLUMN longitude;

There is an alternate method.
Google pgfutter and download appropriate version of the executable file. link for windows but other releases available as well.
Make sure pgfutter file and yourdata.csv file are in the same directory.
Set directory in terminal and run the following code.
This will create a new table as well so just enter a fresh table name.
pgfutter.exe --host "localhost" --port "5432" --db "YourDatabaseName" --schema "public" --table "TableName" --user "postgres" --pw "YourPassword" csv YourData.csv

Related

How to find average of the column

Thanks in advance.
How to find avg when the column datatype is character varying in PostgreSQL.
here growth and literacy are character-varying.
Select avg(cast(GROWTH as float)) from census2;
I have created a table name called census2
create table census2
(
District VARCHAR(200)
, STATE VARCHAR(250)
, Growth NUMERIC
, Sex_Ratio NUMERIC
, Literacy NUMERIC
);
After successful creation while adding CSV file it is displaying errors as
ERROR: invalid input syntax for type numeric: "Growth"
CONTEXT: COPY census2, line 503, column growth: "Growth"

postgres SQL: getting rid of NA while migrating data from csv file

I am migrating data from a "csv" file into a newly created table named fortune500. the code is shown below
CREATE TABLE "fortune500"(
"id" SERIAL,
"rank" INTEGER,
"title" VARCHAR PRIMARY KEY,
"name" VARCHAR,
"ticker" CHAR(5),
"url" VARCHAR,
"hq" VARCHAR,
"sector" VARCHAR,
"industry" VARCHAR,
"employees" INTEGER,
"revenues" INTEGER,
"revenues_change" REAL,
"profits" NUMERIC,
"profits_change" REAL,
"assets" NUMERIC,
"equity" NUMERIC
);
Then I wanted to migrate data from a csv file using the below code:
COPY "fortune500"("rank", "title", "name", "ticker", "url", "hq", "sector", "industry", "employees",
"revenues", "revenues_change", "profits", "profits_change", "assets", "equity")
FROM 'C:\Users\Yasser A.RahmAN\Desktop\SQL for Business Analytics\fortune.csv'
DELIMITER ','
CSV HEADER;
But I got the below error message due to NA values in one of the columns.
ERROR: invalid input syntax for type real: "NA"
CONTEXT: COPY fortune500, line 12, column profits_change: "NA"
SQL state: 22P02
So how can I get rid of 'NA' values while migrating the data?
Consider using a staging table that would not have restrictive data types and then do your transformations and insert into the final table after the data had been loaded into staging. This is known as ELT (Extract - Load - Transform) approach. You could also use some external tools to implement an ETL process, and do the transformation in that tool, before it reaches your database.
In your case, an ELT approach would be to first create a table with all text types, load that table and then insert into your final table, casting the text values into appropriate types, either filtering out the values that cannot be casted or inserting NULLs, or maybe 0, where that cast can't be made - depending on your requirements. For example you'd filter out rows where profits_change = 'NA' (or better, WHERE NOT (profits_change ~ '^\d+\.?\d+$') to check for a numeric value, or you'd insert NULL or 0:
CASE
WHEN profits_change ~ '^\d+\.?\d+$'
THEN profits_change::real
ELSE NULL -- or 0, depending what you need
END
You'd perform this kind of validation for all fields.
Alternatively, if it's a one off thing - just edit your CSV before importing.

Insert data in a column geometry on redshift

I created this table on a database in redshift; and try to insert data. Do you know how to insert the point coordinate into the column geometry ?
CREATE TABLE airports_data (
airport_code character(3),
airport_name character varying,
city character varying,
coordinates geometry,
timezone timestamp with time zone
);
INSERT INTO airports_data(airport_code,airport_name,city,coordinates,timezone)
VALUES ('YKS','Yakutsk Airport','129.77099609375, 62.093299865722656', 'AsiaYakutsk');
I had an error when trying to make this insert.
Query ELAPSED TIME: 13 m 05 s ERROR: Compass I/O exception: Invalid
hexadecimal character(s) found
In Redshift, make your longitude and latitude values into a geometry object. Use:
ST_Point(longitude, latitude) -- for an XY point
ST_GeomFromText('LINESTRING(4 5,6 7)') -- for other geometries
You're missing city in your INSERT VALUES and 'AsiaYakutsk' is not a valid datetime value - see https://docs.aws.amazon.com/redshift/latest/dg/r_Datetime_types.html#r_Datetime_types-timestamptz
Ignoring your timezone column and adding city into values, use this:
INSERT INTO airports_data(airport_code,airport_name,city,coordinates)
VALUES ('YKS','Yakutsk Airport','Yakutsk',ST_Point(129.77099609375, 62.093299865722656));

insert a CSV file with polygons into PostgreSQL

I've got a CSV file including 3 columns named boundaries,category and city.
the data in every cell,below the column "boundaries" is comprised of something like this:
{"coordinates"=>[[[-79.86938774585724, 43.206149439482836], [-79.87618446350098, 43.19090988330086], [-79.88626956939697, 43.19328385965552], [-79.88325476646423, 43.200029828720744], [-79.8932647705078, 43.20258723593195], [-79.88930583000183, 43.211150250203886], [-79.86938774585724, 43.206149439482836]]], "type"=>"Polygon"}
how can I create a table with a proper data type for column "boundaries"?
The data you have specified is in JSON format, so you could either store the boundaries data as a jsonb table column.
e.g: CREATE TABLE cities ( city varchar, category varchar, boundaries jsonb)
The alternative is to parse the JSON and store the coordinates in a PostgreSQL ARRAY column: something like:
CREATE TABLE cities (
city varchar,
category varchar,
boundary_coords point ARRAY,
boundary_type varchar
)

save file (.pdf) in database whit python 2.7

Craig Ringer Ican not work whit large object functions
My database looks like this
this is my table
-- Table: files
--
DROP TABLE files;
CREATE TABLE files
(
id serial NOT NULL,
orig_filename text NOT NULL,
file_data bytea NOT NULL,
CONSTRAINT files_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE files
I want save .pdf in my database, I saw you did the last answer, but using python27 (read the file and convert to a buffer object or use the large object functions)
I did the code would look like
path="D:/me/A/Res.pdf"
listaderuta = path.split("/")
longitud=len(listaderuta)
f = open(path,'rb')
f.read().__str__()
cursor = con.cursor()
cursor.execute("INSERT INTO files(id, orig_filename, file_data) VALUES (DEFAULT,%s,%s) RETURNING id", (listaderuta[longitud-1], f.read()))
but when I'm downloading, I save
fula = open("D:/INSTALL/pepe.pdf",'wb')
cursor.execute("SELECT file_data, orig_filename FROM files WHERE id = %s", (int(17),))
(file_data, orig_filename) = cursor.fetchone()
fula.write(file_data)
fula.close()
but when I'm downloading the file can not be opened, this damaged I repeat I can not work with large object functions
try this and turned me, can you help ?
I am thinking that psycopg2 Binary function does not user lob functions.
thus I used.....
path="salman.pdf"
f = open(path,'rb')
dat = f.read()
binary = psycopg2.Binary(dat)
cursor.execute("INSERT INTO files(id, file_data) VALUES ('1',%s)", (binary,))
conn.commit()
Just correction in INSERT statement, INSERT statement will be failed with null value in column "orig_filename" violates not-null constraint as orig_filename is defined as NOT NULL.... use instead
("INSERT INTO files(id, orig_filename,file_data) VALUES ('1','filename.pdf',%s)", (binary,))