How to find average of the column - postgresql

Thanks in advance.
How to find avg when the column datatype is character varying in PostgreSQL.
here growth and literacy are character-varying.
Select avg(cast(GROWTH as float)) from census2;
I have created a table name called census2
create table census2
(
District VARCHAR(200)
, STATE VARCHAR(250)
, Growth NUMERIC
, Sex_Ratio NUMERIC
, Literacy NUMERIC
);
After successful creation while adding CSV file it is displaying errors as
ERROR: invalid input syntax for type numeric: "Growth"
CONTEXT: COPY census2, line 503, column growth: "Growth"

Related

Use Create view to convert a varchar field and contents to number field

I have table with a Varchar field. See structure.
IMAGE_KEY VARCHAR2(32 BYTE)
DOC_TYPE VARCHAR2(8 BYTE)
DOC_KEY VARCHAR2(256 BYTE)
LAST_UPDT DATE
UPDT_USER VARCHAR2(6 BYTE)
BLK_HANDLE VARCHAR2(16 BYTE)
DOC_KEY_ID NUMBER(15,0)
FE_ID NUMBER(15,0)
I want to create a view and convert the field Doc_Type which is varchar to a Number field in the view using the sql scripts.
SELECT IMAGE_KEY,
DOC_TYPE,
cast(DOC_KEY as NUMBER(15)) as DOC_KEY ,
LAST_UPDT,
UPDT_USER,
BLK_HANDLE,
DOC_KEY_ID,
FE_ID
FROM rdo
and
SELECT IMAGE_KEY,
DOC_TYPE,
TO_NUMBER (TRIM (DOC_KEY)) as DOC_KEY ,
LAST_UPDT,
UPDT_USER,
BLK_HANDLE,
DOC_KEY_ID,
FE_ID
FROM rdo
I get the following error when I extract data from the views created by any of the scripts above.
An error was encountered performing the requested operation:
ORA-01722: invalid number
01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.
Vendor code 1722
How do I successfully convert the varchar field to a Number field using a create view statement.

postgres SQL: getting rid of NA while migrating data from csv file

I am migrating data from a "csv" file into a newly created table named fortune500. the code is shown below
CREATE TABLE "fortune500"(
"id" SERIAL,
"rank" INTEGER,
"title" VARCHAR PRIMARY KEY,
"name" VARCHAR,
"ticker" CHAR(5),
"url" VARCHAR,
"hq" VARCHAR,
"sector" VARCHAR,
"industry" VARCHAR,
"employees" INTEGER,
"revenues" INTEGER,
"revenues_change" REAL,
"profits" NUMERIC,
"profits_change" REAL,
"assets" NUMERIC,
"equity" NUMERIC
);
Then I wanted to migrate data from a csv file using the below code:
COPY "fortune500"("rank", "title", "name", "ticker", "url", "hq", "sector", "industry", "employees",
"revenues", "revenues_change", "profits", "profits_change", "assets", "equity")
FROM 'C:\Users\Yasser A.RahmAN\Desktop\SQL for Business Analytics\fortune.csv'
DELIMITER ','
CSV HEADER;
But I got the below error message due to NA values in one of the columns.
ERROR: invalid input syntax for type real: "NA"
CONTEXT: COPY fortune500, line 12, column profits_change: "NA"
SQL state: 22P02
So how can I get rid of 'NA' values while migrating the data?
Consider using a staging table that would not have restrictive data types and then do your transformations and insert into the final table after the data had been loaded into staging. This is known as ELT (Extract - Load - Transform) approach. You could also use some external tools to implement an ETL process, and do the transformation in that tool, before it reaches your database.
In your case, an ELT approach would be to first create a table with all text types, load that table and then insert into your final table, casting the text values into appropriate types, either filtering out the values that cannot be casted or inserting NULLs, or maybe 0, where that cast can't be made - depending on your requirements. For example you'd filter out rows where profits_change = 'NA' (or better, WHERE NOT (profits_change ~ '^\d+\.?\d+$') to check for a numeric value, or you'd insert NULL or 0:
CASE
WHEN profits_change ~ '^\d+\.?\d+$'
THEN profits_change::real
ELSE NULL -- or 0, depending what you need
END
You'd perform this kind of validation for all fields.
Alternatively, if it's a one off thing - just edit your CSV before importing.

Insert data in a column geometry on redshift

I created this table on a database in redshift; and try to insert data. Do you know how to insert the point coordinate into the column geometry ?
CREATE TABLE airports_data (
airport_code character(3),
airport_name character varying,
city character varying,
coordinates geometry,
timezone timestamp with time zone
);
INSERT INTO airports_data(airport_code,airport_name,city,coordinates,timezone)
VALUES ('YKS','Yakutsk Airport','129.77099609375, 62.093299865722656', 'AsiaYakutsk');
I had an error when trying to make this insert.
Query ELAPSED TIME: 13 m 05 s ERROR: Compass I/O exception: Invalid
hexadecimal character(s) found
In Redshift, make your longitude and latitude values into a geometry object. Use:
ST_Point(longitude, latitude) -- for an XY point
ST_GeomFromText('LINESTRING(4 5,6 7)') -- for other geometries
You're missing city in your INSERT VALUES and 'AsiaYakutsk' is not a valid datetime value - see https://docs.aws.amazon.com/redshift/latest/dg/r_Datetime_types.html#r_Datetime_types-timestamptz
Ignoring your timezone column and adding city into values, use this:
INSERT INTO airports_data(airport_code,airport_name,city,coordinates)
VALUES ('YKS','Yakutsk Airport','Yakutsk',ST_Point(129.77099609375, 62.093299865722656));

insert multiple rows into table with column that has default value

I have a table in PostgreSQL and one of the column has default value.
DDL of the table is:
CREATE TABLE public.my_table_name
(int_column_1 character varying(6) NOT NULL,
text_column_1 character varying(20) NOT NULL,
text_column_2 character varying(15) NOT NULL,
default_column numeric(10,7) NOT NULL DEFAULT 0.1,
time_stamp_column date NOT NULL);
I am trying to insert multiple rows in a single query. And in those I have some rows to which I have value for default_column and i have some rows to which i don't have any value for default_column and want to Postgres to use default value for these rows.
Here's what i tried:
INSERT INTO "my_table_name"(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES
(91,'text_row_11','text_row_21',8,current_timestamp),
(91,'text_row_12','text_row_22',,current_timestamp),
(91,'text_row_13','text_row_23',19,current_timestamp),
(91,'text_row_14','text_row_24',,current_timestamp),
(91,'text_row_15','text_row_25',27,current_timestamp);
this gives me an error. So, when i try to insert:
INSERT INTO "my_table_name"(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES (91,'text_row_12','text_row_22',,current_timestamp), -- i want null to be appended here, so i left it empty.
--error from this query is: ERROR: syntax error at or near ","
and
INSERT INTO "my_table_name"(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES (91,'text_row_14','text_row_24',NULL,current_timestamp),
-- error from this query is: ERROR: new row for relation "glycemicindxdir" violates check constraint "food_item_check"
So, how do i fix this; And insert value when i have it or have Postgres insert default when I don't have a value?
Use the default keyword:
INSERT INTO my_table_name
(int_column_1, text_column_1, text_column_2, default_column, time_stamp_column)
VALUES
(91, 'text_row_11', 'text_row_21', 8 , current_timestamp),
(91, 'text_row_12', 'text_row_22', default, current_timestamp),
(91, 'text_row_13', 'text_row_23', 19 , current_timestamp),
(91, 'text_row_14', 'text_row_24', default, current_timestamp),
(91, 'text_row_15', 'text_row_25', 27 , current_timestamp);

an empty row with null-like values in not-null field

I'm using postgresql 9.0 beta 4.
After inserting a lot of data into a partitioned table, i found a weird thing. When I query the table, i can see an empty row with null-like values in 'not-null' fields.
That weird query result is like below.
689th row is empty. The first 3 fields, (stid, d, ticker), are composing primary key. So they should not be null. The query i used is this.
select * from st_daily2 where stid=267408 order by d
I can even do the group by on this data.
select stid, date_trunc('month', d) ym, count(*) from st_daily2
where stid=267408 group by stid, date_trunc('month', d)
The 'group by' results still has the empty row.
The 1st row is empty.
But if i query where 'stid' or 'd' is null, then it returns nothing.
Is this a bug of postgresql 9b4? Or some data corruption?
EDIT :
I added my table definition.
CREATE TABLE st_daily
(
stid integer NOT NULL,
d date NOT NULL,
ticker character varying(15) NOT NULL,
mp integer NOT NULL,
settlep double precision NOT NULL,
prft integer NOT NULL,
atr20 double precision NOT NULL,
upd timestamp with time zone,
ntrds double precision
)
WITH (
OIDS=FALSE
);
CREATE TABLE st_daily2
(
CONSTRAINT st_daily2_pk PRIMARY KEY (stid, d, ticker),
CONSTRAINT st_daily2_strgs_fk FOREIGN KEY (stid)
REFERENCES strgs (stid) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT st_daily2_ck CHECK (stid >= 200000 AND stid < 300000)
)
INHERITS (st_daily)
WITH (
OIDS=FALSE
);
The data in this table is simulation results. Multithreaded multiple simulation engines written in c# insert data into the database using Npgsql.
psql also shows the empty row.
You'd better leave a posting at http://www.postgresql.org/support/submitbug
Some questions:
Could you show use the table
definitions and constraints for the
partions?
How did you load your data?
You get the same result when using
another tool, like psql?
The answer to your problem may very well lie in your first sentence:
I'm using postgresql 9.0 beta 4.
Why would you do that? Upgrade to a stable release. Preferably the latest point-release of the current version.
This is 9.1.4 as of today.
I got to the same point: "what in the heck is that blank value?"
No, it's not a NULL, it's a -infinity.
To filter for such a row use:
WHERE
case when mytestcolumn = '-infinity'::timestamp or
mytestcolumn = 'infinity'::timestamp
then NULL else mytestcolumn end IS NULL
instead of:
WHERE mytestcolumn IS NULL