I have a column that I want to get an average of, the column is varchar(200). I keep getting this error. How do I convert the column to numeric and get an average of it.
Values in the column look like
16,000.00
15,000.00
16,000.00 etc
When I execute
select CAST((COALESCE( bonus,'0')) AS numeric)
from tableone
... I get
ERROR: invalid input syntax for type numeric:
The standard way to represent (as text) a numeric in SQL is something like:
16000.00
15000.00
16000.00
So, your commas in the text are hurting you.
The most sensible way to solve this problem would be to store the data just as a numeric instead of using a string (text, varchar, character) type, as already suggested by a_horse_with_no_name.
However, assuming this is done for a good reason, such as you inherited a design you cannot change, one possibility is to get rid of all the characters which are not a (minus sign, digit, period) before casting to numeric:
Let's assume this is your input data
CREATE TABLE tableone
(
bonus text
) ;
INSERT INTO tableone(bonus)
VALUES
('16,000.00'),
('15,000.00'),
('16,000.00'),
('something strange 25'),
('why do you actually use a "text" column if you could just define it as numeric(15,0)?'),
(NULL) ;
You can remove all the straneous chars with a regexp_replace and the proper regular expression ([^-0-9.]), and do it globally:
SELECT
CAST(
COALESCE(
NULLIF(
regexp_replace(bonus, '[^-0-9.]+', '', 'g'),
''),
'0')
AS numeric)
FROM
tableone ;
| coalesce |
| -------: |
| 16000.00 |
| 15000.00 |
| 16000.00 |
| 25 |
| 150 |
| 0 |
See what happens to the 15,0 (this may NOT be what you want).
Check everything at dbfiddle here
I'm going to go out on a limb and say that it might be because you have Empty strings rather than nulls in your column; this would result in the error you are seeing. Try wrapping the column name in a nullif:
SELECT CAST(coalesce(NULLIF(bonus, ''), '0') AS integer) as new_field
But I would really question your schema that you have numeric values stored in a varchar column...
Related
I am working with world health's covid data for a projoect and have had no issues until this specific query keeps throwing the, invalid input syntax for double precision: "" error.
I should note that the tables were brought in from a CSV file and I am using postgresql.
Query throwing error:
select covid_deaths.continent, covid_deaths.location, covid_deaths.date, covid_deaths.population, covid_vacc.new_vaccinations,
SUM(covid_vacc.new_vaccinations::int) over (partition by covid_deaths.location
order by covid_deaths.location, covid_deaths.date) as RollingPeopleVaccinated
from covid_deaths
join covid_vacc
on covid_deaths.location = covid_vacc.location and covid_deaths.date::date = covid_vacc.date::date
The line throwing the error is line 3, particularly the SUM(covid_vacc.new_vaccinations::int) portion. The new_vaccinations column in the covid_vacc table is VARCHAR datatype, and I know casting is not a great solution, but I am very much trying to avoid having to reimport all of the data from the excel sheet. Evne if I were to do this, not sure how to get all the datatypes correct and issues with null values cleared up.
I have tried not casing the new_vaccinations column as well as casting it as a few different datatypes. Have also tried running querys to alter the datatype of the new_vaccinations column, but I don't believe that is actually working. Fairly new to sql so any help is appreciated.
Use nullif to convert empty strings to NULL. Use trim to shorten empty strings longer then '' down to '' for the nullif function.
Table "public.varchar_test"
Column | Type | Collation | Nullable | Default
------------+-----------------------+-----------+----------+---------
fld_1 | character varying(50) | | |
text_array | text[] | | |
insert into varchar_test(fld_1) values ('1'), (''), ('3');
select sum(fld_1::integer) from varchar_test ;
ERROR: invalid input syntax for type integer: ""
select sum(nullif(fld_1::integer, '')) from varchar_test ;
ERROR: invalid input syntax for type integer: ""
select sum(nullif(trim(fld_1), '')::integer) from varchar_test ;
sum
-----
4
So in your case:
SUM(nullif(trim(covid_vacc.new_vaccinations), '')::integer)
FYI, the above will not deal with issues like:
select '1,000'::float;
ERROR: invalid input syntax for type double precision: "1,000"
select 'a'::float;
ERROR: invalid input syntax for type double precision: "a"
That means there still may be a need for data cleanup on the column.
In a dataset I have, there is a columns contains numbers like 83.420, 43.317, 149.317, ... and this columns is stored as string. The dot in the numbers doesn't represent decimal point, i.e., the number 83.420 is basically 83420 etc.
One way to remove this dot from numbers in this column is to use TRANSLATE function as follows:
SELECT translate('83.420', '.', '')
which returns 83420. But how I can apply this function on all the rows in the dataset?
I tried this, however, I failed:
SELECT translate(SELECT num_column FROM my_table, '.', '')
I face with error SQL Error [42601]: ERROR: syntax error at end of input.
Any idea how I can apply translate function on one column in data entirely? or any better idea to use rather than translate?
You can even cast the result to numeric like this:
SELECT translate(num_column, '.', '')::integer from the_table;
-- average:
SELECT avg(translate(num_column, '.', '')::integer from the_table;
or use replace
SELECT replace(num_column, '.', '')::integer from the_table;
-- average:
SELECT avg(replace(num_column, '.', '')::integer) from the_table;
Please note that storing numbers as formatted text is a (very) bad idea. Use a native numeric type instead.
Two options.
Set up table:
create table string_conv(id integer, num_column varchar);
insert into string_conv values (1, 83.420), (2, 43.317), (3, 149.317 );
select * from string_conv ;
id | num_column
----+------------
1 | 83.420
2 | 43.317
3 | 149.317
First option leave as string field:
update string_conv set num_column = translate(num_column, '.', '');
select * from string_conv ;
id | num_column
----+------------
1 | 83420
2 | 43317
3 | 149317
The above changes the value format in place. I means though that if new data comes in with the old format, 'XX.XXX', then those values would have to be converted.
Second option convert to integer column:
truncate string_conv ;
insert into string_conv values (1, 83.420), (2, 43.317), (3, 149.317 );
alter table string_conv alter COLUMN num_column type integer using translate(num_column, '.', '')::int;
select * from string_conv ;
id | num_column
----+------------
1 | 83420
2 | 43317
3 | 149317
\d string_conv
Table "public.string_conv"
Column | Type | Collation | Nullable | Default
------------+---------+-----------+----------+---------
id | integer | | |
num_column | integer | | |
This option changes the format of the values and changes the type of the column they are being stored in. The issue is here is that from then on new values would need to be compatible with the new type. This would mean changing the input data from 'XX.XXX' to 'XXXXX'.
Can anyone tell me which command is used for concatenate three columns data into one column in PostgreSQL database?
e.g. If the columns are
begin | Month | Year
12 | 1 | 1988
13 | 3 |
14 | | 2000
| 5 | 2012
output:
Result
12-1-1988
13-3-null
14-null-2000
null-5-2012
Actually, I have concatenated two columns but it is displaying only those values in the result
which is not null in all columns but i want to display that value also which is not null in single
column.
If you simply used a standard concatenation function like concat() or the || operator, you'd get a complete null string when any element is null.
You could use the function concat_ws() which ignores a null value. But you are expecting them to be shown.
So you need to cast the real null value into a non-null text 'null'. This could be done using the COALESCE() function, which takes several arguments and returns the first non-null. But here the problem occurs, that the 'null' string is of another type (text) than the columns (int). So you have to equalize the types, e.g. by casting the int values into text before. So, finally your query could look like this:
Click: demo:db<>fiddle
SELECT
concat_ws('-',
COALESCE(begin::text, 'null'),
COALESCE(month::text, 'null'),
COALESCE(year::text, 'null')
)
FROM mytable
I have a column in a Postgresql table that is unique and character varying(10) type. The table contains old alpha-numeric values that I need to keep. Every time a new row is created from this point forward, I want it to be numeric only. I would like to get the max numeric-only value from this table for this column then create a new row with that max value incremented by 1.
Is there a way to query this table for the max numeric value only for this column?
For example, if this column currently has the values:
1111
A1111A
1234
1234A
3331
B3332
C-3333
33-D33
3**333*
Is there a query that will return 3333, AKA cut out all the non-numeric characters from the values and then perform a MAX() on them?
Not precisely what you asking, but something that I think will work better for you.
To go over all the columns, convert each to numbers, and then cast it to integer & return max.:
SELECT MAX(regexp_replace(my_column, '[^0-9]', '', 'g')::int) FROM public.foobar;
This gets you your max value... say 2999.
Now, going forward, consider making the default for your column a serial-like value, and convert it to text... that way you set the "MAX" once, and then let postgres do all the work for future values.
-- create simple integer sequence
CREATE SEQUENCE public.foobar_my_column_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 0;
-- use new sequence as default value for column __and__ convert to text
ALTER TABLE foobar
ALTER COLUMN my_column
SET DEFAULT nextval('publc.foobar_my_column_seq'::regclass)::text;
-- initialize "next value" of sequence to whatever is larger than
-- what you already have in your data ... say 3000:
ALTER SEQUENCE public.foobar_my_column_seq RESTART WITH 3000;
Because you're simply setting default, you don't change your current alpha-numeric values.
I figured it out. The following query works.
select text_value, regexp_replace(text_value, '[^0-9]+', '') as new_value from the_table;
Result:
text_value | new_value
-----------------------+-------------
4*215474 | 4215474
740024 | 740024
4*100535 | 4100535
42356 | 42356
CASH |
4*215474 | 4215474
740025 | 740025
740026 | 740026
4*5089655798 | 45089655798
4*15680 | 415680
4*224034 | 4224034
4*265718708 | 4265718708
A typical and relevant application of tsvectot is to query and summarize information about the set of occurred words and about its frequency... And JSONB is the natural choice (!) to represent tsvectot datatype for these "querying applications"... So,
There are a simple workaround to cast tsvector into JSONB?
Example: counting global frequency of words of a cached tsvectot's, will be something like this query
SELECT r.key as word, SUM(r.value) as occurrences
FROM (
SELECT jsonb_each(kx_tsvectot::jsonb) as r FROM terms
) t
GROUP BY 1;
You can use ts_stat() function, which will give you exactly what you need
word text — the value of a lexeme
ndoc integer — number of documents (tsvectors) the word occurred in
nentry integer — total number of occurrences of the word
Example may be the following:
CREATE TABLE t (
tsv TSVECTOR
);
INSERT INTO t VALUES
('word'::TSVECTOR),
('second word'::TSVECTOR),
('third word'::TSVECTOR);
SELECT * FROM
ts_stat('SELECT tsv FROM t');
Result:
word | ndoc | nentry
--------+------+--------
word | 3 | 3
third | 1 | 1
second | 1 | 1
(3 rows)
If you still want to convert it to jsonb you can use cast word from text to jsonb.