Get integer part of number - tsql

So I have a table with numbers in decimals, say
id value
2323 2.43
4954 63.98
And I would like to get
id value
2323 2
4954 63
Is there a simple function in T-SQL to do that?

SELECT FLOOR(value)
http://msdn.microsoft.com/en-us/library/ms178531.aspx
FLOOR returns the largest integer less than or equal to the specified numeric expression.

Assuming you are OK with truncation of the decimal part you can do:
SELECT Id, CAST(value AS INT) INTO IntegerTable FROM NumericTable

FLOOR,CAST... do not return integer part with negative numbers, a solution is to define an internal procedure for the integer part:
DELIMITER //
DROP FUNCTION IF EXISTS INTEGER_PART//
CREATE FUNCTION INTEGER_PART(n DOUBLE)
RETURNS INTEGER
DETERMINISTIC
BEGIN
IF (n >= 0) THEN RETURN FLOOR(n);
ELSE RETURN CEILING(n);
END IF;
END
//
MariaDB [sidonieDE]> SELECT INTEGER_PART(3.7);
+-------------------+
| INTEGER_PART(3.7) |
+-------------------+
| 3 |
+-------------------+
1 row in set (0.00 sec)
MariaDB [sidonieDE]> SELECT INTEGER_PART(-3.7);
+--------------------+
| INTEGER_PART(-3.7) |
+--------------------+
| -3 |
+--------------------+
1 row in set (0.00 sec)
after you can use the procedure in a query like that:
SELECT INTEGER_PART(value) FROM table;
if you do not want to define an internal procedure in the database you can put an IF in a query like that:
select if(value < 0,CEILING(value),FLOOR(value)) from table ;

Related

postgresql: datatype numeric with limited digits

I am looking for numeric datatype with limited digits
(before and after the decimal point)
The function kills only digits after the decimal point. (PG version >= 13)
create function num_flex( v numeric, d int) returns numeric as
$$
select case when v=0 then 0
when v < 1 and v > -1 then trim_scale(round(v, d - 1 ) )
else trim_scale(round(v, d - 1 - least(log(abs(v))::int,d-1) ) ) end;
$$
language sql ;
For testing:
select num_flex( 0, 6)
union all
select num_flex( 1.22000, 6)
union all
select num_flex( (-0.000000123456789*10^x)::numeric,6)
from generate_series(1,15,3) t(x)
union all
select num_flex( (0.0000123456789*10^x)::numeric,6)
from generate_series(1,15,3) t(x) ;
It runs,
but have someone a better idea or find a bug (a situation, that is not implemented)?
The next step is to integrate this in PG, so that I can write
select 12.123456789::num_flex6 ;
select 12.123456789::num_flex7 ;
for a num_flex datatype with 6 or 7 digits.
with types from num_flex2 to num_flex9. Is this possible?
There are a few problems with your function:
Accepting negative digit counts (parameter d). num_flex(1234,-2) returns 1200 - you specified you want the function to only kill digits after decimal point, so 1234 would be expected.
Incorrect results between -1 and 1. num_flex(0.123,3) returns 0.12 instead of 0.123. I guess this might also be desired effect if you do want to count 0 to the left of decimal point. Normally, that 0 is ignored when a number's precision and scale are considered.
Your counting of digits to the left of decimal point is incorrect due to how ::int rounding works. log(abs(11))::int is 1 but log(abs(51))::int is 2. ceil(log(abs(v)))::int returns 2 in both cases, while keeping int type to still work as 2nd parameter in round().
create or replace function num_flex(
input_number numeric,
digit_count int,
is_counting_unit_zero boolean default false)
returns numeric as
$$
select trim_scale(
case
when input_number=0
then 0
when digit_count<=0 --avoids negative rounding
then round(input_number,0)
when (input_number between -1 and 1) and is_counting_unit_zero
then round(input_number,digit_count-1)
when (input_number between -1 and 1)
then round(input_number,digit_count)
else
round( input_number,
greatest( --avoids negative rounding
digit_count - (ceil(log(abs(input_number))))::int,
0)
)
end
);
$$
language sql;
Here's a test
select *,"result"="should_be"::numeric as "is_correct" from
(values
('num_flex(0.1234 ,4)',num_flex(0.1234 ,4), '0.1234'),
('num_flex(1.234 ,4)',num_flex(1.234 ,4), '1.234'),
('num_flex(1.2340000 ,4)',num_flex(1.2340000 ,4), '1.234'),
('num_flex(0001.234 ,4)',num_flex(0001.234 ,4), '1.234'),
('num_flex(123456 ,5)',num_flex(123456 ,5), '123456'),
('num_flex(0 ,5)',num_flex(0 ,5), '0'),
('num_flex(00000.00000 ,5)',num_flex(00000.00000 ,5), '0'),
('num_flex(00000.00001 ,5)',num_flex(00000.00001 ,5), '0.00001'),
('num_flex(12345678901 ,5)',num_flex(12345678901 ,5), '12345678901'),
('num_flex(123456789.1 ,5)',num_flex(123456789.1 ,5), '123456789'),
('num_flex(1.234 ,-4)',num_flex(1.234 ,4), '1.234')
) as t ("operation","result","should_be");
-- operation | result | should_be | is_correct
----------------------------+-------------+-------------+------------
-- num_flex(0.1234 ,4) | 0.1234 | 0.1234 | t
-- num_flex(1.234 ,4) | 1.234 | 1.234 | t
-- num_flex(1.2340000 ,4) | 1.234 | 1.234 | t
-- num_flex(0001.234 ,4) | 1.234 | 1.234 | t
-- num_flex(123456 ,5) | 123456 | 123456 | t
-- num_flex(0 ,5) | 0 | 0 | t
-- num_flex(00000.00000 ,5) | 0 | 0 | t
-- num_flex(00000.00001 ,5) | 0.00001 | 0.00001 | t
-- num_flex(12345678901 ,5) | 12345678901 | 12345678901 | t
-- num_flex(123456789.1 ,5) | 123456789 | 123456789 | t
-- num_flex(1.234 ,-4) | 1.234 | 1.234 | t
--(11 rows)
You can declare the precision (total number of digits) of your numeric data type in the column definition. Only digits after decimal point will be rounded. If there are too many digits before the decimal point, you'll get an error.
The downside is that numeric(n) is actually numeric(n,0), which is dictated by the SQL standard. So if by limiting the column's number of digits to 5 you want to have 12345.0 as well as 0.12345, there's no way you can configure numeric to hold both. numeric(5) will round 0.12345 to 0, numeric(5,5) will dedicate all digits to the right of decimal point and reject 12345.
create table test (numeric_column numeric(5));
insert into test values (12345.123);
table test;
-- numeric_column
------------------
-- 12345
--(1 row)
insert into test values (123456.123);
--ERROR: numeric field overflow
--DETAIL: A field with precision 5, scale 0 must round to an absolute value less than 10^5.

Multiply a column with each values of an array

How to multiply a column value with each element present in the array without using loop?
I have tried using the for each loop which iterates over the loop and multiplying each element with the column value.
CREATE OR REPLACE FUNCTION
public.test_p_offer_type_simulation1(offers numeric[])
RETURNS TABLE(sku character varying, cannibalisationrevenue double
precision, cannibalisationmargin double precision)
LANGUAGE plpgsql
AS $function$ declare a numeric []:= offers;
i numeric;
begin
foreach i in array a loop return QUERY
select
base.sku,
i * base.similar_sku,
.................
Suppose I have a column name 'baseline', and have an array [1,2,3], I want to multiply a baseline column value where its id =1 with each element of the array.
Example:::
id | baseline
----+----------
1 | 3
suppose I have an array with values [2,3,4]; I want to multiply baseline= 3 with (3 *2) , (3*3), (3*4). and return 3 rows after multiplication with values 6, 9, 12.
The output should be:
id | result| number
----+-------+---------
1 6 2
1 9 3
1 12 4
OK, according to your description, just use unnest function, the example SQL as below:
with tmp_table as (
select 1 as id, 3 as baseline, '{2,3,4}'::int[] as arr
)
select id,baseline*unnest(arr) as result,unnest(arr) as number from tmp_table;
id | result | number
----+--------+--------
1 | 6 | 2
1 | 9 | 3
1 | 12 | 4
(3 rows)
You can just replace the CTE table_tmp above to your real table name.

how to optimize a query which needs timestamp normalization

I have the following data source, which has several physical values (one per column) coming from several devices at different times:
+-----------+------------+---------+-------+
| id_device | timestamp | Vln1 | kWl1 |
+-----------+------------+---------+-------+
| 123 | 1495696500 | | |
| 122 | 1495696800 | | |
| 122 | 1495697100 | 230 | 5.748 |
| 122 | 1495697100 | 230 | 5.185 |
| 124 | 1495700100 | 226.119 | 0.294 |
| 122 | 1495713900 | 230 | |
| 122 | 1495716000 | | |
| 122 | 1495716300 | 230 | |
| 122 | 1495716300 | | |
| 122 | 1495716300 | | |
| 122 | 1495716600 | 230 | 4.606 |
| 122 | 1495716600 | | |
| 124 | 1495739100 | | |
| 123 | 1495739400 | | |
+-----------+------------+---------+-------+
timestamp is (unfortunately) bigint and each device sends data at different times and with different frequency: some of the devices push every 5 mins, others every 10mins, other every 15 mins. The physical values could be NULL.
A front-end application needs to plot charts - let us say line charts - of a specific time stamp, with time ticks every minutes. Time ticks are chosen by the user.
The charts can be made of multiple physical values of multiple devices, and each line is a independent request made to the backend.
Let us think about a case where:
the chosen time tick is 10 mins
two lines to plot are chosen, having two different physical values (columns) on two different devices:
A device pushes every 5 mins
The other every 10 mins
What the front-end app expects are normalized results:
<timestamp>, <value>
Where
timestamp represents rounded time (00:00, 00:10, 00:20, and so forth)
in case there are more than one value in each "timebox" (ex: there will be 2 values for a device pushing every 5 minutes within 00:00 and 00:10), a single value will be returned, which is an aggregated value (AVG)
In order to accomplish this I created some plpgsql functions that help me, but I'm not sure that what I'm doing is the best in terms of performance.
Basically what I do is:
Get the data for the particular device and phisical measure, within the timespan selected
Normalize the data returned: each timestamp is rounded to the time ticks selected (i.e. 10:12:23 -> 10:10:00). That way, each tuple will represent a value within a "time bucket"
Create a range of time buckets, according the chosen time ticks the user selected
JOIN the timestamp-normalized data with the range. Aggregate in case of multiple values within the same range
Here are my functions:
create or replace function app_iso50k1.blkGetTimeSelParams(
t_end bigint,
t_granularity integer,
t_span bigint,
OUT delta_time_bucket interval,
OUT b_timebox timestamp,
OUT e_timebox timestamp)
as
$$
DECLARE
delta_time interval;
BEGIN
/* normalization: no minutes */
t_end = extract('epoch' from date_trunc('minute', (to_timestamp(t_end) at time zone 'UTC')::timestamp));
delta_time = app_iso50k1.blkGetDeltaTimeBucket(t_end, t_granularity);
e_timebox = date_trunc('minute', (to_timestamp(t_end - extract('epoch' from delta_time)) at time zone 'UTC'))::timestamp;
b_timebox = (to_timestamp(extract('epoch' from e_timebox) - t_span) at time zone 'UTC')::timestamp;
delta_time_bucket = delta_time;
END
$$ immutable language 'plpgsql' security invoker;
create or replace function app_iso50k1.getPhyMetData(
tablename character varying,
t_span bigint,
t_end bigint,
t_granularity integer,
idinstrum integer,
id_device integer,
varname character varying,
op character varying,
page_size int,
page int)
RETURNS TABLE(times bigint , val double precision) as
$$
DECLARE
series REFCURSOR;
serie RECORD;
first_notnull bool = false;
prev_val double precision;
time_params record;
q_offset int;
BEGIN
time_params = app_iso50k1.blkGetTimeSelParams(t_end, t_granularity, t_span);
if(page = 1) then
q_offset = 0;
else
q_offset = page_size * (page -1);
end if;
if not public.blkIftableexists('resgetphymetdata')
THEN
create temporary table resgetphymetdata (times bigint, val double precision);
ELSE
truncate table resgetphymetdata;
END IF;
execute format($ff$
insert into resgetphymetdata (
/* generate every possible range between these dates */
with ranges as (
select generate_series($1, $2, interval '$5 minutes') as range_start
),
/* normalize your data to which <t_granularity>-minute interval it belongs to */
rounded_hst as (
select
date_trunc ('minutes', (to_timestamp("timestamp") at time zone 'UTC')::timestamp)::timestamp -
mod (extract ('minutes' from ((to_timestamp("timestamp") at time zone 'UTC')::timestamp))::int, $5) * interval '1 minute' as round_time,
*
from public.%I
where
idinstrum = $3 and
id_device = $4 and
timestamp <= $8
)
select
extract('epoch' from r.range_start)::bigint AS times,
%s (hd.%I) AS val
from
ranges r
left join rounded_hst hd on r.range_start = hd.round_time
group by
r.range_start
order by
r.range_start
LIMIT $6 OFFSET $7
);
$ff$, tablename, op, varname) using time_params.b_timebox, time_params.e_timebox, idinstrum, id_device, t_granularity, page_size, q_offset, t_end;
/* data cleansing: val holes between not-null values are filled with the previous value */
open series no scroll for select * from resgetphymetdata;
loop
fetch series into serie;
exit when not found;
if NOT first_notnull then
if serie.val NOTNULL then
first_notnull = true;
prev_val = serie.val;
end if;
else
if serie.val is NULL then
update resgetphymetdata
set val = prev_val
where current of series;
else
prev_val = serie.val;
end if;
end if;
end loop;
close series;
return query select * from resgetphymetdata;
END;
$$ volatile language 'plpgsql' security invoker;
Do you see good alternatives to what I coded? Is there room for improvements?
Thanks!
You can fully translate your iterative logic with pure SQL query.
You can parametrize your query with a function.
For better performance use sql langage for your function.
You can build partial sum over timeseries interval using Window function like explained here
Window function trailing dates in PostgreSQL
Other suggestions
Manage null values with coalesce
Avoid timestamp conversion with a dedicated computed column
You can use small computed views and join them in your final query or use LATERAL JOIN

Is there an equivalent of DataFrames 'describe' to a postgresql table?

I want to summarize multiple tables in my database getting each columns statistics (min, max, avg, num of null values, etc.).
Is there a postgresql command/tool for doing that?
Postgresql maintains statistics on all tables. They are made visible via the pg_stats view.
It contains at least some of the information you are after, such as the proportion of null values, as well as other potentially useful info like histograms of most commonly occurring values, etc.
These statistics are maintained by the database itself, to aid in query planning.
Example Usage: Obtain fraction of nulls and number of distinct values in table 'foo':
ispdb_t1=> select tablename || '.' || attname as tablecolumn, null_frac, n_distinct from pg_stats where tablename='foo';
tablecolumn | null_frac | n_distinct
-------------------+-------------+------------
foo.name | 0 | -1
foo.a | 0.000785309 | 4
foo.b | 0.000241633 | 4
foo.id | 0 | -1
foo.d | 0 | 553
(6 rows)

How to convert string to integer in amazon redshift

I have a column that holds an id, currently as string. if the id is indeed a number I need to convert it to a real integer and if not, it should be converted to a null value. I would like to run an update query on the table and create a new integer id field.
I was unable to find exactly how to determine if the string is a number
Does any one know?
Thanks
Nir
Since Redshift does not support modifying a column type, it's better to create another table with your desired schema. The way is simply inserting a varchar column value into integer and insert it into a new table.
Here is an example:
dev=> CREATE TABLE table_varchar_id (id varchar(24), val varchar(24));
CREATE TABLE
dev=> INSERT INTO table_varchar_id values ('1111', 'aaaa'),('2222', 'bbbb'),('dummy1', 'cccc'),('dummy2', 'dddd');
INSERT 0 4
dev=> CREATE TABLE table_int_id (id int, val varchar(24));
CREATE TABLE
dev=>
dev=> INSERT INTO table_int_id (
dev(> SELECT
dev(> CASE REGEXP_COUNT(id, '^[0-9]+$')
dev(> WHEN 0 then NULL
dev(> ELSE id::integer
dev(> END as "id",
dev(> val
dev(> FROM
dev(> table_varchar_id
dev(> );
INSERT 0 4
dev=> SELECT * FROM table_varchar_id ORDER BY id;
id | val
--------+------
1111 | aaaa
2222 | bbbb
dummy1 | cccc
dummy2 | dddd
(4 rows)
dev=> SELECT * FROM table_int_id ORDER BY id;
id | val
------+------
1111 | aaaa
2222 | bbbb
| dddd
| cccc
(4 rows)