Generating a random 10-digit ID number in PostgreSQL - postgresql

I apologize if this has been answered elsewhere. I found similar posts but nothing that exactly matches what I was looking for. I am wondering if it is possible to randomly assign a number as data is entered into a table. For example, for this table
CREATE TABLE test (
id_number INTEGER NOT NULL,
second_number INTEGER NOT NULL)
I would like to be able to do the following:
INSERT INTO test (second_number)
VALUES (1234567)
where id_number is then populated with a random 10-digit number. I guess I want this similar to SERIAL in the way it populates but it needs to be 10 digits and random. Thanks very much

You can try this expression:
CAST(1000000000 + floor(random() * 9000000000) AS bigint)
This will give you a random number between 1000000000 and 9999999999.

Related

POSTGRES SUM big someSting::decimal

I try to SUM and cast at the same time. I have a column with big numbers with a lot of decimals for example: "0.0000000000000000000000000000000000000000000043232137067129047"
when I try sum(amount::decimal) I get the following error message org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [22003]: ERROR: value overflows numeric format Where: parallel worker
What I don't get is that the doc is saying up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
And my longest casted string is 63 digits so I don't get it.
What am I missing and how could I make my sum ?
EDIT:
amount type is varchar(255)
EDIT2:
I found out it's only when I try to CREATE a table from this request that it breaks, request is working fine in itself, how can it be due to create table ?
Complete request:
create table cross_dapp_ft as (select sender,receiver,sum(amount::decimal),contract from ft_transfer_event ftce
where receiver in (
select account_id from batch.cc cc
where classification not in ('ft')
)
group by sender,receiver,contract);
As Samuel Liew suggested in the comments, some rows where corrupted. Conclusion is , to be safe don't store numbers as string.

Would it be possible to select random rows with a little preference for a specific column?

I would like to get a random selection of records from my table but I wonder if it would be possible to give a better chance for items that are newly created. I also have pagination so this is why I'm using setseed
Currently I'm only retrieving items randomly and it works quite well, but I need to give a certain "preference" to newly created items.
Here is what I'm doing for now:
SELECT SETSEED(0.16111981), RANDOM();
I don't know what to do and I can't figure what can be a good solution without being an absolute performance disaster.
Firstly I want to explain how we can select random records on a table. On PostgreSQL, we can use random() function in the order by statement. Example:
select * from test_table
order by random()
limit 1;
I am using limit 1 for selecting only one record. But, using this method our query performance will be very bad for large size tables (over 100 million data)
The second way, you can manually be selecting records using random() if the tables are had id fields. This way is very high performance.
Let's firstly write our own randomize function for using it's easily on our queries.
CREATE OR REPLACE FUNCTION random_between(low integer, high integer)
RETURNS integer
LANGUAGE plpgsql
STRICT
AS $function$
BEGIN
RETURN floor(random()* (high-low + 1) + low);
END;
$function$;
This function returns a random integer value in the range of our input argument values. Then we can write a query using our random function. Example:
select * from test_table
where id = (select random_between(min(id), max(id)) from test_table);
This query I tested on the table has 150 million data and gets the best performance, Duration 12 ms. In this query, if you need many rows but not one, then you can write where id > instead of where id=.
Now, for your little preference, I don't know your detailed business logic and condition statements which you want to set to randomizing. I can write for you some sample queries for understanding the mechanism. PostgreSQL has not a function for doing this process, so randomize data using preferences. We must write this logic manually. I created a sample table for testing our queries.
CREATE TABLE test_table (
id serial4 NOT NULL,
is_created bool NULL,
action_date date NULL,
CONSTRAINT test_table_pkey PRIMARY KEY (id)
);
CREATE INDEX test_table_id_idx ON test_table USING btree (id);
For example, I want to set more preference only to data which are action dates has a closest to today. Sample query:
select
id,
is_created,
action_date,
(extract(day from (now()-action_date))) as dif_days
from
test.test_table
where
id > (select random_between(min(id), max(id)) from test.test_table)
and
(extract(day from (now()-action_date))) = random_between(0, 6)
limit 1;
In this query this (extract(day from (now()-action_date))) as dif_days query will returned difference between action_date and today. On the where clause firstly I select data that are id field values greater than the resulting randomize value. Then using this query (extract(day from (now()-action_date))) = random_between(0, 6) I select from this resulting data only which data are action_date equals maximum 6 days ago (maybe 4 days ago or 2 days ago, mak 6 days ago).
Сan wrote many logic queries (for example set more preferences using boolean fields: closed are opened and etc.)

PostgreSQL indexed columns choice

I have these two tables :
CREATE TABLE ref_dates(
date_id SERIAL PRIMARY KEY,
month int NOT NULL,
year int NOT NULL,
month_name CHAR(255)
);
CREATE TABLE util_kpi(
kpi_id SERIAL PRIMARY KEY,
kpi_description int NOT NULL,
kpi_value float,
date_id int NOT NULL,
dInsertion timestamp default CURRENT_TIMESTAMP,
CONSTRAINT fk_ref_kpi FOREIGN KEY (date_id) REFERENCES ref_dates(date_id)
);
Usually, the type of request i'd do is :
Selecting kpi_description and kpi_value for a specified month and year:
SELECT kpi_description, kpi_value FROM util_kpi u JOIN ref_dates r ON u.date_id = r.date_id WHERE month=X AND year=XXXX
Selecting kpi_description and kpi_value for a specified kpi_description, month and year:
SELECT kpi_description, kpi_value FROM util_kpi u JOIN ref_dates r ON u.date_id = r.date_id WHERE month=X AND year=XXXX AND kpi_description='XXXXXXXXXXX'
I tought about creating these indexes :
CREATE INDEX idx_ref_date_year_month ON ref_dates(year, month);
CREATE INDEX idx_util_kpi_date ON util_kpi(date_id);
First of all, i want to know if it's a good idea to create these indexes.
Second of all and finally, I was wondering if it's a good idea to add kpi_description to the indexes on util_kpi table.
Can you guys give me your opinion ?
Regards
It's not possible to give exact answer without looking on data.
So it's only possible to give an opinion.
A. ref_dates
This table looks very similar to date dimension in ROLAP-schemas.
So the first what I would do: is change date_id from SERIAL to:
DATE datatype
or even "smart integer": integer datatype but in form YYYYMMDD. E.g. 20210430. It may look strange but it's not uncommon to see such identificators in date dimensions
The main point in using such form is that date_id in fact tables became informative even without joining to date dimension.
B. util_kpi
I suppose that:
ref_dates is a date dimension. So it's ~365 * number of years rows. It could be populated once for 20-30 years for future and it's still will not be really big
util_kpi is fact table. Which must be big like "really big" - millions and more records.
For `util_kpi' I expected id of time dimension but did not found it. So no hourly stats are supposed yet.
I see util_kpi.dInsertion - which I suppose is planned to be used as time dimension. I would think to extract it into time_id where put hours, minutes and seconds (if milliseconds are not needed).
C.Indexing
ref_dates: it does not matters a lot how you index ref_dates because it's a relatively small table. Maybe unique index on date_id with INCLUDE options for all fields would be the best. Don't create individual index for fields with low selectivity like year or month - it will not make much sense but it will not harm a lot too.
util_kpi - you need an index on date_id (as for any foreign keys to other dimension tables that will appear in future).
That's my thoughts that based on what I supposed.

How can I sum/subtract time values from same row

I want to sum and subtract two or more timestamp columns.
I'm using PostgreSQL and I have a structure as you can see:
I can't round the minutes or seconds, so I'm trying to extract the EPOCH and doing the operation after, but I always get an error because the first EXTRACT recognizes the column, but when I put the second EXTRACT in the same SQL command I get an error message saying that the second column does not exist.
I'll give you an example:
SELECT
EXAMPLE.PERSON_ID,
COALESCE(EXTRACT(EPOCH from EXAMPLE.LEFT_AT),0) +
COALESCE(EXTRACT(EPOCH from EXAMPLE.ARRIVED_AT),0) AS CREDIT
FROM
EXAMPLE
WHERE
EXAMPLE.PERSON_ID = 1;
In this example I would get an error like:
Column ARRIVED_AT does not exist
Why is this happening?
Could I sum/subtract time values from same row?
Is ARRIVED_AT a calculated value instead of a column? What did you run to get the query results image you posted showing those columns?
The following script does what you expect, so there's something about the structure of the table you're querying that isn't what you expect.
CREATE SCHEMA so46801016;
SET search_path=so46801016;
CREATE TABLE trips (
person_id serial primary key,
arrived_at time,
left_at time
);
INSERT INTO trips (arrived_at, left_at) VALUES
('14:30'::time, '19:30'::time)
, ('11:27'::time, '20:00'::time)
;
SELECT
t.person_id,
COALESCE(EXTRACT(EPOCH from t.left_at),0) +
COALESCE(EXTRACT(EPOCH from t.arrived_at),0) AS credit
FROM
trips t;
DROP SCHEMA so46801016 CASCADE;

strange result when use Where filter in CQL cassandra

i have a column family use counter as create table command below: (KEY i use bigin to filter when query ).
CREATE TABLE BannerCount (
KEY bigint PRIMARY KEY
) WITH
comment='' AND
comparator=text AND
read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
default_validation=counter AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
But when i insert data to this column family , and select using Where command to filter data
results i retrived very strange :( like that:
use Query:
select count(1) From BannerCount where KEY > -1
count
-------
71
use Query:
select count(1) From BannerCount where KEY > 0;
count
-------
3
use Query:
select count(1) From BannerCount ;
count
-------
122
What happen with my query , who any tell me why i get that :( :(
To understand the reason for this, you should understand Cassandra's data model. You're probably using RandomPartitioner here, so each of these KEY values in your table are being hashed to token values, so they get stored in a distributed way around your ring.
So finding all rows whose key has a higher value than X isn't the sort of query Cassandra is optimized for. You should probably be keying your rows on some other value, and then using either wide rows for your bigint values (since columns are sorted) or put them in a second column, and create an index on it.
To explain in a little more detail why your results seem strange: CQL 2 implicitly turns "KEY >= X" into "token(KEY) >= token(X)", so that a querier can iterate through all the rows in a somewhat-efficient way. So really, you're finding all the rows whose hash is greater than the hash of X. See CASSANDRA-3771 for how that confusion is being resolved in CQL 3. That said, the proper fix for you is to structure your data according to the queries you expect to be running on it.