T-SQL data type for 39-digit number? - tsql

I'm using the free IPv6 data table from Ip2Location. The table schema is defined as:
CREATE TABLE [ip2location].[dbo].[ip2location_db11_ipv6](
[ip_from] char(39) NOT NULL,
[ip_to] char(39) NOT NULL,
[country_code] nvarchar(2) NOT NULL,
[country_name] nvarchar(64) NOT NULL,
[region_name] nvarchar(128) NOT NULL,
[city_name] nvarchar(128) NOT NULL,
[latitude] float NOT NULL,
[longitude] float NOT NULL,
[zip_code] nvarchar(30) NOT NULL,
[time_zone] nvarchar(8) NOT NULL
) ON [PRIMARY]
GO
CREATE INDEX [ip_from] ON [ip2location].[dbo].[ip2location_db11_ipv6]([ip_from]) ON [PRIMARY]
GO
CREATE INDEX [ip_to] ON [ip2location].[dbo].[ip2location_db11_ipv6]([ip_to]) ON [PRIMARY]
GO
Table queries are done with a converted IPv6 address to an IpNumber. The provided search query is:
SELECT *
FROM ip2location_db11_ipv6
WHERE #IpNumber BETWEEN ip_from AND ip_to
My average query time based on a few random look ups was ~1.2 seconds. While there are 3,458,959 records in the table, this seems a bit slow to me (is it? I'm by no means a SQL guru). My first thought was to make the ip_from and ip_to columns a numeric data type, but the maximum value is 58569107375850417935858934690443427840 (39 digits), which falls outside of the maximum range for a DECIMAL type. Is there anything that can be done to improve the query time for this?

You may try to use BINARY(17), 39 decimals fits into 17 bytes. Binary type doc.

Related

Is it possible to look at the output of the previous row of a PostgreSQL query?

This is the question: Is it possible to look at the outputs, what has been selected, from the previous row of a running SQL query in Postgres?
I know that lag exists to look at the inputs, the "from" of the query. I also know that a CTE, subquery or lateral join can solve most issues of this kind. But I think the problem I'm facing genuinely requires a peek at the output of the previous row. Why? Because the output of the current row depends on a constant from a lookup table and the value used too look up that constant is an aggregate of all the previous rows. And if that lookup returns the wrong constant all subsequent rows will be increasingly off from the expected value.
The whole rest of this text is a simplified example based on the problem I'm facing. It should be possible to input it to PostgreSQL 12 and above and play around. I'm terribly sorry that it is as complicated as it is, but I think it is the most simple I can make it while still retaining the core issue: lookup in lookup table based on an aggregate from all previous rows as well as the fact that the "inventory" that's being tracked is modeled as a series of transactions of two discrete types.
The database itself exists to keep track of multiple fish farms, or cages full of fish. Fish can be moved/transferred from between these farms and the farms are fed about daily. Why not just carry the aggregate as a field in the table? Because it should be possible to switch out the lookup table after the season is over, to adjust it to better match with reality.
-- A listing of all groups of fish ever grown.
create table farms (
id bigserial primary key,
start timestamp not null,
stop timestamp
);
insert into farms
(id, start)
values (
1, '2021-02-01T13:37'
);
-- A transfer of fish from one odling to another.
-- If the source is null the fish is transferred from another fishery outside our system.
-- If the destination is null the fish is being slaughtered, removed from the system.
create table transfers (
source bigint references farms(id),
destination bigint references farms(id),
timestamp timestamp not null default current_timestamp,
total_weight_g bigint not null constraint positive_nonzero_total_weight_g check (total_weight_g > 0),
average_weight_g bigint not null constraint positive_nonzero_average_weight_g check (average_weight_g > 0),
number_fish bigint generated always as (total_weight_g / average_weight_g) stored
);
insert into transfers
(source, destination, timestamp, total_weight_g, average_weight_g)
values
(null, 1, '2021-02-01T16:38', 5, 5),
(null, 1, '2021-02-15T16:38', 500, 500);
-- Transactions of fish feed into a farm.
create table feedings (
id bigserial primary key,
growth_table bigint not null,
farm bigint not null references farms(id),
amount_g bigint not null constraint positive_nonzero_amunt_g check (amount_g > 0),
timestamp timestamp not null
);
insert into feedings
(farm, growth_table, amount_g, timestamp)
values
(1, 1, 1, '2021-02-02T13:37'),
(1, 1, 1, '2021-02-03T13:37'),
(1, 1, 1, '2021-02-04T13:37'),
(1, 1, 1, '2021-02-05T13:37'),
(1, 1, 1, '2021-02-06T13:37'),
(1, 1, 1, '2021-02-07T13:37');
create view combined_feed_and_transfer_history as
with transfer_history as (
select timestamp, destination as farm, total_weight_g, average_weight_g, number_fish
from transfers as deposits
where deposits.destination = 1 -- TODO: This view only works for one farm, fix that.
union all
select timestamp, source as farm, -total_weight_g, -average_weight_g, -number_fish
from transfers as withdrawals
where withdrawals.source = 1
)
select timestamp, farm, total_weight_g, number_fish, average_weight_g, null as growth_table
from transfer_history
union all
select timestamp, farm, amount_g, 0 as number_fish, 0 as average_weight_g, growth_table
from feedings
order by timestamp;
-- Conversion tables from feed to gained weight.
create table growth_coefficients (
growth_table bigserial not null,
average_weight_g bigint not null constraint positive_nonzero_weight check (average_weight_g > 0),
feed_conversion_rate double precision not null constraint positive_foderkonverteringsfaktor check (feed_conversion_rate >= 0),
primary key(growth_table, average_weight_g)
);
insert into growth_coefficients
(average_weight_g, feed_conversion_rate, growth_table)
values
(5.00,0.10,1),
(10.00,10.00,1),
(20.00,1.30,1),
(50.00,1.31,1),
(100.00,1.32,1),
(300.00,1.36,1),
(600.00,1.42,1),
(1000.00,1.50,1),
(1500.00,1.60,1),
(2000.00,1.70,1),
(2500.00,1.80,1),
(3000.00,1.90,1),
(4000.00,2.10,1),
(5000.00,2.30,1);
-- My current solution is a bad one. It does a CTE that sums over all events but does not account
-- for the feed conversion rate. That means that the average weight used too look up the feed
-- conversion rate will diverge more and more from reality the further into the season time goes.
-- This is why it is important to look at the output, the average weight, of the previous row.
-- We start by summing up all the transfer and feed events to get a rough average_weight_g.
with estimate as (
select
timestamp,
farm,
total_weight_g as transaction_size_g,
growth_table,
sum(total_weight_g) over (order by timestamp) as sum_weight_g,
sum(number_fish) over (order by timestamp) as sum_number_fish,
sum(total_weight_g) over (order by timestamp) / sum(number_fish) over (order by timestamp) as average_weight_g
from
combined_feed_and_transfer_history
)
select
timestamp,
sum_number_fish,
transaction_size_g as trans_g,
sum_weight_g,
closest_lookup_table_weight.average_weight_g as lookup_g,
converted_weight_g as conv_g,
sum(converted_weight_g) over (order by timestamp) as sum_conv_g,
sum(converted_weight_g) over (order by timestamp) / sum_number_fish as sum_average_g
from
estimate
join lateral ( -- We then use this estimated_average_weight to look up the closest constant in the growth coefficient table.
(select gc.average_weight_g - estimate.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g >= estimate.average_weight_g order by gc.average_weight_g asc limit 1)
union all
(select estimate.average_weight_g - gc.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g <= estimate.average_weight_g order by gc.average_weight_g desc limit 1)
order by diff
limit 1
) as closest_lookup_table_weight
on true
join lateral ( -- If the historical event is a feeding we need to lookup the feed conversion rate.
select case when growth_table is null then 1
else (select feed_conversion_rate
from growth_coefficients gc
where gc.growth_table = growth_table
and gc.average_weight_g = closest_lookup_table_weight.average_weight_g)
end
) as growth_coefficient
on true
join lateral (
select feed_conversion_rate * transaction_size_g as converted_weight_g
) as converted_weight_g
on true;
At the very bottom is my current "solution". With the above example data the sum_conv_g should end up being 5.6, but due to the aggregate being used as the lookup not accounting for the conversion rate the sum_conv_g ends up 45.2 instead.
One idea I had was if there perhaps something like query-local variables one could use to store the sum_average_g between rows? There's always the escape hatch of just querying out the transactions to my generic programming language Clojure and solving it there, but it would be neat if it could be solved entirely within the database.
You have to formulate a recursive subquery. I posted a simplified version of this question over at the DBA SE and got the answer there. The answer to that question can be found here and can be expanded to this more complicated question, though I would wager that no one will ever have the interest to do that.

Speed up heavy UPDATE..FROM..WHERE PostgreSQL query

I have 2 big tables
CREATE TABLE "public"."linkages" (
"supplierid" integer NOT NULL,
"articlenumber" character varying(32) NOT NULL,
"article_id" integer,
"vehicle_id" integer
);
CREATE INDEX "__linkages_datasupplierarticlenumber" ON "public"."__linkages" USING btree ("datasupplierarticlenumber");
CREATE INDEX "__linkages_supplierid" ON "public"."__linkages" USING btree ("supplierid");
having 215 000 000 records, and
CREATE TABLE "public"."article" (
"id" integer DEFAULT nextval('tecdoc_article_id_seq') NOT NULL,
"part_number" character varying(32),
"supplier_id" integer,
CONSTRAINT "tecdoc_article_part_number_supplier_id" UNIQUE ("part_number", "supplier_id")
) WITH (oids = false);
having 5 500 000 records.
I need to update linkages.article_id according article.part_number and article.supplier_id, like this:
UPDATE linkages
SET article_id = article.id
FROM
article
WHERE
linkages.supplierid = article.supplier_id AND
linkages.articlenumber = article.part_number;
But it is to heavy. I tried this, but it processed for a day with no result. So I had terminated it.
I need to do this update only once to normalize my datatable structure for using foreign keys in Django ORM. How can I resolve this issue?
Thanks a lot!

Select largest absolute value column pairs with headers per row

I am using: Microsoft SQL Server 2014 - 12.0.4213.0
Here is my sample table (numbers fuzzed):
CREATE TABLE most_recent_counts(
State VARCHAR(2) NOT NULL PRIMARY KEY
,BuildDate DATE NOT NULL
,Count_1725_Change INTEGER NOT NULL
,Count_1725_Percent_Change NUMERIC(20,2) NOT NULL
,Count_2635_Change INTEGER NOT NULL
,Count_2635_Percent_Change NUMERIC(20,2) NOT NULL
,Count_3645_Change INTEGER NOT NULL
,Count_3645_Percent_Change NUMERIC(20,2) NOT NULL
);
INSERT INTO most_recent_counts(State,BuildDate,Count_1725_Change,Count_1725_Percent_Change,Count_2635_Change,Count_2635_Percent_Change,Count_3645_Change,Count_3645_Percent_Change) VALUES ('AK','2018-06-05',1025,5.00,1700,2.50,2050,3.00);
INSERT INTO most_recent_counts(State,BuildDate,Count_1725_Change,Count_1725_Percent_Change,Count_2635_Change,Count_2635_Percent_Change,Count_3645_Change,Count_3645_Percent_Change) VALUES ('AL','2018-06-02',15000,4.00,10400,2.00,6800,1.25);
INSERT INTO most_recent_counts(State,BuildDate,Count_1725_Change,Count_1725_Percent_Change,Count_2635_Change,Count_2635_Percent_Change,Count_3645_Change,Count_3645_Percent_Change) VALUES ('AR','2018-06-07',2300,1.00,2700,1.00,1800,0.50);
INSERT INTO most_recent_counts(State,BuildDate,Count_1725_Change,Count_1725_Percent_Change,Count_2635_Change,Count_2635_Percent_Change,Count_3645_Change,Count_3645_Percent_Change) VALUES ('AZ','2018-04-26',107000,5.50,45400,3.00,180000,16.00);
INSERT INTO most_recent_counts(State,BuildDate,Count_1725_Change,Count_1725_Percent_Change,Count_2635_Change,Count_2635_Percent_Change,Count_3645_Change,Count_3645_Percent_Change) VALUES ('CA','2018-06-07',140000,6.00,550000,14.00,600000,18.00);
It should look something like this:
IMG: https://i.imgur.com/KGkfm66.png
In the real table, I have some 600ish such counts.
I would like to produce a table from this table, where for each state, I have the top ten (in magnitude) pairs of columns (i.e. The abs. change, and the percent change) (i.e. if in Alabama's row, there is a minus 10 million count in the sales to people in the 46-55 range, that should definitely be part of the result set, even if the rest of the columns are positive accruals in the thousands)
What's the best way to do this?

Insert Update/Merge/Dimension Lookup/ Update using Pentaho

There is a table 'TICKETS' in PostgreSQL.I perform an ETL job using Pentaho to populate this table.
There is also a GUI on which a user makes changes and the result is reflected in this table.
The fields in the table are :
"OID" Char(36) <------ **PRIMARY KEY**
, "CUSTOMER" VARCHAR(255)
, "TICKETID" VARCHAR(255)
, "PRIO_ORIG" CHAR(36)
, "PRIO_COR" CHAR(36)
, "CATEGORY" VARCHAR(255)
, "OPENDATE_ORIG" TIMESTAMP
, "OPENDATE_COR" TIMESTAMP
, "TTA_ORIG" TIMESTAMP
, "TTA_COR" TIMESTAMP
, "TTA_DUR" DOUBLE PRECISION
, "MTTA_TARGET" DOUBLE PRECISION
, "TTA_REL_ORIG" BOOLEAN
, "TTA_REL_COR" BOOLEAN
, "TTA_DISCOUNT_COR" DOUBLE PRECISION
, "TTA_CHARGE_COR" DOUBLE PRECISION
, "TTR_ORIG" TIMESTAMP
, "TTR_COR" TIMESTAMP
, "TTR_DUR" DOUBLE PRECISION
, "MTTR_TARGET" DOUBLE PRECISION
, "TTR_REL_ORIG" BOOLEAN
, "TTR_REL_COR" BOOLEAN
, "TTR_DISCOUNT_COR" DOUBLE PRECISION
, "TTR_CHARGE_COR" DOUBLE PRECISION
, "COMMENT" VARCHAR(500)
, "USER" CHAR(36)
, "MODIFY_DATE" TIMESTAMP
, "CORRECTED" BOOLEAN
, "OPTIMISTICLOCKFIELD" INTEGER
, "GCRECORD" INTEGER
, "ORIGINATOR" Char(36)
I want to update the table when columns TICKETID+ORIGINATOR+CUSTOMERS are same. Otherwise, an insert will be performed.
How should I do it using Pentaho? Is the step Dimension/Lookup update fine for it or just the Update/Insert step will do the work ?
Any help would be much appreciated. Thanks in advance.
Eugene Lisitsky suggestion is good practice: you may hard wire it in the database constraints and let PostgesSQL do the job.
For a PDI solution: your table does not look like Slowly Changing Dimension therefore the Insert/Update covers your need.
If you want to use the Dimension_update, you need to alter the table in the Pentaho SCD format: add a version column and valid_form_date/valid_upto_date (with PDI the alter is a one button operation).
After that, when a new row comes in, the TICKETID+ORIGINATOR+CUSTOMERS is searched in the table and if found it receives a valitity_upto=now(). At the same time, a version+1 is created in the table valid from now() to the end-of-time.
The (main) pro is that you can retrieve the state of the database as it was at any date in the past with a simple where now() between validity_from and validity_upto.
The (mian) con is that you have to alter the table which may have some impact on the GUIs (plural).

an empty row with null-like values in not-null field

I'm using postgresql 9.0 beta 4.
After inserting a lot of data into a partitioned table, i found a weird thing. When I query the table, i can see an empty row with null-like values in 'not-null' fields.
That weird query result is like below.
689th row is empty. The first 3 fields, (stid, d, ticker), are composing primary key. So they should not be null. The query i used is this.
select * from st_daily2 where stid=267408 order by d
I can even do the group by on this data.
select stid, date_trunc('month', d) ym, count(*) from st_daily2
where stid=267408 group by stid, date_trunc('month', d)
The 'group by' results still has the empty row.
The 1st row is empty.
But if i query where 'stid' or 'd' is null, then it returns nothing.
Is this a bug of postgresql 9b4? Or some data corruption?
EDIT :
I added my table definition.
CREATE TABLE st_daily
(
stid integer NOT NULL,
d date NOT NULL,
ticker character varying(15) NOT NULL,
mp integer NOT NULL,
settlep double precision NOT NULL,
prft integer NOT NULL,
atr20 double precision NOT NULL,
upd timestamp with time zone,
ntrds double precision
)
WITH (
OIDS=FALSE
);
CREATE TABLE st_daily2
(
CONSTRAINT st_daily2_pk PRIMARY KEY (stid, d, ticker),
CONSTRAINT st_daily2_strgs_fk FOREIGN KEY (stid)
REFERENCES strgs (stid) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT st_daily2_ck CHECK (stid >= 200000 AND stid < 300000)
)
INHERITS (st_daily)
WITH (
OIDS=FALSE
);
The data in this table is simulation results. Multithreaded multiple simulation engines written in c# insert data into the database using Npgsql.
psql also shows the empty row.
You'd better leave a posting at http://www.postgresql.org/support/submitbug
Some questions:
Could you show use the table
definitions and constraints for the
partions?
How did you load your data?
You get the same result when using
another tool, like psql?
The answer to your problem may very well lie in your first sentence:
I'm using postgresql 9.0 beta 4.
Why would you do that? Upgrade to a stable release. Preferably the latest point-release of the current version.
This is 9.1.4 as of today.
I got to the same point: "what in the heck is that blank value?"
No, it's not a NULL, it's a -infinity.
To filter for such a row use:
WHERE
case when mytestcolumn = '-infinity'::timestamp or
mytestcolumn = 'infinity'::timestamp
then NULL else mytestcolumn end IS NULL
instead of:
WHERE mytestcolumn IS NULL