Proper way to archive large JSON objects in a PostgreSQL table that will be API accessible? - postgresql

I've been working at this problem for a bit now. I'm working on a statistics website as a hobby for a game that I play.
Basically I have a script accessing the game's API every 5 minutes (probably going to increase this to 15 minutes) and pulling the current state of all the matches at once. I was originally storing this object as a JSON column in my table. (Each row then had a 118kb object in the JSON column)
The problem was trying to query the table to get the entire archive for a one week period (which is the duration of the match). Basically, it was pulling 2016 118kb records for a week long match-up when all I wanted was a specific key out of the JSON. Requests to this API endpoint are taking about 10 seconds to complete!
I've only found ways in PostgreSQL to query a row based on a JSON key, but not a way to do something like SELECT match.kills FROM matches WHERE....
I've realized that that's not going to work so I want to try to take keys from the JSON objects and insert them into the corresponding table column.
The JSON object skeleton looks like this:
{
id: string,
start_time: timestamp,
end_time: timestamp,
scores: {
green: number,
blue: number,
red: number
},
worlds: number[],
all_worlds: number[][],
deaths: {
green: number,
blue: number,
red: number
},
kills: {
green: number,
blue: number,
red: number
},
maps: [
{
id: number,
type: string,
scores: same as above,
bonuses: {
type: string,
owner: string
},
deaths: same as above,
kills: same as above,
objectives: [
{
id: string,
type: string,
owner: string,
last_flipped: timestamp,
claimed_by: guild id (put this into another api endpoint),
claimed_at: timestamp
},
... (repeat 17 times)
]
},
... (repeat 3 times)
]
}
So I want to store this in my database with the keys as columns, but I'm not quite sure how to accomplish it for keys with values of the type object.
The end goal is to store this in a way that I'll have an API accessible by a URL such as:
mywebsite.com/api/v1/matcharchive?data=kills,deaths,score&matchid=1-1&archive_time=2016-07-09T02:00:00Z
and it will query the database for only those 3 keys in the object and return them.
What is the proper way to store a JSON object with this many keys into a PSQL table?

You just need to use the -> operator on the json field. This example is slightly edited so the auto-increment keys are a little off.
host=# create table tmp1 ( id serial primary key, data json);
CREATE TABLE
host=# \d tmp1
Table "public.tmp1"
Column | Type | Modifiers
--------+---------+---------------------------------------------------
id | integer | not null default nextval('tmp1_id_seq'::regclass)
data | json |
Indexes:
"tmp1_pkey" PRIMARY KEY, btree (id)
host=# insert into tmp1 (data) values ('{"a":1, "b":2}'), ('{"a":3, "b":4}'), ('{"a":5, "c":6}');
INSERT 0 3
host=# select * from tmp1;
id | data
----+----------------
2 | {"a":1, "b":2}
3 | {"a":3, "b":4}
4 | {"a":5, "c":6}
(3 rows)
host=# select id, data->'b' from tmp1;
id | ?column?
----+----------
2 | 2
3 | 4
4 |
(3 rows)

Related

How to sum the total length of an array of uuids column

Currently, I have 1 table consisting of id and otherIds
I want to calculate the sum of otherIds present in database.
id: 1, otherIds: {1,2,3,4,5}
id: 2, otherIds: {3,4,5}
id: 3, otherIds: {9,2,1}
id: 4, otherIds: {}
Desired result: 11 (5 + 3 + 3 + 0)
SELECT
sum(jsonb_array_elements("table"."otherIds")) as "sumLength"
FROM
"Table"
LIMIT 1
[42883] ERROR: function jsonb_array_elements(uuid[]) does not exist
I don't see how JSONB is relevant here. If otherIds is an array of UUID values then wouldn't you just need
SELECT
SUM(ARRAY_LENGTH("table"."otherIds")) as "sumLength"
FROM
"Table"
LIMIT 1
You can get the number of elements in an array with the cardinality() function. Just sum the results over all rows.
I'd like to remark that a table design that includes an array of UUIDs is not pretty and will probable gibe you performance and data integrity problems some day.

Left Join returned null for columns that have values

I have 2 tables:
table1:
id | item_id | item_name
1 | 1 | apple
table2:
id | item_id | item_price
Table 1 has some data and table 2 will not have any data yet but I want to show them in a html table. I am joining the 2 tables hopefully to get a json object:
{id: 1, item_id: 1, item_name: apple, item_price: null}.
But I got this json object instead which is not desired:
{id: null, item_id: null, item_name: apple, item_price: null}
Using knexjs, this is the code that I use to join the tables:
database.select ('*') from ('table1).leftJoin('table2', 'table1.item_id', 'table2.item_id').then(function(data) {
console.log(data)
};
Am I joining incorrectly? I am using node express server and a postgresql database for this. I want the id and item_id not to return null since they have values. Or is there a way to get all values from all columns besides joining tables?
I guess the issue with column name overwriting. Do something like-
database('table1')
.leftJoin('table2', 'table2.item_id', 'table1.item_id')
.columns([
'table1.id',
'table1.item_id',
'table1.item_name',
'table2.price'
])
.then((results) => {
})
I do not know knex.js but I would try database.select('table1.*')....

Is there a way to generate columns in a view based on table row data?

I have this table which contains the settings of an app & I just want to show it in the view. The data of each setting is stored as a row.
Code (varchar64)| Value (varchar1000)
----------------------
ALLOW_MAC_ADDR | 1
----------------------
ALLOW_SAVE | 1
----------------------
USER_ALIAS | James
Now this is where it gets kinda complicated, I have to convert these rows into a jsonb at the view. The key for value column name has to be based on the value of the Code column data.
Here is an example of prefered jsonb:
[dt:{ALLOW_MAC_ADDR: 1, ALLOW_SAVE: 1, USER_ALIAS: 'James'}]
I'm thinking of doing some like this in my view:
SELECT .. FROM generate_jsonb()
So how do I achieve such jsonb?
EDIT: I'm using v9.6 if that helps.
https://www.postgresql.org/docs/current/static/functions-json.html
aggregate function json_object_agg which aggregates pairs of values
into a JSON object
eg:
t=# create table tt(code text, value text);
CREATE TABLE
t=# insert into tt values('ALLOW_MAC_ADDR',1),('USER_ALIAS','James');
INSERT 0 2
t=# select json_object_agg(code,value) from tt;
json_object_agg
----------------------------------------------------
{ "ALLOW_MAC_ADDR" : "1", "USER_ALIAS" : "James" }
(1 row)

unique date field postgresql default value

I have a date column which I want to be unique once populated, but want the date field to be ignored if it is not populated.
In MySQL the way this is accomplished is to set the date column to "not null" and give it a default value of '0000-00-00' - this allows all other fields in the unique index to be "checked" even if the date column is not populated yet.
This does not work in PosgreSQL because '0000-00-00' is not a valid date, so you cannot store it in a date field (this makes sense to me).
At first glance, leaving the field nullable seemed like an option, but this creates a problem:
=> create table uniq_test(NUMBER bigint not null, date DATE, UNIQUE(number, date));
CREATE TABLE
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> select * from uniq_test;
number | date
--------+------
1 |
1 |
1 |
1 |
(4 rows)
NULL apparently "isn't equal to itself" and so it does not count towards constraints.
If I add an additional unique constraint only on the number field, it checks only number and not date and so I cannot have two numbers with different dates.
I could select a default date that is a 'valid date' (but outside working scope) to get around this, and could (in fact) get away with that for the current project, but there are actually cases I might be encountering in the next few years where it will not in fact be evident that the date is a non-real date just because it is "a long time ago" or "in the future."
The advantage the '0000-00-00' mechanic had for me was precisely that this date isn't real and therefore indicated a non-populated entry (where 'non-populated' was a valid uniqueness attribute). When I look around for solutions to this on the internet, most of what I find is "just use NULL" and "storing zeros is stupid."
TL;DR
Is there a PostgreSQL best practice for needing to include "not populated" as a possible value in a unique constraint including a date field?
Not clear what you want. This is my guess:
create table uniq_test (number bigint not null, date date);
create unique index i1 on uniq_test (number, date)
where date is not null;
create unique index i2 on uniq_test (number)
where date is null;
There will be an unique constraint for not null dates and another one for null dates effectively turning the (number, date) tuples into distinct values.
Check partial index
It's not a best practice, but you can do it such way:
t=# create table so35(i int, d date);
CREATE TABLE
t=# create unique index i35 on so35(i, coalesce(d,'-infinity'));
CREATE INDEX
t=# insert into so35 (i) select 1;
INSERT 0 1
t=# insert into so35 (i) select 2;
INSERT 0 1
t=# insert into so35 (i) select 2;
ERROR: duplicate key value violates unique constraint "i35"
DETAIL: Key (i, (COALESCE(d, '-infinity'::date)))=(2, -infinity) already exists.
STATEMENT: insert into so35 (i) select 2;

Why SELECT with WHERE clause returns 0 rows on Cassandra's table? (should return 2 rows)

I created a minimal example of users TABLE on Cassandra 2.0.9 database. I can use SELECT to select all its rows, but I do not understand why adding my WHERE clause (on indexed collumn) returns 0 rows.
(I also do not get why 'COINTAINS' statement causes an error here, as presented below, but let's assume this is not my primary concern. )
DROP TABLE IF EXISTS users;
CREATE TABLE users
(
KEY varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint
);
INSERT INTO users (KEY, gender, password) VALUES ('jessie', 'f', 'avlrenfls');
INSERT INTO users (KEY, gender, password) VALUES ('kate', 'f', '897q7rggg');
INSERT INTO users (KEY, gender, password) VALUES ('mike', 'm', 'mike123');
CREATE INDEX ON users (gender);
DESCRIBE TABLE users;
Output:
CREATE TABLE users (
key text,
birth_year bigint,
gender text,
password text,
session_token text,
state text,
PRIMARY KEY ((key))
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX users_gender_idx ON users (gender);
This SELECT works OK
SELECT * FROM users;
key | birth_year | gender | password | session_token | state
--------+------------+--------+-----------+---------------+-------
kate | null | f | 897q7rggg | null | null
jessie | null | f | avlrenfls | null | null
mike | null | m | mike123 | null | null
And this does not:
SELECT * FROM users WHERE gender = 'f';
(0 rows)
This also fails:
SELECT * FROM users WHERE gender CONTAINS 'f';
Bad Request: line 1:33 no viable alternative at input 'CONTAINS'
It sounds like your index may have become corrupt. Try rebuilding it. Run this from a command prompt:
nodetool rebuild_index yourKeyspaceName users users_gender_idx
However, the larger issue here is that secondary indexes are known to perform poorly. Some have even identified their use as an anti-pattern. DataStax has a document designed to guide you in appropriate use of secondary indexes. And this is definitely not one of them.
creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.
While gender may not be a boolean column, it has the same cardinality. A secondary index on this column is a terrible idea.
If querying by gender is something you really need to do, then you may need to find a different way to model or partition your data. For instance, PRIMARY KEY (state, gender, key) will allow you to query gender by state.
SELECT * FROM users WHERE state='WI' and gender='f';
That would return all female users from the state of Wisconsin. Of course, that would mean you would also have to query all states individually. But the bottom line, is that Cassandra does not handle queries for low cardinality keys/indexes well, so you have to be creative in how you solve these types of problems.