Compute shared hstore key names in Postgresql - postgresql

If I have a table with an HSTORE column:
CREATE TABLE thing (properties hstore);
How could I query that table to find the hstore key names that exist in every row.
For example, if the table above had the following data:
properties
-------------------------------------------------
"width"=>"b", "height"=>"a"
"width"=>"b", "height"=>"a", "surface"=>"black"
"width"=>"c"
How would I write a query that returned 'width', as that is the only key that occurs in each row?
skeys() will give me all the property keys, but I'm not sure how to aggregate them so I only have the ones that occur in each row.

The manual gets us most of the way there, but not all the way... way down at the bottom of http://www.postgresql.org/docs/8.3/static/hstore.html under the heading "Statistics", they describe a way to count keys in an hstore.
If we adapt that to your sample table above, you can compare the counts to the # of rows in the table.
SELECT key
FROM (SELECT (each(properties)).key FROM thing1) AS stat
GROUP BY key
HAVING count(*) = (select count(*) from thing1)
ORDER BY key;
If you want to find the opposite (all those keys that are not in every row of your table), just change the = to < and you're in business!

Related

Amazon Redshift COMPOUND SORTKEY - does insertion order matter?

Let's say I've created an empty table in Redshift like this:
CREATE TABLE my_table (
val_1 INT ,
val_2 INT ,
val_3 FLOAT
)
COMPOUND SORTKEY(val_1, val_2)
;
When I first populate the table (let's say with the results of some query), should the records be inserted in the SORTKEY order, using the ORDER BY in the code below:
INSERT INTO my_table
SELECT val_1, val_2, val_3 FROM other_table
ORDER BY val_1, val_2
Or is there no need to do that; i.e. SORTKEY ordering of inserted records is handled physically by Redshift itself? Thx.
Assuming the same behaviour for INSERT INTO as for loading via the COPY command, there is no need to order the records first. According to the AWS docs all the following constraints be fulfilled in order to add the records to sorted region of the table - in your example you have a COMPOUND SORTKEY of 2 columns:
The table uses a compound sort key with only one sort column.
The sort column is NOT NULL.
The table is 100 percent sorted or empty.
All the new rows are higher in sort order than the existing rows, including rows marked for deletion. In this instance, Amazon Redshift uses the first eight bytes of the sort key to determine sort order.

Reading an append-only list from PostgreSQL

I would like to implement an append-only list in PostgreSQL. Basically, this is trivial: Create a table, and only ever INSERT into that table.
However, I would like to be able to read that list again, in the order it was created. How can I do this? Is a simple SELECT * FROM MyTable enough? If not, what do I sort by?
Rows in a relational database have no inherent sort order. The only way to get a guaranteed sort order is to use an order by.
You can either create an identity column that is incremented on every insert or a timestamp column that records the precise time a row was inserted (or do both).
e.g.
create table append_only
(
id bigint generated always as identity,
... other columns ...
created_at timestamp default clock_timestamp()
);
Then use that column for an order by. By having both, you can use the id column as a tie breaker when sorting by the timestamp in case two rows were inserted at exactly same microsecond.
You could create column with data type SERIAL(similiar to AUTOINCREMENT/SEQUENCE):
CREATE TABLE myTable(id SERIAL, ...)
SELECT * FROM myTable ORDER BY id;

Indexing PostgreSQL JSONB Array Elements

Like the title says, how can I index a JSONB array?
The contents look like...
["some_value", "another_value"]
I can easily access the elements like...
SELECT * FROM table WHERE data->>0 = 'some_value';
I created an index like so...
CREATE INDEX table_data_idx ON table USING gin ((data) jsonb_path_ops);
When I run EXPLAIN, I still see it sequentially scanning...
What am I missing on indexing an array of text elements?
If you want to support that exact query with an index, the index would have to look like this:
CREATE INDEX ON "table" ((data->>0));
If you want to use the index you have, you cannot limit the search to just a specific array element (in your case, the first). You can speed up a search for some_value anywhere in the array:
SELECT * FROM "table"
WHERE data #> '["some_value"]'::jsonb;
I ended up taking a different approach. I am still having problems getting the search to work using a JSONB Type, so I ended up switching my column to a varchar ARRAY
CREATE TABLE table (
data varchar ARRAY NOT NULL
);
CREATE INDEX table_data_idx ON table USING GIN (data);
SELECT * FROM table WHERE data #> '{some_value}';
This works and is using the index.
I think my problem with my JSONB approach is because the element is actually nested much further and being treated as text.
i.e. data->'some_key'->>'array_key'->>0
And everytime I try to search I get all sorts of invalid token errors and other such things.
You may want to create a materialized view that has the primary key (or other unique index of your table) and expands the array field into a text column with the jsonb_array_elements_text function:
CREATE MATERIALIZED VIEW table_mv
AS
SELECT DISTINCT table.id, jsonb_array_elements_text(data->0) AS array_elem FROM table;
You can then create a unique index on this materialized view (primary keys are not supported on materialized views):
CREATE UNIQUE INDEX table_array_idx ON table_mv(id, array_elem);
Then query with a join to the original table on its primary key:
SELECT * FROM table INNER JOIN table_mv ON table.id = table_mv.id WHERE table_mv.array_elem = 'some_value';
This query should use the unique index and then look up the primary key of the original table, both very fast.

Index method for a column used only for ordering

I have a table product_images with a foreign key product_id and integer field order to manualy set order of product's images. Knowing that the table will be used only like this:
SELECT * FROM product_images
WHERE product_id = ?
ORDER BY "order"
-- what is the optimal index method for product_id and order?
Is that enough?:
CREATE INDEX product_images_unique_order
ON "product_images"("product_id", "order");
SQL Fiddle
Yes, that should do it.
PostgreSQL might decide not to use that index, depending on how many rows you have, how many images any given product_id has, and how scattered about the table all of the rows with the same product_id are, and how wide the rows of the product_images table are; plus many other things.
But by having that index you provide PostgreSQL with the opportunity to use it.

Is there a way to quickly duplicate record in T-SQL?

I need to duplicate selected rows with all the fields exactly same except ID ident int which is added automatically by SQL.
What is the best way to duplicate/clone record or records (up to 50)?
Is there any T-SQL functionality in MS SQL 2008 or do I need to select insert in stored procedures ?
The only way to accomplish what you want is by using Insert statements which enumerate every column except the identity column.
You can of course select multiple rows to be duplicated by using a Select statement in your Insert statements. However, I would assume that this will violate your business key (your other unique constraint on the table other than the surrogate key which you have right?) and require some other column to be altered as well.
Insert MyTable( ...
Select ...
From MyTable
Where ....
If it is a pure copy (minus the ID field) then the following will work (replace 'NameOfExistingTable' with the table you want to duplicate the rows from and optionally use the Where clause to limit the data that you wish to duplicate):
SELECT *
INTO #TempImportRowsTable
FROM (
SELECT *
FROM [NameOfExistingTable]
-- WHERE ID = 1
) AS createTable
-- If needed make other alterations to the temp table here
ALTER TABLE #TempImportRowsTable DROP COLUMN Id
INSERT INTO [NameOfExistingTable]
SELECT * FROM #TempImportRowsTable
DROP TABLE #TempImportRowsTable
If you're able to check the duplication condition as rows are inserted, you could put an INSERT trigger on the table. This would allow you to check the columns as they are inserted instead of having to select over the entire table.