I'm trying to make dynamic partitions in Postgres.
What am I trying to achieve:
I will have tableA that will have a lot of data and will have a column with some_id which will be heavily used in where statement.
So, my plan is to have partition by list with some_id value.
How to achieve dynamic partitioning, So when new id is inserted partition is dynamically created?
There is no way to do that in PostgreSQL currently.
The best possible way to do it currently is through code.
Since trigger cannot change table definition the best approach would be to do execute a statement after through a backend after insert.
Related
I would like to backfill a table to all dates in HDB. but the table has like 100 columns. What's the fastest way to backfill with the existing table?
I tried to get the schema from the current table and use the schema to backfill but doesn't work.
this is what I tried:
oldTable:0#newTable;
addtable[dbdir;`table;oldTable]
but this doesn't work. Any good way?
Does the table exist within the latest date partition of the HDB?
If so .Q.chk will add tables to partitions in which they are missing.
https://code.kx.com/q/ref/dotq/#qchk-fill-hdb
And with regards to addtable, what specific error are you getting when trying the above?
I'm trying to run the same query over multiple tables in my Postgres database, that all have the same schema.
This question: Select from multiple tables without a join?
shows that this is possible, however they are hard-coding the set of tables.
I have another query that returns the five specific tables I would like my main query to run on. How can I go about using the result of this with the UNION approach?
In short, I want my query to see the five specific tables (determined by the outcome of another query) as one large table when it runs the query.
I understand that in many cases similar to my scenario you'd simply just want to merge the tables. I can not do this.
One way of doing this that may satisfy your constraints is using table inheritance. In short, you will need to create a parent table with the same schema, and for each child you want to query you must ALTER that_table INHERIT parent_table. Any queries against the parent table will query all of the child tables. If you need to query different tables in different circumstances, I think the best way would be to add a column named type or some such, and query only certain values of that table.
Docs for Redshift say:
ALTER TABLE locks the table for reads and writes until the operation completes.
My question is:
Say I have a table with 500 million rows and I want to add a column. This sounds like a heavy operation that could lock the table for a long time - yes? Or is it actually a quick operation since Redshift is a columnar db? Or it depends if column is nullable / has default value?
I find that adding (and dropping) columns is a very fast operation even on tables with many billions of rows, regardless of whether there is a default value or it's just NULL.
As you suggest, I believe this is a feature of the it being a columnar database so the rest of the table is undisturbed. It simply creates empty (or nearly empty) column blocks for the new column on each node.
I added an integer column with a default to a table of around 65M rows in Redshift recently and it took about a second to process. This was on a dw2.large (SSD type) single node cluster.
Just remember you can only add a column to the end (right) of the table, you have to use temporary tables etc if you want to insert a column somewhere in the middle.
Personally I have seen rebuilding the table works best.
I do it in following ways
Create a new table N_OLD_TABLE table
Define the datatype/compression encoding in the new table
Insert data into N_OLD(old_columns) select(old_columns) from old_table Rename OLD_Table to OLD_TABLE_BKP
Rename N_OLD_TABLE to OLD_TABLE
This is a much faster process. Doesn't block any table and you always have a backup of old table incase anything goes wrong
My current query looks something like this:
SELECT SUBSTR(name,1,1), COUNT(*) FROM files GROUP BY SUBSTR(name,1,1)
But it's taking a pretty long time just to do counts on a table that's already indexed by the name column. I saw from this question that some engines might not use indexes correctly for the SUBSTR function, and in fact, sqlite will not use indexes for SUBSTR(string,1,1).
Is there any other approach that would utilize the index and net me some faster queries?
One strategy that is consistent with your access pattern is to add a new indexed column "first_letter" to your table. Use a trigger on to set the value on insert and update. Then your query is a simple group by first_letter.
Another strategy is to create a shadow table which contains an aggregation of the mother table. This isn't easy because it is your job as developer to keep the shadow table consistent with the mother table. Every delete, update or insert in table files needs to be accompanied by a change in the shadow table.
Databases like Oracle have support for materialized views to achieve this automatically but sqlite doesn't.
If you're creating a temporary table within a stored procedure and want to add an index or two on it, to improve the performance of any additional statements made against it, what is the best approach? Sybase says this:
"the table must contain data when the index is created. If you create the temporary table and create the index on an empty table, Adaptive Server does not create column statistics such as histograms and densities. If you insert data rows after creating the index, the optimizer has incomplete statistics."
but recently a colleague mentioned that if I create the temp table and indices in a different stored procedure to the one which actually uses the temporary table, then Adaptive Server optimiser will be able to make use of them.
On the whole, I'm not a big fan of wrapper procedures that add little value, so I've not actually got around to testing this, but I thought I'd put the question out there, to see if anyone had any other approaches or advice?
A few thoughts:
If your temporary table is so big that you have to index it, then is there a better way to solve the problem?
You can force it to use the index (if you are sure that the index is the correct way to access the table) by giving an optimiser hint, of the form:
SELECT *
FROM #table (index idIndex)
WHERE id = #id
If you are interested in performance tips in general, I've answered a couple of other questions about that at some length here:
Favourite performance tuning tricks
How do you optimize tables for specific queries?
What's the problem with adding the indexes after you put data into the temp table?
One thing you need to be mindful of is the visibility of the index to other instances of the procedure that might be running at the same time.
I like to add a guid to these kinds of temp tables (and to the indexes), to make sure there is never a conflict. The other benefit of this approach is that you could simply make the temp table a real table.
Also, make sure that you will need to query the data in these temp tables more than once during the running of the stored procedure, otherwise the cost of index creation will outweigh the benefit to the select.
In Sybase if you create a temp table and then use it in one proc the plan for the select is built using an estimate of 100 rows in the table. (The plan is built when the procedure starts before the tables are populated.) This can result in the temp table being table scanned since it is only "100 rows". Calling a another proc causes Sybase to build the plan for the select with the actual number of rows, this allows the optimizer to pick a better index to use. I have seen significant improvedments using this approach but test on your database as sometimes there is no difference.