How to skip or modify an index in pgloader? - postgresql

I have a MySQL database with a FULLTEXT index that I wish to port to Postgres. When I create a Postgres database using pgloader, the index in Postgres becomes this:
"idx_33441_ibtsearchidx" gin (to_tsvector('simple'::regconfig, keywords))
Now, the simple configuration is not what I want; I, for this application, need english. I can manually enter an ALTER INDEX statement in psql after the migration, but I would like to fully automate the pgloader process (which worked beautifully in every other case!)
But how do I configure pgloader to do this? I seems like there are three possibilities:
Just put an ALTER INDEX statement into the pgloader script's AFTER LOAD section. But the problem is, I won't know the index name. Also I think this approach would be inefficient since an index was made and then a new one would be made after that.
Tell pgloader NOT to automatically make the fulltext index in Postgres. I don't know how to do this. Can it be done? I know how to exclude tables but not indexes. Here I can do the ALTER INDEX in the AFTER LOAD section no problem, because I can choose my own index name.
Specify exactly the full text index configuration I want in the pgloader script. I was unable to find an option for doing this in the pgloader reference. Is it possible?

Related

What happens to existing data with psql dbname < pg_dump_file [duplicate]

This question already has an answer here:
will pg_restore overwrite the existing tables?
(1 answer)
Closed 9 months ago.
I have an database on aws' rds and I use a pg_dump from a local version of the database, then psql dbname > pg_dump_file with proper arguments for remote upload to populate the database.
I'd like to know what is expected to happen if that rds db already contains data. More specifically:
Data present in the local dump, but absent in rds
Data present on rds, but absent in the local data
Data present in both but that have been modified
My current understanding:
New data will be added and be present in both after upload
Data in rds should be unaffected?
The data from the pg_dump will be present in both (assuming the same pk, but different fields otherwise)
Is that about correct? I've been reading this, but it's a little thin on how the restore is actually performed, so I'm having a harder time figuring that out. Thanks.
EDIT: following #wildplasser comment, by looking at the pg_dump file it appears that the following happens:
CREATE TABLE [....]
ALTER TABLE [setting table owner]
ALTER SEQUENCE [....]
For each table in the db. Then, again one table at a time:
COPY [tablename] (list of cols) FROM stdin;
[data to be copied]
Finally, more ALTER statements to set contraints, foreign keys etc.
So I guess the ultimate answer is "it depends". One could I suppose remove the CREATE TABLE [...], ALTER TABLE, ALTER SEQUENCE statements if those are already created as they should. I am not positive yet what happens if one tries CREATE TABLE with an existing table (error thrown perhaps?).
Then I guess the COPY statements would overwrite whatever already exists. Or perhaps throw an error. I'll have to test that. I'll write up an answer once I figure it out.
So the answer is a bit dull. Turns out that even if one removes the initial statements before the copy, if the table as an primary key (thus uniqueness constrains) then it won't work:
ERROR: duplicate key value violates unique constraint
So one gets shutdown pretty quickly there. One would have I guess to rewrite the dump as a list of UPDATE statements instead, but then I guess might as well write a script to do so. Unsure if pg_dump is all that useful in that case.

Is `CLUSTER` applied by `pg_dump`?

If CLUSTER is set on a table, then is it applied by pg_dump?
Specifically, the following:
Is it used to order the rows in the dump? If not, is there a way to do this?
Is it set on the table when using pg_restore? If not, is there a way to do this?
The dump will contain the statement
ALTER TABLE mytable CLUSTER ON anindex;
Restoring the dump will execute that statement. As the documentation explains,
This form selects the default index for future CLUSTER operations. It does not actually re-cluster the table.

how to copy derby table

I am using Eclipse, Java and a Derby database. I want to experiment with changing values that rewrite one of the tables in the db. Before starting the change I would like to copy the particular table (not in code) so that I can restore the original data if necessary. Sof ar googling and searching this site hasnt produced an answer. In Eclipse there is an option to export the db but it calls it a connection so I am not usre what would happen.
If you're not sure about how to connect to the database and issue sql statements, you will need to learn about JDBC. This is a good place to start.
If you're asking about the SQL, it's pretty straight forward. You can create a table based on a select statement.
e.g.
create table table2 as select * from table1 with no data;
Derby is a little strange in this area. You must specify the with no data, and the created table will be empty. You can then issue an insert that will populate the new table if you wish.
insert into table2 select * from table1;
The new table will not have indexes. You will need to create them if you want them. It might retain the primary key. You should check that if you're testing against it. If it doesn't retain the primary key, you should create the primary key before inserting data into the table.
In Eclipse there is an option to export the db but it calls it a connection so I am not sure what would happen.
If what Eclipse does isn't clear for you, you can just as well zip your entire database directory (content of DERBY_HOME env. variable) into an archive. The database must not be running while you make the backup.

Postgres Indexing?

I am a newbie in postgres. I have a column named host (string varchar2) in a table which has around 20 million rows. How do I use indexing to optimize my search to find particular host. Also, this column will be updated daily do I need to write trigger indexing at particular interval? If yes, how do I do that? (For Records I am using Ruby and Rails 3)
Assuming you're doing exact matches, you should just be able to create the index and leave it:
CREATE INDEX host_index ON table_name (host)
The query optimizer should just use that automatically.
You may wish to specify other options such as the collation to use.
See the PostgreSQL docs for CREATE INDEX for more information.
I'd suggest using BRIN Index since its introduction from PostgreSQL 9.5 rather than the conventional btree index.
For text search, it is recommended that you use GIN or GiST index types.
https://www.postgresql.org/docs/9.5/static/textsearch-indexes.html
Another possibility is that if you were only performing exact matching in the host column, i.e., no inequality comparisons (>, <) and partial matching (like, wildcard) involved, you may consider converting host to a hash integer to speed up the search significantly.

What is the command for Index optimization and update statistics for Oracle 10g and 11g?

I am Loading large no of rows into a table from a csv data file . For every 10000 records I want to update the indexs on the table for optimization (update statistics ). Any body tell me what is the command i can use? Also what is SQL Server "UPDATE STATISTICS" equivalent in Oracle.is Update statistics means index optimization or gatehring statistics. I am using Oracle 10g and 11g. Thanks in advance.
Index optimization is a tricky question. You can COALESCE an index to eliminate adjacent empty blocks, and you can REBUILD an index to completely trash and recreate it. In my opinion, what you may wish to do for the period of your data load, is make the indexes UNUSABLE, then when you're done, REBUILD them.
ALTER INDEX my_table_idx01 DISABLE;
-- run loader process
ALTER INDEX my_table_idx01 REBUILD;
You only want to gather statistics once when you're done, and that's done with a call to DBMS_STATS, like so:
EXEC DBMS_STATS.GATHER_TABLE_STATS ('my_schema', 'my_table');
I would recommend taking a different approach. I would drop the index(es), load the data and then recreate the index. After enabling it Oracle will build a good index on the data you just loaded. Two things are accomplished here, the records will load faster and the index will be rebuilt with a properly balanced tree. (Note: Be careful here, if the table is a really big table, you may need to declare a temporary tablespace for it to work in.)
drop index my_index;
-- uber awesome loading process
create index my_index on my_table(my_col1, my_col2);