Postgres ORDER BY giving inconsistent results across machines (local and RDS) - postgresql

I'm running Postgres 12 on both my local machine and on an AWS RDS instance.
I have a query like
SELECT name FROM my_table WHERE (...) ORDER BY name;
"name" is a varchar(255), with a UNIQUE constraint.
When I run this on RDS, the rows are ordered like
bun.df_baa6_g900_a13500_pd20
bundle_high_basic
but when I execute it locally, the rows are flipped. It's
bundle_high_basic
bun.df_baa6_g900_a13500_pd20
It's a bit of a head scratcher for me, I haven't found any documentation about how to configure ORDER BY outside of the query itself. So these two should be returning the same order...
Does anyone have a clue why this might be happening? In terms of resolution, I don't care what the order is, as long as both machines are consistent.
I have tried amending the query with
ORDER BY lower(name)
but the same inconsistency happens with that.

That is normal. Probably the database are using different collations. Compare the values of the lc_collate parameter on both databases.
But even with the same collation there could be differences if the machines are using different C libraries or different versions of the same C library. Of course you won't be able to figure out the C library version on a hosted database...

Related

Postgres Cluster exceeds temp_file_limit

Recently, we are trying to migrate our database from SQL Server to PostgreSQL. But, we didn't know that by default, tables in Potsgres are ot clustered. Now, when our data has increased so much, we want to CLUSTER our table like so
CLUSTER table USING idx_table;
But seems like my data is a lot (maybe), so that it produces
SQL Error [53400]: ERROR: temporary file size exceeds temp_file_limit
(8663254kB)
Since, its not resulted by a query, which I cannot tune it to perform better, Is there any solution for this?
If for example I am needed to increase my temp_file_limit, is it possible to increase it only for temporary? Since I'm only running this CLUSTER once.
There is some important differences between SQL Server and PostgreSQL.
Sybase SQL Server has been designed from INGRES in the beginning of the eighties when INGRES was using massively the concept of CLUSTERED indexes which means that table is organized as an index. The SQL Engine was designed especially to optimize the use of CLUSTERED index. That is the ways that SQL Server actually works...
When Postgres was designed, the use of CLUSTERED indexes disappeared.
When Postgres switched to the SQL language, an then be renamed to PostgreSQL nothing have changed to use CLUSTERED indexes.
So the use of CLUSTER tables in PostgreSQL is rarely optimal in execution plans. You have to prove individually for each table and for some queries involving those tables, if there is a benefit or not...
Another thing is that CLUSTERing a table in PostgreSQL is not the equivalent of MS SQL Server's CLUSTERED indexes...
More information about this will be find in my paper :
PostgreSQL vs. SQL Server (MSSQL) โ€“ part 3 โ€“ Very Extremely Detailed Comparison
An especially in ยง : "6 โ€“ The lack of Clustered Index (AKA IOT)"

Find unused tables in Amazon RDS (Postgres)

In an effort to do some basic housekeeping on our Amazon RDS (Postgresql) instance, my team hopes to drop unused or rarely used tables from our database. In Redshift, I used the stl_query table to determine which tables were accessed frequently enough to remain.
The problem is, I can't seem to figure out an equivalent strategy for Postgres. I tried checking the log files in the console, but these don't appear to have the correct info.
Aside from searching our code base for references to used tables, is there a good strategy to find unused / infrequently used tables in Postgres? If sufficient logs exist, I am willing to write some sort of parsing script to get the necessary data - I just need to find a good source.
It turns out the statistics I need live in the statistics collector views, specifically pg_stat_user_tables.
This is the query I was able to find infrequently accessed tables:
SELECT
relname,
schemaname
FROM
pg_stat_user_tables
WHERE
(idx_tup_fetch + seq_tup_read) < 5; --access threshold

In DBeaver, how can I run an SQL union query from two different connections..?

We recently migrated a large DB2 database to a new server. It got trimmed a lot in the migration, for instance 10 years of data chopped down to 3, to name a few. But now I find that I need certain data from the old server until after tax season.
How can I run a UNION query in DBeaver that pulls data from two different connections..? What's the proper syntax of the table identifiers in the FROM and JOIN keywords..?
I use DBeaver for my regular SQL work, and I cannot determine how to span a UNION query across two different connections. However, I also use Microsoft Access, and I easily did it there with two Pass-Through queries that are fed to a native Microsoft Access union query.
But how to do it in DBeaver..? I can't understand how to use two connections at the same time.
For instance, here are my connections:
And I need something like this...
SELECT *
FROM ASP7.F_CERTOB.LDHIST
UNION
SELECT *
FROM OLD.VIPDTAB.LDHIST
...but I get the following error, to which I say "No kidding! That's what I want!", lol... =-)
SQL Error [56023]: [SQL0512] Statement references objects in multiple databases.
How can this be done..?
This is not a feature of DBeaver. DBeaver can only access the data that the DB gives it, and this is restricted to a single connection at a time (save for import/export operations). This feature is being considered for development, so keep an eye out for this answer to be outdated sometime in 2019.
You can export data from your OLD database and import it into ASP7 using DBeaver (although vendor tools for this are typically more efficient for this). Then you can do your union as suggested.
Many RDBMS offer a way to logically access foreign databases as if they were local, in which case DBeaver would then be able to access the data from the OLD database (as far as DBeaver is concerned in this situation, all the data is coming from a single connection). In Postgres, for example, one can use a foreign data wrapper to access foreign data.
I'm not familiar with DB2, but a quick Google search suggests that you can set up foreign connections within DB2 using nicknames or three-part-names.
If you check this github issue:
https://github.com/dbeaver/dbeaver/issues/3605
The way to solve this is to create a task and execute it in different connections:
https://github.com/dbeaver/dbeaver/issues/3605#issuecomment-590405154

PostgreSQL database causing loss of datetime-values

I have a PostgreSQL database containing a table with several 'timestamp with timezone' fields.
I have a tool (DBSync) that I want to use to transfer the contents of this table to another server/database.
When I transfer the data to a MSSQL server all datetime values are replaced with '1753-01-01'. When I transfer the data to a PostgreSQL database all datetime values are replaced with '0001-01-01'.
The smallest possible date for those systems.
Now i recreate the source-table (including contents) in a different database on the same PostgreSQL server. The only difference: the sourcetable is in a different database. Same server, same routing. Only ports are different.
User is different but in each database I have the same rights.
How can it be that the database is responsible for an apparant different interpretation of the data? Do PostgreSQL databases have database-specific settings that can cause such behaviour? What database-settings can/should I check?
To be clear, I am not looking for another way to transfer data. I have several available. The thing that I am trying to understand is: how can it be that, if an application reads datetime info from table A in database Y on server X, it gives me the the wrong date while when reading the same table from database Z on server X will give me the data as it should be.
It turns out that the cause is probably the difference in server-version. One is a Postgres 9 (works ok), the other is a Postgres 10 (does not work okay).
They are different instances on the same machine. Somehow I missed that (blush).
With transferring I meant that I am reading records from a sourcedatabase (Postgresql) and inserting them in a targetdatabase (mssql 2017).
This is done through the application, I am not sure what drivers it is using.
I wil work with the people who made the application.
For those wondering: it is this application: https://dbconvert.com/mssql/postgresql/
When a solution is found I will update this answer with the found solution.

TSQL: Possible to reference different database, depending on the environment?

So we have the situation where we have two different databases and stored procedures need to reference the other database to get information. We will typically write the query to look like this
select * from Mercury.dbo.MyTable a join Purchasing.dbo.OtherTable b on a.a = b.a
Which works fine for us in Production and our Development environment, but recently we split development into Dev/QA/ST and we have different versions of the databases to match the environments.
Example
Purchasing, PurchasingQA, PurchasingST
Mercury, MercuryQA, MercuryST
So now we are running into issues when we promote code because the stored procs in QA will reference a database for dev..
So my question is how can I change the database that is being accessed based on an environment variable? I have started using DynSQL for this, but this leads to a lot of more difficult to maintain code.
Perhaps is there a way to create a "DB Alias" that is database wide?
You can consider using synonyms for that purpose. In your case just create a synonym referencing other database table, for example:
CREATE SYNONYM MyTable FOR Mercury.dbo.MyTable
Each database will have its own synonyms, but the rest of scripts and stored procedures will be the same.