Postgres JSONB not compatible with (Sqlite) testing environment - postgresql

So I am using Postgres for my production database. I use Alembic for migrations and SQLAlchemy to define models and query and so on. This setup works very well for the most part, but I have run into a problem recently.
One of my tables requires a config column, which will be a json blob of varying contents. As such, I have opted for the Postgres JSONB type.
The issue is that my local tests run with an sqlite in memory database. Sqlite apparently has cannot interpret this column type, raising the following exception up running the tests (and building the database):
sqlalchemy.exc.CompileError: (in table 'applications', column 'config'): Compiler <sqlalchemy.dialects.sqlite.base.SQLiteTypeCompiler object at 0x109e52ef0> can't render element of type JSONB
Has anyone got a suggestion about how to overcome this issue? I was thought about maybe trying to alter the migrations based on whether the program is running tests or in production, but I really don't know where to start in that regard.
Has anyone got any advice?
Thanks in advance,
Eric

For anyone who comes across a similar issue: as mentioned in the comments on the question, Sqlite presents many differences from Postgres. As such, my testing environment does not mirror my production environment well and differences like these force me to consider hacks, which is bad news.
The solution I have chosen is to containerize a Postgres db to be spun up when tests are run as described here. It's a slight hit in terms of speed, but it does allow me to reliably test db behaviour locally and in CI pipelines, so I think it is worth it.
Thanks to all the people who took the time to comment.

Related

Sequence issue while importing DB

I am trying to import DB from DEV to Staging. I am not trying the pg_dump method as I don't know that and also when I tried to click on pg_utlity it ask for a password and when I type my password and enter its just going nothing happening. SO I tried the manual method of creating the DB,Schema.
My issue is I already have datas in DEV and I am importing tables via import method (right click on table>>import), the sequence is going wrong.
That is in new Db table when we insert it is staring from 1. it will cause mapping issues in application. I tried to change the current value of sequence , but it is bit difficult as it is taking too much time to do for all tables. Is there any way to solve this problem ?
Thanks
Rose
Reason: The Sequence 'issue' that in the new database, is probably because the GUI tool used to copy Data over, is simply copying the Schema over to the new Database (which resets the Sequence state). This causes the new Database Sequence to start counting from 1 (instead of what it currently is, in the Old database).
Solutions:
Using PG_DUMP isn't that difficult. If you have access to a command-line, and feel comfortable using it, many StackOverflow answers (like this one) should get such a Database copying job done within a few minutes (depending on the DB size) and with all the Sequences taken care of automatically. I would really recommend reading / understanding and using this method since its proven, and best advised in such a scenario.
Paid Application: Migration Toolkit: Paid + Easy + GUI. Not something that I would recommend for everyone, but it probably may be an easy way out, if you're only used to the GUI and are not comfortable using the (free) PG_DUMP method given above.

What are ways to include sizable Postgres table imports in Flyway migrations?

We have a series of modifications to a Postgres database, which can generally be written all in SQL. So it seems Flyway would be a great fit to automate these.
However, they also include imports from files to tables, such as
COPY mytable FROM '${PWD}/mydata.sql';
And secondarily, we'd like not to rely on Postgres' use of file paths like this, which apparently must reside on the server. It should be possible to run any migration from a remote client -- as in Amazon's RDS documentation (last section).
Are there good approaches to handling this kind of scenario already in Flyway? Or alternate approaches to avoid this issue altogether?
Currently, it looks like it'd work to implement the whole migration in Java and use the Postgres driver's CopyManager to import the data. However, that means most of our migrations have to be done in Java, which seems much clumsier. (As far as I can tell, hybrid Java+SQL migrations are not expected?)
Am new to looking at Flyway so thought I'd ask what other alternatives might exist with Flyway, since I'd expect it's pretty common to import a table during a migration.
Starting with Flyway 3.1, you can use COPY FROM STDIN statements within your migration files to accomplish this. The SQL execution engine will automatically use PostgreSQL's CopyManager to transfer the data.

In Slick is there a way to declar Tables without using a Specific JDBC Driver

In my persistence code all through out the tables, etc. I have the following import
import scala.slick.driver.PostgresDriver.simple._
This is nice because it works, but this is a problem because all my code is bound to Postgres exclusively. If I want my production to do Postgres and my test to be HSQLDB, for example, I can't. I'd like to declare which DataSource/Driver when I'm running my persistence manager (which will do the create) instead of at the table declaration. What am I missing?
This is certainly possible using the cake pattern. My team uses H2 in development and MySQL in production.
See MultiDBExample and MultiDBCakeExample in https://github.com/slick/slick-examples
As far as I was able to find, I think this is a definite restriction in Slick. So much so that I dropped my test environments database and switched it to the same type as my production one. In retrospect, this is what I should have been doing in the first place, but I do understand this is sometimes easier said than done.

Accessing non-public schema in PostgreSQL with Pentaho

Let me start by saying, what I know about Pentaho wouldn't fill up a single paragraph. I'm more knowledgeable about PostgreSQL. I'm working with some contractors that are building a set of monthly reports in Pentaho (v. 4.5) for my company. Some of the data needs to go through a ETL process and get rolled up for reporting purposes. From a dba(ish) point of view, I would like to move these tables into a separate PostgreSQL schema.
I know that Pentaho is often times used with MySQL (which doesn't have schemas) and I'm concerned this might cause problems. I've done some "googlin'" and I don't turn up a lot of hits on the topic, but I did find a closed bug from a few years ago - thus implying that the functionality should be supported.
before I do this, I would like to see if anyone knows of a reason this will fail or be a bad idea. (or if you've done it an it works great, please let me know that, too).
Final notes: I'm using PostgreSQL 9.1.5, and I don't have access to a Pentaho instance to even test this myself. And I'm hoping the good folks in the Stackoverflow community will share their expertise and save me from having to install one and the hours of playing/testing to get an idea of this is a bad idea.
EDIT:
I sort of knew this question was a bit vague, but I was hoping that some one would read it and share any experience they have. So, Let me spell it out more clearly and ask more explicit questions.
I have not done anything. I don't know Pentaho. I don't want to learn Pentaho (not that there is anything wrong with Pentaho... It's just not where my interests are right now). My company hired contractors (I did not hire them). They have experience with Pentaho, but with MySQL. They don't really know anything about PostgreSQL. There are some important difference between PostgreSQL and MySQL. Including the fact that PostgreSQL supports schemas (whereas MySQL uses separate database... similar in concept be behave differently in some ways). Some ORMs (and tools) don't really like this... for example, the Django framework still doesn't really fully support schemas in Postgresql (I know this because I use Python and Django often and my life is much better when I keep things in the "public" schema). Because of my experience with Django and PostgreSQL schemas, I'm a bit leery of moving this data to a new schema.
I do understand that where ever the tables are, they will need permissions to be able to access the data.
My explicit questions:
Do you use Pentaho to access a PostgreSQL database to access tables in schemas other than "public" (the default).
If so, does it just work (no problems)?
If you had problems, would you please be willing to share with me (and the Stackoverflow community) any online resources that helped you? Or would you be willing to detail what you remember here?
Do you know of anything that just won't work correctly? For example, an open bug in Pentaho related to this topic.
Again, it's not your standard kind of question. I'm hoping that someone out there has experience and is willing to share it here and save me from having to spend time setting up a new Pentaho instance and trying to learn Pentaho well enough to test it, etc.
Thanks.
Two paths you can take:
1) What previous post said ("Pentaho steps (table inputs, outputs, etc.) usually allow you to specify a database schema.")
2) In database connection, advanced tab, "The preferred schema name".
If you're working with different schemas, you can create one database connection per schema. With this approach you can leave schema field in input/output steps empty.
We use MS SQL server and I can tell you that Pentaho does struggle with the idea of a schema. Many of their apps allow you to select a schema but Pentaho, like you said, is built to use something like mySQL.
Make you pentaho database user work like it would be working in mySQL.
We made the database user default to dbo then we structured our tables like dbo.dimDimension,
dbo.factFactTable etc. Basically, only use dbo for Pentaho purposes. (Or whatever schema you want to default to.)
I use PDI and PgSQL extensively every day with a bunch of different schemas. It works fine. The only trouble you might run into is Pg's troublesome practice of forcing unquoted identifiers to lower instead of upper case. I soon realized everything was easier when I set the Advanced connection property to "Quote all in database".
Yes, you have to quote everything when you type SQL if PDI doesn't do it for you, but it works quite well. Haven't experimented with forcing all identifiers to lower case, but I expect that would work as well.
And yes, use the "Preferred schema nanme" as well, but be aware that some steps use that option and others don't. You can't, for example, expect it to add schema names to SQL you type into a Table Input step.
The only other issues you might run into are the limits of Pg's JDBC driver. It's not as good as SQL Server's or DB2's, but the only thing I've every had trouble with was sending error rows from a Table Output step to another step when the Table Output step was in batch mode.
Have fun learning PDI. It makes a great complement to your DBA skills.
Brian
Pentaho steps (table inputs, outputs, etc.) usually allow you to specify a database schema.
I did a quick test using PDI and our 8.4 Postgres instance and was able to explore, read from and write to tables in different schemas.
So, I think this is a reasonable direction. Hope this helps.

Data Warehousing Postgres

We're considering using SSIS to maintain a PostgreSql data warehouse. I've used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postgres. I’m using the evaluation version of the OLEDB PGNP data provider (http://www.postgresql.org/about/news.1004).
I wanted to start with something simple like UPSERT on the fact table (10k-15k rows are updated/inserted daily), but this is proving very difficult (not to mention I’ll want to use surrogate keys in the future).
I’ve attempted (Link) and (http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx) which are effectively the same (except I don’t really understand the union all at the end when I’m trying to upsert) But I run into the same problem with parameters when doing the update using a OLEDb command – which I tried to overcome using (http://technet.microsoft.com/en-us/library/ms141773.aspx) but that just doesn’t seem to work, I get a validation error –
The external columns for complent.... are out of sync with the datasource columns... external column “Param_2” needs to be removed from the external columns.
(this error is repeated for the first two parameters as well – never came across this using the sql connection as it supports named parameters)
Has anyone come across this?
AND:
The fact that this simple task is apparently so difficult to do in SSIS suggests I’m using the wrong tool for the job - is there a better (and still flexible) way of doing this? Or would another ETL package be better for use between two Postgres database? -Other options include any listed on (http://en.wikipedia.org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks). I could just go and write a load of SQL to do this for me, but I wanted a neat and easily maintainable solution.
I have used the Slowly Changing Dimension wizard for this with good success. It may give you what you are looking for especially with the Wizard
http://msdn.microsoft.com/en-us/library/ms141715.aspx
The External Columns Out Of Sync: SSIS is Case Sensitive - I encountered this issue multiple times and it makes me want to pull my hair out.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
SCD is way too slow for what I want. I need to use set based sql.
It turned out that a lot of my problems were with bugs in the provider.
I opened a forum topic (http://www.pgoledb.com/forum/viewtopic.php?f=4&t=49) and had a useful discussion with the moderator/support/developer person.
Also Postgres doesn't let you do cross db querys, so I solved the problem this way:
Data Source from Production DB to a temp Archive DB table
Run set based query between temp table and archive table
Truncate temp table
Note that the temp table is not atchally a temp table, but a copy of the archive table schema to temporarily stored data in.
Took a while, but I got there in the end.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
What enterprise ETL solution would you suggest?