Tableau Queries with JOINS and check for NULL are failing in ClickHouse - tableau-api

I am running Tableau connected to ClickHouse via ODBC driver. At first mostly any report request was failing. I have configured this tdc file https://github.com/yandex/clickhouse-odbc/blob/clickhouse-tbc/clickhouse.tdc and its actually started to work, however now some of the query requests with JOINS that contain check for NULL in ON are failing because of using IS NULL instead of isNull(id)
JOIN users ON ((users.user_id = t0.user_id) OR ((users.user_id IS NULL) AND (t0.user_id IS NULL)))
This is the correct way that works:
JOIN users ON ((users.user_id = t0.user_id) OR ((isNull(users.user_id) = 1) AND (isNull(t0.user_id) = 1 IS NULL)))
How to make tablau driver to send the right requerst?

Here are a few suggestions:
This post on the Tableau Community looks like it has similar symptoms as you describe. The suggested resolution is to wrap all fields as such IfNull([Dimension], "") thereby reducing the need, apparently, to have Clickhouse do the check of nulls.
The TDC file from Github looks pretty complete, but they might not have taken joins into consideration. The GitHub commit states that the tdc is "untested." I would message the creator of that TDC and see if they've done any work around joins and if they have any suggestions.
Here is a list of possible ODBC Customizations that can be added to or removed from your TDC file. The combination of which may take some experimentation, but they're well worth researching as a possible solution.
Create an extract before performing complex analysis. If you're able to connect initially, then it should be possible to bring all the data from Clickhouse into an extract.
Custom SQL would probably alleviate any join syntax issue because the query and any joins are purely written by you. After making the initial connection to ClickHouse, instead of choosing a table, select "Custom ODBC" and write a query that will return the joined tables of your choosing.
Finally, the Tableau Ideas Forum is a place to ask for and/or vote on upcoming connectors. I can see there is already an idea in place for ClickHouse. Feel free to vote it up.

If you can make sure not to have any NULL values in the data, you can also use this proxy that I wrote for this exact problem.
https://github.com/kfzteile24/clickhouse-proxy
It kinda worked, for most cases, but it's not bullet-proof.

Related

DB2 System Tables Logs?

I'm using DB2 LUW. I'm working on looking at how often data is being changed or added into our database, and I was curious if there was a system table that I might be able to find this information?
It depends a bit on what you mean. You will for example find the #commit and #rollback in sysibmadm.snapdb, #rows_written per table can be found in sysibmadm.snaptab. If you take snapshots from those on a regular basis you get an idea on how often data is updated. Was that what you had in mind, or is it something else you are looking for?

SQL Developer returns query results on one computer but not on another

I can run a query on views in SQL Developer 3.1.07 and it returns the results I expect. My co-worker, who is in Mexico using the same user, can connect to the same database, sees the same views, runs the same query and gets no results, even from a simple "select * from VIEWNAME" query. The column headers display, but no data. If he selects a view from the connections window and selects the DATA tab no data displays. This user does not have access to any tables on this specific database.
I'm not certain he is running the same version of Developer, but it's not far off. I have checked as many settings in SQL Developer that I think could be the issue, but see no significant difference in his settings from mine.
Connecting to another database allows him to access data in both tables and views
Any thoughts on what we're missing?
I know I'm a few years late, but check if the underlying view doesn't filter on something that is based on localisation! I just had the issue and it turned out to be a statment like this that was causing issues:
SELECT *
FROM sometable
WHERE language = userenv('LANG')
Copy the JDBC folder from your oracle home and copy it over to your c-workers machine. we had the same issue and replacing the JDBC folder worked.
Faced the same which got resolved when I checked the 'skip NLS settings' box. My query was returning zero results earlier but when I ran the same query again, I could see the table rows.
Since your co-worker is in a different country, most probably the NLS settings (related to the language) are the culprit here.
I was facing the same issue, turned out that the update to the database from my sqldevelolper was not commited to the main database, that's why, I was getting results on my sqldeveloper for that query, but from aws it was returning empty results. When I chatted with DBA, he could find stale data. After I committed the data from my sqldeveloper, the db was actually updated.

Accessing non-public schema in PostgreSQL with Pentaho

Let me start by saying, what I know about Pentaho wouldn't fill up a single paragraph. I'm more knowledgeable about PostgreSQL. I'm working with some contractors that are building a set of monthly reports in Pentaho (v. 4.5) for my company. Some of the data needs to go through a ETL process and get rolled up for reporting purposes. From a dba(ish) point of view, I would like to move these tables into a separate PostgreSQL schema.
I know that Pentaho is often times used with MySQL (which doesn't have schemas) and I'm concerned this might cause problems. I've done some "googlin'" and I don't turn up a lot of hits on the topic, but I did find a closed bug from a few years ago - thus implying that the functionality should be supported.
before I do this, I would like to see if anyone knows of a reason this will fail or be a bad idea. (or if you've done it an it works great, please let me know that, too).
Final notes: I'm using PostgreSQL 9.1.5, and I don't have access to a Pentaho instance to even test this myself. And I'm hoping the good folks in the Stackoverflow community will share their expertise and save me from having to install one and the hours of playing/testing to get an idea of this is a bad idea.
EDIT:
I sort of knew this question was a bit vague, but I was hoping that some one would read it and share any experience they have. So, Let me spell it out more clearly and ask more explicit questions.
I have not done anything. I don't know Pentaho. I don't want to learn Pentaho (not that there is anything wrong with Pentaho... It's just not where my interests are right now). My company hired contractors (I did not hire them). They have experience with Pentaho, but with MySQL. They don't really know anything about PostgreSQL. There are some important difference between PostgreSQL and MySQL. Including the fact that PostgreSQL supports schemas (whereas MySQL uses separate database... similar in concept be behave differently in some ways). Some ORMs (and tools) don't really like this... for example, the Django framework still doesn't really fully support schemas in Postgresql (I know this because I use Python and Django often and my life is much better when I keep things in the "public" schema). Because of my experience with Django and PostgreSQL schemas, I'm a bit leery of moving this data to a new schema.
I do understand that where ever the tables are, they will need permissions to be able to access the data.
My explicit questions:
Do you use Pentaho to access a PostgreSQL database to access tables in schemas other than "public" (the default).
If so, does it just work (no problems)?
If you had problems, would you please be willing to share with me (and the Stackoverflow community) any online resources that helped you? Or would you be willing to detail what you remember here?
Do you know of anything that just won't work correctly? For example, an open bug in Pentaho related to this topic.
Again, it's not your standard kind of question. I'm hoping that someone out there has experience and is willing to share it here and save me from having to spend time setting up a new Pentaho instance and trying to learn Pentaho well enough to test it, etc.
Thanks.
Two paths you can take:
1) What previous post said ("Pentaho steps (table inputs, outputs, etc.) usually allow you to specify a database schema.")
2) In database connection, advanced tab, "The preferred schema name".
If you're working with different schemas, you can create one database connection per schema. With this approach you can leave schema field in input/output steps empty.
We use MS SQL server and I can tell you that Pentaho does struggle with the idea of a schema. Many of their apps allow you to select a schema but Pentaho, like you said, is built to use something like mySQL.
Make you pentaho database user work like it would be working in mySQL.
We made the database user default to dbo then we structured our tables like dbo.dimDimension,
dbo.factFactTable etc. Basically, only use dbo for Pentaho purposes. (Or whatever schema you want to default to.)
I use PDI and PgSQL extensively every day with a bunch of different schemas. It works fine. The only trouble you might run into is Pg's troublesome practice of forcing unquoted identifiers to lower instead of upper case. I soon realized everything was easier when I set the Advanced connection property to "Quote all in database".
Yes, you have to quote everything when you type SQL if PDI doesn't do it for you, but it works quite well. Haven't experimented with forcing all identifiers to lower case, but I expect that would work as well.
And yes, use the "Preferred schema nanme" as well, but be aware that some steps use that option and others don't. You can't, for example, expect it to add schema names to SQL you type into a Table Input step.
The only other issues you might run into are the limits of Pg's JDBC driver. It's not as good as SQL Server's or DB2's, but the only thing I've every had trouble with was sending error rows from a Table Output step to another step when the Table Output step was in batch mode.
Have fun learning PDI. It makes a great complement to your DBA skills.
Brian
Pentaho steps (table inputs, outputs, etc.) usually allow you to specify a database schema.
I did a quick test using PDI and our 8.4 Postgres instance and was able to explore, read from and write to tables in different schemas.
So, I think this is a reasonable direction. Hope this helps.

Data Warehousing Postgres

We're considering using SSIS to maintain a PostgreSql data warehouse. I've used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postgres. I’m using the evaluation version of the OLEDB PGNP data provider (http://www.postgresql.org/about/news.1004).
I wanted to start with something simple like UPSERT on the fact table (10k-15k rows are updated/inserted daily), but this is proving very difficult (not to mention I’ll want to use surrogate keys in the future).
I’ve attempted (Link) and (http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx) which are effectively the same (except I don’t really understand the union all at the end when I’m trying to upsert) But I run into the same problem with parameters when doing the update using a OLEDb command – which I tried to overcome using (http://technet.microsoft.com/en-us/library/ms141773.aspx) but that just doesn’t seem to work, I get a validation error –
The external columns for complent.... are out of sync with the datasource columns... external column “Param_2” needs to be removed from the external columns.
(this error is repeated for the first two parameters as well – never came across this using the sql connection as it supports named parameters)
Has anyone come across this?
AND:
The fact that this simple task is apparently so difficult to do in SSIS suggests I’m using the wrong tool for the job - is there a better (and still flexible) way of doing this? Or would another ETL package be better for use between two Postgres database? -Other options include any listed on (http://en.wikipedia.org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks). I could just go and write a load of SQL to do this for me, but I wanted a neat and easily maintainable solution.
I have used the Slowly Changing Dimension wizard for this with good success. It may give you what you are looking for especially with the Wizard
http://msdn.microsoft.com/en-us/library/ms141715.aspx
The External Columns Out Of Sync: SSIS is Case Sensitive - I encountered this issue multiple times and it makes me want to pull my hair out.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
SCD is way too slow for what I want. I need to use set based sql.
It turned out that a lot of my problems were with bugs in the provider.
I opened a forum topic (http://www.pgoledb.com/forum/viewtopic.php?f=4&t=49) and had a useful discussion with the moderator/support/developer person.
Also Postgres doesn't let you do cross db querys, so I solved the problem this way:
Data Source from Production DB to a temp Archive DB table
Run set based query between temp table and archive table
Truncate temp table
Note that the temp table is not atchally a temp table, but a copy of the archive table schema to temporarily stored data in.
Took a while, but I got there in the end.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
What enterprise ETL solution would you suggest?

Jira using enterprise architecture by OfBiz

The 'open for business project' is an enterprise framework.
It so happens Jira uses this, and I was pretty shocked at how much work is involved to pull data for a particular entity (say a issue/bug in Jira's case).
Imagine getting a list of all the issues, it has to first get all the columns (or properties) to display for the table column, then pull in the values for each. For an enterprise solution this sounds like a sub-optimal solution (but I understand how it adds flexibility).
You can read how its used in Jira practically: http://confluence.atlassian.com/display/JIRA/Database+Schema
main site: http://ofbiz.apache.org/docs/entity.html
I'm just confused as to how to list all issues. Meaning, what would the sql queries look like?
Its one thing to pull a single issue, but to get a list you have to do allot of work to get the values. I don't think it can be done with a singl query using joins now can it?
(Disclaimer: I work for Atlassian, but I'm not on the JIRA team)
OFBiz EE is just an abstraction layer for moving between database tables and fancy maps called GenericValues. It has no influence over the database schema itself. Your real issue here seems to be that JIRA's database schema is complicated.
The reason it's complicated is because it has to support a data model where an issue is an arbitrary collection of arbitrary fields, at some point in an arbitrary workflow. The fields themselves can be defined by third-party plugins. It's very hard to produce a friendly-looking RDBMS schema to fit this kind of dynamic data model, and JIRA tries as best it can.
You can get information directly out of the database if you want, the database schema is documented in the link above, or you can go up a layer or twelve of abstraction and talk through one of JIRAs many APIs.
A good place to ask questions about getting data out of JIRA is the forums on http://forums.atlassian.com/
The entity engine used in jira is a database abstraction layer ( with a very rich and easy to use API ) that connects your application with one or more datasources. But the databases are still relational, so you can use SQL if you want to. About the issue info you want to pull I'd say it wouldn't be very easy only with joins. I'd recommend you use the scripting language of the RDBMS ( i.e. PL/SQL, pgPL/SQL ).
SELECT * FROM jiraissue;