Tableau Extract API with multiple tables in a database - postgresql

I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?

You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.

Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.

Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.

Related

Statistics of all/many tables in FileMaker

I'm writing a kind of summary page for my FileMaker solution.
For this, I have define a "statistics" table, which uses formula fields with ExecuteSQL to gather info from most tables, such as number of records, recently changed records, etc.
This strangely takes a long time - around 10 seconds when I have a total of about 20k records in about 10 tables. The same SQL on any database system shouldn't take more than some fractions of a second.
What could the reason be, what can I do about it and where can I start debugging to figure out what's causing all this time?
The actual code is, like this:
SQLAusführen ( "SELECT COUNT(*) FROM " & _Stats::Table ; "" ; "" )
SQLAusführen ( "SELECT SUM(\"some_field_name\") FROM " & _Stats::Table ; "" ; "" )
Where "_Stats" is my statistics table, and it has a string field "Table" where I store the name of the other tables.
So each row in this _Stats table should have the stats for the table named in the "Table" field.
Update: I'm not using FileMaker server, this is a standalone client application.
We can definitely talk about why it may be slow. Usually this has mostly to do with the size and complexity of your schema. That is "usually", as you have found.
Can you instead use the DDR ( database design report ) instead? Much will depend on what you are actually doing with this data. Tools like FMPerception also will give you many of the stats you are looking for. Again, depends on what you are doing with it.
Also, can you post your actual calculation? Is the statistic table using unstored calculations? Is the statistics table related to any of the other tables? These are a couple things that will affect how ExecuteSQL performs.
One thing to keep in mind, whether ExecuteSQL, a Perform Find, or relationship, it's all the same basic query under-the-hood. So if it would be slow doing it one way, it's going to likely be slow with any other directly related approach.
Taking these one at a time:
All records count.
Placing an unstored calc in the target table allows you to get the count of the records through the relationship, without triggering a transfer of all records to the client. You can get the value from the first record in the relationship. Super light way to get that info vs using Count which requires FileMaker to touch every record on the other side.
Sum of Records Matching a Value.
using a field on the _Stats table with a relationship to the target table will reduce how much work FileMaker has to do to give you an answer.
Then having a Summary field in the target table so sum the records may prove to be more efficient than using an aggregate function. The summary field will also only sum the records that match the relationship. ( just don't show that field on any of your layouts if you don't need it )
ExecuteSQL is fastest when it can just rely on a simple index lookup. Once you get outside of that, it's primarily about testing to find the sweet-spot. Typically, I will use ExecuteSQL for retrieving either a JSON object from a user table, or verifying a single field value. Once you get into sorting and aggregate functions, you step outside of the optimizations of the function.
Also note, if you have an open record ( that means you as the current user ), FileMaker Server doesn't know what data you have on the client side, and so it sends ALL of the records. That's why I asked if you were using unstored calcs with ExecuteSQL. It can seem slow when you can't control when the calculations fire. Often I will put the updating of that data into a scheduled script.

How to union two tables in Qlik Sense

I am trying to join/concatenate 2 tables, because it would be more comfortable to work because they have the "year-month" column in common. As you can see below:
I just have 1 table uploaded in Qlik Sense, the 1st one. And I'm trying to upload the 2nd one to just one table. There would be no point in having to move the "year-month" period in the columns 2 times.
I'm using only expressions and the tables have the relation year-month.
Any ideas?
With Only Expressions
It might be tricky because what you're trying to do is typically handled prior to going to the front-end. If you're only trying to easily work with the data and selecting it, add the two separate tables as a master item (whatever chart you would like) then add that chart to your sheet. That will allow you to call one chart data set.
Ultimately, it doesn't solve your problem, but it's the best you're going to have if you want to use only expressions.
Combining the Tables
I think is your intent: use the data load script. The data load script is a simple SQL-like script that executes when the "load data" button is pressed. Here you can make columns, do math, join tables, split tables, do whatever you want.
Information about the script loader:
https://help.qlik.com/en-US/sense/February2020/Subsystems/Hub/Content/Sense_Hub/Scripting/introduction-data-modeling.htm
It seems like you're trying to do is an outer join. Qlik data load script syntax reserved word for outer joins is simply join --It's an outer join by default.
At its core, you would simply have something like this:
LOAD * from table1.csv;
join LOAD a, d from table2.csv;

Case-when or if-then to control table creation in Redshift

I have a handful of data sources that I'd like to apply the same analyses to and eventually load into a larger table database (uniformtable). Different sources contain different columns, and sometimes sources involve crosswalk files that I need to join. I'd like to have one query that converts all sources' data into uniformtable formatting, based on a unique key for each source. Something along the lines of this:
case when source.sourceid = 1 then
create uniformtable as
select column1a as uniforma, column1b as uniformb, sourceid from source
else
when source.sourceid = 2 then
create uniformtable as
select column2a as uniforma, column2b as uniformb, sourceid from source
end;
I've tried using if-then and case-when to accomplish this, but I get syntax errors pointing to the very start of my query. Does Redshift allow you to use if logic for this kind of control?
No, this logic is not permitted.
CASE statements are only valid within a SELECT statement.
You would need to perform this logic external to Amazon Redshift, and then just send the final SQL to create the table.

Remove header (column names) from query result

I am using a Java based program and I am writing a simple select query inside that program to retrieve data from the PostgreSQL database. The data come with the header which is an error for the rest of my codes.
How do I get rid of all column headings in an SQL query? I just want to
print out the raw data without any headings.
I am using Building Controls Virtual Test Bed (BCVTB) to connect my database to EnergyPlus. This BCVTB has a database actor that you can write a query in it and receive data and send it to your other simulation program. I decided to use PostgreSQL. however when I write Select * From mydb, it brings data with the column names (header). I just want raw data without header. what should I do?
PostgreSQL does not send table headings, not like a CSV file. The protocol (as used via JDBC) sends the rows. The driver does request a description of the rows that includes column names, but it is not part of the result set rows like the "header first" convention for CSV.
Whatever is happening must be a consequence of the BCVTB tools you are using, and I suggest pursuing it on that side of things.

Updating the text of a large number of stored procedures

The question pretty much sums it up. I've got to replace text in a large number for store procedures. Its not so many that doing it manually is impossible, but enough that I'm asking the question. I also prefer automation as it reduces the change of user error when we make the change in production.
I can Identify them like this:
select OBJECT_DEFINITION(object_id), *
from sys.procedures
where OBJECT_DEFINITION(object_id) like '%''MyExampleLiteral''%'
order by name
Is there any way to mass update them all to change 'MyExampleLiteral' to 'MyOtherExampleLiteral'?
I'd even settle for a way to open all the stored procs. Just Finding these store procs in a larger list will take some time.
I thought about generating alter statements using the above select statements, but then I lose line breaks.
Thanks in advance,
This is a Microsoft SQL Server.
There are different tools to use depending on the database in question. For example, Microsoft SQL Server Data Tools integrates with Visual Studio, and allows you to do these types of operations fairly easily. The database is stored in your solution as scripts, which you can then search and replace any keyword you wish. I'm assuming there would be similar tools available for other platforms.
You could do this with dynamic sql. Query the system tables to get all the SPs containing your "MyExampleLiteral":
SELECT [object_id] FROM sys.objects o
WHERE type_desc = 'SQL_STORED_PROCEDURE'
AND is_ms_shipped = 0
AND OBJECT_DEFINITION(o.[object_id]) LIKE '%<search string>%'
Then, write a while loop to go through those object_ids. In the while loop, get the OBJECT_DEFINITION() into a string and replace the "MyExampleLiteral", then replace CREATE PROCEDURE with ALTER PROCEDURE and execute the string using sp_executesql.
Doing something this crazy, make sure you backup the database first.