Google Data Studio: Connect to multiple schemas in multi-tenant postgres DB - postgresql

I have a multi-tenant database in postgres. So, I have one schema per customer and each schema has a fixed set of tables.
When I connect to the DB using Google Data Studio(GDS), I only see the table names without their associated schema.
How do I connect to tables belonging to one or more schemas?
Also, what do I do if my tables have more than 700k rows, as GDS has a limit on number of rows that can be queried right?

You'll have to use the "Custom Query" option instead of the basic table selection if you need anything more complex.
Regarding the row limit. I wasn't aware of the limit, but if that is true I'd suggest using the Custom Query to pre-group your rows in the query into whatever makes sense...days...months...etc to bring the row count down.
Data Studio will likely choke on anywhere near that many rows and make for a horrible user experience. Let Postgres do as much of the heavy lifting as you can.

Answering only: "what do I do if my tables have more than 700k rows, as GDS has a limit on number of rows that can be queried right?"
Not exactly. The limit is on the number of rows returned, not the amount of rows queried. And that matters since Data Studio will almost always push down queries to connectors.
Here's an example: Lets say you have a purchase table in a PostgreSQL db with 1M+ rows where each record is a purchase event. You add this table as a data source in your report and add a bar chart that shows average purchase by customer type. Let's say you have 12 customer types. Data Studio will then push down the GROUP BY clause to the PostgreSQL db. Thus, your result will have only 12 rows of data instead of 1M+. In most chart types, Data Studio will aggregate or page the results thus issuing a query statement that limits the number of rows returned.
You will only run into the limit if you end up creating a scenario where Data Studio cannot issue an aggregation or paging over the query results or if the aggregated results cross the row limit.

Related

Feedback about my database design (multi tenancy)

The idea of the SaaS tool is to have dynamic tables with dynamic custom fields and values of different types, we were thinking to use "force.com/salesforce.com" example but is seems to be too complicated to maintain moving forward, also making some reports to create with a huge abstraction level, so we came up with simple idea but we have to be sure that this is kinda good approach.
This is the architecture we have today (in few steps).
Each tenant has it own separate database on the cluster (Postgres 12).
TABLE table, used to keep all of those tables as reference, this entity has ManyToOne relation to META table and OneToMany relation with DATA table.
META table is used for metadata configuration, has OneToMany relation with FIELDS (which has name of the fields as well as the type of field e.g. TEXT/INTEGER/BOOLEAN/DATETIME etc. and attribute value - as string, only as reference).
DATA table has ManyToOne relation to TABLES and 50 character varying columns with names like: attribute1...50 which are NULL-able.
Example flow today:
When user wants to open a TABLE DATA e.g. "CARS", we load the META table with all the FIELDS (to get fields for this query). User specified that he want to query against: Brand, Class, Year, Price columns.
We are checking by the logic, the reference for Brand, Class, Year and Price in META>FIELDS table, so we know that Brand = attribute2, Class = attribute 5, Year = attribute6 and Price = attribute7.
We parse his request into a query e.g.: SELECT [attr...2,5,6,7] FROM DATA and then show the results to user, if user decide to do some filters on it, based on this data e.g. Year > 2017 AND Class = 'A' we use CAST() functionality of SQL for example SELECT CAST(attribute6 AS int) AND attribute5 FROM DATA WHERE CAST(attribute6 AS int) > 2017 AND attribute5 = 'A';, so then we can actually support most principles of SQL.
However moving forward we are scared a bit:
Manage such a environment for more tenants while we are going to have more tables (e.g. 50 per customer, with roughly 1-5 mil per TABLE (5mil is maximum which we allow, for bigger data we have BigQuery) which is giving us 50-250 mil rows in single table DATA_X) which might affect performance of the queries, especially when we gave possibilities to manage simple WHERE statements (less,equal,null etc.) using some abstraction language e.g. GET CARS [BRAND,CLASS,PRICE...] FILTER [EQ(CLASS,A),MT(YEAR,2017)] developed to be similar to JQL (Jira Query Language).
Transactions lock, as we allow to batch upload CSV into the DATA_X so once they want to load e.g. 1GB of the data, it kinda locks the table for other systems to access the DATA table.
Keeping multiple NULL columns which can affect space a bit (for now we are not that scared as while TABLE creation, customer can decide how many columns he wants, so based on that we are assigning this TABLE to one of hardcoded entities DATA_5, DATA_10, DATA_15, DATA_20, DATA_30, DATA_50, where numbers corresponds to limitations of the attribute columns, and those entities are different, we also support migration option if they decide to switch from 5 to 10 attributes etc.
We are on super early stage, so we can/should make those before we scale, as we knew that this is most likely not the best approach, but we kept it to run the project for small customers which for now is working just fine.
We were thinking also about JSONB objects but that is not the option, as we want to keep it simple for getting the data.
What do you think about this solution (fyi DATA has PRIMARY key out of 2 tables - (ID,TABLEID) and built in column CreatedAt which is used form most of the queries, so there will be maximum 3 indexes)?
If it seem bad, what would you recommend as the alternative to this solution based on the details which I shared (basically schema-less RDBMS)?
IMHO, I anticipate issues when you wanted to join tables and also using cast etc.
We had followed the approach below that will be of help to you
We have a table called as Cars and also have a couple of tables like CarsMeta, CarsExtension columns. The underlying Cars table will have all the common fields for a ll tenant's. Also, we will have the CarsMeta table point out what are the types of columns that you can have for extending the Cars entity. In the CarsExtension table, you will have columns like StringCol1...5, IntCol1....5, LongCol1...10
In this way, you can easily filter for data also like,
If you have a filter on the base table, perform the search, if results are found, match the ids to the CarsExtension table to get the list of exentended rows for this entity
In case the filter is on the extended fields, do a search on the extension table and match with that of the base entity ids.
As we will have the extension table organized like below
id - UniqueId
entityid - uniqueid (points to the primary key of the entity)
StringCol1 - string,
...
IntCol1 - int,
...
In this case, it will be easy to do a join for entity and then get the data along with the extension fields.
In case you are having the table metadata and data being inferred from separate tables, it will be a difficult task to maintain this over long period of time and also huge volume of data.
HTH

IBM DB2 Timetravel logging based on some criteria

I have been searching for the condition, where, lets say when we enable time travel to a certain table in DB2 , but don't want to capture all the updates done, but only the updates that's done by some specific user.
Wanted to know if this is at all possible with the DB2 time travel and how we can achieve it .
It's not possible with DB2 temporal tables.
Alter the temporal table add a user column maintained by system.
db2 for Iseries column shown
EMP_CHANGE_USER VARCHAR(18) GENERATED ALWAYS AS (USER)
The new column will go automatically to the history table of the temporal table. You can report on the history table and have emp_change user.
Note: IRL Don't single out users. You can give management a report that lists out all users and management can filter it down to individuals. Programmers do not single out users for reporting and logging.

How to filter dashboard based on quick filter values selected in Tableau ?

I'm having Dashboard-1 with the data source from SQL Server Table-A having columns
Col1,Col2,Col3
Now, i'm creating a new dashboard-2 with data source as Table-B having columns Col1,Col4,Col5.
But Col1 which is common in both these tables doesn't have common data.
Eg. Col1 from Table-A is having records till 100 and Table-B is having records from 101.Also, the data is not static, its keeps on increasing in Table-B , Table-A is no longer populating but we need the data from it.
Problem1-- How to merge two column as single column for filter in Tableau
Problem2-- in the dashboard i need to show single filter as a union of Col1 from both tables, if user select value <100 then Dashboard-1 will open otherwise Dashboard-2.
Can someone provide me a correct approach.
1) Instead of merging after you have brought the data in, try merging the data using SQL UNION.
2) If that's not possible, do the same after importing both the datasets into Tableau. For an example, try from this official link
3) Try different Joins to see which one works for merging your table columns:
4) If all the above fails, try setting up an Action Filter explained in this link. Essentially you have to use Tiled Containers instead of Floating Containers and set up a action filter using a custom Parameter. This custom Parameter will help display Dashboard 1 when user selects <100 in the filter(for example) and Dashboard 2 when user selects >100(again example)

iPhone Dev - Trying to access every row of a sqlite3 table sequentially

this is my first time using SQL at all, so this might sound basic. I'm making an iPhone app that creates and uses a sqlite3 database (I'm using the libsqlite3.dylib database as well as importing "sqlite3.h"). I've been able to correctly created the database and a table in it, but now I need to know the best way to get stuff back from it.
How would I go about retrieving all the information in the table? It's very important that I be able to access each row in the order that it is in the table. What I want to do (if this helps) is get all the info from the various fields in a single row, put all that into one object, and then store the object in an array, and then do the same for the next row, and the next, etc. At the end, I should have an array with the same number of elements as I have rows in my sql table. Thank you.
My SQL is rusty, but I think you can use SELECT * FROM myTable and then iterate through the results. You can also use a LIMIT/OFFSET(1) structure if you do not want to retrieve all elements at one from your table (for example due to memory concerns).
(1) Note that this can perform unexpectedly bad, depending on your use case. Look here for more info...
How would I go about retrieving all the information in the table? It's
very important that I be able to access each row in the order that it
is in the table.
That is not how SQL works. Rows are not kept in the table in a specific order as far as SQL is concerned. The order of rows returned by a query is determined by the ORDER BY clause in the query, e.g. ORDER BY DateCreated, or ORDER BY Price.
But SQLite has a rowid virtual column that can be used for this purpose. It reflects the sequence in which the rows were inserted. Except that it might change with a VACUUM. If you make it an INTEGER PRIMARY KEY it should stay constant.
order by rowid

Postgres full text search across multiple related tables

This may be a very simplistic question, so apologies in advance, but I am very new to database usage.
I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.
I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.
Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.
You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.
You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.