Combining two data sources with exact same schema in Tableau - tableau-api

I have two data sources containing list of orders with the exact field structure (one is the archive, and another is the active database). I'm accessing them through OData connection in Tableau.
What I want is to combine these two data sources so the Tableau chart will display all order numbers and information (as opposed to just the active one, which I'm doing with a single data source).
The two tables don't overlap (since whatever is archived is by definition not active), so I cannot join or blend with the primary key Order No. (or any key for that matter).
How can I combine these data sources? Does the fact that the connection is OData make any difference?

For relational databases, the solution is to define custom SQL with two back to back select statements (1 for each table) separated by the SQL UNION ALL keywords
I don't know whether OData sources support UNION ALL

Create TDE file for Archive data and add this file to the current data extract using the option "Append data from file" from Data-->Extract;

Related

ELT pipeline for Mongo

I am trying to get my data into Amazon Redshift using Fivetran, but have some questions in general about the ELT/ETL process. My source database is Mongo but I want to perform deep analysis on the data using a 3rd party BI tool like Looker, but they integrate with SQL. I am new to the ELT/ETL process and was wondering would it look like this.
Extract data from Mongo (handled by Fivetran)
Load into Amazon Redshift (handled by Fivetran)
Perform Transformation - This is where my biggest knowledge gap is. I obviously have to convert objects and arrays into compatible SQL types. I can perform a transformation on all objects to extract those to columns and transform all arrays to a table. Is this the right idea? Should I design a MYSQL schema and write all the transformations according to that schema design?
as you state, Fivetran will load your data into Redshift putting individual fields in columns where it can and putting everything else into varchar columns as JSON. So at that point you basically have a Data Lake - all your data in an analytical platform but basically still in source format and available for you to do whatever you want with it.
Initially, if you don't know much about your data and just want to investigate it, you can probably leave it as it is. Redshift has SQL functions that allow you to query the elements of a JSON structure so there is no need to build additional tables and more ETL just to allow you to investigate your data - especially as these tables may get thrown away once you understand your data and decide what you want to do with it.
If you have proper reporting requirements then that is the point where you can start to design a schema that will support these requirements (I'm not sure why you suggested a MYSQL schema as MYSQL is a database vendor?). Traditionally an analytical schema would be designed as a Kimball Dimensional model (facts and dimensions) but the type of schema you decide to design will depend on:
The database platform you are using (in your case, Redshift) and the type of structures it works best with e.g. star schema or "flat" tables
The BI tool you are using and how it expects to have data presented to it
For example (and I'm not saying this is a real world example), if Redshift works ok with star schemas but better with flat tables and Looker has to have a star schema then it probably makes more sense to build star schemas in Redshift as this is a single modelling exercise - rather than model flat tables in Redshift and then have to model star schemas in Looker.
Hope this helps?
It depends on how you need the final stage of your data analysis presented, and what the purpose of your data analysis is. As stated by NickW, assuming you need to integrate your data into a BI tool the schema should be adapted according to the tool's data format requirements.
a mongodb ETL/ELT process might looks like this:
Select Connection: Select the set connection
Collection Name:Choose the collection by using the [database].[collection] format.
If you pulling data from your authentication database, only the [collection] name can be determined. Examples: ea sample.products east .
Extract Method:
All: pull the entire data in the table.
Incremental: pull data by incremental value.
Incremental Attributes: Set the name of the incremental attribute to run by. I.e: UpdateTime .
Incremental Type: Timestamp | Epoch. Choose the type of incremental attribute.
Choose Range:
In Timestamp, choose your date increment range to run by.
In Epoch, choose the value increment range to run by.
If no End Date/Value entered, the default is the last date/value in the table.
The increment will be managed automatically
Include End Value: Should the increment process take the end value or not
Interval Chunks: On what chunks the data will be pulled by. Split the data by minutes, hours, days, months or years.
Filter: Filter the data to pull. The filter format will be a MongoDB Extended JSON.
Limit: Limit the rows to pull.
Auto Mapping: You can choose the set of columns you want to bring, add a new column or leave it as it is.
Converting Entire Key Data As a STRING
In cases the data is not as expected by a target, like key names started with numbers, or flexible and inconsistent object data, You can convert attributes to a STRING format by setting their data types in the mapping section as STRING
Conversion exists for any value under that key.
Arrays and objects will be converted to JSON strings.
Use cases:
Here are few filtering examples:
{"account":{"$oid":"1234567890abcde"}, "datasource": "google", "is_deleted": {"$ne": true}}
date(MODIFY_DATE_START_COLUMN) >=date("2020-08-01")

How to copy Tableau Data Extract logic?

Someone in my org created a Data Extract. There is an issue in one of the worksheets that uses it, and we suspect it's due to a mistake in how the Union was built.
But since it's a Data Extract, I can't see the UI for the data merge. Is there anyway to take a current Data Extract and view the logic that creates it?
Download the extract from the server (I'm assuming you're using server), then open that extract using desktop. You should be able to see the details of it.
Before going too deep into extract details, note that extracts are not intended to be permanent systems of record for data - just an efficient way to work with query results for optimized reporting. So in general, you should always be able to throw away the extract and look at the original source - or recreate the extract on command. But life isn't always perfect so ...
If you use Tableau Desktop to look at your worksheet, and look at the data source icon at the top of the data pane in the left sidebar, do you see an icon for your data source that looks like two databases with one on top of (shadowing) the other? If so, you can at right click on the data source icon and view its properties to see the source database table or file path. You can then even try disabling the extract to view the original source data.
If instead you see a single database icon, you have a "naked" extract where you've discarded the reference to the original source, (unless it is stored in the catalog mentioned below.)
If your organization purchased the Data Management Add-on for Tableau Server (strongly recommended), then if your data source is published to Tableau Server you can trace its history and origin by exploring the Tableau Catalog. That is especially valuable if the extract was built by a Tableau Prep Flow.
If instead, someone built the extract another way, say by writing a custom app using the Tableau Data Extract API, then the answer is to find that program.
One last point, in recent versions of Tableau, extracts are stored in an efficient relational type database file called Hyper. Hyper extracts can either be a single table (say serializing the results of a query joining multiple tables) or a Hyper extract can contain multiple tables (say serializing caching individual tables and deferring the join for later).
That may not be relevant to your question, but could turn out to matter as you reverse engineer how the extract was created.

data integration-, multiple databases, unique incremental SOR_id using talend

I'm trying to integrate multiple databases using talend and in turn have an SOR_id for each table for auditing purposes. is it possible to map between multiple source tables simultaneously to destination table having an SOR_id which is meant to be auto incremented? Would I have incremental values for each source tables rows
I have approached this using another way as shown in the image so that my SOR_id can be accounted for.

(JasperReports) Combine data from different datasources as columns of the same report row

I am evaluating JasperReports (CE) as a reporting solution for one of my clients.
As for now I like it very much and it looks like a pretty solid platform. One thing I cannot find info about, is the possibility of combining results of sub-queries made to different datasources in one report (not as drill-down sub-reports but as different columns of the same row).
As in example: there is some products info in one database (Firebird), but the sales info, actual stock and purchase prices are stored in a different system, which uses different database (SQL Server of Microsoft). In both databases products are represented with the same product unique code. So I need to query the first database to obtain the "master recordset" for fulfilling some report columns, and then query each product for additional info, which is stored in the second database, combining resulting data from both datasources in the same row as different columns of the same report.
Is it possible with JasperReports? If not, I'd appreciate your suggestions on other reporting solutions being able to fulfill my request.
Since your row data is from different DBs, you need to query the required tables in both Dbs, build a BeanDatasource from the resultsets and pass it on to jasper reports.

DB2/400 Query - record format level identifiers for all tables in a library

We have multiple copies of the same library for testing, QA, development etc. consisting of hundreds of tables. Over time these libraries got out of sync and we run into a lot of level check problems. I would like to list all tables with a different Record Level Format Identifier from the corresponding tables in a model library. Is this possible using SQL? If not what other choices do we have?
A quick peek into SYSTABLES didn't show anything, but the QDBRTVFD API has that information in the file definition header. If APIs are not your thing, you can use DSPFD FILE(somelib/*ALL) TYPE(*RCDFMT) OUTPUT(*OUTFILE) FILEATR(*PF *LF) OUTFILE(QTEMP/RCDFMTS) to create a file you CAN use SQL on.