Tableau: Difference between 2 sets

Tableau: Difference between 2 sets - tableau-api

I have two different data sources (Users and Shipments). I wish to display all the users who didn't receive any shipment for a specific month which will be selected by user.

Assuming you are working with structured data, you can use Tableau's data munging features. See this Tableau Knowledge Base article. Also, Tableau can support joins across different data stores, a.k.a Data Blending. See this Tableau video for more on data blending.
Also based on your question, I believe you must be relatively new to Tableau. I was in your position about a year ago. Watch Tableau's free On Demand Training videos on its website.

Related

Tableau - Data shown on map based on filter

I am new to Tableau and I've tried Googling but can't find an answer on my issue (perhaps my keywords are wrong).
I have a set of data like this:
I am able to create three individual worksheets to show on the map how many small, medium, big houses are there. It is linked to a second data set which pulls out the highest selling price for the house of the category and location.
I want to learn if it's possible to combine all three individual worksheets into one worksheet so that I can filter based on the size of the house (small, medium, big). If not, can I do the filtering on the dashboard instead?
Thanks in advance!

You need to reshape the data to make it flatter. Check this on pivoting the data in Tableau

tableau extract vs live

I just need a bit more clarity around tableau extract VS live. I have 40 people who will use tableau and a bunch of custom SQL scripts. If we go down the extract path will the custom SQL queries only run once and all instances of tableau will use a single result set or will each instance of tableau run the custom SQL separately and only cache those results locally?

There are some aspects of your configuration that aren't completely clear from your question. Tableau extracts are a useful tool - they essentially are temporary, but persistent, cache of query results. They act similar to a materialized view in many respects.
You will usually want to employ your extract in a central location, often on Tableau Server, so that it is shared by many users. That's typical. With some work, you can make each individual Tableau Desktop user have a copy of the extract (say by distributing packaged workbooks). That makes sense in some environments, say with remote disconnected users, but is not the norm. That use case is similar to sending out data marts to analysts each month with information drawn from a central warehouse.
So the answer to your question is that Tableau provides features that you can can employ as you choose to best serve your particular use case -- either replicated or shared extracts. The trick is then just to learn how extracts work and employ them as desired.
The easiest way to have a shared extract, is to publish it to Tableau Server, either embedded in a workbook or separately as a data source (which is then referenced by workbooks). The easiest way to replicate extracts is to export your workbook as a packaged workbook, after first making an extract.
A Tableau data source is the meta data that references an original source, e.g. CSV, database, etc. A Tableau data source can optionally include an extract that shadows the original source. You can refresh or append to the extract to see new data. If published to Tableau Server, you can have the refreshes happen on schedule.
Storing the extract centrally on Tableau Server is beneficial, especially for data that changes relatively infrequently. You can capture the query results, offload work from the database, reduce network traffic and speed your visualizations.
You can further improve performance by filtering (and even aggregating) extracts to have only the data needed to display your viz. Very useful for large data sources like web server logs to do the aggregation once at extract creation time. Extracts can also just capture the results of long running SQL queries instead of repeating them at visualization time.
If you do make aggregated extracts, just be careful that any further aggregation you do in the visualization makes sense. SUMS of SUMS and MINS of MINs are well defined. Averages of Averages etc are not always meaningful.

If you use the extract, than if will behave like a materialized SQL table, thus anything before the Tableau extract will not influence the result, until being refreshed.

The extract is used when the data need to be processed very fast. In this case, the copy of the source of data is stored in the Tableau memory engine, so the query execution is very fast compared to the live. The only problem with this method is that the data won't automatically update when the source data is updated.
The live is used when handling real-time data. Here each query is accessed from the source data, so the performance won't be as good as the extract.
If you need to work on a static database use extract else the live.

I am feeling from your question that you are worrying about performance issues, which is why you are wondering if your users should use tableau extract or use live connection.
From my opinion for both cases (live vs extract) it all depends on your infrastructure and the size of the table. It makes no sense to make an extract of a huge table that would take hours to download (for example 1 billion rows and 400 columns).
In the case all your users are directly connected on a database (not a tableau server), you may run on different issues. If the tables they are connecting to, are relatively small and your database processes well multiple users that may be OK. But if your database has to run many resource-intensive queries in parallel, on big tables, on a database that is not optimized for many users to access at the same time and located in a different time zone with high latency, that will be a nightmare for you to find a solution. On the worse case scenario you may have to change your data structure and update your infrastructure to allow 40 users to access the data simultaneously.

Improve calculation efficiency in Tableau

I have over 300k records (rows) in my dataset and I have a tab that stratifies these records into different categories and does conditional calculations. The issue with this is that it takes approximately an hour to run this tab. Is there any way that I can improve calculation efficiency in tableau?
Thank you for your help,

Probably the issue is accessing your source data. I have found this problem working directly live data as an sql dabase.
An easy solution is to use extracts. Quoting Tableau Improving Database Query Performance article
Extracts allow you to read the full set of data pointed to by your data connection and store it into an optimized file structure specifically designed for the type of analytic queries that Tableau creates. These extract files can include performance-oriented features such as pre-aggregated data for hierarchies and pre-calculated calculated fields (reducing the amount of work required to render and display the visualization).
Edited
If you are using an extraction and you still have performance issues I suggest to you to massage your source data to be more friendly for tableau, for example, generate pre-calculated fields on ETL process.

Filtering Values from Two Data Sets

I am very new to Tableau and am having trouble finding a way to filter between two data sets. These sets are Tableau Data Extracts so I am unable to create custom SQL to achieve this.
In DataSet1 I have levels of precipitation by date.
In DataSet2 I have sales revenue per date and store location.
I am trying to visualize the sum of sales revenue per store location on only days with precipitation. I thought I would be able to simply create a filtered list of all dates in DataSet1 that saw precipitation then subsequently filter all dates in DataSet2 to = my filtered list.
Any thoughts on how I would go about this? I feel like it should be relatively simple, but being so unfamiliar with the software I am having trouble locating a solution.
Thanks!

Please look into Data Blending for Tableau.It is an interesting feature in tableau to link multiple data sources into one sheet.It can get laggy at times so use it carefully.
Some Tutorial Links:
http://www.tableau.com/learn/tutorials/on-demand/data-blending-0
http://kb.tableau.com/articles/knowledgebase/relate-summarized-data-60
Also the screenshot below shows the option to edit relationships between data sources which is essentially Data Blending

where can noSql be successfully implemented?

I took the time and see the entire Hadi Hariri presentation of CouchDB for .NET Developers that took place in OreDev conference last year.
And I keep asking myself, where should I use such way to store data?
What small, medium and large examples can be took using a noSQL model?
In what application context I would save the data in JSON, and that do not follow a pattern? In what application context the retrieving of such data would be better and faster (along the application time) comparing to the process of getting from a SQL server? Licencing price? Is that the only one?

Let me share our case: we use a NoSQL system of document type to store and search our documents in full text. This requires a full-text indexing. We also do a facet search on the entire data. That is, we produce only "hit" count for a specific search broke down to some categories that we need. You can imagine an electronic shop selling photo cameras, so the facet search here can take place in price ranges. Thus you would be able to say, to which price range what types of cameras belong.
If you think about using a NoSQL system for document search, then small dataset would be order of GB's (let's say up to 10), a medium up to 100GB and large dataset of size up to 1TB. This is based on what I have seen people use Apache SOLR (from their mail-list) and what data volume we have in our company.
There are other types of NoSQL systems and associated use / business cases, where you can utilize them in conjuction with SQL systems or alone. You can have a look on this short PP presentation I made for an introductory talk on NoSQL systems: http://www.slideshare.net/dmitrykan/nosql-apache-solr-and-apache-hadoop

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse