Power BI Allows Only one Filtering Path Between Tables in a Data Model - filtering

I am trying to add relationships for multiple tables but Power Bi is only allowing ONE table to be with "Both" for the "Cross Filter Direction".
If I try to select other tables to be with "Both" cross filter, then an error comes up saying "Power BI Desktop allows only one filtering path between tables in a Data Model". Is there a way to go around that?
I have three data sets ( tables ), that are joined based on two non-unique columns. First dataset (table) has site, item numbers, orders, due dates. Second dataset (table) has site, item numbers, and available quantities. Third dataset has site, item numbers, and released orders to manufacture. The three non-unique columns are sites and item numbers.
Out of these three datasets (tables), I have created two more datasets with one unique columns (sites, and item numbers) to inforce unique relationships between the three datasets.
The problem that I am facing is when I have filtering in the models it doesn't flow back and forth between the datasets. I assume fixing the filtering direction will result in fixing that.
Thanks!

I would not use the Both cross filter direction for that scenario. As you are observing it comes with a lot of restrictions, so I only use it for "Many-to-Many" scenarios - not what you described.
I think you are on the right path with your sites and item number tables. Just be careful to use the copy of the fields in those tables (e.g. item number from item numbers) in your visuals, not the same fields in the 3 "data" tables.

Related

Ignoring space characters when linking tables

I’m experiancing a problem when trying to link to tables in the database expert. The two fields that link the tables have exactly the same information except one table always has an additional space. For example;
Table 1 = Multivitamin/Tablets
Table 2 = Multivitamin//Tablets
‘/‘ are representing spaces
Formulas won’t help (e.g. extractstring etc) as it’s the tables themselves I need to link together
This is preventing me from retrieving the information I need. Any advice on how I can get around this?
There are some ways to come across this:
Consider using a command as datasource instead of tables. When writing the query of the command you can define the join condition yourself.
If you have access to the data source, you could add a calculated field to the tables to contain the normalized field values and then use these for linking in CR.
Alternatively, one could create views in the database, either adding normalized "linking fields" or providing the joined tables results.
If it's only a few rows in CR, you could consider using SQL fields or subreports to retrieve data from Table 2.

Feedback about my database design (multi tenancy)

The idea of the SaaS tool is to have dynamic tables with dynamic custom fields and values of different types, we were thinking to use "force.com/salesforce.com" example but is seems to be too complicated to maintain moving forward, also making some reports to create with a huge abstraction level, so we came up with simple idea but we have to be sure that this is kinda good approach.
This is the architecture we have today (in few steps).
Each tenant has it own separate database on the cluster (Postgres 12).
TABLE table, used to keep all of those tables as reference, this entity has ManyToOne relation to META table and OneToMany relation with DATA table.
META table is used for metadata configuration, has OneToMany relation with FIELDS (which has name of the fields as well as the type of field e.g. TEXT/INTEGER/BOOLEAN/DATETIME etc. and attribute value - as string, only as reference).
DATA table has ManyToOne relation to TABLES and 50 character varying columns with names like: attribute1...50 which are NULL-able.
Example flow today:
When user wants to open a TABLE DATA e.g. "CARS", we load the META table with all the FIELDS (to get fields for this query). User specified that he want to query against: Brand, Class, Year, Price columns.
We are checking by the logic, the reference for Brand, Class, Year and Price in META>FIELDS table, so we know that Brand = attribute2, Class = attribute 5, Year = attribute6 and Price = attribute7.
We parse his request into a query e.g.: SELECT [attr...2,5,6,7] FROM DATA and then show the results to user, if user decide to do some filters on it, based on this data e.g. Year > 2017 AND Class = 'A' we use CAST() functionality of SQL for example SELECT CAST(attribute6 AS int) AND attribute5 FROM DATA WHERE CAST(attribute6 AS int) > 2017 AND attribute5 = 'A';, so then we can actually support most principles of SQL.
However moving forward we are scared a bit:
Manage such a environment for more tenants while we are going to have more tables (e.g. 50 per customer, with roughly 1-5 mil per TABLE (5mil is maximum which we allow, for bigger data we have BigQuery) which is giving us 50-250 mil rows in single table DATA_X) which might affect performance of the queries, especially when we gave possibilities to manage simple WHERE statements (less,equal,null etc.) using some abstraction language e.g. GET CARS [BRAND,CLASS,PRICE...] FILTER [EQ(CLASS,A),MT(YEAR,2017)] developed to be similar to JQL (Jira Query Language).
Transactions lock, as we allow to batch upload CSV into the DATA_X so once they want to load e.g. 1GB of the data, it kinda locks the table for other systems to access the DATA table.
Keeping multiple NULL columns which can affect space a bit (for now we are not that scared as while TABLE creation, customer can decide how many columns he wants, so based on that we are assigning this TABLE to one of hardcoded entities DATA_5, DATA_10, DATA_15, DATA_20, DATA_30, DATA_50, where numbers corresponds to limitations of the attribute columns, and those entities are different, we also support migration option if they decide to switch from 5 to 10 attributes etc.
We are on super early stage, so we can/should make those before we scale, as we knew that this is most likely not the best approach, but we kept it to run the project for small customers which for now is working just fine.
We were thinking also about JSONB objects but that is not the option, as we want to keep it simple for getting the data.
What do you think about this solution (fyi DATA has PRIMARY key out of 2 tables - (ID,TABLEID) and built in column CreatedAt which is used form most of the queries, so there will be maximum 3 indexes)?
If it seem bad, what would you recommend as the alternative to this solution based on the details which I shared (basically schema-less RDBMS)?
IMHO, I anticipate issues when you wanted to join tables and also using cast etc.
We had followed the approach below that will be of help to you
We have a table called as Cars and also have a couple of tables like CarsMeta, CarsExtension columns. The underlying Cars table will have all the common fields for a ll tenant's. Also, we will have the CarsMeta table point out what are the types of columns that you can have for extending the Cars entity. In the CarsExtension table, you will have columns like StringCol1...5, IntCol1....5, LongCol1...10
In this way, you can easily filter for data also like,
If you have a filter on the base table, perform the search, if results are found, match the ids to the CarsExtension table to get the list of exentended rows for this entity
In case the filter is on the extended fields, do a search on the extension table and match with that of the base entity ids.
As we will have the extension table organized like below
id - UniqueId
entityid - uniqueid (points to the primary key of the entity)
StringCol1 - string,
...
IntCol1 - int,
...
In this case, it will be easy to do a join for entity and then get the data along with the extension fields.
In case you are having the table metadata and data being inferred from separate tables, it will be a difficult task to maintain this over long period of time and also huge volume of data.
HTH

Multiple Options to Case When

Is there a way to create several groups in Case When statement?
For example,
CASE [Sales Manager]
WHEN "Manager 1" THEN "Germany"
WHEN "Manager 1" THEN "Russia"
WHEN "Manager 2" THEN "Russia"
END
Such statement will assign Manager 1 only to Germany, while I need to have it for both countries. Any other possible ways to do that ?
One solution is to define a table in your database (or Excel) that maps managers to countries. You just need two columns, one for manager and for country, and a row in the table for each association between a manager and a country.
That way you can easily represent a manager that works with many countries, or a country that has many managers (a many-to-many relationship).
You can then combine that table with your other data using joins or data blending. Realize that when you join data that has a to-many type of relationship that you can in general cause duplicate values to arise in the query results (e.g. the sales quota for a manager can be repeated multiple times, once for each country the manager visits). Unless your filters and work flow eliminate that case, you need to make sure your calculations account for duplication and avoid double counting.
Bottom line -- sometimes it is alot easier to specify information as data than as code.

How to best structure csv data for tableau that has "multiple categories"?

I have a set of 100 “student records”, I want to have checkboxes for each "favorite_food_type" and "favorite_food", whichever is checked would filter a "bar graph" that counts number of reports that contain that specific "favorite_food"type" and "favorite_food" schema could be:
name
favorite_food_type (e.g. vegetable)
favorite_food (e.g. banana)
I would like to in the dashboard be able to select via checkboxes, “Give me all the COUNT OF DISTINCT students with favorite_food of banana, apple, pear“ and filter graphs for all records. My issue is for a single student record, maybe one student likes both banana and apple. How do I best capture that? Should I have:
CASE A: Duplicate Records (this captures the two different “favorite_food”, but now I have to figure out how many students there are (which is one student)
NAME, FAVORITE_FOOD_TYPE,FRUIT
Charlie, Fruit, Apple
Charlie, Fruit, Pear
CASE B: Single Records (this captures the two different “favorite_food”, but is there a way to pick out from delimiters?)
NAME, FAVORITE_FOOD_TYPE,FRUITS
Charlie, Fruit, Apple#Pear
CASE C: Column for Each Fruit (this captures one record per student, but need a loooot of columns for each fruit, many would be false)
NAME, FAVORITE_FOOD_TYPE, APPLE, BANANA, PINEAPPLE, PEAR
Charlie, Fruit, TRUE, FALSE, TRUE, FALSE
I want to do this as easy as possible.
Avoid Case B if at all possible. Repeating information is almost always best handled by repeating rows -- not by cramming multiple values into a single table cell, nor by creating multiple columns such as Favorite_1 and Favorite_2
If you are provided data with multiple values in a field, Tableau does have functions and data connection features that can be used to split a single field into its constituent parts to form multiple fields. That works well with fixed number of different kinds of information -- say splitting a City, State field into separate fields for City and State.
Avoid Case C if at all possible. That cross tab structure makes it hard to analyze the data and make useful visualizations. Each value is treated as a separated field.
If you are provided data in crosstab format, Tableau allows you to pivot the data in the data connection pane to reshape into a form with fewer columns and many rows.
Case A is usually the best approach. You can simplify it further by factoring out repeating information into separated tables -- a process known as normalization. Then you can use a join to recombine the tables and see the repeating information when desired.
A normalized approach to your example would have two tables (or tabs in excel). The first table would have exactly one row per student with 2 columns: name and favorite_food_type. The second table would have a row per student/favorite food combination, with 2 columns: name and favorite_food. Now each student can have as many favorite foods as you like or none at all. Since both columns have a name field, that would be the key used to join (combine) the tables when needed.
Given that table design, you could have 2 data sources in Tableau. The first one just pointed to the student table and could be used to create visualizations that only involved students and favorite_food_types. The second data source would use a (left) join to read from both tables and could be used to look at favorite foods. When working with the second data source, you would have to be careful about reporting information about student names and favorite food types to account for the duplicate information. So use the first data source when possible. Finally, you could put both kinds of visualizations on a dashboard and use filter and highlight actions to make interaction seamless despite the two sources -- getting the best of both worlds.

Efficiently updating cosine similarity scores

My iPhone application is using a SQLite database with the following schema:
items(id, name, ...) -> this table contains 50 records
tags(id, name) -> this table contains 50 records
item_tags(id, item_id, tag_id, user_id)
similarities(id, item1_id, item2_id, score)
The items, tags, item_tags and similarities tables are populated with pre-defined records, hence also the similarities between different items have already been calculated offline (using cosine similarity algorithm based on the items' tags).
Users are able to add additional tags to items and to remove their custom tags later on. Whenever this happens the similarity scores between the items should be updated locally, i.e. without contacting the server application.
My question now is the following:
What is the most efficient way to do so? So far, on startup of the iPhone application, I compute a term-document matrix for all the items and tags (which reflects the tag frequencies for each item) and keep this matrix in memory as long as the application is running. Whenever a tag is added or removed, I use this matrix to update the similarities in the database. However, this is rather inefficient. Do you have any suggestions?
Thanks!
This presentation might help you:
http://www.slideshare.net/jnvms/incremental-itembased-collaborative-filtering-4095306