I have a basic model with a fact table, and 2 dimensions (one of them is Date dimension).
Now, a new column with a date has been added to the fact table… Therefore I have created a second ‘Dim Date’ and connected to it:
I have the next doubts:
Can I have any problem in my .pbix or cube if I use 2 dim dates?
Shall I mark this new ‘dim date’ also as ‘Mark as date table’? can I have 2 tables marked as date table?
This new 'Dim Date' shall be used only as a filter in the pbix, I dont plan on using any time intelligence on it...
It depends:
The analysis services tabular engine that power bi runs on supports multiple connections between tables. I would generally recommend using this with the USERELATIONSHIP() function and then your measures will give context to the report.
However, I have found there are situations where using USERELATIONSHIP() in many measures can introduce unnecessary complexity in your model. You can end up with far too many measures and it can get confusing when you use two measures that are using two different relationships in the same visual.
In short: There is not anything inherently wrong with duplicating a dimension but for data storage optimization and model cleanliness I would be sure USERELATIONSHIP() with multiple relationships between fact and dimension will NOT work before duplicating the dimension.
Related
I am using RdeHat Decision Maker 7.1 (Drools) to create a rule for assigning a case to a department. The rule itself is quite simple, however it requires quite a lot of parameters (~12) like the agent type, working area, case type, customer seniority and more. The result "action" is the department to which the case is assigned.
I tried to place the parameters in a decision table , but the table quickly bloated to over 15,000 rows and will probably get even larger then that. I did, however, notices that in many cases the different between two rows is 1 or two parameters (e.g. same row with the only different is agent type "Local" vs. "Regional") resulting in different assignment.
I am thinking of replacing the table with something else, like a tree structure, so I can group similar rows under the same node and then navigate over the tree to make the decision. To do this I plan to prioritize the parameters and give parameters with higher priority a higher place in the tree.
Does anyone has experience with such a problem ? I looked at decision trees but they focus more on ML and probabilities, so I'm not sure this is what I need.
Is there any other method to deal with bloated tables that become unmanageable ? I cannot go to our customer and ask them to maintain a 15,000 rows excel. They'll shoot me there and then.
Thanks
Alon.
Working on a little side project, I have now the opportunity to design my very own API. Event if it is not a business endeavor, it's the occasion for me to learn more about REST, Resources, Collections and URIs.
My service, records data points organized in time-series and will soon provide an API to easily query ranges of data points from specific series. Data points are immutable and as such should be very good candidates for caching. Time-series can be updated only during a limited time window, after which they are archived and readable only (making them also "cachable").
I have been looking into the APIs of some companies that provide the same kind of services, and I found the following two patterns:
Define the series in the path and the range in the query:
/series/:id?from=2017-01-26&to=2017-01-27
This is pretty much what most services out there are using. I understand it as
the series being the resources/collections that are then sliced to a specific range. This seems to be very easy to use from a consumer point of view, but from a data point of view, the dates in the query are part of some kind of organization or hierarchy and should in this case be part of the path.
Define the series and coordinates in the path:
/series/:x/:y/:z
I didn't find examples of this for time-series, but it is the kind of structure used for tile based map services. Which, to me, means that each combination of x, y and z is a different collection, that might, in some cases contain the same resources or not. It also maps directly to some hierarchy, /series/:x contains all the series with a specific value of x and any value of y and z.
I really like the idea of the method 2. and I started with something like:
/series/:id (all data points from a specific series)
/series/:id/:year (all the data points from a specific series and year)
/series/:id/:year/:month
/series/:id/:year/:month/:day
...
Which works pretty well for querying predefined ranges such as "all the data points from 2016" or "all the data points from January 2016". Issues arise when trying to query arbitrary ranges like "all the data points from January 2016 to March 2016".
My first trial was to simply add the start and end to the path:
/series/:id/:year (all the data points from a specific year)
/series/:id/:fromyear/:toyear (all the data points between fromyear and toyear)
But:
It becomes very long, very quick. Example: /series/:id/:fromyear/:frommonth/:fromday/:toyear/:tomonth/:today and potentially very cumbersome depending of the chosen structure /series/:id/:fromyear/:toyear/:frommonth/:tomonth/:fromday/:today
It doesn't make any sense from a hierarchy or structure point of view. In /series/1/2014/2016, 2016 is not a subset of 2014 and this collection is actually going to return data points from 2014, 2015 and 2016.
It is tricky to handle on the server side. Is /series/1/2016/01/02 supposed to return all the data points for the January the 2nd or for the whole January to February range ?
After noticing the way that Github references specific lines or ranges of lines in their fragment, I played with the idea of defining ranges as being different collections, such as:
/series/:id/:year/:month (same than before)
/series/:id/:year/:frommonth-:tomonth (to get a specific range)
/series/:id/:year/-:tomonth (to get everything from the beginning of the year to tomonth)
/series/:id/:year/:frommonth- (to get everything from frommonth to the end of the year)
Now, the questions:
Does my solution break any REST or Semantic URL rules/notions/ideas?
Does it improve caching in anyway compared to using ranges in the query?
Does it hurt usability for consumers?
Is it unnatural or going against unwritten rules of some Frontend frameworks?
I'm working in Tableau to help my school district visualize discipline data. I want to be able to disaggregate and filter by quite a few different measures (at least 13).
In the past, if I wanted to be able to disaggregate by a number of measures, I would make a parameter with a list of possible outputs, display each output as the name of a measure, then create a calculated field that returned the value from a given measure based on that parameter. This works fine for disaggregating.
However, filtering based on these values presents a challenge. The problem is that I'm not filtering based on any given measure, I'm filtering on a calculated field that returns the value in that measure. If my parameter is set to "Day" for instance, and I filter to Tuesday, but then switch to "Race", everything vanishes, because now my calculated field is returning race. What I want to create is a dropdown menu that lets you select from a number of different measures to filter by.
Below is a link to a packaged workbook that can help illustrate the problem that I'm dealing with.
I feel like something like this should be possible in Tableau, but there's some little trick that I'm missing. When I contacted their support team, their solutions were both only viable due to the limited number of measures I was using in the dummy data. The support team felt that this was possible as well, but they didn't know how.
https://public.tableau.com/profile/publish/DynamicFiltersUsingParameters/Sheet1#!/publish-confirm
You could create an Filter Action on the Tableau dashboard which carries over the 'Day' filter to give a smaller subset of data to work with for the next filter.
I have a SQL SERVER 2008R2 Standard Edition. The Cube has one measure called "AUM". Basically this measure is only additive across One Dimension Portfolio.
Across Time I need to pick LastChild, Across Security I need to pick Max and Across Portfolio I need to pick SUM.
How should I create the measure ? what should be the Aggregation property to achieve all 3 types of calculations.
currently we have written SCOPE statement for Security and Time to overwrite Default SUM behavior. this works great but as the members in security and Time Dimension increases the SSRS reporting query gets slow down a lot.
I am currently testing creating new persisted measures with changing the property of aggregations and combinations of some additional create member statements to see if I can avoid scope statements.
Any kind of help be great. Thanks
It's a nice problem. some thoughts
There is a problem on the order you evalute your tuple if the aggregation is not associative. I'd take care with scope (when you're evaluating a tuple that does not fall inside). Check this presentation from Chris Webb (nice SSAS guru)
The order of you aggregation is important, LastChild(Max(Sum( tuple) ) ) is not Max( LasChild (Sum (tuple) ) ). I'd go for a calculated member if performance is a problem :
First calculating the LastChild with data in your time dimension. Here you can use any aggregation method with a nonempty. Once you got using another measure to get properly the max.
P.S. : I think in SSAS you can define special aggregation methods (somewhere there is an use case for that).
hope it helps
What are some possible designs to deal with frequently changing data forms?
I have a basic CRUD web application where the main data entry form changes yearly. So each record should be tied to a specific version of the form. This requirement is kind of new, so the existing application was not built with this in mind.
I'm looking for different ways of handling this, hoping to avoid future technical debt. Here are some options I've come up with:
Create a new object, UI and set of tables for each version. This is obviously the most naive approach.
Keep adding all the fields to the same object and DB tables, but show/hide them based on the form version. This will become a mess after a few changes.
Build form definitions, then dynamically build the UI and store the data as some dictionary like format (e.g. JSON/XML or maybe an document oriented database) I think this is going to be too complex for the scope of this app, especially for the UI.
What other possibilities are there? Does anyone have experience doing this? I'm looking for some design patterns to help deal with the complexity.
First, I will speak to your solutions above and then I will give my answer.
Creating a new table for each
version is going to require new
programming every year since you will
not be able to dynamically join to
the new table and include the new
columns easily. That seems pretty obvious and really makes this a bad choice.
The issues you mentioned with adding
the columns to the same form are
correct. Also, whatever database you
are using has a max on how many
columns it can handle and how many
bytes it can have in a row. That could become another concern.
The third option I think is the
closest to what you want. I would
not store the new column data in a
JSON/XML unless it is for duplication
to increase speed. I think this is
your best option
The only option you didn't mention
was storing all of the data in 1
database field and using XML to
parse. This option would make it
tough to query and write reports
against.
If I had to do this:
The first table would have the
columns ID (seeded), Name,
InputType, CreateDate,
ExpirationDate, and CssClass. I
would call it tbInputs.
The second table would have the have
5 columns, ID, Input_ID (with FK to
tbInputs.ID), Entry_ID (with FK to
the main/original table) value, and
CreateDate. The FK to the
main/original table would allow you
to find what items were attached to
what form entry. I would call this
table tbInputValues.
If you don't
plan on having that base table then
I would use a simply table that tracks the creation date, creator ID,
and the form_id.
Once you have those you will just need to create a dynamic form that pulls back all of the inputs that are currently active and display them. I would put all of the dynamic controls inside of some kind of container like a <div> since it will allow you to loop through them without knowing the name of every element. Then insert into tbInputValues the ID of the input and its value.
Create a form to add or remove an
input. This would mean you would
not have much if any maintenance
work to do each year.
I think this solution may not seem like the most eloquent but if executed correctly I do think it is your most flexible solution that requires the least amount of technical debt.
I think the third approach (XML) is the most flexible. A simple XML structure is generated very fast and can be easily versioned and validated against an XSD.
You'd have a table holding the XML in one column and the year/version this xml applies to.
Generating UI code based on the schema is basically a bad idea. If you do not require extensive validation, you can opt for a simple editable table.
If you need a custom form every year, I'd look at it as kind of a job guarantee :-) It's important to make the versioning mechanism and extension transparent and explicit though.
For this particular app, we decided to deal with the problem as if there was one form that continuously grows. Due to the nature of the form this seemed more natural than more explicit separation. We will have a mapping of year->field for parts of the application that do need to know which data is for which year.
For the UI, we will be creating a new page for each year's form. Dynamic form creation is far too complex in this situation.