I am new to MSTR.
We are working on migrating from Essbase to Microstrategy 10.2.
After migration, we expect business users to be able to create report on top of MSTR cube and play around with the data similar to the way they have been doing using Essbase and Excel.
I need help to design data model for given scenario:
FactTb:
Subcategory Revenue
1 100
2 200
3 300
DimensionTb:
Category Subcategory
A 1
A 2
B 1
B 2
B 3
C 2
C 3
User wants to see revenue by category or subcategory.
FactTb has 3 rows. Assuming size of each row as 10 bytes, size of FactTb is 30 bytes.
If it is joined with DimensionTb there will 7 rows and size will grow (approximately) to 70 bytes.
Is there any way to restrict size of Cube?
Mapping of Category and Subcategory is static and there is no need to maintain a table for it.
Can I create/define DimensionTb out of Cube (Store it in report, create derived element using Subcategory)?
We want to restrict size of cube to maintain it in memory and ensure that report will always hit cube over database.
A cube is just the result of a SQL query, copied in memory for faster access. As you cannot imagine the result of a query split in two, the same is for a cube.
In memory cubes are compressed by MicroStrategy using multiple algorithms (to use the best compression depending on column data types and value distributions), but cubes contains also internal indexes (to speed up data access) created automatically depending on the queries used for the cube.
A VLDB setting can help reducing the size of the cube.
If you check the technote TN32540: Intelligent Cube Population methods in MicroStrategy 9.x, you will see different options, for my experience the last setting (Direct loading of dimensional data and filtered fact data.) is quete helpful in speed up cube loading and reduce the size, but you can also try the other ones (Normalize Intelligent Cube data in the Database).
With this approach the values from Dimension tables will be stored in memory, but separated from the fact, saving space.
Finally to be sure that your users alway use the cube, allow/teach them to create reports and dashboards clicking directly on the cube (or selecting it).
This is the safe way, MicroStrategy offers also a dynamic way to map reports to cubes (when conditions are satisfied), but users can surprise even the most thorough designer.
Related
folks! Apologies if this is a duplicate question and I've done some research on the topic but don't know if I'm heading in the right direction.
I have converted gridded data of population density to a MongoDB collection using a geometry object defining the population density cell as a five node polygon (the fifth node matching the first) and a float value consisting of the population in that geographic region. Even though the database is huge in size, I can quickly retrieve the "records" of the population regions as they are indexed as a 2D Sphere when it intersects a geo-polygon indicating some type of weather event or other geofence polygon.
The issue comes when I try to add all of the boxes up. It takes an exceedingly long amount of time, especially if the polygon is of a significant geographic area. The population data I have are 1km^2 cells. The adding of the data can take several seconds or, in worse case scenario, minutes!
I had a thought of creating a type of quadtree structure in the database by a lower resolution node set as a separate collection and so on and so on. Then when calculating population, I could start with the lowest res set and work my way down the node "tree" by making several database calls until there are no more matches. While I'd increase my database calls significantly, I'd reduce the sheer number of elements that I would need to add up at the end - which is taking the most computational time.
I could try to create these data using bottom-up neighbor finding whilst adding up the four population values that would make up the next lower-resolution node set. This, of course, will explode the database size and will increase the number of queries to the database for a single population request.
I haven't seen too much of this done with databases. I'd like to have it in a database (could also be PostgreSQL) since it gives me the ability to quickly geo-query by point or area. And, I'm returning the result as an API call so the efficiency of time is of the essence!
Any advice or places to research would be greatly appreciated!!!
Im fairly sure what im attempting is not the ideal way to do things due to my lack of knowledge of power BI but here goes:
I have two tables in the form of:
One has the actual power against wind and the other is a reference
I created calculated columns that add a corresponding binned speed to each row (so 1-2, 2-3, 3-4 etc)
I have filters and slicers applied on the page / visual that will keep changing.
What i want is to create a pivot or a grouped table that is changed dynamically based on my filters.
The reason i want this is currently the table ive got has totals that are averaged (because individual row is averaged) but i want a sum of an average by category. If i can have this as a calculated table instead of a visual (picture below) i would likely be able to aggregate this again to get what i want
so on the above table i want to totals to be sum of individual rows. I also want to be able to use these totals to carry out other calculations (simple stuff like total divided by fixed number etc)
Hi I have some insurance data and I am trying to put multiple variables on my map but got stuck. I am using tableau public in my desktop.
To understand what I am plotting, I am interested in if the data is Direct or Agent and if the data is HO3 proudct or BC product. How my data is set up is one column is Direct HO3, another Direct BC, Agent HO3 and another Agent BC. It is broken down by zip with corresponding county.
I tried to use dual axis but with 4 combination first dual layer is Agency (HO3 & BC) and my second dual layer is Direct (HO3 & BC). I need help either putting 4 of these data by using color shelves or hide the second graph as shown. The two charts are the same but I can't get 4 data plotted at the same time. I can't seem to put all 4 columns of data plotted with 1 dual axis chart. All of my data are measures no dimensions except zip code and county column. I do think it has to do with how my raw data is but I am not sure how to modify my raw data so tableau will plot the way I need.
I created hide/un-hide fields to hide data that has 0 HO-3 policy count or 0 BC policy count. Second picture is from my raw data.
Looking at your dataset one thought that will work is you can pivot all 4 columns that you are planning to view on the sheet.
Create a chart with pivoted values and then drop the pivoted field name on to the color which will differentiate your measures and show to user
I have a table that will have about 3 * 10 ^ 12 lines (3 trillion), but with only 3 attributes.
In this table you will have the IDs of 2 individuals and the similarity between them (it is a number between 0 and 1 that I multiplied by 100 and put as a smallint to decrease the space).
It turns out that I need to perform, for a certain individual that I want to do the research, the summarization of these columns and returning how many individuals have up to 10% similarity, 20%, 30%. These values are fixed (every 10) until identical individuals (100%).
However, as you may know, the query will be very slow, so I thought about:
Create a new table to save summarized values
Create a VIEW to save these values.
As individuals are about 1.7 million, the search would not be so time consuming (if indexed, returns quite fast). So, what can I do?
I would like to point out that my population will be almost fixed (after the DB is fully populated, it is expected that almost no increase will be made).
A view won't help, but a materialized view sounds like it would fit the bill, if you can afford a sequential scan of the large table whenever the materialized view gets updated.
It should probably contain a row per user with a count for each percentile range.
Alternatively, you could store the aggregated data in an independent table that is updated by a trigger on the large table whenever something changes there.
I need to store many independent graphs:
Each graph with 100 to 2000 nodes
Each node with 1 to 7 edges plus 2 to 3 extra fields.
No directional
Right now I'm storing them as mongoDB documents with one collection for nodes and one collection for edges.
I have some questions:
Does MongoDB have a "best practices" to store graphs?
Would it be better to store them in another database like Neo4j? (It seems to be really powerful when you have really large graphs)
I would like to be able to version each graph