DAX Create Power BI distribution chart and table with grouping processes amount by users amount - group-by

Could someone please help with the following issue that I have?
I have dataset which collect data from next process:
Manager_start started process with unique uuid (job_id not unique, because if status don't change to 'finish_pass' it's not close. After Manager_start finished (close) his job - it's automatically assigned on Manager_finish. If status 'finish_decline' - Manager_finish field stay 'blanc' 
I need prepare visualization (table and bar chart)
which will show amount of processes by manager_finish amount. It should look same like this. (For table it should be grouped by Number of processes and for chart on X axis should be amount of processes and for Y axis should be amount of manager_finish who finished process. 
Also should be able to filter visualizations by next filters: uuid, job_id, status, date_finish_working, manager_name_start, manager_name_finish 
I have dataset in table with such columns:
uuid (uuid of process) identifier
job_id (id of job which need to be done))
status - status of process (everytime begins from 'start' and could finish with 'finish_pass' or 'finish_decline')
date_start_working - date when job start get in work
date_finish_working - date when process finished
manager_id_start - id of manager, who start process
I found same case with solution (https://community.powerbi.com/t5/DAX-Commands-and-Tips/distribution-of-calculated-measure/m-p/1047781), but it's not working to my case, because I need to have ability filtering data by different fields and have diffent data structure
Data sample example available by link (upload it to file storage) 
https://fex.net/s/drxzbeo

Related

Insert job's information to statistic table after job finished DataStage

Currently, I have multiple jobs to load data from source to target (Oracle Connector -> Transformer Stage -> Oracle Connector). I want to get those job's information to a statistic table to track the progress every day.
My thought is after the job has done, it will automatically insert 1 row for each job to my statistic table. For example, after Job_1 (load to Target_1 table) and Job_2 (load to Target_2 table) finished, each job will insert 1 row to my statistic table and it will look like below:
TABLE_NAME DATE_1 DATE_2 TIME_STAMP TOTAL_RECORD
---------- ------ ------- ------------------- ------------
Target_1 041120 2020309 2020-11-04 11:09:00 500
Target_2 041120 2020309 2020-11-04 11:10:00 1000
Is it possible to do with a routine or something else?
Such load stats are really useful and there are several way to achieve this.
The main question is how to get the total record information amd how those are defined.
The is a number of rows getting read from the source and a number written into the target - only if you do not have any filtering in your Transformer (and no rejects) those shoule be the same.
You can get this from
the link information via DSGetLinkInfo
by running a count on your target table
or - my recommendation - by using the DSODB
Check out the Section Monitoring job and job runs by using the Operations Console in the Knowledge Center
or if you want to create your own table Extracting Monitor data from the operations database

Calculate a running total that works with relative date filters

I have a Union table of my various bank accounts to create a personal finance analysis dashboard.
I am trying to make a Running Total to show my total capital available at any given date. Using a Running Total table calculation works, just as much as using a RUNNING_SUM() calculated field. They both work up until I filter the dates. So I am trying to find a way to make the running calculation work without being thrown off by Date Filters (I would like to implement relative dates for visualisation in the dashboard).
My union table has the following relevant data columns:
Order ID: Descending number from 1 for each entry per account.
Date: Date of entry.
Item: Entry name.
Account: Name of bank account.
Amount: +ive for credit or -ive for debit.
Balance: balance after entry value for each given account.
So the table can look like this:
So on 07/05/2019 the Running total should be 229.64.
The running sum formula mentioned above is currently RUNNING_SUM(SUM([Amount])), so if any dates are excluded via filter the running total doesn't add up to the right amount.
A way I can see around the problem could be to get the sum over all accounts of the last balance reading at a given date. The balance is a running total but only if the final entry per time period for all accounts are summed would it work. Would it be possible to make a calculated field that gets the last balance reading for each account at any given date and then sums them?
Or is there a simpler smarter way I am not aware of?
This comes down to an Order of Operations problem. Once you filter the dates the viz doesn't have access to the data anymore.
Your best approach would be to add the running sum to the data source before you bring it into Tableau. Then the running sum isn't a calculated field dependent on the data in the Viz.

How to Handle Rows that Change over Time in Druid

I'm wondering how we could handle data that changes over time in Druid. I realize that Druid is built for streaming data where we wouldn't expect a particular row to have data elements change. However, I'm working on a project where we want to stream transactional data from a logistics management system, but there's a calculation that happens in that system that can change for a particular transaction based on other transactions. What I mean:
-9th of the month - I post transaction A with a date of today (9th) that results in the stock on hand coming to 0 units
-10th of the month - I post transaction B with a date of the 1st of the month, crediting my stock amount by 10 units. At this time (on the 10th of the month) the stock on hand for transaction A recalculates to 10 units. The same would be true for ALL transactions after the 1st of the month
As I understand it, we would re-extract transaction A, resulting in transaction A2.
The stock on hand dimension is incredibly important to our metrics. Specifically, identifying when stockouts occur (when stock on hand = 0). In the above example, if I have two rows for transaction A, I would be mistakenly identifying a stockout with transaction A1, whereas transaction A2 is the source of truth.
Is there any ability to archive a row and replace it with an updated row, or do we need to add logic to our queries that finds the rows with the freshest timestamp per transaction id?
Thanks
I have two thoughts that I hope help you. The key documentation for this is "Updating Existing Data": http://druid.io/docs/latest/ingestion/update-existing-data.html which gives you three options: Lookup Tables, Reindexing, and Delta Ingestion. The last one, Delta Ingestion, is only for adding new rows to old segments, so that's not very useful for you, let's go over the other two.
Reindexing: You can crunch all the numbers that change in your ETL process, identify the segments that would need to be reloaded, and simply have Druid re-index those segments. That will replace the stock-on-hand value for A in your example whenever you want, whenever you do the re-indexing.
Lookups: If you have stock values for multiple products, you can store the product id in the segment and have that be immutable, but lookup the stock-on-hand value in a lookup. So, you would store:
A, 2018-01-01, product-id: 123
And in your lookup, you'd have:
product-id: 123, stock-on-hand: 0
And later, you'd update the lookup and change that to 10. This would update any rows that reference product-id: 123.
I can't be sure but you may be mixing up dimensions and metrics while you're doing this, and you may need to read over that terminology in OLAP descriptions like this: https://en.wikipedia.org/wiki/Online_analytical_processing
Good luck!

Oracle Gather Statistics after Partition Exchange

Database: Oracle 12c
I have a process that selects data from Fact Table, summarizes it and push it to Summary Table.
Summary table is Range Partitioned (Trade Date) and List Partitioned (File Id).
the process picks up data from Fact table (where file_id=<> for all Trade Dates), summarizes it in a temp table and use Partition Exchange to move data from Temp table to one of the SubPartitions in Summary Table (as the process works on a File Id level).
Summary table is completely refreshed everyday (100% data will be exchanged).
Before the data is exchanged at the subpartition level, statistics are gathered and exchanged along with the data.
After the process is completed, we run dbms_gather_table_stats at partition level (in a for loop - for each partition) with granularity set as "approx_global and partition".
Even though we collect stats at the global level, user_tab_statistics for the summary table has "STALE_STATS" = YES for this table, however, partition & Subpartition stats are available.
when we run a query against the summary table (for a date range of 3 years), the query spins for a long time - spiking the CPU to 90%, but never returns any data.
I checked the explain plan on the query, Cardinality is showing as 1.
I read about incremental stats, but it seems increment will work if a few partitions change - it may not be the best option in my case, where data across all the partitions change completely.
I m looking for a strategy to gather statistics on the summary table - don't want to run a full gather stats.
Thanks.

Volume of an Incident Queue at a Point in Time

I have an incident queue, consisting of a record number-string, the open time - datetime, and a close time-datetime. The records go back a year or so. What I am trying to get is a line graph displaying the queue volume as it was at 8PM each day. So if a ticket was opened before 8PM on that day or anytime on a previous day, but not closed as of 8, it should be contained in the population.
I tried the below, but this won't work because it doesn't really take into account multiple days.
If DATEPART('hour',[CloseTimeActual])>18 AND DATEPART('minute',[CloseTimeActual])>=0 AND DATEPART('hour',[OpenTimeActual])<=18 THEN 1
ELSE 0
END
Has anyone dealt with this problem before? I am using Tableau 8.2, cannot use 9 yet due to company license so please only propose 8.2 solutions. Thanks in advance.
For tracking history of state changes, the easiest approach is to reshape your data so each row represents a change in an incident state. So there would be a row representing the creation of each incident, and a row representing each other state change, say assignment, resolution, cancellation etc. You probably want columns to represent an incident number, date of the state change and type of state change.
Then you can write a calculated field that returns +1, -1 or 0 to to express how the state change effects the number of currently open incidents. Then you use a running total to see the total number open at a given time.
You may need to show missing date values or add padding if state changes are rare. For other analytical questions, structuring your data with one record per incident may be more convenient. To avoid duplication, you might want to use database views or custom SQL with UNION ALL clauses to allow both views of the same underlying database tables.
It's always a good idea to be able to fill in the blank for "Each record in my dataset represents exactly one _________"
Tableau 9 has some reshaping capability in the data connection pane, or you can preprocess the data or create a view in the database to reshape it. Alternatively, you can specify a Union in Tableau with some calculated fields (or similarly custom SQL with a UNION ALL clause). Here is a brief illustration:
select open_date as Date,
"OPEN" as Action,
1 as Queue_Change,
<other columns if desired>
from incidents
UNION ALL
select close_date as Date,
"CLOSE" as Action,
-1 as Queue_Change,
<other columns if desired>
from incidents
where close_date is not null
Now you can use a running sum for SUM(Queue_Change) to see the number of open incidents over time. If you have other columns like priority, department, type etc, you can filter and group as usual in Tableau. This data source can be in addition to your previous one. You don't have ta have a single view of the data for every worksheet in your workbook. Sometimes you want a few different connections to the same data at different levels of detail or for perspectives.