Insert job's information to statistic table after job finished DataStage - datastage

Currently, I have multiple jobs to load data from source to target (Oracle Connector -> Transformer Stage -> Oracle Connector). I want to get those job's information to a statistic table to track the progress every day.
My thought is after the job has done, it will automatically insert 1 row for each job to my statistic table. For example, after Job_1 (load to Target_1 table) and Job_2 (load to Target_2 table) finished, each job will insert 1 row to my statistic table and it will look like below:
TABLE_NAME DATE_1 DATE_2 TIME_STAMP TOTAL_RECORD
---------- ------ ------- ------------------- ------------
Target_1 041120 2020309 2020-11-04 11:09:00 500
Target_2 041120 2020309 2020-11-04 11:10:00 1000
Is it possible to do with a routine or something else?

Such load stats are really useful and there are several way to achieve this.
The main question is how to get the total record information amd how those are defined.
The is a number of rows getting read from the source and a number written into the target - only if you do not have any filtering in your Transformer (and no rejects) those shoule be the same.
You can get this from
the link information via DSGetLinkInfo
by running a count on your target table
or - my recommendation - by using the DSODB
Check out the Section Monitoring job and job runs by using the Operations Console in the Knowledge Center
or if you want to create your own table Extracting Monitor data from the operations database

Related

DAX Create Power BI distribution chart and table with grouping processes amount by users amount

Could someone please help with the following issue that I have?
I have dataset which collect data from next process:
Manager_start started process with unique uuid (job_id not unique, because if status don't change to 'finish_pass' it's not close. After Manager_start finished (close) his job - it's automatically assigned on Manager_finish. If status 'finish_decline' - Manager_finish field stay 'blanc' 
I need prepare visualization (table and bar chart)
which will show amount of processes by manager_finish amount. It should look same like this. (For table it should be grouped by Number of processes and for chart on X axis should be amount of processes and for Y axis should be amount of manager_finish who finished process. 
Also should be able to filter visualizations by next filters: uuid, job_id, status, date_finish_working, manager_name_start, manager_name_finish 
I have dataset in table with such columns:
uuid (uuid of process) identifier
job_id (id of job which need to be done))
status - status of process (everytime begins from 'start' and could finish with 'finish_pass' or 'finish_decline')
date_start_working - date when job start get in work
date_finish_working - date when process finished
manager_id_start - id of manager, who start process
I found same case with solution (https://community.powerbi.com/t5/DAX-Commands-and-Tips/distribution-of-calculated-measure/m-p/1047781), but it's not working to my case, because I need to have ability filtering data by different fields and have diffent data structure
Data sample example available by link (upload it to file storage) 
https://fex.net/s/drxzbeo

Talend - Insert records to seperate tabel based on file name

I am very new to talend open studio and I am using Talend 7.3.
We have customers accross multiple zone and we have seperate table per zone.
We have multiple zone specific files named based on zone. I want to write a generic job which will process files to correct table(each zone has a separate table).
Can someone help me with this?
Thank you.
You have use multiple tFileinput and multiple targets in the same code. Your design should be something as below
tFileinput1 If you need any ---- Table1
tFileList ----- tFileinput2 transformations ---- Table2
tfileinput3 you can apply it ---- Table3
Provide the proper file pattern names in each tFileinput components and it will process into required target alone.

Database design to send notifications to all users

I'm searching for a solutions to create a notification and notify to all users, record their reaches and views. We have around tens of thousands users.
If each time of creating a new notification, I need to write records for all the users, the database may be overloaded with a surge in writing processes.
Do you have a better design for this use case? Thank you in advance.
I use PostgreSQL, with those two tables somehow like below.
CREATE TABLE notification (
id BIGSERIAL PRIMARY KEY,
notification_message VARCHAR(255),
)
CREATE TABLE notification_user (
user_id BIGINT,
notification_id BIGINT,
status VARCHAR
)
Without a lot more details there is not much anyone can advise you on. But do not dwell over a measly 90K rows. First off I have no idea of your design, but assuming you have normalized you should have 3 tables here: users, notifications, and user_notifications. But put together something and TEST it, that is the only to determine if you actually have an issue of just the presumption of an issue.
I have put together a small demo. I like round number so I used 100K users and a simple query to insert a notification as a user_notification for each user. I then ran a that insert 1,2,3,4,5,10, and 25 notification. That results in 100K rows to 2.5M and captured the time. All on my "play machine". This is not a formal performance test just more of a back-of-the-envelope test.
Environment
Acer laptop with
Intel I5 1.6GHz 4Core 8GB Ram
Windows 10 Home 64bit
Postgres 12.0
IDE: DBeaver 7.0.0
Overall, a very much underrated server.
Results:
users 100000
notice # rows time (in sec)
-------- --------- ---------------
1 100,001 1.750
2 200,002 3.781
3 300,003 5.500
4 400,004 7.663
5 500,005 9.367
10 1,000,010 21.186
25 2,500,025 60.6
# rows includes notifications + user_notifications inserts
See fiddle for full sample, but it has only 100 users not 100K. I don't know what performance your server can provide, should be more than my toy.

Selecting multiple values/aggregators from Influxdb, excluding time

I got a influx db table consisting of
> SELECT * FROM results
name: results
time artnum duration
---- ------ --------
1539084104865933709 1234 34
1539084151822395648 1234 81
1539084449707598963 2345 56
1539084449707598123 2345 52
and other tags. Both artnum and duration are fields (that is changeable though). I'm now trying to create a query (to use in grafana) that gives me the following result with a calculated mean() and the number of measurements for that artnum:
artnum mean_duration no. measurements
------ -------- -----
1234 58 2
2345 54 2
First of all: Is it possible to exclude the time column? Secondly, what is the influx db way to create such a table? I started with
SELECT mean("duration"), "artnum" FROM "results"
resulting in ERR: mixing aggregate and non-aggregate queries is not supported. Then I found https://docs.influxdata.com/influxdb/v1.6/guides/downsampling_and_retention/, which looked like what I wanted to do. I then created a infinite retention policy (duration 0s) and a continuous query
> CREATE CONTINUOUS QUERY "cq" ON "test" BEGIN
SELECT mean("duration"),"artnum"
INTO infinite.mean_duration
FROM infinite.test
GROUP BY time(1m)
END
I followed the instructions, but after I fed some data to the db and waited for 1m, `SELECT * FROM "infinite"."mean_duration" did not return anything.
Is that approach the right one or should I continue somewhere else? The very goal is to see the updated table in grafana, refreshing once a minute.
InfluxDB is a time series database, so you really need the time dimension - also in the response. You will have a hard time with Grafana if your query returns non time series data. So don't try to remove time from the query. Better option is to hide time in the Grafana table panel - use column styles and set Type: Hidden.
InfluxDB doesn't have a tables, but measurements. I guess you need query with proper grouping only, no advance continous queries, etc.. Try and improve this query*:
SELECT
MEAN("duration"),
COUNT("duration")
FROM results
GROUP BY "artnum" fill(null)
*you may have a problem with grouping in your case, because artnum is InfluxDB field - better option is to save artnum as InfluxDB tag.

Oracle Gather Statistics after Partition Exchange

Database: Oracle 12c
I have a process that selects data from Fact Table, summarizes it and push it to Summary Table.
Summary table is Range Partitioned (Trade Date) and List Partitioned (File Id).
the process picks up data from Fact table (where file_id=<> for all Trade Dates), summarizes it in a temp table and use Partition Exchange to move data from Temp table to one of the SubPartitions in Summary Table (as the process works on a File Id level).
Summary table is completely refreshed everyday (100% data will be exchanged).
Before the data is exchanged at the subpartition level, statistics are gathered and exchanged along with the data.
After the process is completed, we run dbms_gather_table_stats at partition level (in a for loop - for each partition) with granularity set as "approx_global and partition".
Even though we collect stats at the global level, user_tab_statistics for the summary table has "STALE_STATS" = YES for this table, however, partition & Subpartition stats are available.
when we run a query against the summary table (for a date range of 3 years), the query spins for a long time - spiking the CPU to 90%, but never returns any data.
I checked the explain plan on the query, Cardinality is showing as 1.
I read about incremental stats, but it seems increment will work if a few partitions change - it may not be the best option in my case, where data across all the partitions change completely.
I m looking for a strategy to gather statistics on the summary table - don't want to run a full gather stats.
Thanks.