JCL for previous month-year in dataset name - jcl

I need to run a job on the first work day of the month with accounting data for the month-end close(previous month) to be used in some year end jobs. I would like to run the same job each month, with no operator intervention and have the closing month and year in the dataset name so it will be easily identifiable for what closing period the dataset was ceated. Currently run 8 separate jobs to accomplish this task. Please provide specific JCL samples.

It's not clear why you are running 8 separate job to accomplish this task, what does each job do?
Are you using any scheduler to run the job at a specific time?
You can use EZACFSM1 to use system symbolic paramters to add date/time information to dataset names.
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/IEA1E241/2.2.2

The best solution for this is to use the features of your job scheduler. Having said that...
Here is a more recent reference for EZACFSM1. OS/390 2.10 hasn't been current for over a decade.
However, you cannot just use this utility to create a dataset with date/time information in its name. EZACFSM1 simply reads from SYSIN and writes to SYSOUT, interpreting the system symbols it reads.
You could use EZACFSM1 to write an ALTER statement for IDCAMS, renaming a statically named dataset (one without the year and month in it) to one that has those attributes. It does require two additional steps, and a caveat.
//CATLG EXEC PGM=IEFBR14
//DD01 DD DISP=(NEW,CATLG),
// DSN=STUFF,
// AVGREC=U,
// LRECL=80,
// RECFM=FB,
// SPACE=(80,(1000,100))
//*
//MKALTER EXEC PGM=EZACFSM1
//SYSOUT DD DISP=(NEW,PASS),
// AVGREC=U,
// LRECL=80,
// RECFM=FB,
// SPACE=(80,(1000,100))
//SYSIN DD *
ALTER STUFF NEWNAME(STUFF.Y&YR4&MON)
//*
//RENAME EXEC PGM=IDCAMS
//SYSIN DD DISP=(OLD,PASS),DSN=*.MKALTER.SYSOUT
//SYSPRINT DD SYSOUT=*
//*
The caveat has to do with job scheduling. Let's say your job runs late on the last day of the month. If it sits in the input queue long enough, it will run on the first day of the next month, making the ALTER incorrect.

Related

Schedule Builds not adhering to CRON in Foundry

Schedule has been set to update this table between 14th to 25th of every month Mon-Fri. Although, the build got triggered recently on 12th of August which shouldn't happen according to the specified CRON.
The culprit seems to be a limitation of the cron expression, outside of Foundry - specifically this part:
The day of a command's execution can be specified by two fields — day of month, and day of week. If both fields are restricted (i.e., aren't *), the command will be run when either field matches the current time. For example, ``30 4 1,15 * 5'' would cause a command to be run at 4:30 am on the 1st and 15th of each month, plus every Friday.
So the cron schedule 30 8 14-25 * 1-5 will run between the 14th and 25th of the month and every Monday through Friday. (See for example crontab.guru (https://crontab.guru/#30_8_14-25_*_1-5).)
The generated description for it is not accurate, unfortunately we don't have much control over it as we use a library to turn the cron expressions into human readable expressions.
Related:
https://unix.stackexchange.com/questions/602328/are-the-day-of-month-and-day-of-week-crontab-fields-mutually-exclusive
https://unix.stackexchange.com/questions/602216/when-will-an-interval-cron-execute-the-first-time-ex-3-days/602222#602222
https://blog.healthchecks.io/2022/09/schedule-cron-job-the-funky-way/

How to process files only for the past hour using Talend?

I have continuous sensor data coming in every 5 mins in form of files. I want to pick files only for the past hour and do the required processing.
for e.g: the talend job runs at 12:01pm , it picks all the files from 11:00 am to 12:00 pm only.
Can anyone please suggest the approach I should take to make this happen within talend. is there any inbuilt component that can pick files for previous one hour ?
Here is the flow.
Use tFileProperties, in which you will get builtin schema with the name of mstring_name. By using this column you will get last modified time of file and in tJava or tJavaRow you can check wether this time lie between past one hour using talendDate functions
iterate all files and in tJavaRow write this code :
Date lastModifiedDate = TalendDate.parseDate("EEE MMM dd HH:mm:ss zzz yyyy", input_row.mtime_string);
Date current_date = TalendDate.getCurrentDate();
if(TalendDate.diffDate(current_date, lastModifiedDate,"HH") <= 1) {
output_row.abs_path = input_row.abs_path;
}
by this you will get all the files which are between past one hour.
hope this helps..
here is the complete job design :
tFileList--->(iterate)---->tFileProperties---->(row1 main)---->tJavaRow---->if---->tFileInputDelimited---->main----->tMap---->main----->tFileOutput
The context you are setting tJavaRow, check its nullability in if condition :
context.getProperty("file") != null && !context.getProperty("file").isEmpty()
After this use the context as you are doing
There is no built-in component that will give you files based on time.
However, you can accomplish this by using tFileList-->tFileProperties. Configure tFileList to sort by last modified date, then tFileProperties will give you the modified date. From there, you can filter based on the date value - if older than an hour, stop, otherwise process.

Loading date or datetime into date dimension

Let's say I have a date dimension and from my business requirements I know that the most granular I would need to go is to examine the specific day of the month that an event occurred.
The data I am given provides me with the exact time that an event occurred (YYYY-MM-DD HH:MM:SS). I have two opitons:
Before loading the data into the date dimension, slice the HH:MM:SS from the date.
Create the time attributes in my date dimension and insert the full date time.
The way I see it, I should go with the option 1. This would remove redundant data and save some space. However, if I go with option 2, should the business requirements ever change or if my manager suddenly wants to be more granular I wouldn't need to modify my original design. Which option is more commonly used? Are there more options that I did not consider?
Update - follow up question
I receive new data every month. If I used a pre built date dimension with all the dates would I then need to run my script every month to populate the table with new dates of that month or would I have a continuous process where by every day insert into the table one row, which would be that date?
I would agree with you and avoid option 2. A standard date dimension table is at the individual date level. If you did need to analyse by time of day, you could create an additional time of day dimension at the level of a second in a single day, and link to that from your fact table.
Your date dimension should be created by script automatically, rather than from the dates that events occurred. This allows you to analyse across a range of events from other facts, and on dates where no events occur, using a standard, prebuilt dimension.
I would also include the full date/time stamp as a column in the fact table, along with the 'DateKey' to the dimension table. This would allow you some visibility/analysis of the timestamp, you would not lose the data, and would still allow you to analyse by the date dimension.
Update - follow up question
Your pre-built date dimension (the standard way of doing it) would usually contain some dates in the future. There's no reason not to, for example, include another 5 years of dates in the table. But if you'd like it to gradually grow over time, you could have a script that is run once a day, once a month, or once a year to add new dates. Its totally up to you! There are many example scripts for building date dimensions- just google date dimension script. They exist for the language of your choice, e.g. SQL, C#, Power Query, etc.

How to check max date of each input file , if it matches with previous week start date then only process?

I have 25 countries data file has week wise data in CSV format which we get every Monday in ftp location, I just need to consolidate all the files into one file which I am able to do.
In each data file there is "Week" column and now I need to check whether latest week data is there in file or not , if not there send the mail saying file does not have latest data.
For example next Monday is on 16th March so max week in file should be 9th March.
How can I apply that logic?
Using tAggregateRow and tJavaRow I am able to get the max week of each file but how to design job after that?
The basic steps you want to follow are:
Keep the expected max date in a global variable at the start of
job. In this example it should be 9th March.
Read each file one by one and get the max of week date and if it
matches the global variable then do not send email. Otherwise send
the email.
So an example job flow might look like:
tFileList---iterate--->tFileInputDelimited--->tAggregaterow--->tJavaRow---RUN IF condition(based on if SendEmailflag is Y)--->tSendMail
The tAggregateRow should get the max week date.
In the tJavaRow you should compare if input_row.maxdate == globalmaxdate(9-march) and based on this set another flag SendEmailFlag=Y or N with it defaulting to N.

How can I schedule bi-weekly jobs?

My app requires users to schedule recurring events that can recur daily, weekly, monthly, or bi-weekly.
By bi-weekly, I mean every fortnight (14 days) starting from an arbitrary date value provided at the time of creation.
My jobs table has two columns to support this: job_frequency_id and job_frequency_value. I'm able to schedule all types except for bi-weekly.
The first col is an FK to the job_frequencies table; it contains daily, weekly, monthy, bi-weekly values. The job_frequency_value contains the value corresponding to the frequency.
For example: If a job has a job_frquency_id == 3 and job_frequency_value == 10, it will run every 10th day of the month.
How do I add bi-weekly support without tampering with my db structure? I will use the job_frequency_value col to store the start date of the 14 day period, but I'm unsure of the calculation going forward.
Say your starting date is stored as a variable named 'createdDate'.
nextFortnight = DateAdd("ww", job_frequency_value*2, createdDate);
can you wrap your scheduled task in a and set it to run every week?
Something like
<cfif DateDiff('ww',CreateDate(2011,01,01),Today'sDate) MOD 2 EQ 1>
That way if the weeks are odd your scheduled task runs completely and if it's an odd week then it runs the scheduled task, but ignore all your code.