to delete selected number of records in a big file useing jcl sort cards - jcl

I want to delete 501 records based on 5 character activity code from the test file with 38,792 records.
As there are 501 record I can't write a omit condition.
I need to use sort join card but my prombelem is this 5 charcter activity code is starting from 46th column for some records and 47th column for others.
So what can I do?

The question is unclear with many details missing, but here's something which may help another searcher:
//SYSIN DD *
JOINKEYS F1=INA,FIELDS=(1,5,A),SORTED,NOSEQCK
JOINKEYS F2=INB,FIELDS=(1,5,A)
JOIN UNPAIRED,F1
REFORMAT FIELDS=(F1:1,80,?)
OPTION COPY
INREC IFTHEN=(WHEN=(81,1,CH,EQ,C'B'),
OVERLAY=(82:SEQNUM,9,ZD))
OUTFIL OMIT=(82,9,CH,LE,C'000000501',
AND,
81,1,CH,EQ,C'B')
//JNF2CNTL DD *
INREC IFTHEN=(WHEN=(1,1,CH,EQ,C'0'),
BUILD=(3,5)),
IFTHEN=(WHEN=NONE,
BUILD=(2,5))
//INA DD *
11111 IN
22222 KEEP UNMATCHED
33333 OUT
66666 IN
66667 KEEP UNMATCHED
66668 KEEP UNMATCHED
77777 OUT
88888 SHAKE IT ALL ABOUT
//INB DD *
0X11111
0X66666
0X88888
133333
799999
877777
This is using two input files, INA and INB.
INA is already in sequence (so specify SORTED,NOSEQCHK on the JOINKEYS for it), and is fixed-length 80-byte records.
INB is not already in sequence, because it is a mixture of different files, all are fixed-length 80-byte records.
In JNF2CNTL, only the key from the second file is extracted, as no other data is required from that file. The key is sourced from different places depending on the record-type. The file will be sorted automatically (with OPTION EQUALS set) before the JOIN itself.
The JOIN is for matches, and unmatched records from F1 (INA).
The ? in the REFORMAT statement is the "match marker" and it will be automatically set to B (both) for a match and 1 (in this case, only one is possible due to the ONLY on the JOIN statement) for an unmatched record from F1.
Of those that match, you want to ignore the first 501. So, set up a sequence number which is only incremented for the matching records.
Then on OUTFIL, OMIT= for those matched records which have a sequence less than or equal to the 501 count.
The output on SORTOUT will be all the records from the INA file, except the first 501 which matched.

Related

SAS - how can I read in date data?

I am trying to read in some data in date format and the solution is eluding me. Here are four of my tries using the simplest self-contained examples I could devise. (And the site is making me boost my text-to-code ratio in order for this to post, so please ignore this sentence).
*EDIT - my example was too simplistic. I have spaces in my variables, so I do need to specify positions (the original answer said to ignore positions entirely). The solution below works, but the date variable is not a date.
data clinical;
input
name $ 1-13
visit_date $ 14-23
group $ 25
;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
No need to specify the lengths. datalines already assumes space-delimited values. A simple way to specify an informat is to use a : after each input variable.
data clinical;
input ID$ visit_date:mmddyy10. group$;
format visit_date mmddyy10.; * Make the date look human-readable;
datalines;
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
;
run;
Output:
ID visit_date group
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
A friend of mine suggested this, but it seems odd to have to switch syntax markedly depending on whether the variable is a date or not.
data clinical; 
input
name $ 1-12
#13 visit_date MMDDYY10.
group $ 25 ;
datalines;
John Turner 03/12/1998 D
Mary Jones  04/15/2008 P
Joe Sims    11/30/2009 J
;
run;
SAS provides a lot of different ways to input data, just depending on what you want to do.
Column input, which is what you start with, is appropriate when this is true:
To read with column input, data values must have these attributes:
appear in the same columns in all the input data records
consist of standard numeric form or character form
Your data does not meet this in the visit_date column. So, you need to use something else.
Formatted input is appropriate to use when you want these features:
With formatted input, an informat follows a variable name and defines how SAS reads the values of this variable. An informat gives the data type and the field width of an input value. Informats also read data that is stored in nonstandard form, such as packed decimal, or numbers that contain special characters such as commas.
Your visit_date column matches this requirement, as you have a specific informat (mmddyy10.) you would like to use to read in the data into date format.
List input would also work, especially in modified list format, in some cases, though in your example of course it wouldn't due to the spaces in the name. Here's when you might want to use it:
List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data is located in specific columns. However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input does not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.
(For completeness, there is also Named input, though that's more rare to see, and not helpful here.)
You can mix Column and Formatted inputs, but you don't want to mix List input as it doesn't have the same concept of pointer control exactly so it can be easy to end up with something you don't want. In general, you should use the input type that's appropriate to your data - use Column input if your data is all text/regular numerics, use formatted input if you have particular formats for your data.

Adding the totals of two seperate tables in a word document

this question relates to adding the totals of two tables and using that total in the body of the word document.
In my case I have a word document (docx) with two tables. These tables are populated through a word merge process of third party software over which I have no control. For ease of reference I will refer to each table as table1 and table2. Both tables will contain an unknown amount of rows, but the last row, will always contain a total in the last column, which will total the rows above using the formula =SUM(ABOVE).
In the body of the document, I know need to reference the total of each table and because I do not know how many rows there are, I am at a loss. For example, if I knew how many rows there are, I could use the answer given here.
I have tried to using a merge field - with the column names as follows - however I get a !Syntax error ...
=SUM(table1[Amount]+table2[InterestAmount])
Any and all help greatly appreciated.
If you bookmark the two tables (e.g. TblA, TblB), you can use a formula field to tally their totals:
{=SUM(TblA C:C)/2+SUM(TblB C:C)/2}
The reason for the /2 is that, unless you know the last row # beforehand, you need to reference the entire column (including your existing totals row), the sum of which will therefore be twice the total.
To see how to do a wide range of calculations in Word, check out my Microsoft Word Field Maths Tutorial, at:
http://www.msofficeforums.com/word/38720-microsoft-word-field-maths-tutorial.html
or:
http://www.gmayor.com/downloads.htm#Third_party
Fields can be bookmarked in Word, then referenced elsewhere in the document. When bookmarking in a table, be careful to not select the entire cell, only the field! If the entire cell is bookmarked then the cell structures are carried across to the REF and the field content can't be processed numerically.
For three bookmarked fields with the names Fld1, Fld2and Fld3 that should be multiplied the combined field code would look like this:
{ = { REF Fld1 } * { REF Fld2 } * { REF Fld3 } \# "0.00" }
Note that you could also use the PRODUCT function (like SUM, but multiplies, each factor separated using the system's list separator character.)
Notes for readers not familiar with working with Word field codes: the paired wavy braces must be inserted using Ctrl+F9 and cannot simply be typed from the keyboard. Alt+F9 will toggle between field code and field result display. Press F9 to force a field to update.

Talend filter from two input files

I have two data files (delimited files) :
- The first one contain 3 columns, ID, num_phone, trafic_etl : the sim card may be 3g, 4g or whatever.
- the second one contain 1 column num_phone_4g : the sim card has to be 4g.
The thing is, I want to fill a oracle table, with numbers with 4g sim card (second file), that has 0 trafic_etl in total, knowing that the first file may have more than one row for same num_phone.
I did do this with sql statement by storing files in tables.
But what I have to do, is using talend for and I am new to this tool.
Thanks in advance.
Images of the two files : File2
File1
Here's a solution using this sample data.
*File 1*
num_phone;trafic_etl;annee;mois;jour
123456;111111;2018;Juillet;20
123457;222222;2018;Juillet;20
123458;0;2018;Juillet;20
123456;333333;2018;Juillet;20
123457;444444;2018;Juillet;20
123458;0;2018;Juillet;20
*File 2*
num_phone_4g
123456
123457
123458
123459
The expected output is 123458 (because it has a total of 0 trafic) and 123459 (because it's not present in file 1; I don't know if this is possible in your use case).
I aggregate the data of file2 by phone number to get the total trafic for each phone number (assuming the date is not important). Then I use this aggregated data as a lookup to file2. In tMap_1, there is a join between the 2 flows on the phone number, and I only output the rows from file2 where the total trafic is null or zero.
Let me know if my assumptions are correct. If they are not, I will update my answer.

Merge Columns from various files in Talend

I am trying to achieve column merge of files in a folder using Talend.(Files are local)
Example:- 4 files are there in a folder. ( there could be 'n' number of files also)
Each file would have one column having 100 values.
So after merge, the output file would have 4 or 'n' number of columns with 100 records in it.
Is it possible to merge this way using Talend components ?
Tried with 2 files in tmap , the output records becomes multiplied ( the record in first file * the record in second file ).
Any help would be appreciated.
Thanks.
You have to determine how to join data from the different files.
If row number N of each file has to be matched with row number N of the other files, then you must set a sequence on each of your file, and join the sequences in order to get your result. Careful, you are totally depending on the order of data in each file.
Then you can have this job :
tFileInputdelimited_1 --> tMap_1 --->{tMap_5
tFileInputdelimited_2 --> tMap_2 --->{tMap_5
tFileInputdelimited_3 --> tMap_3 --->{tMap_5
tFileInputdelimited_4 --> tMap_4 --->{tMap_5
In tMaps from 1 to 4, copy the input to the output, and add a "sequence" column (datatype integer) to your output, populate it with Numeric.sequence("IDENTIFIER1",1,1) . Then you have 2 columns in output : your data and a unique sequence.
Be careful to use different identifiers for each source.
Then in tMap_5, just join the different sequences, and get your inputColumn.

Process two space delimited text files into one by common column [duplicate]

This question already has answers here:
merge two files by key if exists in the first file / bash script [duplicate]
(2 answers)
Closed 9 years ago.
I have two text files that look like:
col1 primary col3 col4
blah 1 blah 4
1 2 5 6
...
and
colA primary colC colD
1 1 7 27
foo 2 11 13
I want to merge them into a single wider table, such as:
primary col1 col3 col4 colA colC colD
1 blah blah 4 a 7 27
2 1 5 6 foo 11 13
I'm pretty new to Perl, so I'm not sure what the best way is to do this.
Note that column order does not matter, and there are a couple million rows. Also my files are unfortunately not sorted.
My current plan unless there's an alternative:
For a given line in one of the files, scan the other file for the matching row and append them both as necessary into the new file. This sounds slow and cumbersome though.
Thanks!
Solution 1.
Read the smaller of two files line by line, using a standard CPAN delimited-file parser like TXT::CSV_XS to parse out columns.
Save each record (as arrayref of columns) in a hash, with your merge column being the hash key
When done, read the larger of two files line by line, using a standard CPAN delimited-file parser like TXT::CSV_XS to parse out columns.
For each record, find the join key field, find the matching record from your hash storing the data from file#1, merge the 2 records as needed, and print.
NOTE: This is pretty memory intensive as the entire smaller file will live in memory, but won't require you to read one of the files million times.
Solution 2.
Sort file1 (using Unix sort or some simple Perl code) into "file1.sorted"
Sort file2 (using Unix sort or some simple Perl code) into "file2.sorted"
Open both files for reading. Loop until both are fully read:
Read 1 line from each file into the buffer if the buffer for that file is empty (buffer being simply a variable containing the next record).
Compare indexes between 2 lines.
If index1 < index2, write the record for file1 into output (without merging) and empty buffer1. Repeat step 3
If index1 > index2, write the record for file2 into output (without merging) and empty buffer2. Repeat.
If index1 == index2, merge 2 records, write the merged record into output and empty out both buffers (assuming the join index column is unique. If not unique, this step is more complicated).
NOTE: this does NOT require you to keep entire file in memory, aside from sorting the files (which CAN be done in memory constrained way if you need to).