Using the CFF Stage to read an EBCDIC file as a single record - datastage

I have an EBCDIC file created on z/OS and is SFTPed to the midrange/Linux. The EBCDIC file has 20 fields. I’m trying to use the CFF Stage to read records in one field. Is this possible? Thanks in advance for any help.
The COBOL copybook has
01 Record
02 FLD_1
02 FLD_2
02 FLF_3
.
.
I want the CFF Stage to read the file entire record.
01 Record

Try replacing the copybook with one that has only an 01 level item. Or replace the table definition with a single field of type Char (or VarChar) of appropriate maximum length.

Related

check what data does not match on a customer list using tsql

I have a customer list table holding all data for a number of our branches. We have recently had a load of customers move from branch 02 to branch 04 but have all now got a different customer number. small example below:
table.customers
branch
cust_code
post_code
email
tel
mob
02
1234
de5 1ac
fgfg#b.com
0178
0188
04
1432
de5 1ac
fgfg#b.com
0178
0188
02
8528
st4 3ad
thng#b.com
0164
1654
04
6132
st43 ad
thng#b.com
0164
1654
02
8523
de4 1ac
fgfg#b.com
0178
0188
04
7463
de4 1ac
fggf#b.com
0178
0188
So I need to now check that all data has been moved from branch 02 to branch 04 correct, with only the cust_Code being allowed to be different on the columns stated. I do have a list of branch 02 customer codes and the new corresponding branch 04 customer codes. so can tell the query a customer code at branch 02 and to check the customer code at branch 04 to see if the rest of the columns match.
As you can see above the first customer 1234 everything is fine and matches so this can pass the check.
But the customers in bold and italics have something typed incorrect.
I am wanting to write some T-SQL query to now help me identify which customers have non matching data.
Sorry, but you should not base your query in "hope".
First, make sure you make a table with the customer codes before and after the branch move. Let's assume a table named [pairs] with two columns, [branch02code] and [branch04code]. You make sure this has the correct data from excel. Then, you use that to join one copy of customers for each branch, and maybe use a case to figure differences:
select
p.*
,stats_query.records_match
from
pairs p
inner join customers b2 on p.branch02code = b2.cust_code
inner join customers b4 on p.branch04code = b4.cust_code
cross apply
(select case
when b2.post_code = b4.post_code and b2.email= b4.email and b2.tel=b4.tel and ......
then 1
else 0
end as records_match
) as stats_query
order by stats_query.records_match desc

SAS - how can I read in date data?

I am trying to read in some data in date format and the solution is eluding me. Here are four of my tries using the simplest self-contained examples I could devise. (And the site is making me boost my text-to-code ratio in order for this to post, so please ignore this sentence).
*EDIT - my example was too simplistic. I have spaces in my variables, so I do need to specify positions (the original answer said to ignore positions entirely). The solution below works, but the date variable is not a date.
data clinical;
input
name $ 1-13
visit_date $ 14-23
group $ 25
;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
No need to specify the lengths. datalines already assumes space-delimited values. A simple way to specify an informat is to use a : after each input variable.
data clinical;
input ID$ visit_date:mmddyy10. group$;
format visit_date mmddyy10.; * Make the date look human-readable;
datalines;
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
;
run;
Output:
ID visit_date group
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
A friend of mine suggested this, but it seems odd to have to switch syntax markedly depending on whether the variable is a date or not.
data clinical; 
input
name $ 1-12
#13 visit_date MMDDYY10.
group $ 25 ;
datalines;
John Turner 03/12/1998 D
Mary Jones  04/15/2008 P
Joe Sims    11/30/2009 J
;
run;
SAS provides a lot of different ways to input data, just depending on what you want to do.
Column input, which is what you start with, is appropriate when this is true:
To read with column input, data values must have these attributes:
appear in the same columns in all the input data records
consist of standard numeric form or character form
Your data does not meet this in the visit_date column. So, you need to use something else.
Formatted input is appropriate to use when you want these features:
With formatted input, an informat follows a variable name and defines how SAS reads the values of this variable. An informat gives the data type and the field width of an input value. Informats also read data that is stored in nonstandard form, such as packed decimal, or numbers that contain special characters such as commas.
Your visit_date column matches this requirement, as you have a specific informat (mmddyy10.) you would like to use to read in the data into date format.
List input would also work, especially in modified list format, in some cases, though in your example of course it wouldn't due to the spaces in the name. Here's when you might want to use it:
List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data is located in specific columns. However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input does not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.
(For completeness, there is also Named input, though that's more rare to see, and not helpful here.)
You can mix Column and Formatted inputs, but you don't want to mix List input as it doesn't have the same concept of pointer control exactly so it can be easy to end up with something you don't want. In general, you should use the input type that's appropriate to your data - use Column input if your data is all text/regular numerics, use formatted input if you have particular formats for your data.

Alternate to LISTAGG function in Amazon Redshift

I using LISTAGG function to convert row values into one single column but facing below error. When searched, when the result rows exceeds more than 65535 [https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html]. My result set is more 100000 so I get the exception below. What is the alternate to this function? How can I achieve my use case in redshift. Please help.
-----------------------------------------------
error: Result size exceeds LISTAGG limit
code: 8001
context: LISTAGG limit: 65535
process: query18_863_20899937 [pid=66066]```
**My Use Case**
From:
ID dates
00 date00
01 date00
00 date01
00 date02
00 date03
01 date01
**To: [Using LISTAGG function]**
ID dates
00 date00,date01,date02,date03
01 date00,date01
Thanks.
It would be helpful to show the query (or the key parts of it) that you are running but I think I can suggest a direction. LISTAGG() has a "WITHIN GROUP" option and is documented at the link you mention in your post.
Listagg() within group will aggregate the text column only for the values in the group defined in the "GROUP BY" clause. So to meet your deesired output you should be able to do something like:
select ID, listagg(dates) within group as dates
from <table>
group by ID;
The same listagg() limit will exist - it is just that now it is applied to each "group by" group. If this still produces a text string that is too long you will need to break things into more groups. When this happens I will calculate a "subpart" column which increments for every 1,000 rows in the group. This way I can ensure that no group will produce more than 1,000 rows for listagg() to consume.

How to import BLOB formatted as CSV to postgres

I have a csv file that is an output from a BLOB store. The csv contains 6 related tables. Not all records utilise the 6 tables but all records do use table 1. I would like to import table 1 into postgres. Data is described as following
Files are ASCII text files comprising variable length fields delimited by an asterisk. The files have an extension of “csv”. Records are separated by a Carriage Return/Line Feed. No data items should contain an asterisk.
Further information is given in the technical arrangement.
Technical Arrangement
The technical arrangement of our summary data is as follows:
Fields are in the exact order in which they are listed in this file specification.
Records are broken down into Types. Each Type represents a different part of the record.
Each record will begin with Type ‘01’ data
For each record type ‘01’, there is one or more record type ‘02’s containing Survey Line Item data. There may be zero or more of record types ’03’ and '06'.
There may be zero or one of record types '04' and '05'.
If a record type '06' exists, there will be one record type '07'
The end of a record is only indicated by the next row of Type ‘01’ data or the end of the file.
You should use this information to read the file into formal data structures.
I'm new to databases and want to know how to tackle this, i understand that postgres has python and java connectors which in turn have ways to read blob data. Is that the best approach?
EDIT
Sample data, one entry comprising 2 record types then 1 containing all 7 record types ;
01*15707127000*8227599000*0335*The Occupier*3****MARKET STREET**BRACKNELL*BERKS*RG12 1JG*290405*Shop And Premises*60.71*14872*14872*14750*2017*Bracknell Forest*00249200003001*20994339144*01-APR-2017**249*NIA*330.00
02*1*Ground*Retail Zone A*29.42*330.00*9709
02*2*Ground*Retail Zone B*31.29*165.00*5163
01*15707136000*492865165*0335**7-8****CHARLES SQUARE**BRACKNELL*BERKS*RG12 1DF*290405*Shop And Premises*325.10*34451*32921*32750*2017*Bracknell Forest*00215600007806*21012750144*01-APR-2017**249*NIA*260.00
02*1*Ground*Retail Zone A*68.00*260.00*17680
02*2*Ground*Remaining Retail Zone*83.50*32.50*2714
02*3*Ground*Office*7.30*26.00*190
02*4*First*Locker Room (Female)*3.20*13.00*42
02*5*First*Locker Room (Male)*5.80*13.00*75
02*6*First*Mess/Staff Room*11.50*13.00*150
02*7*Ground*Internal Storage*7.80*26.00*203
02*8*Ground*Retail Zone B*68.10*130.00*8853
02*9*Ground*Retail Zone C*69.90*65.00*4544
03*Air Conditioning System*289.5*7.00*+2027
06*Divided or split unit*-5.00%
06*Double unit*-5.00%
07*36478*-3557`
Copy the text file to an auxiliary table with a single text column:
drop table if exists text_buffer;
create table text_buffer(text_row text);
copy text_buffer from '/data/my_file.csv';
Transform the text column to text array skipping rows you do not need. You'll be able to select any element as a new column with a given name and type, e.g.:
select
cols[2]::bigint as bigint1,
cols[3]::bigint as bigint2,
cols[4]::text as text1,
cols[5]::text as text2
-- specify name and type of any column you need
from text_buffer,
lateral string_to_array(text_row, '*') cols -- transform text column to text array
where left(text_row, 2) = '01'; -- get only rows for table1
bigint1 | bigint2 | text1 | text2
-------------+------------+-------+--------------
15707127000 | 8227599000 | 0335 | The Occupier
15707136000 | 492865165 | 0335 |
(2 rows)

get part of a Tree Structure in Crystal Reports

I have the tree stucture of the departments of a company.
The link for the Parent-Child department in the table is:
Departments.ID->Departments.parentID.
However I don't need the whole tree.
The thing is that the structure of the departments has changed during the years and I would like to keep only a part of the tree.
For example:
-Root
--Parent 1
---Child 01
---Child 02
--Parent 2
---Child 01
---Child 02 (This is the parent that I want to have in my "shorter" tree
----Child 001 (This is the part of the tree that I want. The depth is more than 1)
---Child 04
--Parent 03
Can I say something like "get me everything under Child 001"?
Does this even makes sense?
Thanks for any advice.
I think I rushed and ask the question.
While I was adding tables and data for my report the tree gets "shorter" because of database fields that are null. So when I added all the tables that I wanted to use, the "older" departments disapear because of the NULLs.
If you are using the File System Data connection, add this to your record-selection formula:
{x.Directory}="Root\Parent 2\Child 02
OR {x.Parent Directory} LIKE "Root\Parent 2\Child 02*"
You could also choose to set the database 'table' to the desired parent node.