I am trying to make sense of Adabas Natural DDMs. Mostly it makes sense but explanations of certain specifics are hard to come by.
The Files start off with something like:
00101DB: 000 FILE: 015 - Z-NATDIC-PR DEFAULT SEQUENCE:
0020
0030TYL DB NAME F LENG S D REMARKS
0040--- -- -------------------------------- - ---- - - ------------------------
which is all good and well. But what does it mean if lines similar to those appear multiple times within the same DDM?
For example, the excerpt above comes from a DDM that also contains:
03001DB: 255 FILE: 253 - Z-NATDIC-PR DEFAULT SEQUENCE:
0310
0320TYL DB NAME F LENG S D REMARKS
0330--- -- -------------------------------- - ---- - - ------------------------
...
05901DB: 255 FILE: 253 - Z-NATDIC-PR DEFAULT SEQUENCE:
0600
0610TYL DB NAME F LENG S D REMARKS
0620--- -- -------------------------------- - ---- - - ------------------------
...
08901DB: 255 FILE: 253 - Z-NATDIC-PR DEFAULT SEQUENCE:
0900
0910TYL DB NAME F LENG S D REMARKS
0920--- -- -------------------------------- - ---- - - ------------------------
My understanding is:
a DDM exists to define a user-friendly way of referring to fields for a single Adabas file (kinda like an SQL table)
A default sequence defines the order of a bunch of fields (analogous to SQL columns)
I need clarification:
What is the purpose of a default sequence?
what does it mean if there are multiple default sequences within a single DDM?
Sheena, it is sorted in the Adabas short name sequence. I believe it is to order your fields at a later stage on the logical view, for instance if you want to add a postal code at the end of an address field later on. Adabas, always puts the field at the end of the file, if you use a short name in between address line 4 and the next field you can add the postal code there. In my 21 years of working with natural you are the first to ask this question :-)
The default sequence is specified with the two-character field short name. The system validates the short name based on the selected file number. If the database is accessible, the short name is checked against the corresponding field in the database file. If such a field does not exist in the database, a selection list of valid short names is displayed. If the database cannot be accessed, no selection list is generated.
As Carl mentioned, in the DDM-Editor a list of valid short names may be shown as a completion aid.
However that doesn't explain what the value is used for.
The above is documented under "Using the DDM Editor" in the current Natural documentation.
If you take a look in the Natural Programming Guide, under...
"Accessing Data in an Adabas Database"
...how its used is explained.
To access Adabas data in logical order with Natural you might code the following:
READ view LOGICAL BY descriptor
(that corresponds to Select/Order by in SQL)
It is however also possible to omit descriptor and code the following:
READ view LOGICAL
In that case the data will be read in the order specified by Default Sequence.
(this is also discussed in the Natural documentation of the READ statement)
In my 35 years or so working with Adabas & Natural at Software AG and customers I've never seen this field used. Its usually left blank.
Related
I am trying to read in some data in date format and the solution is eluding me. Here are four of my tries using the simplest self-contained examples I could devise. (And the site is making me boost my text-to-code ratio in order for this to post, so please ignore this sentence).
*EDIT - my example was too simplistic. I have spaces in my variables, so I do need to specify positions (the original answer said to ignore positions entirely). The solution below works, but the date variable is not a date.
data clinical;
input
name $ 1-13
visit_date $ 14-23
group $ 25
;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
No need to specify the lengths. datalines already assumes space-delimited values. A simple way to specify an informat is to use a : after each input variable.
data clinical;
input ID$ visit_date:mmddyy10. group$;
format visit_date mmddyy10.; * Make the date look human-readable;
datalines;
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
;
run;
Output:
ID visit_date group
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
A friend of mine suggested this, but it seems odd to have to switch syntax markedly depending on whether the variable is a date or not.
data clinical;
input
name $ 1-12
#13 visit_date MMDDYY10.
group $ 25 ;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
SAS provides a lot of different ways to input data, just depending on what you want to do.
Column input, which is what you start with, is appropriate when this is true:
To read with column input, data values must have these attributes:
appear in the same columns in all the input data records
consist of standard numeric form or character form
Your data does not meet this in the visit_date column. So, you need to use something else.
Formatted input is appropriate to use when you want these features:
With formatted input, an informat follows a variable name and defines how SAS reads the values of this variable. An informat gives the data type and the field width of an input value. Informats also read data that is stored in nonstandard form, such as packed decimal, or numbers that contain special characters such as commas.
Your visit_date column matches this requirement, as you have a specific informat (mmddyy10.) you would like to use to read in the data into date format.
List input would also work, especially in modified list format, in some cases, though in your example of course it wouldn't due to the spaces in the name. Here's when you might want to use it:
List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data is located in specific columns. However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input does not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.
(For completeness, there is also Named input, though that's more rare to see, and not helpful here.)
You can mix Column and Formatted inputs, but you don't want to mix List input as it doesn't have the same concept of pointer control exactly so it can be easy to end up with something you don't want. In general, you should use the input type that's appropriate to your data - use Column input if your data is all text/regular numerics, use formatted input if you have particular formats for your data.
Considering the trade table 't' and quotes table 'q' in memory:
q)t:([] sym:`GOOG`AMZN`GOOG`AMZN; time:10:01 10:02 10:02 10:03; px:10 20 11 19)
q)q:([] sym:`GOOG`AMZN`AMZN`GOOG`AMZN; time:10:01 10:01 10:02 10:02 10:03; vol:100 200 210 110 220)
In order to get performance benefits applying grouped attribute on 'sym' column of q table and making 'time' column sorted within sym.
Using this, I can clearly see the performance benefits from it:
q)\t:1000000 aj[`sym`time;t;q]
9573
q)\t:1000000 aj[`sym`time;t;q1]
8761
q)\t:100000 aj[`sym`time;t;q]
968
q)\t:100000 aj[`sym`time;t;q1]
893
And in large tables the performance is far better.
Now, I'm trying to understand how it works internally when we are applying grouped attribute to sym column and sort time within sym.
My understanding is internally the aj should happen in below way, can someone please let me know the correct internal working?
* Since, grouped attribute is applied on sym; so it creates a hashtable for table q1, then since we are sorting on time so the internal q1 table might look like.
GOOG|(10:01;10:02)|(100;110)
AMZN|(10:01;10:02:10:03)|(200;210;220)
So in this case of q1, if the interpreter has to join (AMZN;10:02) of t table; it will directly find it in q1's hasttable in less time, but for joining same value(AMZN;10:02) of table 't' in table 'q' the interpreter will have to search linearly through table 'q' hence taking more time.
I believe you're on the right track, though we can't know for sure as we don't have access to the kdb source code to see precisely what it does.
If you look at the definition of aj you'll see that it's based on bin:
q)aj
k){.Q.ft[{d:x_z;$[&/j:-1<i:(x#z)bin x#y;y,'d i;+.[+.Q.ff[y]d;(!+d;j);:;.+d i j:&j]]}[x,();;0!z]]y}
specifically,
(`sym`time#q)bin `sym`time#t
and the bin documentation provides some more details on how bin behaves: https://code.kx.com/q/ref/bin/
I believe in the two-column case it will first match on the sym column and then use bin on the second column. Like you said, the grouped attribute on sym speeds up the matching of syms part and the sorting on time ensures the bin returns the correct results. Note that for on-disk queries it's optimal to put `p# on sym rather than `g# as the parted attribute is optimal for matching/retrieving by sym from disk.
I am currently studying SQL normal forms.
Lets say I have the following table the primary key is userid
userid FirstName LastName Phone
1 John Smith 555-555
1 Tim Jack 432-213
2 Sarah Mit 454-541
3 Tom jones 987-125
The book I'm reading states the following conditions must be true in order for a table to be in 1st normal form.
Rows contain data about an entity.
Columns contain data about attributes of the entities.
All entries in a column are of the same kind.
Each column has a unique name.
Cells of the table hold a single value.
The order of the columns is unimportant.
The order of the rows is unimportant.
No two rows may be identical.
A primary key Must be assigned
I'm not sure if my table violates the
8th rule No two rows may be identical.
Because the first two records in my table
1 John Smith 555-555
1 Tim Jack 432-213
share the same userid does that mean that they are considered
duplicate rows?
Or does duplicate records mean that every peace of data in the row
has to be the same for the record to be considered a duplicate row
see example below?
1 John Smith 555-555
1 John Smith 555-555
EDIT1: Sorry for the confusion
The question I was trying to ask is simple
Is this table below in 1st normal form?
userid FirstName LastName Phone
1 John Smith 555-555
1 Tim Jack 432-213
2 Sarah Mit 454-541
3 Tom jones 987-125
Based on the 9 rules given in the textbook I think it is but I wasn't sure that
if rule 8 No two rows may be identical
was being violated because of two records that use the same primary key.
The class text book and prof isn't really that clear on this subject which is why I am asking this question.
Or does duplicate records mean that every peace of data in the row has to be the same for the record to be considered a duplicate row see example below?
They mean that--the latter of your choices. Entire rows are what must be "identical". It's ok if two rows share the same values for one or more columns as long as one or more columns differ.
That's because a relation holds a set of values that are tuples/rows/records, and set is a collection of values that are all different.
But SQL & some relational algebras have different notions of "identical" in the case of NULLs compared to the relational model without NULLs. You should read what your textbook says about it if you want to know exactly what they mean by it. Two rows that have NULL in the same column are considered different. (Point 9 might be summarizing something involving NULLs. Depends on the explanation in the book.)
PS
There's no single notion of what a relation is. There is no single notion of "identical". There is no single notion of 1NF.
Points 3-8 are better described as (poor) ways of restricting how to interpret a picture of a table to get a relation. Your textbook seems to be (strangely) making "1NF" a property of such an interpretation of a picture of a table. Normally we simply define a relation to be a certain thing so if you have one then it has to have the defined properties. Then "in 1NF" applies to a relation & either means "is a relation" & isn't further used or it means certain further restrictions hold. A relation is a set of tuples/rows/records, and in the kind of relation your 3-8 describes they are sets of attribute/column/field name-value pairs & the values paired with a name have to be of the type paired with that name in some schema/heading that is a set of name-type pairs that is defined either as part of the relation or external to it.
Your textbook doesn't seem to present things clearly. It's definition of "1NF" is also idiosyncratic in that although 3-8 are mathematical, 1 & 2 are informal/heuristic (& 9 could be either or both).
I've started to do some programming in ILE RPG and I'm curious about one thing - what exactly is the record format? I know that it has to be defined in physical/logical/display files but what exactly it does? In an old RPG book from 97 I've found that "Each record format defines what is written to or read from the workstation in a single I/O operation"
In other book I have found definition that record format describe the fields within a record(so for example length, type like char or decimal?).
And last, what exactly means that "every record within a physical file must have an identical record layout"?
I'm a bit confused right now. Still not sure what is record format :F.
Still not sure what is record format :F
The F Specification: This specification is also known as the File specification. Here we declare all the files which we will be using in the program. The files might be any of the physical file, logical file, display file or the printer file. Message files are not declared in the F specification.
what exactly means that "every record within a physical file must have an identical record layout"?
Each and every record within one physical file has the same layout.
Let's make a record layout of 40 characters.
----|---10----|---20----|---30----|---40
20150130 DEBIT 00002100
20150130 CREDIT 00012315
The bar with the numbers is not part of the record layout. It's there so we can count columns.
The first field in the record layout is the date in yyyymmdd format. This takes up 8 characters, from position 1 to position 8.
The second field is 2 blank spaces, from position 9 to position 10.
The third field is the debit / credit indicator. It takes up 10 characters, from position 11 to position 20.
The fourth field is the debit / credit amount. It takes up 8 positions, from position 21 to position 28. The format is assumed to be 9(6)V99. In other words, there's an implied decimal point between positions 26 and 27.
The fifth field is more blank spaces, from position 29 to position 40.
Every record in this file has these 5 fields, all defined the same way.
A "record format" is a named structure that is used for device file I/O. It contains descriptions of each column in the 'record' (or 'row'). The specific combination of data types and sizes and the number and order of columns is hashed into a value that is known as the "record format identifier".
A significant purpose is the inclusion by compilers of the "record format identifier" in compiled program objects for use when the related file is opened. The system will compare the format ID from the program object to the current format ID of the file. If the two don't match, the system will notify the program that the file definition has changed since the program was compiled. The program can then know that it is probably going to read data that doesn't match the definitions that it knows. Nearly all such programs are allowed to fail by sending a message that indicates that the format level has changed, i.e., a "level check" failed.
The handling of format IDs is rooted in the original 'native file I/O' that pre-dates facilities such as SQL. It is a part of the integration between DB2 and the various program compilers available on the system.
The 'native' database file system was developed using principles that eventually resulted in SQL. A SQL table should have rows that all hold the same series of column definitions. That's pretty much the same as saying "every record within a physical file must have an identical record layout".
Physical database files can be thought of as being SQL tables. Logical database files can be thought of as being SQL views. As such, all records in a physical file will have the same definitions, but there is some potential variation in logical files.
A record format It's something you learn in old school. You read a file (table) and update/write through a record format.
DSPFD FILE(myTable)
Then you can see everything about the file. The record format name is in there.
New or Young Developers believe that every record in a physical file must be identical, but in ancient times, the dinosaurs walk on earth and in one single file you could have several types of records or "record formats", so as the name indicates a record format is the format of a record within a file.
A new reporting requirement has arisen and I'm not too sure of the best way to tackle it. The source system has a field in it - let's call the field 'Fruit codes'. The Fruit codes field contains a list of comma-separated fruits. These are stored as semi-meaningful values Eg.
ID - Fruit codes
100 - APL, BAN, STRW
101 - ORNG
102 - BAN, STRW
There is a table that maps these semi-meaningful values to the full string equivalent. Eg.
Fruit code - Fruit name
APL - Apple
BAN - Banana
STRW - Strawberry
ORNG - Orange
We want to be able to display the full-string equivalent, separated by commas. The expected output should look like this:
ID - Fruit names
100 - Apple, Banana, Strawberry
101 - Orange
102 - Banana, Strawberry
We are using DataStage 9.1 with DB2 9.7. I was hoping that I might be able to use the Ereplace function in DataStage, however I'm not sure that this will work. The list of possible Fruits changes every now and then, so I want this to be dynamic. I wonder whether I might need to loop through each of the comma-separated list of fruits and then somehow to an Ereplace using the mapping table. Perhaps I will need to separate the comma-separated lists into individual rows or columns.
Maybe it's possible to do this using the Pivot stage, or the opposite of the LISTAGG DB2 function (if this exists). I'm not so proficient in DataStage, so I have lots of ideas, but no answers!
Thank you so much for your help.