I have a delimited flat file that has 3 columns:
NEW # DETAIL OLD #
------ ------ ------
111111 AAAA 123456
222222 BBBB
333333 CCCC 987654
I need my output to be
# DETAIL
------ ------
111111 AAAA
222222 BBBB
333333 CCCC
123456 AAAA
987654 CCCC
I need to ignore nulls in the OLD # column.
I'm not sure the best way to accomplish this. Union All and/or merge seem to work if you have multiple sources.
The general concept is that you will want to Unpivot. Jason Strate has a really good article on it with his 31 days of SSIS series.
The basic idea is that you want to keep the DETAIL column and let the other two flow into it. Unpivot is the native operation to normalize the data.
Source
I used a query as it was faster to gin up and I added in a row with an explicit NULL value.
SELECT
*
FROM
(
VALUES
('111111','AAAA','123456')
, ('222222','BBBB','')
, ('333333','CCCC','987654')
, ('444444','DDDD',NULL)
) D([NEW #], [DETAIL],[OLD #]);
Unpviot
The operation will unpivot the data. This eliminates NULL values but retains empty strings. This may or may not be the outcome you desire.
Results
At this point, you can see we have the empty string row. You can address this in two ways, I'll let you pick your approach.
Upstream - scrub empty strings to NULL for elimination
Downstream - use a Conditional Split to remove the rows with empty Number values
Biml
Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. Here, we're going to use it to describe the ETL. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that addresses a host of shortcomings with it. Specifically, we're going to use the ability to transform a Biml file describing ETL into an SSIS package. This has the added benefit of providing you a mechanism for being able to generate exactly the solution I'm describing versus clicking through many tedious dialogue boxes.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection ConnectionString="Provider=SQLNCLI11;Data Source=localhost\dev2014;Integrated Security=SSPI;Initial Catalog=tempdb" Name="CM_OLE" />
</Connections>
<Packages>
<Package
ConstraintMode="Linear"
Name="so_25670727">
<Tasks>
<Dataflow Name="DFT Combine all">
<Transformations>
<!--
Generate some source data. Added a row with an explicit NULL
as no real testing is done unless we have to deal with NULLs
-->
<OleDbSource ConnectionName="CM_OLE" Name="OLE_SRC Query">
<DirectInput>
SELECT
*
FROM
(
VALUES
('111111','AAAA','123456')
, ('222222','BBBB','')
, ('333333','CCCC','987654')
, ('444444','DDDD',NULL)
) D([NEW #], [DETAIL],[OLD #]);
</DirectInput>
</OleDbSource>
<!--
Unpivot the data. Combine NEW # and OLD # into a single column called Number.
A "Pivot Key Value" column will also be generated that identifies where the value came
from.
-->
<Unpivot Name="UP Detail">
<Columns>
<Column SourceColumn="DETAIL" TargetColumn="DETAIL"/>
<Column SourceColumn="NEW #" TargetColumn="Number" PivotKeyValue="NEW #"/>
<Column SourceColumn="OLD #" TargetColumn="Number" PivotKeyValue="OLD #"/>
</Columns>
</Unpivot>
<!--
Put something here so we can attach a data viewer
Notice, the NULL does not show in the output but the empty string does.
Depending on your tolerance, you will want to either
* Upstream - scrub empty strings to NULL for elimination
* Upstream - convert NULL to empty string for preservation
* Downstream - use a Conditional Split to remove the rows with empty Number columns
-->
<DerivedColumns Name="DER DataViewer">
</DerivedColumns>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Related
I have a task to create tetrapeptide screening library aminoacids using Knime. I have never used Knime before sadly. I need to create a workflow with all 20 aminoacids, multiply it with another 20, then multiply the result with another 20 and repeat to get final result of tetrapeptides. Can someone suggest me how to input aminoacids on the Knime? Thank you very much!
Use a Table Creator node to enter the Amino acid single-letter codes, one per table. Now use a Cross Joiner node to cross-join the table to itself - you should now have a table with rows like:
A|A
A|C
etc.
Now put this table into both inputs of a second Cross Joiner node, which should give you now quite a long table starting something like:
A|A|A|A
A|A|A|C
A|C|A|A
A|C|A|C
etc.
Now use a Column Aggregator node, select all column as aggregation columns, the aggregation method as Concatenate and change the delimiter to an empty string:
and:
This will give you a table with a single column, 'Peptide':
AAAA
AAAC
ACAA
ACAC
etc.
If you want the output as a chemical structure, then as of v1.36.0 the Vernalis community contribution contains a node Speedy Sequence to SMILES which will convert the sequence to a SMILES string (make sure you select the option that your input column is a Protein!)
The full workflow is as shown:
I have a table of attributes that I am trying to pivot off of and while the pivot makes sense, several key attribute values I am successfully pivoting off of are prefixed with numbers (for sorting purposes). These are important attributes (there are several like this) that we want to pivot and report on.
I found a similar question here: How to select a column containing dot in column name in kdb and am when I sanitize the dictionary .Q.id t prefixed the columns with a
When I ran type on the returned value it returned 99h so the pivot returns a dictionary.
I'm trying to leverage enlist(`1CODE)#t but to no avail as of yet.
Any thoughts or suggestions?
q) t
monthDate | 1CODE 2CODE 3CODE 4CODE
----------| ------------------------------------
2022.01.01| 18.0054 0.1537228 4.116678 9.332936
2022.02.01| 17.87151 0.1527959 3.866393 9.685012
2022.03.01| 17.739 0.1518747 3.646734 10.00515
...
You can't use colName#table on a keyed table (99h is a keyed table in this case, though yes a keyed table is also a dictionary). So you would have to unkey the table first using 0!
t:1!flip`monthDate`1CODE`2CODE!(2022.01.01 2022.02.01 2022.03.01;3?100.;3?10.);
q)((),`1CODE)#0!t
1CODE
--------
61.37452
52.94808
69.16099
q)((),`1CODE`2CODE)#0!t
1CODE 2CODE
------------------
61.37452 0.8388858
52.94808 1.959907
69.16099 3.75638
Tables in kdb are just lists of dictionaries. Type 99h can be both a keyed table and a dictionary. You can still use qsql if you've sanitised your table:
q)select a1CODE from .Q.id t
a1CODE
--------
18.0054
17.87151
17.739
Another option is to use xcol to rename your columns:
q)t:(`monthDate,`$1 rotate'string 1_cols t)xcol t
q)select CODE1 from t1
CODE1
--------
47.35547
75.21426
99.14374
I'm not sure what you mean by pivoting off of at the beginning, but an issue that sticks out to me is that the enlist function should use square brackets - rather than the round ones in in your post. So the code you want is:
enlist[`1CODE]#t
I am adept in both SQL and CR, but this is something I've never had to do.
In CR, I load a table that will always contain 1 record. There is a range of columns (like Column1, Column2 ... Column60). (bad design, I know, but I can't do anything to change that).
Thanks to this old design I have to manually add each column in my report like this:
-----------
| TABLE |
-----------
| Column1 |
| Column2 |
| Column3 |
| ... |
-----------
Now I would like to be able to create a subreport and create a datasource for it in such a way that [Column1...Column60] becomes a collection [Row1...Row60]. I want to be able to use the detailsection of the subreport to dynamically generate the table. That would save me a lot of time.
Is there any way to do that? Maybe a different approach to what I had in mind?
Edit
#Siva: I'll describe it the best way I can. The table exists out of 500+ columns and will only hold 1 record (never more). Because normalization was never taken into account when creating these tables (Objective C / DBF ages) columns like these: Brand01,Brand02,Brand03...Brand60 should have been placed in a separate table named "Brands"
The document itself is pretty straight forward considering there's only one record. But some columns have to be pivoted (stacked vertically) and placed in a table layout on the document which is a lot of work if you have to do it manually. That's why I wanted to feed a range of columns into my subreport so I can use the detail section of my subreport to generate the table layout automatically.
Ok got it... I will try to answer to the extent possible...
you need to have 2 columns in report that will show the 60 column names as 60 rows as 1st column and 60 column data as 2nd column. For this there are two ways that I can think of.
if columns are static and report need to be developed only once then though its a tough job manually create 120 formulas 60 for row names where you will write column names and 60 for data for respective columns and place in report since you have only one record you will get correct data. Like below:
formula 1:
column1 name // write manually
Formula 1:
databasefield for column1 // this has data for column1
Above will be one row in report like this you will get 120 formulas 60 rows and you don't need sub report here main report will do the job.
Since you are expecting dynamic behavior (Though columns are static), you can create view from database perspective or datatable (Please note I have no idea on datatable use it as per your convinience).
Create in such a way that it has 2 columns in table and in report use cross tab that will give you dynamic behaviour.
In cross tab column1 will be rows part and column 2 will be data.
Here also I don't see any requirement for sub report you can directly use main report. If you want sub report you can use aswell no harm since you have only 1 record
I am trying to make sense of Adabas Natural DDMs. Mostly it makes sense but explanations of certain specifics are hard to come by.
The Files start off with something like:
00101DB: 000 FILE: 015 - Z-NATDIC-PR DEFAULT SEQUENCE:
0020
0030TYL DB NAME F LENG S D REMARKS
0040--- -- -------------------------------- - ---- - - ------------------------
which is all good and well. But what does it mean if lines similar to those appear multiple times within the same DDM?
For example, the excerpt above comes from a DDM that also contains:
03001DB: 255 FILE: 253 - Z-NATDIC-PR DEFAULT SEQUENCE:
0310
0320TYL DB NAME F LENG S D REMARKS
0330--- -- -------------------------------- - ---- - - ------------------------
...
05901DB: 255 FILE: 253 - Z-NATDIC-PR DEFAULT SEQUENCE:
0600
0610TYL DB NAME F LENG S D REMARKS
0620--- -- -------------------------------- - ---- - - ------------------------
...
08901DB: 255 FILE: 253 - Z-NATDIC-PR DEFAULT SEQUENCE:
0900
0910TYL DB NAME F LENG S D REMARKS
0920--- -- -------------------------------- - ---- - - ------------------------
My understanding is:
a DDM exists to define a user-friendly way of referring to fields for a single Adabas file (kinda like an SQL table)
A default sequence defines the order of a bunch of fields (analogous to SQL columns)
I need clarification:
What is the purpose of a default sequence?
what does it mean if there are multiple default sequences within a single DDM?
Sheena, it is sorted in the Adabas short name sequence. I believe it is to order your fields at a later stage on the logical view, for instance if you want to add a postal code at the end of an address field later on. Adabas, always puts the field at the end of the file, if you use a short name in between address line 4 and the next field you can add the postal code there. In my 21 years of working with natural you are the first to ask this question :-)
The default sequence is specified with the two-character field short name. The system validates the short name based on the selected file number. If the database is accessible, the short name is checked against the corresponding field in the database file. If such a field does not exist in the database, a selection list of valid short names is displayed. If the database cannot be accessed, no selection list is generated.
As Carl mentioned, in the DDM-Editor a list of valid short names may be shown as a completion aid.
However that doesn't explain what the value is used for.
The above is documented under "Using the DDM Editor" in the current Natural documentation.
If you take a look in the Natural Programming Guide, under...
"Accessing Data in an Adabas Database"
...how its used is explained.
To access Adabas data in logical order with Natural you might code the following:
READ view LOGICAL BY descriptor
(that corresponds to Select/Order by in SQL)
It is however also possible to omit descriptor and code the following:
READ view LOGICAL
In that case the data will be read in the order specified by Default Sequence.
(this is also discussed in the Natural documentation of the READ statement)
In my 35 years or so working with Adabas & Natural at Software AG and customers I've never seen this field used. Its usually left blank.
Yes I know, this question has been asked MANY times but after reading all the posts I found that there wasn't an answer that fits my need. So, Heres my question. I would like to take a column of values and pivot them into rows of 6 columns.
I want to take this...... And turn it into this.......................
G Letter Date Code Ammount Name Account
081278 G 081278 12 00123535 John Doe 123456
12
00123535
John Doe
123456
I have 110000 values in this one column in one table called TempTable. I need all the values displayed because each row is an entity to itself. For instance, There is one unique entry for all of the Letter, Date, Code, Ammount, Name, and Account columns. I understand that the aggregate function is required but is there a workaround that will allow me to get this desired result?
Just use a MAX aggregate
If one row = one column (per group of 6 rows) then MAX of a single value = that row value.
However, the data you've posted in insufficient. I don't see anything to:
associate the 6 rows per group
distinguish whether a row is "Letter" or "Name"
There is no implicit row order or number to rely upon to generate the groups
Unfortunately, the max columns in a SQL 2008 select statement is 4,096 as per MSDN Max Capacity.
Instead of using a pivot, you might consider dynamic SQL to get what you want to do.
Declare #SQLColumns nvarchar(max),#SQL nvarchar(max)
select #SQLColumns=(select '''+ColName+'''',' from TableName for XML Path(''))
set #SQLColumns=left(#SQLColumns,len(#SQLColumns)-1)
set #SQL='Select '+#SQLColumns
exec sp_ExecuteSQL #SQL,N''