KDB - How to select a column with prefixed numbers? - kdb

I have a table of attributes that I am trying to pivot off of and while the pivot makes sense, several key attribute values I am successfully pivoting off of are prefixed with numbers (for sorting purposes). These are important attributes (there are several like this) that we want to pivot and report on.
I found a similar question here: How to select a column containing dot in column name in kdb and am when I sanitize the dictionary .Q.id t prefixed the columns with a
When I ran type on the returned value it returned 99h so the pivot returns a dictionary.
I'm trying to leverage enlist(`1CODE)#t but to no avail as of yet.
Any thoughts or suggestions?
q) t
monthDate | 1CODE 2CODE 3CODE 4CODE
----------| ------------------------------------
2022.01.01| 18.0054 0.1537228 4.116678 9.332936
2022.02.01| 17.87151 0.1527959 3.866393 9.685012
2022.03.01| 17.739 0.1518747 3.646734 10.00515
...

You can't use colName#table on a keyed table (99h is a keyed table in this case, though yes a keyed table is also a dictionary). So you would have to unkey the table first using 0!
t:1!flip`monthDate`1CODE`2CODE!(2022.01.01 2022.02.01 2022.03.01;3?100.;3?10.);
q)((),`1CODE)#0!t
1CODE
--------
61.37452
52.94808
69.16099
q)((),`1CODE`2CODE)#0!t
1CODE 2CODE
------------------
61.37452 0.8388858
52.94808 1.959907
69.16099 3.75638

Tables in kdb are just lists of dictionaries. Type 99h can be both a keyed table and a dictionary. You can still use qsql if you've sanitised your table:
q)select a1CODE from .Q.id t
a1CODE
--------
18.0054
17.87151
17.739
Another option is to use xcol to rename your columns:
q)t:(`monthDate,`$1 rotate'string 1_cols t)xcol t
q)select CODE1 from t1
CODE1
--------
47.35547
75.21426
99.14374

I'm not sure what you mean by pivoting off of at the beginning, but an issue that sticks out to me is that the enlist function should use square brackets - rather than the round ones in in your post. So the code you want is:
enlist[`1CODE]#t

Related

How to select a column containing dot in column name in kdb

I have a table which consists of column named "a.b"
q)t:([]a.b:3?10.0; c:3?10; d:3?`3)
How can we select column a.b and c from table t?
How can we rename column a.b to b?
Is it possible to achieve above two cases without functional select?
Failed attempts:
q)select a.b, c from t
'type
q)?[`t;();0b;enlist (`b`c!`a.b`c)]
'type
q)select b:a.b from t
'type
As others have mentioned, .Q.id t will sanitise table column names if they aren't suitable for qSQL statements or performance in general.
`a.b`c#t
will only work for multiple column selects and
`a.b#t
will return a type error. However, you can get around this by enlisting the single item into the take operator, like so:
q)enlist[`a.b]#t
a.b
---------
4.931835
5.785203
0.8388858
q)(enlist`a.b)#t
a.b
---------
4.931835
5.785203
0.8388858
If you only need the values from a single column another option would be to use indexing, in this case, it would be t[a.b] ` which would return all values from the a.b column.
You could also mix these selection styles like so, but ultimately lose the column name from a.b:
q)select c,t[`a.b] from t
c x
----------
8 4.707883
5 6.346716
4 9.672398
In the query operation the . itself is used for foreign key navigation and it is throwing a type error as it cannot find any table relating to the foreign key it believes you have passed it.
As much as I hate answering any online forum question by refuting the premise, I really must here, do not use periods in column names, it will cause trouble. .Q.id exists to santise column names for a reason.
The primary reason that errors are encountered is that the use of dot notation in qSQL is reserved for the resolution of linked columns. We can see how this is actually working by parsing the query itself
q)parse "select a.b from tab"
?
`tab
()
0b
(,`b)!,`a.b // Here the referencing of a linked column b via a is occuring
// Compared to a normal select
q)parse "select b from tab"
?
`tab
()
0b
(,`b)!,`b
Other issues could crop up depending on future processing, such as q attempting to treat the column names as namespaces or operating on each part of the name with the dot operator.
Using dot notation in your column names will hamstring any further development, and force all other kdb users to use roundabout methods. The development will be slow and encounter many bugs.
I would advise that if periods must be included in the column, you create an API for external users to use to translate queries into the sanitised forms.
You can easily sanitise the whole table with .Q.id
q)tab:enlist `a.b`c`d!(1 2 3)
q)tab:.Q.id tab
q)sel:{[tab;cl] ?[tab;();0b;((),.Q.id each cl)!((),.Q.id each cl)]}
q)sel[tab;`a.b]
ab
--
1
How about the following, using take # :
q) `a.b`c#t
a.b c
-----------
4.931835 1
5.785203 9
0.8388858 5
To rename:
q) `b xcol t
b c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
You can use .Q.id to rename any unselectable columns:
q).Q.id t
ab c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
Best to avoid dots in columns names and symbols in general, use underscore if you must.

Cloud Dataprep - Multiply rows in one column based on values in other column

I am working in Cloud Dataprep and i have a case like this:
Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.
Is it possible and how?
I understand that the scenario you want to have is: obtain all values from column1 that match a value present in column2. There are many things to consider in this scenario, which you did not describe, such as: can values in column2 be repeated? or if there is a value in column2 missing in column1, what should happen? or what happens the other way around?
However, as a general approach to this issue, I would do the following flow:
With a flow such as this one, you take the input table, which as two columns like this:
In recipes FIRST_COLUMN and SECOND_COLUMN you split both columns into different branches, and do the necessary steps to clean each column. In column1, I understand nothing is needed to be done. In column2, I understand that you will have to remove duplicates (again, this is my guessing, but it would depend on your specific implementation, which you have not completely described) and delete empty values. You can do that applying the following transforms:
Finally, you can join both columns together. Depending on your needs (only values present in both columns should appear, only values present in columnX should appear, etc.) you should apply a different JOIN strategy. You should use a Join key like column1 = column2 (as in the image), and if you choose only the second column in the left-side menu, you will have a single-column result.
Note that in this case I used an Inner-join, but using other JOIN types will provide completely different results. Use the one that fits your requirements better.

Crystal Reports - Create subreport with column range [col1...col60] as datasource?

I am adept in both SQL and CR, but this is something I've never had to do.
In CR, I load a table that will always contain 1 record. There is a range of columns (like Column1, Column2 ... Column60). (bad design, I know, but I can't do anything to change that).
Thanks to this old design I have to manually add each column in my report like this:
-----------
| TABLE |
-----------
| Column1 |
| Column2 |
| Column3 |
| ... |
-----------
Now I would like to be able to create a subreport and create a datasource for it in such a way that [Column1...Column60] becomes a collection [Row1...Row60]. I want to be able to use the detailsection of the subreport to dynamically generate the table. That would save me a lot of time.
Is there any way to do that? Maybe a different approach to what I had in mind?
Edit
#Siva: I'll describe it the best way I can. The table exists out of 500+ columns and will only hold 1 record (never more). Because normalization was never taken into account when creating these tables (Objective C / DBF ages) columns like these: Brand01,Brand02,Brand03...Brand60 should have been placed in a separate table named "Brands"
The document itself is pretty straight forward considering there's only one record. But some columns have to be pivoted (stacked vertically) and placed in a table layout on the document which is a lot of work if you have to do it manually. That's why I wanted to feed a range of columns into my subreport so I can use the detail section of my subreport to generate the table layout automatically.
Ok got it... I will try to answer to the extent possible...
you need to have 2 columns in report that will show the 60 column names as 60 rows as 1st column and 60 column data as 2nd column. For this there are two ways that I can think of.
if columns are static and report need to be developed only once then though its a tough job manually create 120 formulas 60 for row names where you will write column names and 60 for data for respective columns and place in report since you have only one record you will get correct data. Like below:
formula 1:
column1 name // write manually
Formula 1:
databasefield for column1 // this has data for column1
Above will be one row in report like this you will get 120 formulas 60 rows and you don't need sub report here main report will do the job.
Since you are expecting dynamic behavior (Though columns are static), you can create view from database perspective or datatable (Please note I have no idea on datatable use it as per your convinience).
Create in such a way that it has 2 columns in table and in report use cross tab that will give you dynamic behaviour.
In cross tab column1 will be rows part and column 2 will be data.
Here also I don't see any requirement for sub report you can directly use main report. If you want sub report you can use aswell no harm since you have only 1 record

SUM the NUMC field in SELECT

I need to group a table by the sum of a NUMC-column, which unfortunately seems not to be possible with ABAP / OpenSQL.
My code looks like that:
SELECT z~anln1
FROM zzanla AS z
INTO TABLE gt_
GROUP BY z~anln1 z~anln2
HAVING SUM( z~percent ) <> 100 " percent unfortunately is a NUMC -> summing up not possible
What would be the best / easiest practices here as I cannot alter the table itself?
Unfortunately the NUMC type is described as numerical text, so at the end it lands in the database as VARCHAR and that is why the functions like SUM or AVG cannot be used.
It all depends on how big your table is. If it is rather small you could get the group fields and the values for sum into an internal table and then sum it using COLLECT statement and eventually remove the rows for which the sum is equal 100%.
One solution is to define the field in the table using a more appropriate type.
NUMC is often used for key fields - like document numbers, which there would never be a reason to add together.
I didn't find a smooth solution.
What I did, was to copy everything in an internal table, looped over it converting the NUMC values to DEC values. Grouping and summing up worked at that point.
At the end, I converted the DEC values back to NUMC values.
It's been awhile. I came back to this post, because someone voted up my original answer. I was thinking about editing my old answer but I decided to post a new one. As this question was asked in 2017, there were some restictions but now it can be done by using CAST function in new OpenSQL.
SELECT z~anln1
FROM zzanla AS z
INTO TABLE #gt_
GROUP BY z~anln1, z~anln2
HAVING SUM( CAST( z~percent AS INT4 ) ) <> 100

SQL Server 2008: Pivot column with no aggregate function workaround

Yes I know, this question has been asked MANY times but after reading all the posts I found that there wasn't an answer that fits my need. So, Heres my question. I would like to take a column of values and pivot them into rows of 6 columns.
I want to take this...... And turn it into this.......................
G Letter Date Code Ammount Name Account
081278 G 081278 12 00123535 John Doe 123456
12
00123535
John Doe
123456
I have 110000 values in this one column in one table called TempTable. I need all the values displayed because each row is an entity to itself. For instance, There is one unique entry for all of the Letter, Date, Code, Ammount, Name, and Account columns. I understand that the aggregate function is required but is there a workaround that will allow me to get this desired result?
Just use a MAX aggregate
If one row = one column (per group of 6 rows) then MAX of a single value = that row value.
However, the data you've posted in insufficient. I don't see anything to:
associate the 6 rows per group
distinguish whether a row is "Letter" or "Name"
There is no implicit row order or number to rely upon to generate the groups
Unfortunately, the max columns in a SQL 2008 select statement is 4,096 as per MSDN Max Capacity.
Instead of using a pivot, you might consider dynamic SQL to get what you want to do.
Declare #SQLColumns nvarchar(max),#SQL nvarchar(max)
select #SQLColumns=(select '''+ColName+'''',' from TableName for XML Path(''))
set #SQLColumns=left(#SQLColumns,len(#SQLColumns)-1)
set #SQL='Select '+#SQLColumns
exec sp_ExecuteSQL #SQL,N''