How do I replace the first 10 entries in a column with NaN in KDB - kdb

I am doing calculation on columns using summation. I want to manually change my first n entries in my calc column from float to NaN. Can someone please advise me how to do that?
For example, if my column in table t now is mycol:(1 2 3 4 5 6 7 8 9), I am trying to get a function that can replace the first n=4 entries with NaN, so my column in table t becomes mycol:(0N 0N 0N 0N 5 6 7 8 9)
Thank you so much!
Emily

We can use amend functionality to replace the first n items with null value. Additionally, it would be better to use the appropriate null literal for each column based on the type. Something like this would work:
f: {nullDict: "ijfs"!(0Ni;0Nj;0Nf:`); #[x; til y; :; nullDict .Q.ty x]}
This will amend the first y items in the list x. .Q.ty will get the type for input so that we can get the corresponding value from the dictionary.
You can then use this for a single column, like so:
update mycol: f[mycol;4] from tbl
You can also do this in one go for multiple columns with varying number of items to be replaced using functional form:
![tbl;();0b;`mycol`mycol2!((f[;4];`mycol);(f[;3];`mycol2))]
Do take note that you will need to modify nullDict with whatever other types you need.
Update: Thanks to Jonathon McMurray for suggesting a better way to build up nullDict for all primitive types using the below code:
{x!first each x$\:()}.Q.t except " "

Related

KDB/Q : What is Vector operation?

I am learning KDB+ and Q programming and read about the following statement -
"select performs vector operations on column lists". What does Vector operation mean here? Could somebody please explain with an example? Also, How its faster than standard SQL?
A vector operation is an operation that takes one or more vectors and produces another vector. For example + in q is a vector operation:
q)a:1 2 3
q)b:10 20 30
q)a + b
11 22 33
If a and b are columns in a table, you can perform vector operations on them in a select statement. Continuing with the previous example, let's put a and b vectors in a table as columns:
q)([]a;b)
a b
----
1 10
2 20
3 30
Now,
q)select c:a + b from ([]a;b)
c
--
11
22
33
The select statement performed the same a+b vector addition, but took input and returned output as table columns.
How its faster than standard SQL?
"Standard" SQL implementations typically store data row by row. In a table with many columns the first element of a column and its second element can be separated in memory by the data from other columns. Modern computers operate most efficiently when the data is stored contiguously. In kdb+, this is achieved by storing tables column by column.
A vector is a list of atoms of the same type. Some examples:
2 3 4 5 / int
"A fine, clear day" / char
`ibm`goog`aapl`ibm`msft / symbol
2017.01 2017.02 2017.03m / month
Kdb+ stores and handles vectors very efficiently. Q operators – not just +-*% but e.g. mcount, ratios, prds – are optimised for vectors.
These operators can be even more efficient when vectors have attributes, such as u (no repeated items) and s (items are in ascending order).
When table columns are vectors, those same efficiencies are available. These efficiencies are not available to standard SQL, which views tables as unordered sets of rows.
Being column-oriented, kdb+ can splay large tables, storing each column as a separate file, which reduces file I/O when selecting from large tables.
The sentence means when you refer to a specific column of a table with a column label, it is resolved into the whole column list, rather than each element of it, and any operations on it shall be understood as list operations.
q)show t: flip `a`b!(til 3;10*til 3)
a b
----
0 0
1 10
2 20
q)select x: count a, y: type b from t
x y
---
3 7
q)type t[`b]
7h
q)type first t[`b]
-7h
count a in the above q-sql is equivalent to count t[`a] which is count 0 1 2 = 3. The same goes to type b; the positive return value 7 means b is a list rather than an atom: http://code.kx.com/q/ref/datatypes/#primitive-datatypes

Comparing, matching and combining columns of data

I need some help matching data and combining it. I currently have four columns of data in an Excel sheet, similar to the following:
Column: 1 2 3 4
U 3 A 0
W 6 B 0
R 1 C 0
T 9 D 0
... ... ... ...
Column two is a data value that corresponds to the letter in column one. What I need to do is compare column 3 with column 1 and whenever it matches copy the corresponding value from column 2 to column 4.
You might ask why don't I do this manually ? I have a spreadsheet with around 100,000 rows so this really isn't an option!
I do have access to MATLAB and have the information imported, if this would be more easily completed within that environment, please let me know.
As mentioned by #bla:
a formula similar to =IF(A1=C1,B1,0)
should serve (Excel).

Get range of elements in KDB using variables

Why I can't use variable inside array ranges in KDB?
test:1 2 3 4 5
This example won't work:
pos:3;
test[1 pos]
but this way it will work
test[1 3]
As you can see, when you use test[1 3], (1 3) is a list. So vector variable requires a list.
q) list1:1 3
q) test[list1]
So you have to use:
q)n:3
q)list1:(1;n)
q)test[list1]
q)test[(1;n)] / alternate way
For detail explanation about why only semicolon doesn't work and why we require brackets '()',check my answer for this post:
kdb/q: how to reshape a list into nRows, where nRows is a variable
To understand what you're asking, consider:
1 2 3 7
That is a simple list of integers. Now consider:
a 2 3
Where a is a vector. The above indexes into a. Easy. Now say you want to have that 2 3 list as a variable
b:2 3
a b //works!
You are specifically asking about how to get a range from a list, this is covered in How to get range of elements in a list in KDB?
In that answer, use variables to create your index list and use the result to index into a

matlab sort one column and keep respective values on second column

How do I just do a simple sort in matlab. I always have to use the excel link to import my data, sort it, then export back to matlab. This is annoying!!!
I have one matrix <10x10> and I want to sort the first column in descending order while keeping it's respective values on the second column. Matlab seems to just sort each column individually.
Example:
matrix a
5 4
8 9
0 6
7 3
matrix b (output)
0 6
5 4
7 3
8 9
The sortrows answer by #chaohuang is probably what you're looking for. However, it sorts based on all columns. If you only want to sort based on the first column, then you can do this:
% sort only the first column, return indices of the sort
[~,sorted_inds] = sort( a(:,1) );
% reorder the rows based on the sorted indices
b = a(sorted_inds,:);
Simply use b=sortrows(a); See here.

postgresql compute min value of columns conditiong on a value of other columns

can I do this with the standard SQL or I need to create a function for the following problem?
I have 14 columns, which represent 2 properties of 7 consecutive objects (the order from 1 to 7 is important), so
table.object1prop1, ...,table.object1prop7,table.objects2prop2, ..., table.objects2prop7.
I need compute the minimum value of the property 2 of the 7 objects that have smaller values than a specific threshold for property 1.
The values of the property 1 of the 7 objects take values on a ascending arithmetic scale. So property 1 of the object 1 will ever be smaller than property 2 of the objects 1.
Thanks in advance for any clue!
This would be easier if the data were normalized. (Hint, any time you find a column name with a number in it, you are looking at a big red flag that the schema is not in 3rd normal form.) With the table as you describe, it will take a fair amount of code, but the greatest() and least() functions might be your best friends.
http://www.postgresql.org/docs/current/interactive/functions-conditional.html#FUNCTIONS-GREATEST-LEAST
If I had to write code for this, I would probably feed the values into a CTE and work from there.