The value in "sym" file disappears when using splayed tables - kdb

I am using the following line:
`:c:/dir/ set .Q.en[`:c:/dir; tablename]
Everything is ok if I don't exit KDB, but if I do and then try to load the table using
get `dir
all the symbol columns are integer. I would really appreciate your help into understanding why this happens.

It looks like you forgot to repeat the table name on the l.h.s. of set.
Try
q)`:c:/dir/tablename/ set .Q.en[`:c:/dir; tablename]
This will correctly save table columns in c:/dir/tablename subdirectory and place the sym file alongside. Now you should be able to load both your table and the sym file by using the \l command or specifying c:/dir on the command line when you restart q
q c:/dir
or
q
q)\l c:/dir
(no backticks or leading :'s in either of those commands)
If you want to use get on this table, you will have to load sym separately:
q)load`:c:/dir/sym
q)get`:c:/dir/tablename/
(note the leading : in the path specs)
Finally, you may want to take a look at the rsave command which will save your table without you having to write tablename twice.

.Q.en takes 2 oarams - file handle and table data
Your first param isnt a hsym - should be backtick then colon then path to your db root
Also set takes 2 params - first in this case should be the path to where you want to save like dir/splayedTableName/

Related

how to use each to pass sets of variables to a function in q

I have a function that deletes and works using a directory and a table and column as variables:
Delete1[dir,t,c]
Another that retruns a set of directories that works:
Paths[dir]
Now I am trying to combine these two using something like "each" to all the directories that Paths[dir] to Delete1 function and I am trying something like this:
Delete1 each (Paths[dir];t;c)
The syntax does not quite work.
You want to use projection. Supplying only the second and third arguments to the Delete1 function creates a new function with just one argument. You can use each between the projection and Paths
Delete1[;t;c] each Paths[dir]
You could use dot apply to this end, you can read more about dot applies here https://code.kx.com/q/ref/unclassified/#apply. It would look like the following:
Delete1 .' (Paths[dir];t;c)
Note if you're using this delete function to delete a column from a table in every partition you only need to delete it from .d file in the last partition. (like in a previous question of yours soft deleting a column from a table in q )

Greenplum : Getting filenames processed via an external table

we are processing multiple files using external table. Is there any way I can get the file name being processed in external tables and stored it in database table?
Only workaround I can find is appending the file name to every record in the flat file which isn't ideal when huge dataset and multiple files.
Can anyone help on this
Thanks
No, the file name is simply never passed from the gpfdist daemon back to Greenplum. So you have to append the file name to each line - you can use gpfdist transformation for doing so
I was struggling with this as well, here's my solution. Please note I'm not an expert in linux, so there may be a one liner solution.
So I wanted to add a filename column in front of my records.
That can be done in sed, I've created a transform.sh file, with the following content:
#/bin/sh
filename=$1
#echo $filename >> transform.txt
sed -e "s|^|$filename\v|" $filename
Please note that I was using vertical tab as a delimiter, \v. Also in the filename you could have / hence using | . In order to have the value of $filename we have to use double quites for sed.
Test it, it looks good.
./transform.sh countersamples-2016-03-02--11-51-10.csv
countersamples-2016-03-02--11-51-10.csv
timestamp
machine
category
instance
name
value
countersamples-2016-03-02--11-51-10.csv
2016-03-02 11:51:10.064
DESKTOP-4PLQKVL
Memory
% Committed Bytes In Use
74.8485488891602
This part is done, lets continue with gpfdist. We need a yaml file that can be passed to gpfdist, I named this transform.yaml
Content:
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
add_filename:
TYPE: input
CONTENT: data
COMMAND: /bin/bash transform.sh %filename%
Please note that we have the %filename% value here. It seems that gpfdist prefilters the files that needs to be handled, and passes them 1 by 1 to our transform.
Lets fire up gpfdist:
gpfdist -c transform.yaml -v
Now go into greenplum and create an external table such as:
CREATE READABLE EXTERNAL TABLE "ext_transform"
(
"filename" text,
"timestamp" timestamp without time zone ,
"machine" text ,
"category" text ,
"instance" text ,
"name" text ,
"value" double precision
)
LOCATION ('gpfdist://localhost:8080/*/countersamples*.csv#transform=add_filename')
FORMAT 'TEXT'
( HEADER DELIMITER '\013' NULL AS '\\N' ESCAPE AS '\\' )
And when we select data from it:
select * from "ext_transform";
We see:
I've created 2 folders to see how it reacts if the files are not in the same folder as the transform. This way I can distinguish between the 2 files, even if their data is identical.

Can a CSV in a text variable be COPY-ied in PostgreSQL

For example, say I've got the output of:
SELECT
$text$col1, col2, col3
0,my-value,text
7,value2,string
0,also a value,fort
$text$;
Would it be possible to populate a table directly from it with the COPY command?
Sort of. You would have to strip the first two and last lines of your example in order to use the data with COPY. You could do this by using the PROGRAM keyword:
COPY table_name FROM PROGRAM 'sed -e ''1,2d;$d'' inputfile';
Which is direct in that you are doing everything from the COPY command and indirect in that you are setting up an outside program to filter your input.

Matlab - Error using save Cannot create '_' because '_____' does not exist

I have some data in a cell array,
data2={[50,1;49,1;26,1];...
[36,2;12,2;37,2;24,2;47.3,2];}
and names in another cell array,
names2={'xxx/01-ab-07c-0fD3/0';'xxx/01-ab-07s-0fD3/6';}
I want to extract a subset of the data,
data2_subset=data2{1,:}(:,1);
then a temporary file name,
tempname2=char(names2(2));
an save the subset to a text file with
save (tempname2, 'data2_subset', '-ASCII');
But I get this error message: _
Error using save
Cannot create '6' because 'xxx/01-ab-07s-0fD3' does not exist.
To try to understand what is happening, I created a mock dataset with simpler names:
names={'12-05';'14-03'};
data={[50,1;29,1;25,1];[35,2;22,2;16,2;38,2];[40,3;32,3;10,3;44,3;43,3];};
data_subset=data{1,:}(:,1);
tempname=char(names(2));
save (tempname, 'data_subset', '-ASCII');
in which case the save command works properly.
Unfortunately I still do not understand what the problem is in the first case. Any suggestions as to what is happening, and of possible solutions?
MATLAB is interpreting the the forward slashes (/) as directory separators and 6 as the intended file name (your second example doesn't have this slash problem).
Since the relative directory tree xxx/01-ab-07s-0fD3/ doesn't exist, MATLAB can't create the file.
To solve the problem, you can either create the directories beforehand using mkdir():
>> pieces = strsplit(tempname2,'/');
>> mkdir(pieces{1:2});
>> save(tempname2, 'data2_subset', '-ASCII');
or replace the / with some other benign symbol like _:
>> tempname3= strrep(tempname2,'/','_');
>> save (tempname3, 'data2_subset', '-ASCII');
(which works for me).

postgresql how to have COPY interpret formatted numeric fields automatically?

I have an input CSV file containing something like:
SD-32MM-1001,"100.00",4/11/2012
SD-32MM-1001,"1,000.00",4/12/2012
I was trying to COPY import that into a postgresql table(varchar,float8,date) and ran into an error:
# copy foo from '/tmp/foo.csv' with header csv;
ERROR: invalid input syntax for type double precision: "1,000.00"
Time: 1.251 ms
Aside from preprocessing the input file, is there some setting in PG that will have it read a file like the one above and convert to numeric form in COPY? Something other than COPY?
If preprocessing is required, can it be set as part of the COPY command? (Not the psql \copy)?
Thanks a lot.
The option to pre processing is to first copy to a temporary table as text. From there insert into the definitive table using the to_number function:
select to_number('1,000.00', 'FM000,009.99')::double precision;
It's an odd CSV file that surrounds numeric values with double quotes, but leaves values like SD-32MM-1001 unquoted. In fact, I'm not sure I've ever seen a CSV file like that.
If I were in your shoes, I'd try copy against a file formatted like this.
"SD-32MM-1001",100.00,4/11/2012
"SD-32MM-1001",1000.00,4/12/2012
Note that numbers have no commas. I was able to import that file successfully with
copy test from '/fullpath/test.dat' with csv
I think your best bet is to get better formatted output from your source.