Is there a difference in performance between set/save when saving columns to tables? - kdb

I have a small utility that checks for new columns for an intraday hdb and adds new columns.
At the moment I am using :
.[set;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
where path is :
`:path_to_hdb/2022.03.31/table01/newDummyThree
and
?[data;();();cls] // just an exec statement
Would it make any difference to use save instead:
.[save;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]

Yes. If you are adding entire columns to a table then you might want to store it splayed, i.e. as a directory of column files rather than as a single table file. This means using set rather than save.
https://code.kx.com/q/kb/splayed-tables/
But test actual example updates.

As mentioned in the documentation for save:
Use set instead to save
a variable to a file of a different name
local data
So set has the advantage of not requiring a global and you can name the file a different name to the name of your in-memory global variable.
There is no difference in how they serialise/write the data. In fact, save uses set under the covers anyway:
q)save
k){$[1=#p:`\:*|`\:x:-1!x;set[x;. *p]; x 0:.h.tx[p 1]#.*p]}'
By the way - you can't use save in the way that you've suggested in your post. save takes a symbol as input and this symbol is the symbol name of your global variable containing the data you want to write.

Related

How do I prevent users to use thousands separator in FileMaker Pro?

In FileMaker Pro, when using number field, the user can choose to use a thousand separator or not. For example, if I have a database with a field for the price of an item, the user can either enter 1,000 or 1000.
I am using my database to generate an XML file that needs to be uploaded. The thing is, that my XML scheme dictates that only a value of 1000 is allowed and not 1,000. Therefore, I want to either automatically remove the comma, or (my preference in this case) alert the user when trying to enter a value with a thousand separator.
What I tried is the following.
For the field, I am setting Validation options. For example:
Require Strict data type: Numeric Only
Validated by calculation: Position ( Self ; ","; 1 ; 1 ) = 0
Validated by calculation: Self = Substitue ( Self, ",", "")
Auto-enter calculation: Filter( Self ; "0123456789." )
Unfortunately, none of these work. As the field is defined as a number (and I want to keep it like this, as I am also performing calculations based on this number), the Position function and the Substitute function apparently ignore the thousand separator!
EDIT:
Note that I am generating my XML by concatenating a string, for example:
"<Products><Product><Name>" & Name & "</Name><Price>" & Price & "</Price></Product></Product>"
The reason is that what I am exporting is dependent on the values in my database. Therefore, I am not using the [File][Export records...] function.
Auto-enter calculation will work, but you need to uncheck the box "Do not replace existing value of field" (which is checked by default).
I'd suggest using the calculation GetAsNumber(self) as the auto-enter calc. If it should only contain integers, wrap that in a call to Int()
I am using my database to generate an XML file that needs to be uploaded. The thing is, that my XML scheme dictates that only a value of 1000 is allowed and not 1,000.
If this is only a problem when you export, why not handle it when exporting?
If you are exporting as XML using XSLT, you can add an instruction to
your stylesheet to remove the comma from all number fields;
Alternatively, you can export from a layout where the field is
formatted to display without the comma and select the Apply current's layout data formatting to exported data option when
exporting.
Added:
Perhaps I should have clarified. I am not using the export function to generate the XML as there is some logic involved in how the XML should be formatted (dependent on the data that I want to export). What I do instead is that I make a string where I combine XML-tags and actual values from the database.
IMHO, you're making a mistake by not taking advantage of the built-in XML/XSLT export option. Any imaginable logic can be implemented this way, without burdening your solution with the fragile task of creating a valid XML.
In any case, if you're using the field in a calculation, you can replace all references to it with:
GetAsNumber (YourField )
to get an unformatted, numeric-only, value.
Your question puzzles me. As far as I know, FileMaker does not store the thousands separator, but rather offers it only as a display option.
That's also why those functions can't find it.
Are you sure you are exporting the raw data and not a "formatted as layout" variant?

Matlab loop: ignores variable definition when loading something from directories

I am new to MATLAB and try to run a loop within a loop. I define a variable ID beforehand, for example ID={'100'}. In my loop, I then want to go to the ID's directory and then load the matfile in there. However, whenever I load the matfile, suddenly the ID definition gets overridden by all possible IDs (all folders in the directory where also the ID 100 is). Here is my code - I also tried fullfile, but no luck so far:
ID={'100'}
for subno=1:length(ID) % first loop
try
for sessno=1:length(Session) % second loop, for each ID there are two sessions
subj_name = ([ID{subno} '_' Session{sessno} '_vectors_SC.mat']);
cd(['C:\' ID{subno} '\' Session{sessno}]);
load(subj_vec_name) % the problem occurs here, when loading, not before
end
end
end
When I then check the length of the IDs, it is now not 1 (one ID, namely 100), but there are all possible IDs within the directory where 100 also lies, and the loop iterates again for all other possible IDs (although it should stop after ID 100).
You should always specify an output to load to prevent over-writing variables in your workspaces and polluting your workspace with all of the contents of the file. There is an extensive discussion of some of the strange potential side-effects of not doing this here.
Chances are, you have a variable named ID inside of the .mat file and it over-writes the ID variable in your workspace. This can be avoided using the output of load.
The output to load will contain a struct which can be used to access your data.
data = load(subj_vec_name);
%// Access variables from file
id_from_file = data.ID;
%// Can still access ID from your workspace!
ID
Side Note
It is generally not ideal to change directories to access data. This is because if a user runs your script, they may start in a directory they want to be in but when your program returns, it dumps them somewhere unexpected.
Instead, you can use fullfile to construct a path to the file and not have to change folders. This also allows your code to work on both *nix and Windows systems.
subj_name = ([ID{subno} '_' Session{sessno} '_vectors_SC.mat']);
%// Construct the file path
filepath = fullfile('C:', ID{subno}, Session{sessno}, subj_name);
%// Load the data without changing directories
data = load(filepath);
With the command load(subj_vec_name) you are loading the complete mat file located there. If this mat file contains the variable "ID" it will overwrite your initial ID.
This should not cause your outer for-loop to execute more than once. The vector 1:length(ID) is created by the initial for-loop and should not get overwritten by subsequent changes to length(ID).
If you insert disp(ID) before and after the load command and post the output it might be easier to help.

Talend How To Pass Last Modified File Into TFileInputDelimited?

I have searched all over, and read this post.
But it doesn't seem complete and doesn't work.
The situation: I need to get the last modified file from a directory on the local machine. I then need to pass that file into the fileinputdelimited component.
I currently have:
tfilelist --> iterate --> titeratetoflow --> tsamplerow
-->tflowtoiterate -> tinpufiledelimited ---> tlogrow (just to make sure its pulling the right file)
But it doesn't work. I have configured it. so that titeratetoflow has a column called
"FileName" with "((String)globalMap.get("CURRENT_FILE"))" as the value,
"FileDirectory" with ((String)globalMap.get("CURRENT_FILEDIRECTORY")) as value, and
"FileAndDirectory" with ((String)globalMap.get("CURRENT_FILEPATH")) as value.
The tsamplerow is limited to "1".
The tiflowtoiterate is set so that
"FileNameOnly" is value of "FileName"
"FileDirectoryOnly" is "FileDirectory" and
"FilePathComplete" is "FileAndDirectory"
In the File location field of the tinputfiledelimited, I have "((String)globalMap.get("FilePathComplete"))"
When it runs I get an error saying cannot find file or path. If I cut out the fileinput component and have it send straight to the tlogrow, it shows a single line of blank entry.
Any ideas?
I'm not sure if you've just slightly misconfigured the job here but it seems to work fine for me.
Here's a few screenshots showing my job design:
The only thing I can think of just by looking at your post is that you might have slightly messed up the key value pair combinations in the tFlowToIterate. I tend to find that the default settings there work fine pretty much all of the time and it makes it a little more obvious what it's doing as well.
EDIT: Actually, it looks like you might be using the wrong values in your tIterateToFlow. The tFileList will throw the values for the file paths etc in to the global map but it will preface it with the unique component name. If you hit ctrl+space in the value window it should prompt you with a list of available values (these are also specified in the "Outline" tab of the studio). It typically makes an implicit conversion to String but for this you will need to explicitly convert it so use .toString() instead of (String).
Another way to get last modified file is as below
tFileList(sorted DESC by file modified date) ------> tFixedFlowInput (schema - filename, filenumber) ----->tHashOutput
here in tFixedFlowInput
filename = file(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")+"/"+(String)globalMap.get("tFileList_1_CURRENT_FILE")
filenumber = (Integer)globalMap.get("tFileList_1_NB_FILE")
What above will accomplish is get list of all files in the directory with their number/rank - where the file last modified will have file number =1 and next to that will have 2...and so on.
Now on SubJobOK of above tFileList you can have tHashInput which will read from above tHashOutput and filter only row where filenumber==1 - which means the last modified file.
tHashInput (link to tHashoutput) ---->tFilterRow(filenumber==1)------>tLogRow
One reason why you are getting null is probably you have used globalMap.get("CURRENT_FILEPATH) instead of globalMap.get("tFileList_1_CURRENT_FILEPATH")
The Simple Solution for above problem could be as below:
tFileList(sorted ASC by file modified date)--> tIterateToFlow --> tJava( just to end the subjob).
Then on
subjob ok --> tfileinput ( use (String)globalMap.get("tFileList_1_CURRENT_FILE") or (String)globalMap.get("tFileList_1_CURRENT_FILEPATH") as a file name/file path)
Explanation:
Since tFileList iterates all the files in ASC order, it will always have Latest file name stored in globalMap for the last iteration. The list is only iterated till tIterateToFlow hence after this component (String)globalMap.get("tFileList_1_CURRENT_FILE") will always give the last file name from the iterated list, which is the latest file in out case.
Main Flow :
Component View:

Matlab - Name variable same as File name?

I'm building a facial recognition program and loading a whole bunch of images which will be used for training.
Currently, I'm reading my images in using double loops, iterating through subfolders in a folder.
Is there any way, as it iterates, that the images file name can be used before the image is read and stored?
eg. I have an image person001.jpg. How can you retrieve that name (person001) then read the image in like: person001 = imread('next iteration of loop which happens to be person001');
Thanks in advance.
I strongly recommend not to use unstructured variables. First it's very difficult to do operations like "iterate over all images", second you can get strange problems covering a function name with a variable name. Instead i would use a struct with dynamic field names or a map. A solution with a Map propably allows all possible file names.
Dynamic field names:
dirlisting=dir('.jpg');
for imageIX=1:numel(dirlisting)
%cut of extension:
[~,name,~]=fileparts(dirlisting(imageIX).name);
allImages.(name)=imread(dirlisting(imageIX).name);
end
You can access the images in a struct with allImages.person001 or allImages.(x)
Map:
allImages=containers.Map
dirlisting=dir('.jpg');
for imageIX=1:numel(dirlisting)
%cut of extension:
[~,name,~]=fileparts(dirlisting(imageIX).name);
allImages(name)=imread(dirlisting(imageIX).name);
end
You can access the images in a Map using allImages('person001'). Using a Map there is no need to cut of the file extension.

How to load two sheets at once with xlsread

I currently have this code to load a set of prompts to assign the appropriate data:
full=xlsread(input('File Name for Full data?\n'),input('Sheet Name for full?\n'));
empty=xlsread(input('File Name for Empty data?\n'),input('Sheet Name for empty?\n'));
xx1=full(:,1);
yy1=full(:,2);
ff1=full(:,3);
xx2=empty(:,1);
yy2=empty(:,2);
ff2=empty(:,3);
However, since the full and empty sheets are both in one spreadsheet, I would like to make it so that there is only one prompt for the file and then a prompt for each sheet, so something like:
everything=xlsread(input('File Name for Full data?\n'),input('Sheet Name for full?\n'),input('Sheet Name for empty?\n');
xx1=everything(:,1);
yy1=everything(:,2);
ff1=everything(:,3);
xx2=everything(:,4);
yy2=everything(:,5);
ff2=everything(:,6);
What can I do to make this work out?
Just make the input calls before you use xlsread
filename = input('File Name for Full data?\n')
full = input('Sheet Name for full?\n')
empty = input('Sheet Name for empty?\n')
full=xlsread(filename, full);
empty=xlsread(filename, empty);
xx1=full(:,1);
yy1=full(:,2);
ff1=full(:,3);
xx2=empty(:,1);
yy2=empty(:,2);
ff2=empty(:,3);
Though xlsread does not support this directly, you can create a wrapper that will call xlsread in the right way.
Basically just ask the input arguments that you need, and based on them make a call to xlsread.
It is indeed a weakness that you cannot read multiple sheets simultaneously, but xlsread is just a very basic command. Personally I think it is a greater weakness that you can only read out contiguous ranges.