I built a data loader prototype that saves CSV into splayed tables. The workflow is as follows:
Create schema the first time e.g. volatilitysurface table:
volatilitysurface::([date:`datetime$(); ccypair:`symbol$()] atm_convention:`symbol$(); premium_included:`boolean$(); smile_type:`symbol$(); vs_type:`symbol$(); delta_ratio:`float$(); delta_setting:`float$(); wing_extrapolation:`float$(); spread_type:`symbol$());
For every file in the rawdata folder import it:
myfiles:#[system;"dir /b /o:gn ",string `$getenv[`KDBRAWDATA],"*.volatilitysurface.csv 2> nul";()];
if[myfiles~();.lg.o[`load;"no volatilitysurface files found!"];:0N];
.lg.o[`load;"loading data files ..."];
/ load each file
{
mypath:"" sv (string `$getenv[`KDBRAWDATA];x);
.lg.o[`load;"loading file name '",mypath,"' ..."];
myfile:hsym`$mypath;
tmp1:select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from ("ZSSSSSFFFS";enlist ",")0:myfile;
`volatilitysurface upsert tmp1;
} #/: myfiles;
delete tmp1 from `.;
.Q.gc[];
.lg.o[`done;"loading volatilitysurface data done"];
.lg.o[`save;"saving volatilitysurface schema to ",string afolder];
volatilitysurface::0!volatilitysurface;
.Q.dpft[afolder;`;`ccypair;`volatilitysurface];
.lg.o[`cleanup;"removing volatilitysurface from memory"];
delete volatilitysurface from `.;
.Q.gc[];
.lg.o[`done;"saving volatilitysurface schema done"];
This works perfectly. I use .Q.gc[]; frequently to avoid hitting the wsfull. When new CSV files are available I open the existing schema, upsert into it and save it again effectively overwriting the existing HDB file system.
Open schema:
.lg.o[`open;"tables already exists, opening the schema ..."];
#[system;"l ",(string afolder) _ 0;{.lg.e[`open;"failed to load hdb directory: ", x]; 'x}];
/ Re-create table index
volatilitysurface::`date`ccypair xkey select from volatilitysurface;
Re-run step #2 to append new CSV files into the existing volatilitysurfacetable, it upserts the first CSV perfectly but the second CSV fails with:
error: `cast
I debug to the point of the error and to double-check I see that the metadata of tmp1 and volatilitysurface are perfectly the same. Any ideas why this is happening? I get the same issue with any other table. I have tried cleaning the keys from the table after every upsert but doesn't help i.e.
volatilitysurface::0!volatilitysurface;
volatilitysurface::`date`ccypair xkey volatilitysurface;
And the metadata comparison at the point of the cast error:
meta tmp1
c | t f a
------------------| -----
date | z
ccypair | s
atm_convention | s
premium_included | b
smile_type | s
vs_type | s
delta_ratio | f
delta_setting | f
wing_extrapolation| f
spread_type | s
meta volatilitysurface
c | t f a
------------------| -----
date | z
ccypair | s p
atm_convention | s
premium_included | b
smile_type | s
vs_type | s
delta_ratio | f
delta_setting | f
wing_extrapolation| f
spread_type | s
UPDATE Using the input of the answer below I tried using Torq's .loader.loadallfiles function like this (it doesn't fail but nothing happens either, the table is not created in memory and the data is not written to the database):
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;`:hdb; {[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); `:rawdata]
UDPATE2 This is the output I get from TorQ:
2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|**** LOADING :rawdata/20171102_113420.disccurve.csv ****
2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|reading in data chunk
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Read 10000 rows
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|processing data
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Enumerating
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4525 rows to :hdb/2017.09.12/volatilitysurface/
2017.11.20D08:46:12.581819000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4744 rows to :hdb/2017.09.13/volatilitysurface/
2017.11.20D08:46:12.659823000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 731 rows to :hdb/2017.09.14/volatilitysurface/
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|init|retrieving sort settings from :C:/Dev/torq//config/sort.csv
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sort|sorting the volatilitysurface table
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sortfunction|sorting :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time
2017.11.20D08:46:12.753428000|wsp18497wn|dataloader|dataloader1|ERR|sortfunction|failed to sort :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time. The error was: hdb/2017.09.
I get the following error sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters where is this sorttab documented? does it use the table PK by default?
UPDATE3 Ok fixed UPDATE2 out by providing a non-default sort.csv under my config folder:
tabname,att,column,sort
default,p,sym,1
default,,time,1
volatilitysurface,,date,1
volatilitysurface,,ccypair,1
But now I see that if I call the function multiple times on the same files, it simply appends duplicated data instead of upserting it.
UPDATE4 Still not there yet ... assuming I can check to make sure that no duplicate file is used. When I load and then start the database I get some structure back that ressembles some sort of dictionary and not a table.
2017.10.31| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.01| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.02| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.03| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
sym | `AUDNOK`AUDCNH`AUDJPY`AUDHKD`AUDCHF`AUDSGD`AUDCAD`AUDDKK`CADSGD`C..
Note that date is actually datetime Z and not just date. My full and latest version of the function invocation is:
target:hsym `$("" sv ("./";getenv[`KDBHDB];"/volatilitysurface"));
rawdatadir:hsym `$getenv[`KDBRAWDATA];
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;target;`date;{[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); rawdatadir];
I'm going to add a second answer here to try and tackle the question about using TorQ's data loader.
I'd like to clarify what output you are getting after running this function? There should be some logging messages output, can you post these? For example when I run the function:
jmcmurray#homer ~/deploy/TorQ (master) $ q torq.q -procname loader -proctype loader -debug
<torq startup messages removed>
q).loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(c;"TSSFJFFJJBS";enlist",";`quotes;`:testdb;`date;{[p;t] select date:.z.d,time:TIME,sym:INSTRUMENT,BID,ASK from t});`:csvtest]
2017.11.17D15:03:20.312336000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140421.csv ****
2017.11.17D15:03:20.319110000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
2017.11.17D15:03:20.339414000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
2017.11.17D15:03:20.339463000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
2017.11.17D15:03:20.339519000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
2017.11.17D15:03:20.340061000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.341669000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140422.csv ****
2017.11.17D15:03:20.349606000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
2017.11.17D15:03:20.370793000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
2017.11.17D15:03:20.370858000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
2017.11.17D15:03:20.370911000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
2017.11.17D15:03:20.371441000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.460118000|homer.aquaq.co.uk|loader|loader|INF|init|retrieving sort settings from :/home/jmcmurray/deploy/TorQ/config/sort.csv
2017.11.17D15:03:20.466690000|homer.aquaq.co.uk|loader|loader|INF|sort|sorting the quotes table
2017.11.17D15:03:20.466763000|homer.aquaq.co.uk|loader|loader|INF|sorttab|No sort parameters have been specified for : quotes. Using default parameters
2017.11.17D15:03:20.466820000|homer.aquaq.co.uk|loader|loader|INF|sortfunction|sorting :testdb/2017.11.17/quotes/ by these columns : sym, time
2017.11.17D15:03:20.527216000|homer.aquaq.co.uk|loader|loader|INF|applyattr|applying p attr to the sym column in :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.535095000|homer.aquaq.co.uk|loader|loader|INF|sort|finished sorting the quotes table
After all this, I can run \l testdb and there is a table called "quotes" containing my loaded data
If you can post logging messages like these, it could be helpful to see what's going on.
UPDATE
"But now I see that if I call the function multiple times on the same files, it simply appends duplicated data instead of upserting it."
If I'm understanding the problem correctly, it sounds like you likely shouldn't call the function multiple times on the same files. Another process within TorQ could be useful here, the "file alerter". This process will monitor a directory for new & updated files, and can call a function on any that appear (so you can have it call the loader function with every new file automatically). It has a number of options such as moving files after processing (so you can "archive" loaded CSVs)
Note that the file alerter requires that a function take exactly two parameters - the directory & the file name. This effectively means you will need a "wrapper" function around the loader function, which takes a dictionary & a directory. I don't think TorQ includes a function similar to .loader.loadallfiles for a single file, so it might be necessary to copy the target file to a temporary directory, run loadallfiles on that directory and then delete the file from there before loading the next.
`cast error refers to a value not being enumerated
I can't see any enumeration going on here, splayed tables on disk need to have symbol columns enumerated. For example, this can be done with the following line, before calling .Q.dpft
volatilitysurface:.Q.en[afolder;volatilitysurface];
You may like to consider using an example CSV loader for loading your data. One such example is included in TorQ, the KDB framework developed by AquaQ Analytics (as a disclaimer, I work for AquaQ)
The framework is available (free of charge) here: https://github.com/AquaQAnalytics/TorQ
The specific component you will likely be interested in is dataloader.q and is documented here: http://aquaqanalytics.github.io/TorQ/utilities/#dataloaderq
This script will handle everything necessary, loading all files, enumerating, sorting on disk, applying attributes etc. as well as using .Q.fsn to prevent running out of memory
Related
I have a small utility that checks for new columns for an intraday hdb and adds new columns.
At the moment I am using :
.[set;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
where path is :
`:path_to_hdb/2022.03.31/table01/newDummyThree
and
?[data;();();cls] // just an exec statement
Would it make any difference to use save instead:
.[save;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
Yes. If you are adding entire columns to a table then you might want to store it splayed, i.e. as a directory of column files rather than as a single table file. This means using set rather than save.
https://code.kx.com/q/kb/splayed-tables/
But test actual example updates.
As mentioned in the documentation for save:
Use set instead to save
a variable to a file of a different name
local data
So set has the advantage of not requiring a global and you can name the file a different name to the name of your in-memory global variable.
There is no difference in how they serialise/write the data. In fact, save uses set under the covers anyway:
q)save
k){$[1=#p:`\:*|`\:x:-1!x;set[x;. *p]; x 0:.h.tx[p 1]#.*p]}'
By the way - you can't use save in the way that you've suggested in your post. save takes a symbol as input and this symbol is the symbol name of your global variable containing the data you want to write.
Lately, I am facing problems importing from CSV files. I am using
MariaDB : 10.3.32-MariaDB-0ubuntu0.20.04.1
On Ubuntu : Ubuntu 20.04.3 LTS
I am using this command
LOAD DATA LOCAL INFILE '/path_to_file/data.csv' INTO TABLE tab
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
After searching and trying I found that I can only load File from tmp folder. i.e.
SELECT load_file('/tmp/data.csv');
But it didn't work on other paths.
And secondly, I found that even If the CSV file is present in tmp folder; If it contains a lot of fields then again MariaDB would fail to load. The main problem is that LOAD DATA command does not give any type of error or even warning; except if the file does not exist. Other than that nothing is shown. And nothing is imported.
I only succeeded to import very simple CSV from tmp folder
What I Suspected is that
MariaDB had been updated and in this new version there are some flags or configuration options that prohibit MariaDB from importing CSV files from other than tmp folder and
MariaDB would fail to load CSV because of some unknown problem, Maybe some special character (which I made sure nothing is in there).
There must be some option that makes MariaDB produce verbose error and warning log. Which I didn't know. Except for /var/log/mysql/error.log file. which does not contain any info containing failed to load CSV.
Any help would be appreciated.
Below is the first record of CSV. Actual CSV contains 49 fields and 1862 records (but the below sample contains only one record)
"S.No","Training Code","Intervention Type (NRM/Emp. Skill
Training)","Training Title/Activity","Start Month","Ending
Month","No. of Days Trainings","Start Date Training","End Date
Training","Name of Person","Father
Name","CNIC","Gender","Age","Education","Skill Level","CO Ref
#","COName","Village Name","Tehsil Name","District","Type of Farm
production","Total Land (if applicable)","Total Trees (if
applicable)","Sheeps/goats","Buffalo/Cows","Profession","Person's
Income Emp. Skill (Pre-Intervention)","Income from NRM (Pre-
Intervention)","HH Other Sources of Income","Total HH Income","Type
of Support provided","Tool Kit/Inputs Received or Not","Date of
Tool Kit receiving","Other intervention , like exposure market
trial, followup support, Advance Training etc","Production (Pre-
Intervention)","Production (Post-Intervention)","Change in
Production","Unit (kg, Maund,Liter, etc)","Income gain from
production (Post-Intervention)","Change in Income (NRM)","Income
gain by Employment -Emp.Skill (Post Intervention)","Change in
Income (Emp. Skill)","Outcome Trend","Employment/Self-
Employment/Other","Outcome Result","Remarks","Beneficiaries Contact
No.","Activity Location"
1,"AUP-0001","NRM","Dates Processing &
Packaging","Sep/2018","Sep/2018",2,"25/Sep/2018","26/Sep/2018",
"Some name","Barkat Gul",1234567891234,"Male",34,"Primary","Semi-
Skilled","AUP-NWD-073","MCO Haider Khel Welfare Committee","Haider
Khel","Mir Ali","North
Waziristan","Dates",,20,,,"Farming",,5000,"Farming",5000,"Training,
Packaging Boxes","Yes","10/10/2018",,180,320,140,"Kg",8000,3000,,,
"Positive","Self Employed","Value addition to the end product
(Packaging increase the Price per KG to 25%)",,,"Field NW"
BTW am NON-Technical :-O
While using Mariadb version 10.5.13-3.12.1 am able to import CSV files into Tables have set up.
Except with dates,
https://dba.stackexchange.com/questions/283966/tradedate-import-tinytext-how-to-show-date-format-yyyymmdd-of-20210111-or-2021?noredirect=1#comment555600_283966
There am still struggling to import text-format-dates AND to convert text-dates into the (YYYY-MM-DD) date format.
end.
I would like to take an existing table that I deserialized from a binary file or obtained from a remote process and generate code that will create an empty copy of the table so that I can have a human-readable representation of the schema that I can use to easily re-create the table.
For example, assume I have a trade table in memory and I want to generate code that will return an empty table of the same schema.
q)show meta trade
c | t f a
-----| -----
time | n
sym | s g
price| f
size | i
stop | b
cond | c
ex | c
I'm aware I can obtain an empty copy of trade by running 0#trade. However, I'd like to have a general function (let's say it's called getSchema) that will behave something like this:
q) getSchema trade
"trade:([]time:`timespan$(); sym:`g#`symbol$(); price:`float$(); size:`int$(); stop:`boolean$(); cond:`char$(); ex:`char$())"
I think it would be straightforward to implement this by processing the result of meta trade, but I was wondering if there was was a more straightforward or publicly available implementation of this function. Thanks.
I haven't seen such function available in public. An example is below, but it does not cover all cases (this is left for an exercise)
getSchema: {
typeMapping: "nsfibc"!("timespan";"symbol";"float";"int";"boolean";"char");
c: exec c from meta x;
t: exec t from meta x;
statement: string[x],":([]";
statement,: "; " sv string[c],'": `",/:(typeMapping#t),\:"$()";
statement,: ")";
statement
};
//Expects table's name as symbol
getSchema`trade
What is not covered:
attributes: Attribute should go in the middle of "; " sv string[c],'": _code_for_attribute_ `",/:(typeMapping#t),\:"$()" statement
types: typeMapping must be enriched to cover the rest of Q types
keys. If table is keyed, then keyed columns are listed inside square brackets t: ([keyed_columns] other_columns)
foreign keys. To be fair they are seldom in use, so I would ignore them
The following should work nicely, however it will only render types atomic types, which will conform to how tables are normally defined. This method just utilizes the string representation of the empty schema that you can achieve via -3!
This should handle keyed tables and attributes but not foreign key.
getSchema:{[x]
/ Render the types with -3!
typs:ssr[-3!value flip 0!0#get x;"\"\"";"`char$()"];
/ Append the column names to the types
colNameType:(string[cols 0!get x],\:":"),'";" vs 1_-1_typs;
/ Ensure the correct keys are shown, add in ;
colnametyp:#[colNameType;count[keys x] - 1;,;"]"],\:";";
/ Combine
raze string[x],":([",(-1_raze colnametyp),")"
}
q)trade:2!([]time:`timespan$(); sym:`g#`symbol$(); price:`float$(); size:`int$(); stop:`boolean$(); cond:`char$(); ex:`char$());
q)getSchema `trade
"trade:([time:`timespan$();sym:`g#`symbol$()];price:`float$();size:`int$();stop:`boolean$();cond:`char$();ex:`char$())"
Some context before the question.
Imagine file FileA having around 50 fields of different types. Instead of all programs using the file, I tried having a service program, so the file could only be accessed by that service program. The programs calling the service would then receive a DataStructure based on the file structure, as an ExtName. I use SQL to recover the information, so, basically, the procedure would go like this :
Datastructure shared by service program :
D FileADS E DS ExtName(FileA) Qualified
Procedure called by programs :
P getFileADS B Export
D PI N
D PI_IDKey 9B 0 Const
D PO_DS LikeDS(FileADS)
D LocalDS E DS ExtName(FileA) Qualified
D NullInd S 5i 0 Array(50) <-- Since 50 fields in fileA
//Code
Clear LocalDS;
Clear PO_DS;
exec sql
SELECT *
INTO :LocalDS :nullind
FROM FileA
WHERE FileA.ID = :PI_IDKey;
If SqlCod <> 0;
Return *Off;
EndIf;
PO_DS = LocalDS;
Return *On;
P getFileADS E
So, that procedure will return a datastructure filled with a record from FileA if it finds it.
Now my question : Is there any way I can assign the %nullind(field) = *On without specifying EACH 50 fields of my file?
Something like a loop
i = 1;
DoW (i <= 50);
if nullind(i) = -1;
%nullind(datastructure.field) = *On;
endif;
i++;
EndDo;
Cause let's face it, it'd be a pain to look each fields of each file every time.
I know a simple chain(n) could do the trick
chain(n) PI_IDKey FileA FileADS;
but I really was looking to do it with SQL.
Thank you for your advices!
OS Version : 7.1
First, you'll be better off in the long run by eliminating SELECT * and supplying a SELECT list of the 50 field names.
Next, consider these two web pages -- Meaningful Names for Null Indicators and Embedded SQL and null indicators. The first shows an example of assigning names to each null indicator to match the associated field names. It's just a matter of declaring a based DS with names, based on the address of your null indicator array. The second points out how a null indicator array can be larger than needed, so future database changes won't affect results. (Bear in mind that the page shows a null array of 1000 elements, and the memory is actually relatively tiny even at that size. You can declare it smaller if you think it's necessary for some reason.)
You're creating a proc that you'll only write once. It's not worth saving the effort of listing the 50 fields. Maybe if you had many programs using this proc and you had to create the list each time it'd be a slight help to use SELECT *, but even then it's not a great idea.
A matching template DS for the 50 data fields can be defined in the /COPY member that will hold the proc prototype. The template DS will be available in any program that brings the proc prototype in. Any program that needs to call the proc can simply specify LIKEDS referencing the template to define its version in memory. The template DS should probably include the QUALIFIED keyword, and programs would then use their own DS names as the qualifying prefix. The null indicator array can be handled similarly.
However, it's not completely clear what your actual question is. You show an example loop and ask if it'll work, but you don't say if you had a problem with it. It's an array, so a loop can be used much like you show. But it depends on what you're actually trying to accomplish with it.
for old school rpg just include the nulls in the data structure populated with the select statement.
select col1, ifnull(col1), col2, ifnull(col2), etc. into :dsfilewithnull where f.id = :id;
for old school rpg that can't handle nulls remove them with the select statement.
select coalesce(col1,0), coalesce(col2,' '), coalesce(col3, :lowdate) into :dsfile where f.id = :id;
The second method would be easier to use in a legacy environment.
pass the key by value to the procedure so you can use it like a built in function.
One answer to your question would be to make the array part of a data structure, and assign *all'0' to the data structure.
dcl-ds nullIndDs;
nullInd Ind Dim(50);
end-ds;
nullIndDs = *all'0';
The answer by jmarkmurphy is an example of assigning all zeros to an array of indicators. For the example that you show in your question, you can do it this way:
D NullInd S 5i 0 dim(50)
/free
NullInd(*) = 1 ;
Nullind(*) = 0 ;
*inlr = *on ;
return ;
/end-free
That's a complete program that you can compile and test. Run it in debug and stop at the first statement. Display NullInd to see the initial value of its elements. Step through the first statement and display it again to see how the elements changed. Step through the next statement to see how things changed again.
As for "how to do it in SQL", that part doesn't make sense. SQL sets the values automatically when you FETCH a row. Other than that, the array is used by the host language (RPG in this case) to communicate values back to SQL. When a SQL statement runs, it again automatically uses whatever values were set. So, it either is used automatically by SQL for input or output, or is set by your host language statements. There is nothing useful that you can do 'in SQL' with that array.
I have searched all over, and read this post.
But it doesn't seem complete and doesn't work.
The situation: I need to get the last modified file from a directory on the local machine. I then need to pass that file into the fileinputdelimited component.
I currently have:
tfilelist --> iterate --> titeratetoflow --> tsamplerow
-->tflowtoiterate -> tinpufiledelimited ---> tlogrow (just to make sure its pulling the right file)
But it doesn't work. I have configured it. so that titeratetoflow has a column called
"FileName" with "((String)globalMap.get("CURRENT_FILE"))" as the value,
"FileDirectory" with ((String)globalMap.get("CURRENT_FILEDIRECTORY")) as value, and
"FileAndDirectory" with ((String)globalMap.get("CURRENT_FILEPATH")) as value.
The tsamplerow is limited to "1".
The tiflowtoiterate is set so that
"FileNameOnly" is value of "FileName"
"FileDirectoryOnly" is "FileDirectory" and
"FilePathComplete" is "FileAndDirectory"
In the File location field of the tinputfiledelimited, I have "((String)globalMap.get("FilePathComplete"))"
When it runs I get an error saying cannot find file or path. If I cut out the fileinput component and have it send straight to the tlogrow, it shows a single line of blank entry.
Any ideas?
I'm not sure if you've just slightly misconfigured the job here but it seems to work fine for me.
Here's a few screenshots showing my job design:
The only thing I can think of just by looking at your post is that you might have slightly messed up the key value pair combinations in the tFlowToIterate. I tend to find that the default settings there work fine pretty much all of the time and it makes it a little more obvious what it's doing as well.
EDIT: Actually, it looks like you might be using the wrong values in your tIterateToFlow. The tFileList will throw the values for the file paths etc in to the global map but it will preface it with the unique component name. If you hit ctrl+space in the value window it should prompt you with a list of available values (these are also specified in the "Outline" tab of the studio). It typically makes an implicit conversion to String but for this you will need to explicitly convert it so use .toString() instead of (String).
Another way to get last modified file is as below
tFileList(sorted DESC by file modified date) ------> tFixedFlowInput (schema - filename, filenumber) ----->tHashOutput
here in tFixedFlowInput
filename = file(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")+"/"+(String)globalMap.get("tFileList_1_CURRENT_FILE")
filenumber = (Integer)globalMap.get("tFileList_1_NB_FILE")
What above will accomplish is get list of all files in the directory with their number/rank - where the file last modified will have file number =1 and next to that will have 2...and so on.
Now on SubJobOK of above tFileList you can have tHashInput which will read from above tHashOutput and filter only row where filenumber==1 - which means the last modified file.
tHashInput (link to tHashoutput) ---->tFilterRow(filenumber==1)------>tLogRow
One reason why you are getting null is probably you have used globalMap.get("CURRENT_FILEPATH) instead of globalMap.get("tFileList_1_CURRENT_FILEPATH")
The Simple Solution for above problem could be as below:
tFileList(sorted ASC by file modified date)--> tIterateToFlow --> tJava( just to end the subjob).
Then on
subjob ok --> tfileinput ( use (String)globalMap.get("tFileList_1_CURRENT_FILE") or (String)globalMap.get("tFileList_1_CURRENT_FILEPATH") as a file name/file path)
Explanation:
Since tFileList iterates all the files in ASC order, it will always have Latest file name stored in globalMap for the last iteration. The list is only iterated till tIterateToFlow hence after this component (String)globalMap.get("tFileList_1_CURRENT_FILE") will always give the last file name from the iterated list, which is the latest file in out case.
Main Flow :
Component View: