Just starting with Neuroph NN GUI. Trying to create a dataset by importing a .csv file. What's the file format supposed to be?
I have 3 inputs and 1 output so I assumed the format of the import file would be ..
1,2,3,4
6,7,8,9
But I get error 9, or 4 or 10 depending on what combination I try of newlines, commas etc.
Any help out there ?
many thanks,
john.
That's because you aren't counting with the output column. The lastest columns are for the output.
So, for example, if you have 10 inputs and 1 output, your file will need to have 11 columns.
I came here, because the Neurophy can't import CSVs with title line. Example of a data file that works for me:
1.0,1.0,1.0
1.0,2.0,2.0
1.0,3.0,3.0
1.0,4.0,4.0
1.0,5.0,5.0
1.0,6.0,6.0
1.0,7.0,7.0
1.0,8.0,8.0
1.0,9.0,9.0
1.0,10.0,10.0
2.0,1.0,2.0
2.0,2.0,4.0
2.0,3.0,6.0
2.0,4.0,8.0
2.0,5.0,10.0
2.0,6.0,12.0
2.0,7.0,14.0
2.0,8.0,16.0
2.0,9.0,18.0
2.0,10.0,20.0
Related
I am trying to save a csv generated from a table.
If I 'Export all as CSV' from QPAD the file is 22MB.
If I do `:path.csv 0: csv 0: table the file is 496MB.
The file contains same data.
I do have some columns which are list of dates, list of symbols which cause some issues when parsing to csv.
To get over that I use this {`$$[1=count x;string first x;`$" "sv string x]}
i.e. one of the cols is called allDates and looks like this:
someOtherCol
allDates
stackedSymCol
val1
, 2001.01.01
, `sym 1
val2
2001.01.01 2001.01.02
`sym 2`sym 3
Where is this massive difference in size coming from and how can I reduce the the size.
If I remove these 3 columns which are lists of lists, the file goes down significantly.
Doing an ungroup is not an option.
I think the important question here is why is QPAD capable to handle columns which are lists of lists of type 'D' 'S' etc and how I can achieve that without casting those columns to a space delimited string. This is what is causing my saved csv to be so massive.
ie. I can do an 'Export all to csv' from QPAD on this and it is 21MB :
but if I want to save it programatically, I need to change those allDates and DESK_NAME column and it goes up to 500MB
UPDATE: Thanks everyone. I did not know that QPAD is truncating data like that on exports. That is worrying.
These csvs will not be identical. qPad truncates nested lists(including strings). The csv exported directly from kdb will be complete.
Eg.
([]a:3#enlist til 1000;b:3#enlist til 1000)
The qPad csv export of this looks like this at the end: 30j, 31j ....
Based on the update to your answer it seems you are exporting the data shown in the screenshot which would not be the same as the data you are transforming to save to csv directly from q.
Based on the screenshot it is likely the csv files are not identical for at least 3 reasons:
QPad is truncating the nested dates at a certain length
QPad adds enlist to nested lists of length 1
QPad adds/keeps backticks before symbols
Example data comparison
Here is a minimal example that should highlight this:
q)example:{n:1 2 20;([]someOtherCol:3?10;allDates:n?\:.z.d;stackedSymCol:n?\:`3)}[]
q)example
someOtherCol allDates
stackedSymCol
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 ,2006.01.13
,`hfg
1 2008.04.06 2008.01.11
`nha`plc
4 2009.06.12 2016.01.24 2021.02.02 2018.09.02 2011.06.19 2022.09.26 2008.10.29 2010.03.11 2022.07.30 2012.09.06 2021.11.27 2017.11.24 2007.09.10 2012.11.27 2020.03.10 2003.07.02 2007.11.29 2010.07.18 2001.10.23 2000.11.07 `ifd`jgp`eln`kkb`ahm`cal`eni`idj`mod`omb`dkc`ogf`eaj`mbf`kdd`hip`gkg`eef`edi`jak
I have used 'Export All to CSV' to save to C:/q/qpad.csv.
I couldn't get your "razing" function to work as-is so I modified it slightly and used that to convert nested lists to strings and saved the file to csv.
q)f:{`$$[1=count x;string first x;" "sv string x]}
q)`:C:/q/q.csv 0: csv 0: update f'[allDates], f'[stackedSymCol] from example
Reading from both files and comparing the contents results in mismatched contents.
q)a:read0`:C:/q/q.csv
q)b:read0`:C:/q/qpad.csv
q)a~b
0b
Side note
Since kdb+ V4.0 2020.03.17 it is possible to save nested vectors to csv using .h.cd to prepare the text. The variable .h.d is used as the delimiter for sublist items.
q).h.d:" ";
q).h.cd example
"someOtherCol,allDates,stackedSymCol"
"8,2013.09.10,pii"
"6,2007.08.09 2012.12.30,hbg blg"
"8,2011.04.04 2020.08.21 2006.02.12 2005.01.15 2016.05.31 2015.01.03 2021.12.09 2022.03.26 2013.10.15 2001.10.29 2011.02.17 2010.03.28 2005.11.14 2003.08.16 2002.04.20 2004.08.07 2014.09.19 2000.05.24 2018.06.19 2017.08.14,cim pgm gha chp dio gfc beh mbo cfe kec jbn bjh eni obf agb dce gnk jif pci ppc"
q)`:somefile.csv 0: .h.cd example
CSV saved from q
Contents of the csv saved from q and the character count are shown in the example:
q)read0`:C:/q/q.csv
"someOtherCol,allDates,stackedSymCol"
"8,2013.09.10,pii"
"6,2007.08.09 2012.12.30,hbg blg"
"8,2011.04.04 2020.08.21 2006.02.12 2005.01.15 2016.05.31 2015.01.03 2021.12.09 2022.03.26 2013.10.15 2001.10.29 2011.02.17 2010.03.28 2005.11.14 2003.08.16 2002.04.20 2004.08.07 2014.09.19 2000.05.24 2018.06.19 2017.08.14,cim pgm gha chp dio gfc beh mbo cfe kec jbn bjh eni obf agb dce gnk jif pci ppc"
q)count raze read0`:C:/q/q.csv
383
CSV saved from QPad
Similarly the contents of the csv saved from QPad and the character count:
q)read0`:C:/q/qpad.csv
"someOtherCol,allDates,stackedSymCol"
"1,enlist 2006.01.13,enlist `hfg"
"1,2008.04.06 2008.01.11,`nha`plc"
"4,2009.06.12 2016.01.24 2021.02.02 2018.09.02 2011.06.19 2022.09.26 2008.10.29 2010.03.11 2022.07.30 2012.09.06 2021.11.27 2017.11.24 2007.09.10 2012.11.27 ...,`ifd`jgp`eln`kkb`ahm`cal`eni`idj`mod`omb`dkc`ogf`eaj`mbf`kdd`hip`gkg`eef`edi`jak"
q)count raze read0`:C:/q/qpad.csv
338
Conclusion
We can see from these examples the points outlined above. The dates are truncated at a certain length, enlist is added to nested lists of length 1, and backticks are kept before symbols.
The truncated dates could be the reason why the file you have exported from QPad is so much smaller. Based on your comments above the files are not identical, so this may be the reason.
TL;DR - Both files are created differently and that's why they differ.
I have a .csv file with the first column containing dates, a snippet of which looks like the following:
date,values
03/11/2020,1
03/12/2020,2
3/14/20,3
3/15/20,4
3/16/20,5
04/01/2020,6
I would like to import this data into Matlab (I think the best way would probably be using the readtable() function, see here). My goal is to bring the dates into Matlab as a datetime array. As you can see above, the problem is that the dates in the original .csv file are not consistently formatted. Some of them are in the format mm/dd/yyyy and some of them are mm/dd/yy.
Simply calling data = readtable('myfile.csv') on the .csv file results in the following, which is not correct:
'03/11/2020' 1
'03/12/2020' 2
'03/14/0020' 3
'03/15/0020' 4
'03/16/0020' 5
'04/01/2020' 6
Does anyone know a way to automatically account for this type of data in the import?
Thank you!
My version: Matlab R2017a
EDIT ---------------------------------------
Following the suggestion of Max, I have tried specifiying some of the input options for the read command using the following:
T = readtable('example.csv',...
'Format','%{dd/MM/yyyy}D %d',...
'Delimiter', ',',...
'HeaderLines', 0,...
'ReadVariableNames', true)
which results in:
date values
__________ ______
03/11/2020 1
03/12/2020 2
NaT 3
NaT 4
NaT 5
04/01/2020 6
and you can see that this is not working either.
If you are sure all the dates involved do not go back more than 100 years, you can easily apply the pivot method which was in use in the last century (before th 2K bug warned the world of the danger of the method).
They used to code dates in 2 digits only, knowing that 87 actually meant 1987. A user (or a computer) would add the missing years automatically.
In your case, you can read the full table, parse the dates, then it is easy to detect which dates are inconsistent. Identify them, correct them, and you are good to go.
With your example:
a = readtable(tfile) ; % read the file
dates = datetime(a.date) ; % extract first column and convert to [datetime]
idx2change = dates.Year < 2000 ; % Find which dates where on short format
dates.Year(idx2change) = dates.Year(idx2change) + 2000 ; % Correct truncated years
a.date = dates % reinject corrected [datetime] array into the table
yields:
a =
date values
___________ ______
11-Mar-2020 1
12-Mar-2020 2
14-Mar-2020 3
15-Mar-2020 4
16-Mar-2020 5
01-Apr-2020 6
Instead of specifying the format explicitly (as I also suggested before), one should use the delimiterImportoptions and in the case of a csv-file, use the delimitedTextImportOptions
opts = delimitedTextImportOptions('NumVariables',2,...% how many variables per row?
'VariableNamesLine',1,... % is there a header? If yes, in which line are the variable names?
'DataLines',2,... % in which line does the actual data starts?
'VariableTypes',{'datetime','double'})% as what data types should the variables be read
readtable('myfile.csv',opts)
because the neat little feature recognizes the format of the datetime automatically, as it knows that it must be a datetime-object =)
Is there a way to make a powershell script that compares 2 CSV files, and make a new .csv fil with the word that isent in 1 of the csv files?
I got 1 CSV file with 24mil words down in column 1.
And i got a nr2 CSV file with 24mil words. I want to compare those 2 list and see what words are missing, iknow 1 mil are missing.
So is there a way to make a powershell script that compares :) ?
Best Regards
I want to import the large data set (multiple column) by using the following code. I want to get all in a single column instead only one row (multi column). So i did transpose operation but it still doesn't work appropriately.
clc
clear all
close all
dataX_Real = fopen('dataX_Real_in.txt');dataX_Real=dataX_Real';
I will really appreciate your support and suggestions. Thank You
The sample files can be found using the following link.
When using fopen, all you are doing is opening up the file. You aren't reading in the data. What is returned from fopen is actually a file pointer that gives you access to the contents of the file. It doesn't actually read in the contents itself. You would need to use things like fread or fscanf to read in the content from the text data.
However, I would recommend you use dlmread instead, as this doesn't require a fopen call to open your file. This will open up the file, read the contents and store it into a variable in one function call:
dataX_Real = dlmread('dataX_Real_in.txt');
By doing the above and using your text file, I get 44825 elements. Here are the first 10 entries of your data:
>> format long;
>> dataX_Real(1:10)
ans =
Columns 1 through 4
-0.307224970000000 0.135961950000000 -1.072544100000000 0.114566020000000
Columns 5 through 8
0.499754310000000 -0.340369000000000 0.470609910000000 1.107567700000000
Columns 9 through 10
-0.295783020000000 -0.089266816000000
Seems to match up with what we see in your text file! However, you said you wanted it as a single column. This by default reads the values in on a row basis, so here you can certainly transpose:
dataX_Real = dataX_Real.';
Displaying the first 10 elements, we get:
>> dataX_Real = dataX_Real.';
>> dataX_Real(1:10)
ans =
-0.307224970000000
0.135961950000000
-1.072544100000000
0.114566020000000
0.499754310000000
-0.340369000000000
0.470609910000000
1.107567700000000
-0.295783020000000
-0.089266816000000
This is my problem.
I need to copy 2 columns each from 7 different files to the same output file.
All input and output files are CSV files.
And I need to add each new pair of columns beside the columns that have already been copied, so that at the end the output file has 14 columns.
I believe I cannot use
open(FILEHANDLE,">>file.csv").
Also all 7 CSV files have nearlly 20,000 rows each, therefore I'm reading and writing the files line by line.
It would be a great help if you could give me an idea as to what I should do.
Thanx a lot in advance.
Provided that your lines are 1:1 (Meaning you're combining data from line 1 of File_1, File_2, etc):
open all 7 files for input
open output file
read line of data from all input files
write line of combined data to output file
Text::CSV is probably the way to access CSV files.
You could define a csv handler for each file (including output), use getline or getline_hr (returns hashref) methods to fetch data, combine it into arrayrefs, than use print.