Unable to load csv file in wrangler data fusion - google-cloud-data-fusion

i am trying to import a file which has more than 100 columns and size of file is 4 mb.
after applying operation "parse-as-csv :body ',' true" . no data is showing
but if file size is < 2mb it is working.
please give any solution

This is a known issue with Wrangler, where the headers are not filtered after the first file. Saying that, in the first file, you have columns:
| 1 | 2 | 3 |
And in the second one, there is a mismatch where the columns are aligned like:
| 2 | 1 | 3 |
As a remediation, you can parse with the delimiter in the source and filter the header.
Another possible approach that I can think of is to use Python to preprocess the data and align the headers.
Additionally, Data Fusion can import files up to 250 MB.

Related

Extract specific values from cells from a CSV

I have to combine a lot of files , mostly CSV, already have code to combine however I need first to trim the desired csv files so I can get the data that I want. Each CSV has first 2 columns of 8 rows which contain data that I want. and then just below those there is a row that generates 8 columns. I am only having issue grabbing data from the first 8 rows of the 2 columns.
Example of the csv first 3 rows:
Target Name: MIAW
Target OS: Windows
Last Updated On: June 27 2019 07:35:11
This is the data that I want, the first 3 rows are like this, with 2 columns. My idea is to store the 3 values of the 2nd column each into a variable and then use it with the rest of my code.
As I only have a problem extracting the data, and since the way the CSV are formated there is no header at all, it is hard to come up with an easy way to read the 2nd column data. Below is an example, this of course will be used to process several files so it will be a foreach, but I want to come up first with the simple code for 1 file so I can adapt it myself to a foreach.
$a = MIAW-Results-20190627T203644Z.csv
Write-Host "$a[1].col2"
This would work if and only if I had a header called col2, I could name it with the first value on the 2nd column but the issue is that that value will change for CSV file. So the code I tried would not work for example if I were to import several files using:
$InFiles = Get-ChildItem -Path $PSScriptRoot\ *.csv -File |
Where-Object Name -like *results*
Each CSV will have a different value for the first value on the 2nd column.
Is there an easier way to just grab the 3 rows of the second column that I need? I need to grab each one and store each in a different variable.

Trailing rows in datastore with multiple csv files

Matlab 2015b
I have several large (100-300MB) csv files, I want to merge to one and filter out some of the columns. They are shaped like this:
timestamp | variable1 | ... | variable200
01.01.16 00:00:00 | 1.59 | ... | 0.5
01.01.16 00:00:01 | ...
.
.
For this task I am using a datastore class including all the csv files:
ds = datastore('file*.csv');
When I read all of the entries and try to write them back to a csv file using writetable, I get an error, that the input has to be a cell array.
When looking at the cell array read from the datastore in debug mode, I noticed, that there are several rows containing only a timestamp, which are not in the original files. These columns are between the last row of a file and the first rows of the following one. The timestamps of this rows are the logical continuation of the last timestamp (as you would get them using excel).
Is this a bug or intended behaviour?
Can I avoid reading this rows in the first place or do I have to filter them out afterwards?
Thanks in advance.
As it seems nobody else had this problem, I will share how I dealt with it in the end:
toDelete = strcmp(data.(2), '');
data(toDelete, :) = [];
I took the second column of the table and checked for an empty string. Afterwards I filled all faulty rows with an empty array via logical indexing. (As shown in the Matlab Documentation)
Sadly I found no method to prevent loading the faulty data, but in the end the amount of data was not to big to do this processing step in memory.

Time-series Stock Data in Matlab

I'm a MatLab beginner, and have no idea what I'm doing.
I have stock data in CSV format which is something like this:
+--------+--------+------+------+-----+-------+
| Ticker | Date | Open | High | Low | Close |
+--------+--------+------+------+-----+-------+
| APPL | 25-Oct | 10 | 12 | 9 | 12 |
| XYZ | 25-Oct | 10 | 12 | 9 | 12 |
| APPL | 26-Oct | 12 | 15 | 10 | 15 |
+--------+--------+------+------+-----+-------+
There are many stock tickers each day. The file is many rows long listing daily stock prices for each ticket on a particular stock exchange.
I'm aiming to do some fun time-series analysis on the 'close' price for each ticker.
To start with making simple charts of a single ticker over time, or multiple tickers over time would be awesome.
Questions:
1. Best way to import data.
I have a big long CSV. But am lost as to which import method is best. Column Vectors, Numeric Matrix, Cell Array or Table?
2. I need to create a time-series object for each ticker, right?
How would one go about that? I've been looking at this guide, but I'm unsure how to make an object for each ticker, over the span of time defined in the file.
http://www.mathworks.com/help/matlab/ref/timeseries-class.html
Any advice, pointers and resources that are good for beginners are appreciated massively!
Thanks!
There are a ton of ways to import data into MATLAB. Before you import data, I would make sure numeric columns hold ONLY numeric data or MATLAB can complain. Some options in my personal order of preference:
d = readtable('mycsvfile.csv'); % puts data in nice table datatype. I find it makes code more readable.
d = csvread('myfile.csv',1,0); % the 1 skips the first row which is probably header names for the csv file. Puts all the data in a matrix and you have to keep track of what column is what.
xlsread is good for reading excel files
Copy and paste the data into a variable in your workspace. Do save blahblah.mat so you can easily load the data later.
I personally wouldn't bother with financial time series objects. It's just going to complicate your life if you're new to MATLAB. If you loaded the data using tableread (i.e. option 1) you can then execute something like:
aapl_indicator = strcmp(d.Ticker, 'AAPL');
to get a vector indicating whether a row in your table is AAPL or not. Then:
close_price_aapl = d.Close(aapl_indicator);
will give you a vector of Apple's closing prices.
When you get down to doing math, you want to be using the matrices.

When I Import txt to VFP with Wizard decimals don't work ok

i have this txt file:
"1","My Product 1","Vegetables","15.20"
"2","My Product 2","soda","9.52"
but when i import that with the wizard on Visual FoxPro 6 my result in the table is:
1 | My Product 1 | Vegetables | 15
2 | My Product 2 | Vegetables | 9
I've used SET DECIMALS TO 2 but it doesn't work. If I export again, the table in txt shows this:
"1","My Product 1","Vegetables","15"
"2","My Product 2","soda","9"
without decimals. So, how can I import decimals correctly to VFP with the wizard or with a sentence?
I don't know the format of your table, but here is something that will work for you. I am creating a temporary cursor as opposed to a permanent table, but a permanent table could do the same thing. You need to pre-define your columns in the same order and expected data type. In this case, the price I set as numeric with a length 10 max, but 2 decimal positions.
CREATE CURSOR C_Import;
( someID c(5),;
someProduct c(30),;
someOtherFld c(20),;
somePrice n(10,2))
Now, if you append the text file as CSV (comma separated values), VFP will recognize the decimal positions during the numeric import.
APPEND FROM YourTextFile.txt TYPE csv
If the the default decimal point is ',' you have to define before the append command: SET POINT TO '.'. Without that you'll get only integer value as price.
Remember after append to change it back to the original value.

KDB/KX appending table to a file without reading the entire file

I'm new to KDB ( sorry if this question is dumb). I'm creating the following table
q)dsPricing:([id:`int$(); date:`date$()] open:`float$();close:`float$();high:`float$();low:`float$();volume:`int$())
q)dsPricing:([id:`int$(); date:`date$()] open:`float$();close:`float$();high:`float$();low:`float$();volume:`int$())
q)`dsPricing insert(123;2003.03.23;1.0;3.0;4.0;2.0;1000)
q)`dsPricing insert(123;2003.03.24;1.0;3.0;4.0;2.0;2000)
q)save `:dsPricing
Let's say after saving I exit. After starting q, I like to add another pricing item in there without loading the entire file because the file could be large
q)`dsPricing insert(123;2003.03.25;1.0;3.0;4.0;2.0;1500)
I've been looking at .Q.dpft but I can't really figure it out. Also this table/file doesn't need to be partitioned.
Thanks
You can upsert with the file handle of a table to append on disk, your example would look like this:
`:dsPricing upsert(123;2003.03.25;1.0;3.0;4.0;2.0;1500)
You can load the table into your q session using get, load or \l
q)get `:dsPricing
id date | open close high low volume
--------------| --------------------------
123 2003.03.23| 1 3 4 2 1000
123 2003.03.24| 1 3 4 2 2000
123 2003.03.25| 1 3 4 2 1500
.Q.dpft will save a table splayed(one file for each column in the table and a .d file containing column names) with a parted attribute(p#) on one of the symbol columns. Any symbol columns will also be enumerated by .Q.en.