How to export data from txt (lines with info sep by ",") to csv, so csv file will have 6 columns even if there are only 4 or 5 values in some lines?

How to export data from txt (lines with info sep by ",") to csv, so csv file will have 6 columns even if there are only 4 or 5 values in some lines? - export-to-csv

I am a total beginner, I cannot find an answer, so that's my last resort.
Please help.
I have a txt file with lines of data separated by commas.
Some lines have 6 values, but some of them have 5 or 4.
I am trying to export this data to csv so it has 6 columns and if there are only 4 values in line in txt file, then absent data will be presented in columns as zeros or NaNs.
Txt:
ZEQ-851,Toyota Corolla,63,Air Conditioning,Hybrid,Automatic Transmission
BMC-69,Nissan Micra,42,Manual Transmission
And i need to form it in a way:
Plate, Model, Number, Propertie1, Propertie2, Propertie3.
ZEQ-851,Toyota Corolla,63,Air Conditioning,Hybrid,Automatic Transmission
BMC-69, Nissan Micra, 42, 0, 0,Manual Transmission.
Thank you!
That's what i was trying to do:
import csv
with open('Vehicles.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split(",") for line in stripped if line)
with open('Vehicles.csv', 'w', newline='') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Reg. nr','Model','PRD','Prop1','Prop2','Prop3'))
writer.writerows(lines)
And then:
all_cars=[]
print("The following cars are available:\n")
with open('Vehicles.csv', 'r') as garage:
reader = csv.reader(garage)
for column in reader:
regnum = column[0]
model = column[1]
ppd = column[2]
prop1 = column[3]
prop2 = column[4]
prop3 = column[5]
veh = Vehicle(regnum=regnum,model=model,ppd=ppd,prop1=prop1,prop2=prop2,prop3=prop3)
all_cars.append(veh)
for each in all_cars:
next(reader)
print("* Reg. nr: ",each.regnum,", Model: ",each.model,", Price per day: ",each.ppd,"\nProperties:",each.prop1,each.prop2,each.prop3,".",sep="")

Related

how to read from file and display the data in desired rows in Matlab

I am trying to read from a file and display the data in rows 6, 11, 111 and 127 in Matlab. I could not figure out how to do it. I have been searching Matlab forums and this platform for an answer. I used fscanf, textscan and other functions but they did not work as intended. I also used a for loop but again the output was not what I wanted. I can now only read one row and display it. Simply I want to display all of them(data in rows given above) at the same time. How can I do that?
matlab code
n = [0 :1: 127];
%% Problem 1
figure
x1 = cos(0.17*pi*n)
%it creates file and writes content of x1 to the file
fileID = fopen('file.txt','w');
fprintf(fileID,'%d \n',x1);
fclose(fileID);
%line number can be changed in order to obtain wanted values.
fileID = fopen('file.txt');
line = 6;
C = textscan(fileID,'%s',1,'delimiter','\n', 'headerlines',line-1);
celldisp(C)
fclose(fileID);
and this is the file
1
8.607420e-01
4.817537e-01
-3.141076e-02
-5.358268e-01
-8.910065e-01
-9.980267e-01
-8.270806e-01
-4.257793e-01
9.410831e-02
5.877853e-01
9.177546e-01
9.921147e-01
7.901550e-01
3.681246e-01
-1.564345e-01
-6.374240e-01
-9.408808e-01
-9.822873e-01
-7.501111e-01
-3.090170e-01
2.181432e-01
6.845471e-01
9.602937e-01
9.685832e-01
7.071068e-01
2.486899e-01
-2.789911e-01
-7.289686e-01
-9.759168e-01
-9.510565e-01
-6.613119e-01
-1.873813e-01
3.387379e-01
7.705132e-01
9.876883e-01
9.297765e-01
6.129071e-01
1.253332e-01
-3.971479e-01
-8.090170e-01
-9.955620e-01
-9.048271e-01
-5.620834e-01
-6.279052e-02
4.539905e-01
8.443279e-01
9.995066e-01
8.763067e-01
5.090414e-01
-4.288121e-15
-5.090414e-01
-8.763067e-01
-9.995066e-01
-8.443279e-01
-4.539905e-01
6.279052e-02
5.620834e-01
9.048271e-01
9.955620e-01
8.090170e-01
3.971479e-01
-1.253332e-01
-6.129071e-01
-9.297765e-01
-9.876883e-01
-7.705132e-01
-3.387379e-01
1.873813e-01
6.613119e-01
9.510565e-01
9.759168e-01
7.289686e-01
2.789911e-01
-2.486899e-01
-7.071068e-01
-9.685832e-01
-9.602937e-01
-6.845471e-01
-2.181432e-01
3.090170e-01
7.501111e-01
9.822873e-01
9.408808e-01
6.374240e-01
1.564345e-01
-3.681246e-01
-7.901550e-01
-9.921147e-01
-9.177546e-01
-5.877853e-01
-9.410831e-02
4.257793e-01
8.270806e-01
9.980267e-01
8.910065e-01
5.358268e-01
3.141076e-02
-4.817537e-01
-8.607420e-01
-1
-8.607420e-01
-4.817537e-01
3.141076e-02
5.358268e-01
8.910065e-01
9.980267e-01
8.270806e-01
4.257793e-01
-9.410831e-02
-5.877853e-01
-9.177546e-01
-9.921147e-01
-7.901550e-01
-3.681246e-01
1.564345e-01
6.374240e-01
9.408808e-01
9.822873e-01
7.501111e-01
3.090170e-01
-2.181432e-01
-6.845471e-01
-9.602937e-01
-9.685832e-01
-7.071068e-01
-2.486899e-01
2.789911e-01

Assuming the file is not exceedingly large, the simplest way would probably be read the entire file & index the output to your desired lines.
line = [6 11 111 127];
fileID = fopen('file.txt');
C = textscan(fileID,'%s','delimiter','\n');
fclose(fileID);
disp(C{1}(line))

Keras: What is the correct data format for recurrent networks?

I am trying to build a recurrent network which classifies sequences (multidimensional data streams). I must be missing something, since while running my code:
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Activation
import numpy as np
ils = 10 # input layer size
ilt = 11 # input layer time steps
hls = 12 # hidden layer size
nhl = 2 # number of hidden layers
ols = 1 # output layer size
p = 0.2 # dropout probability
f_a = 'relu' # activation function
opt = 'rmsprop' # optimizing function
#
# Building the model
#
model = Sequential()
# The input layer
model.add(LSTM(hls, input_shape=(ilt, ils), return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Hidden layers
for i in range(nhl - 1):
model.add(LSTM(hls, return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Output layer
model.add(LSTM(ols, return_sequences=False))
model.add(Activation('softmax'))
model.compile(optimizer=opt, loss='binary_crossentropy')
#
# Making test data and fitting the model
#
m_train, n_class = 1000, 2
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
model.fit(data, labels, nb_epoch=10, batch_size=32)
I get output (truncated):
Using Theano backend.
line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 430, in make_node
new_inputs.append(format(outer_seq, as_var=inner_seq))
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 422, in format
rval = tmp.filter_variable(rval)
File "/home/koala/.local/lib/python2.7/site-packages/theano/tensor/type.py", line 233, in filter_variable
self=self))
TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{:int64:}.0) into Type TensorType(float32, (False, False, True)). You can try to manually convert Subtensor{:int64:}.0 into a TensorType(float32, (False, False, True)).
Is this a problem with the data format at all.

For me the problem was fixed when I went and tried it on my real dataset. The difference being that in the real dataset I have more than 1 label. So an example of dataset on which this code works is:
(...)
ols = 2 # Output layer size
(...)
m_train, n_class = 1000, ols
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
# Make labels onehot
onehot_labels = np.zeros(shape=(labels.shape[0], ols))
onehot_labels[np.arange(labels.shape[0]), labels.astype(np.int)] = 1

Matlab - read unstructured file

I'm quite new with Matlab and I've been searching, unsucessfully, for the following issue: I have an unstructure txt file, with several rows I don't need, but there are a number of rows inside that file that have an structured format. I've been researching how to "load" the file to edit it, but cannot find anything.
Since i don't know if I was clear, let me show you the content in the file:
8782 PROJCS["UTM-39",GEOGC.......
1 676135.67755473056 2673731.9365976951 -15 0
2 663999.99999999302 2717629.9999999981 -14.00231124135486 3
3 709999.99999999162 2707679.2185399458 -10 2
4 679972.20003752434 2674637.5679516452 0.070000000000000007 1
5 676124.87132483651 2674327.3183533219 -18.94794942571912 0
6 682614.20527054626 2671000.0000000549 -1.6383425512446661 0
...........
8780 682247.4593014461 2676571.1515358146 0.1541080392180566 0
8781 695426.98657108378 2698111.6168302582 -8.5039945992245904 0
8782 674723.80100125563 2675133.5486935056 -19.920312922947179 0
16997 3 21
1 2147 658 590
2 1855 2529 5623
.........
I'd appreciate if someone can just tell me if there is the possibility to open the file to later load only the rows starting with 1 to the one starting with 8782. First row and all the others are not important.
I know than manually copy and paste to a new file would be a solution, but I'd like to know about the possibility to read the file and edit it for other ideas I have.
Thanks!

% Now lines{i} is the string of the i'th line.
lines = strsplit(fileread('filename'), '\n')
% Now elements{i}{j} is the j'th field of the i'th line.
elements = arrayfun(#(x){strsplit(x{1}, ' ')}, lines)
% Remove the first row:
elements(1) = []
% Take the first several rows:
n_rows = 8782
elements = elements(1:n_rows)
Or if the number of rows you need to take is not fixed, you can replace the last two statements above by:
firsts = arrayfun(#(x)str2num(x{1}{1}), elements)
n_rows = find((firsts(2:end) - firsts(1:end-1)) ~= 1, 1, 'first')
elements = elements(1:n_rows)

Changing format of many files in Excel

I have a folder filled with thousands of csv files. When I open one file, the data looks like:
20110503 01:46.0 1527.8 1 E
20110503 01:46.0 1537.8 1 E
20110504 37:40.0 1536.6 1 E
20110504 37:40.0 1533.6 1 E
20110504 36:17.0 1531.1 1 E
The second column(time) has minutes and seconds before the decimal point. If I select the second column, right click and click format cells, select time, and change to 13:30:55 mode, the same data looks like:
20110503 19:01:46 1527.8 1 E
20110503 19:01:46 1537.8 1 E
20110504 0:37:40 1536.6 1 E
20110504 0:37:40 1533.6 1 E
20110504 8:36:17 1531.1 1 E
Now I can see hours, minutes and seconds. I have written a matlab function that reads these files, but needs to be able to read the hours. The function can only be used after I change the format to display the hours. Now I have to apply the function to all the files in the folder.
I'm wondering, is there a way to change the default time display so hours are included? If not, is there a way of writing a script to change the format of these files? Thanks!
Note: the part of my matlab function that reads the file looks like:
fid = fopen('E:\Tick Data\Data Output\NGU13.csv','rt');
c = fscanf(fid, '%d,%d:%d:%d,%f,%d,%*c');
datamat = reshape(c,6,length(c)/6)'; % reshape into matrix
yyyymmdd = datamat(:,1);
hr = datamat(:,2);
mn = datamat(:,3);
sec = datamat(:,4);
pp = datamat(:,5); % price
vv = datamat(:,6); % volume
In Excel:
In Notepad, you can see hours, minutes, seconds, and milliseconds:
20111206,09:50:56.411,4.320,1,E
20111206,10:02:10.167,4.300,1,E
20111206,11:24:09.052,4.313,1,E
20111206,11:46:09.359,4.307,1,E
20111206,11:50:22.785,4.320,1,E

For a record of the type
20010402, 09:30:24.456, 4.235, 1, E
you should use this fmt:
fmt = '%f%f:%f:%f.%f%f%*s';
data = textscan(fid, fmt, 'Delimiter',',','CollectOutput',true);

How to add meta_data to Pandas dataframe?

I use Pandas dataframe heavily. And need to attach some data to the dataframe, for example to record the birth time of the dataframe, the additional description of the dataframe etc.
I just can't find reserved fields of dataframe class to keep the data.
So I change the core\frame.py file to add a line _reserved_slot = {} to solve my issue. I post the question here is just want to know is it OK to do so ? Or is there better way to attach meta-data to dataframe/column/row etc?
#----------------------------------------------------------------------
# DataFrame class
class DataFrame(NDFrame):
_auto_consolidate = True
_verbose_info = True
_het_axis = 1
_col_klass = Series
_AXIS_NUMBERS = {
'index': 0,
'columns': 1
}
_reserved_slot = {} # Add by bigbug to keep extra data for dataframe
_AXIS_NAMES = dict((v, k) for k, v in _AXIS_NUMBERS.iteritems())
EDIT : (Add demo msg for witingkuo's way)
>>> df = pd.DataFrame(np.random.randn(10,5), columns=list('ABCDEFGHIJKLMN')[0:5])
>>> df
A B C D E
0 0.5890 -0.7683 -1.9752 0.7745 0.8019
1 1.1835 0.0873 0.3492 0.7749 1.1318
2 0.7476 0.4116 0.3427 -0.1355 1.8557
3 1.2738 0.7225 -0.8639 -0.7190 -0.2598
4 -0.3644 -0.4676 0.0837 0.1685 0.8199
5 0.4621 -0.2965 0.7061 -1.3920 0.6838
6 -0.4135 -0.4991 0.7277 -0.6099 1.8606
7 -1.0804 -0.3456 0.8979 0.3319 -1.1907
8 -0.3892 1.2319 -0.4735 0.8516 1.2431
9 -1.0527 0.9307 0.2740 -0.6909 0.4924
>>> df._test = 'hello'
>>> df2 = df.shift(1)
>>> print df2._test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\lib\site-packages\pandas\core\frame.py", line 2051, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute '_test'
>>>

This is not supported right now. See https://github.com/pydata/pandas/issues/2485. The reason is the propogation of these attributes is non-trivial. You can certainly assign data, but almost all pandas operations return a new object, where the assigned data will be lost.

Your _reserved_slot will become a class variable. That might not work if you want to assign different value to different DataFrame. Probably you can assign what you want to the instance directly.
In [6]: import pandas as pd
In [7]: df = pd.DataFrame()
In [8]: df._test = 'hello'
In [9]: df._test
Out[9]: 'hello'

I think a decent workaround is putting your datafame into a dictionary with your metadata as other keys. So if you have a dataframe with cashflows, like:
df = pd.DataFrame({'Amount': [-20, 15, 25, 30, 100]},index=pd.date_range(start='1/1/2018', periods=5))
You can create your dictionary with additional metadata and put the dataframe there
out = {'metadata': {'Name': 'Whatever', 'Account': 'Something else'}, 'df': df}
and then use it as out[df]

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to export data from txt (lines with info sep by ",") to csv, so csv file will have 6 columns even if there are only 4 or 5 values in some lines? - export-to-csv

Related

how to read from file and display the data in desired rows in Matlab

Keras: What is the correct data format for recurrent networks?

Matlab - read unstructured file

Changing format of many files in Excel

How to add meta_data to Pandas dataframe?

Categories

Resources