How to export data from txt (lines with info sep by ",") to csv, so csv file will have 6 columns even if there are only 4 or 5 values in some lines? - export-to-csv

I am a total beginner, I cannot find an answer, so that's my last resort.
Please help.
I have a txt file with lines of data separated by commas.
Some lines have 6 values, but some of them have 5 or 4.
I am trying to export this data to csv so it has 6 columns and if there are only 4 values in line in txt file, then absent data will be presented in columns as zeros or NaNs.
Txt:
ZEQ-851,Toyota Corolla,63,Air Conditioning,Hybrid,Automatic Transmission
BMC-69,Nissan Micra,42,Manual Transmission
And i need to form it in a way:
Plate, Model, Number, Propertie1, Propertie2, Propertie3.
ZEQ-851,Toyota Corolla,63,Air Conditioning,Hybrid,Automatic Transmission
BMC-69, Nissan Micra, 42, 0, 0,Manual Transmission.
Thank you!
That's what i was trying to do:
import csv
with open('Vehicles.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split(",") for line in stripped if line)
with open('Vehicles.csv', 'w', newline='') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Reg. nr','Model','PRD','Prop1','Prop2','Prop3'))
writer.writerows(lines)
And then:
all_cars=[]
print("The following cars are available:\n")
with open('Vehicles.csv', 'r') as garage:
reader = csv.reader(garage)
for column in reader:
regnum = column[0]
model = column[1]
ppd = column[2]
prop1 = column[3]
prop2 = column[4]
prop3 = column[5]
veh = Vehicle(regnum=regnum,model=model,ppd=ppd,prop1=prop1,prop2=prop2,prop3=prop3)
all_cars.append(veh)
for each in all_cars:
next(reader)
print("* Reg. nr: ",each.regnum,", Model: ",each.model,", Price per day: ",each.ppd,"\nProperties:",each.prop1,each.prop2,each.prop3,".",sep="")

Related

how to read from file and display the data in desired rows in Matlab

I am trying to read from a file and display the data in rows 6, 11, 111 and 127 in Matlab. I could not figure out how to do it. I have been searching Matlab forums and this platform for an answer. I used fscanf, textscan and other functions but they did not work as intended. I also used a for loop but again the output was not what I wanted. I can now only read one row and display it. Simply I want to display all of them(data in rows given above) at the same time. How can I do that?
matlab code
n = [0 :1: 127];
%% Problem 1
figure
x1 = cos(0.17*pi*n)
%it creates file and writes content of x1 to the file
fileID = fopen('file.txt','w');
fprintf(fileID,'%d \n',x1);
fclose(fileID);
%line number can be changed in order to obtain wanted values.
fileID = fopen('file.txt');
line = 6;
C = textscan(fileID,'%s',1,'delimiter','\n', 'headerlines',line-1);
celldisp(C)
fclose(fileID);
and this is the file
1
8.607420e-01
4.817537e-01
-3.141076e-02
-5.358268e-01
-8.910065e-01
-9.980267e-01
-8.270806e-01
-4.257793e-01
9.410831e-02
5.877853e-01
9.177546e-01
9.921147e-01
7.901550e-01
3.681246e-01
-1.564345e-01
-6.374240e-01
-9.408808e-01
-9.822873e-01
-7.501111e-01
-3.090170e-01
2.181432e-01
6.845471e-01
9.602937e-01
9.685832e-01
7.071068e-01
2.486899e-01
-2.789911e-01
-7.289686e-01
-9.759168e-01
-9.510565e-01
-6.613119e-01
-1.873813e-01
3.387379e-01
7.705132e-01
9.876883e-01
9.297765e-01
6.129071e-01
1.253332e-01
-3.971479e-01
-8.090170e-01
-9.955620e-01
-9.048271e-01
-5.620834e-01
-6.279052e-02
4.539905e-01
8.443279e-01
9.995066e-01
8.763067e-01
5.090414e-01
-4.288121e-15
-5.090414e-01
-8.763067e-01
-9.995066e-01
-8.443279e-01
-4.539905e-01
6.279052e-02
5.620834e-01
9.048271e-01
9.955620e-01
8.090170e-01
3.971479e-01
-1.253332e-01
-6.129071e-01
-9.297765e-01
-9.876883e-01
-7.705132e-01
-3.387379e-01
1.873813e-01
6.613119e-01
9.510565e-01
9.759168e-01
7.289686e-01
2.789911e-01
-2.486899e-01
-7.071068e-01
-9.685832e-01
-9.602937e-01
-6.845471e-01
-2.181432e-01
3.090170e-01
7.501111e-01
9.822873e-01
9.408808e-01
6.374240e-01
1.564345e-01
-3.681246e-01
-7.901550e-01
-9.921147e-01
-9.177546e-01
-5.877853e-01
-9.410831e-02
4.257793e-01
8.270806e-01
9.980267e-01
8.910065e-01
5.358268e-01
3.141076e-02
-4.817537e-01
-8.607420e-01
-1
-8.607420e-01
-4.817537e-01
3.141076e-02
5.358268e-01
8.910065e-01
9.980267e-01
8.270806e-01
4.257793e-01
-9.410831e-02
-5.877853e-01
-9.177546e-01
-9.921147e-01
-7.901550e-01
-3.681246e-01
1.564345e-01
6.374240e-01
9.408808e-01
9.822873e-01
7.501111e-01
3.090170e-01
-2.181432e-01
-6.845471e-01
-9.602937e-01
-9.685832e-01
-7.071068e-01
-2.486899e-01
2.789911e-01
Assuming the file is not exceedingly large, the simplest way would probably be read the entire file & index the output to your desired lines.
line = [6 11 111 127];
fileID = fopen('file.txt');
C = textscan(fileID,'%s','delimiter','\n');
fclose(fileID);
disp(C{1}(line))

Keras: What is the correct data format for recurrent networks?

I am trying to build a recurrent network which classifies sequences (multidimensional data streams). I must be missing something, since while running my code:
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Activation
import numpy as np
ils = 10 # input layer size
ilt = 11 # input layer time steps
hls = 12 # hidden layer size
nhl = 2 # number of hidden layers
ols = 1 # output layer size
p = 0.2 # dropout probability
f_a = 'relu' # activation function
opt = 'rmsprop' # optimizing function
#
# Building the model
#
model = Sequential()
# The input layer
model.add(LSTM(hls, input_shape=(ilt, ils), return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Hidden layers
for i in range(nhl - 1):
model.add(LSTM(hls, return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Output layer
model.add(LSTM(ols, return_sequences=False))
model.add(Activation('softmax'))
model.compile(optimizer=opt, loss='binary_crossentropy')
#
# Making test data and fitting the model
#
m_train, n_class = 1000, 2
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
model.fit(data, labels, nb_epoch=10, batch_size=32)
I get output (truncated):
Using Theano backend.
line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 430, in make_node
new_inputs.append(format(outer_seq, as_var=inner_seq))
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 422, in format
rval = tmp.filter_variable(rval)
File "/home/koala/.local/lib/python2.7/site-packages/theano/tensor/type.py", line 233, in filter_variable
self=self))
TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{:int64:}.0) into Type TensorType(float32, (False, False, True)). You can try to manually convert Subtensor{:int64:}.0 into a TensorType(float32, (False, False, True)).
Is this a problem with the data format at all.
For me the problem was fixed when I went and tried it on my real dataset. The difference being that in the real dataset I have more than 1 label. So an example of dataset on which this code works is:
(...)
ols = 2 # Output layer size
(...)
m_train, n_class = 1000, ols
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
# Make labels onehot
onehot_labels = np.zeros(shape=(labels.shape[0], ols))
onehot_labels[np.arange(labels.shape[0]), labels.astype(np.int)] = 1

Matlab - read unstructured file

I'm quite new with Matlab and I've been searching, unsucessfully, for the following issue: I have an unstructure txt file, with several rows I don't need, but there are a number of rows inside that file that have an structured format. I've been researching how to "load" the file to edit it, but cannot find anything.
Since i don't know if I was clear, let me show you the content in the file:
8782 PROJCS["UTM-39",GEOGC.......
1 676135.67755473056 2673731.9365976951 -15 0
2 663999.99999999302 2717629.9999999981 -14.00231124135486 3
3 709999.99999999162 2707679.2185399458 -10 2
4 679972.20003752434 2674637.5679516452 0.070000000000000007 1
5 676124.87132483651 2674327.3183533219 -18.94794942571912 0
6 682614.20527054626 2671000.0000000549 -1.6383425512446661 0
...........
8780 682247.4593014461 2676571.1515358146 0.1541080392180566 0
8781 695426.98657108378 2698111.6168302582 -8.5039945992245904 0
8782 674723.80100125563 2675133.5486935056 -19.920312922947179 0
16997 3 21
1 2147 658 590
2 1855 2529 5623
.........
I'd appreciate if someone can just tell me if there is the possibility to open the file to later load only the rows starting with 1 to the one starting with 8782. First row and all the others are not important.
I know than manually copy and paste to a new file would be a solution, but I'd like to know about the possibility to read the file and edit it for other ideas I have.
Thanks!
% Now lines{i} is the string of the i'th line.
lines = strsplit(fileread('filename'), '\n')
% Now elements{i}{j} is the j'th field of the i'th line.
elements = arrayfun(#(x){strsplit(x{1}, ' ')}, lines)
% Remove the first row:
elements(1) = []
% Take the first several rows:
n_rows = 8782
elements = elements(1:n_rows)
Or if the number of rows you need to take is not fixed, you can replace the last two statements above by:
firsts = arrayfun(#(x)str2num(x{1}{1}), elements)
n_rows = find((firsts(2:end) - firsts(1:end-1)) ~= 1, 1, 'first')
elements = elements(1:n_rows)

Changing format of many files in Excel

I have a folder filled with thousands of csv files. When I open one file, the data looks like:
20110503 01:46.0 1527.8 1 E
20110503 01:46.0 1537.8 1 E
20110504 37:40.0 1536.6 1 E
20110504 37:40.0 1533.6 1 E
20110504 36:17.0 1531.1 1 E
The second column(time) has minutes and seconds before the decimal point. If I select the second column, right click and click format cells, select time, and change to 13:30:55 mode, the same data looks like:
20110503 19:01:46 1527.8 1 E
20110503 19:01:46 1537.8 1 E
20110504 0:37:40 1536.6 1 E
20110504 0:37:40 1533.6 1 E
20110504 8:36:17 1531.1 1 E
Now I can see hours, minutes and seconds. I have written a matlab function that reads these files, but needs to be able to read the hours. The function can only be used after I change the format to display the hours. Now I have to apply the function to all the files in the folder.
I'm wondering, is there a way to change the default time display so hours are included? If not, is there a way of writing a script to change the format of these files? Thanks!
Note: the part of my matlab function that reads the file looks like:
fid = fopen('E:\Tick Data\Data Output\NGU13.csv','rt');
c = fscanf(fid, '%d,%d:%d:%d,%f,%d,%*c');
datamat = reshape(c,6,length(c)/6)'; % reshape into matrix
yyyymmdd = datamat(:,1);
hr = datamat(:,2);
mn = datamat(:,3);
sec = datamat(:,4);
pp = datamat(:,5); % price
vv = datamat(:,6); % volume
In Excel:
In Notepad, you can see hours, minutes, seconds, and milliseconds:
20111206,09:50:56.411,4.320,1,E
20111206,10:02:10.167,4.300,1,E
20111206,11:24:09.052,4.313,1,E
20111206,11:46:09.359,4.307,1,E
20111206,11:50:22.785,4.320,1,E
For a record of the type
20010402, 09:30:24.456, 4.235, 1, E
you should use this fmt:
fmt = '%f%f:%f:%f.%f%f%*s';
data = textscan(fid, fmt, 'Delimiter',',','CollectOutput',true);

How to add meta_data to Pandas dataframe?

I use Pandas dataframe heavily. And need to attach some data to the dataframe, for example to record the birth time of the dataframe, the additional description of the dataframe etc.
I just can't find reserved fields of dataframe class to keep the data.
So I change the core\frame.py file to add a line _reserved_slot = {} to solve my issue. I post the question here is just want to know is it OK to do so ? Or is there better way to attach meta-data to dataframe/column/row etc?
#----------------------------------------------------------------------
# DataFrame class
class DataFrame(NDFrame):
_auto_consolidate = True
_verbose_info = True
_het_axis = 1
_col_klass = Series
_AXIS_NUMBERS = {
'index': 0,
'columns': 1
}
_reserved_slot = {} # Add by bigbug to keep extra data for dataframe
_AXIS_NAMES = dict((v, k) for k, v in _AXIS_NUMBERS.iteritems())
EDIT : (Add demo msg for witingkuo's way)
>>> df = pd.DataFrame(np.random.randn(10,5), columns=list('ABCDEFGHIJKLMN')[0:5])
>>> df
A B C D E
0 0.5890 -0.7683 -1.9752 0.7745 0.8019
1 1.1835 0.0873 0.3492 0.7749 1.1318
2 0.7476 0.4116 0.3427 -0.1355 1.8557
3 1.2738 0.7225 -0.8639 -0.7190 -0.2598
4 -0.3644 -0.4676 0.0837 0.1685 0.8199
5 0.4621 -0.2965 0.7061 -1.3920 0.6838
6 -0.4135 -0.4991 0.7277 -0.6099 1.8606
7 -1.0804 -0.3456 0.8979 0.3319 -1.1907
8 -0.3892 1.2319 -0.4735 0.8516 1.2431
9 -1.0527 0.9307 0.2740 -0.6909 0.4924
>>> df._test = 'hello'
>>> df2 = df.shift(1)
>>> print df2._test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\lib\site-packages\pandas\core\frame.py", line 2051, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute '_test'
>>>
This is not supported right now. See https://github.com/pydata/pandas/issues/2485. The reason is the propogation of these attributes is non-trivial. You can certainly assign data, but almost all pandas operations return a new object, where the assigned data will be lost.
Your _reserved_slot will become a class variable. That might not work if you want to assign different value to different DataFrame. Probably you can assign what you want to the instance directly.
In [6]: import pandas as pd
In [7]: df = pd.DataFrame()
In [8]: df._test = 'hello'
In [9]: df._test
Out[9]: 'hello'
I think a decent workaround is putting your datafame into a dictionary with your metadata as other keys. So if you have a dataframe with cashflows, like:
df = pd.DataFrame({'Amount': [-20, 15, 25, 30, 100]},index=pd.date_range(start='1/1/2018', periods=5))
You can create your dictionary with additional metadata and put the dataframe there
out = {'metadata': {'Name': 'Whatever', 'Account': 'Something else'}, 'df': df}
and then use it as out[df]