I have the following data in a csv file:
00:1A:1E:35:81:01, -36, -37, -36
00:1A:1E:35:9D:61, -69, -69, -69
00:1A:1E:35:7E:C1, -95, -95, -71
00:1A:1E:35:9D:65, -66, -67, -67
00:1A:1E:35:9D:60, -67, -68, -68
00:1A:1E:35:9D:63, -66, -68, -68
I am unable to read first column with MATLAB, which contain strings.
You can use
xlsread(file.csv);
instead of csvread. It returns [num, txt, raw], where num contains all cells parsed to double (NaN where it was not possible), txt all cells as text ('' where conversion to num succeeded) and raw with all cells as strings.
Related
i need to convert a pattern of digits in an amount to spaces. like if i have all 9s then that should be converted to '', but if 9 is part of a number then it should not convert. For eg: 9, 99, 99.99, 9.999, 999.9..etc these should be converted to '', but if the amount is 90, 119, 291, 889, 100.99, 999.11 then it should not convert. CONVERT() is not working, so i tried to COUNT(AMT,9)=LEN(AMT). I think this won't work as LEN() will count DOT in the decimal posItion. So count (9.99, 9) would be 3 but LEN(9.99) would be 4.
My current code in DataStage 11.7 has IF CONVERT('9','', AMT) ='' THEN 0 ELSE AMT
Please help me with solution.
How about If Len(Convert("9","",AMT)) = 0 Then "" Else AMT
I have the following text file.
BREST:
Rennes 244
RENNES:
Breast 244
Caen 176
Nantes 107
Paris 348
CAEN:
Calais 120
Paris 241
Rennes 176
CALAIS:
Caen 120
Nancy 534
Paris 297
I am trying to convert this to a dictionary with the capitalized words as the keys. It should look like this:
roads = {'BREST': ['Rennes'],
'RENNES': ['Brest', 'Caen', 'Nantes', 'Paris'],
'CAEN': ['Calais', 'Paris', 'Rennes'],
'CALAIS': ['Caen', 'Nancy', 'Paris']
}
Assuming that you are reading from a file called input.txt, this produces the desired result.
from collections import defaultdict
d = defaultdict(list)
with open('input.txt', 'r') as f:
for line in f.read().splitlines():
if not line: continue
if line.endswith(':'):
name = line.strip(':')
else:
d[name].append(line.split()[0])
If you want to keep the numbers, you can create a dictionary for each each entry in the file and store the contact with the associated number.
from collections import defaultdict
d = defaultdict(dict)
with open('input.txt', 'r') as f:
for line in f.read().splitlines():
if not line: continue
if line.endswith(':'):
name = line.strip(':')
else:
contact, number = line.split(' ')
d[name][contact] = int(number)
Which produces the following dictionary.
{'BREST': {'Rennes': 244},
'CAEN': {'Calais': 120, 'Paris': 241, 'Rennes': 176},
'CALAIS': {'Caen': 120, 'Nancy': 534, 'Paris': 297},
'RENNES': {'Breast': 244, 'Caen': 176, 'Nantes': 107, 'Paris': 348}}
How to use textscan to reading this txt file matlab/octave
Time:
11:00
Day:
2019-11-05
Company:
Hyperdrones
Drones:
Jupiter, alvalade, 20, 2000, 500.0, 20.0, 2019-11-05, 10:15
Terra, ameixoeira, 15, 1500, 400.0, 20.0, 2019-11-05, 10:20
V125, ameixoeira, 20, 2000, 350.0, 20.0, 2019-11-05, 10:20
Saturno, lumiar, 10, 1000, 600.0, 20.0, 2019-11-05, 10:30
Neptuno, lumiar, 15, 1500, 600.0, 15.0, 2019-11-05, 10:30
Mercurio, alvalade, 25, 2500, 200.0, 20.0, 2019-11-05, 10:40
Marte, campogrande, 10, 1500, 100.0, 10.0, 2019-11-05, 10:50
Regarding the Octave version specifically, I would recommend something like this:
pkg load io % required for `csv2cell` function
tmp = fileread('data'); % read file as string
tmp = strsplit( tmp, '\n' ); % split on line endings to create rows
tmp = tmp(1:6); % keep only rows 1 to 6
Headers = struct( tmp{:} ); % create struct from comma-separated-list
Headers.Drones = csv2cell('data', 7) % use csv2cell to read csv starting from row 7
Result:
Headers =
scalar structure containing the fields:
Time:: 1x5 sq_string
Day:: 1x10 sq_string
Company:: 1x11 sq_string
Drones: 8x8 cell
Matlab does not come with an equivalent csv2cell function, but there are similar ones on matlab fileexchange; here's one that seems to have similar syntax as the octave version.
In general I'd use csv2cell for non-numeric csv data of this nature; it's much easier to work with than textscan. I'd only use textscan as a last resort for non-csv files with lines that are unusual but otherwise consistent in some way...
PS. If your csv file ends with an empty line, csv2cell might result in an extra 'empty' cell row. Remove this manually if you don't want it.
Assuming you want to read the matrix portion:
C = textscan(fileID, '%s %s %d %d %f %f %{yyyy-dd-MM}D %{HH:mm}D','HeaderLines',7,'Delimiter,',')
The 7 assumes there aren't really blank lines in your file. If you have blank lines, adjust that to 14 or whatever is appropriate).
I am trying to import data from text file named xMat.txt which has the data in the following format.
200 space separated elements in one line and some 767 lines.
This is how xMat.txt looks.
386.0 386.0 388.0 394.0 402.0 413.0 ... .0 800.0 799.0 796
801.0 799.0 799.0 802.0 802.0 80 ... 399.0 397.0 394.0 391
.
.
.
This is my file - for reference.
When I try to read the file using
file = fopen('xMat.txt','r')
c = textscan(file,'%f');
I get the output as:
> c = { [1,1] =
> 386
> 386
> 388
> 394
> 402
> 413
> 427
> 442
> 458
> 473
> 487
> 499
> 509
> 517
> 524 ... in column format
What I need is a matrix of size (767X200). How can I do this?
I wouldn't use textscan in this case because your text file is purely numeric. Your text file contains 767 rows of 200 numbers per row where each number is delimited by a space. You couldn't get it to be any better suited for use with dlmread (MATLAB doc, Octave doc). dlmread can do this for you in one go:
c = dlmread('xMat.txt');
c will contain a 767 x 200 array for you that contains the data stored in the text file xMat.txt. Hopefully you can dump textscan in this case because what you're really after is trying to read your data into Octave... and dlmread does the job for you quite nicely.
I have unicode data as read from this file:
Mdt,Doccompra,OrgC,Cen,NumP,Criadopor,Dtcriacao,Fornecedor,P,Fun
400,8751215432,2581,,1,MIGRAÇÃO,01.10.2004,75852214,,TD
400,5464282154,9874,,1,MIGRAÇÃO,01.10.2004,78995411,,FO
I have two problems:
When I try to query this unicode data I get a UnicodeDecodeError:
Traceback (most recent call last):
File "<ipython-input-1-4423dceb2b1d>", line 1, in <module>
runfile('C:/Users/u5en/Documents/SAP/Programação/Problema HDF.py', wdir='C:/Users/u5en/Documents/SAP/Programação')
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 48, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/u5en/Documents/SAP/Programação/Problema HDF.py", line 15, in <module>
store.select("EKKA", "columns=['Mdt', 'Fornecedor']")
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 665, in select
return it.get_result()
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 1359, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 658, in func
columns=columns, **kwargs)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3968, in read
if not self.read_axes(where=where, **kwargs):
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3201, in read_axes
a.convert(values, nan_rep=self.nan_rep, encoding=self.encoding)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 2058, in convert
self.data, nan_rep=nan_rep, encoding=encoding)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 4359, in _unconvert_string_array
data = f(data)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 1700, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 1769, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 4358, in <lambda>
f = np.vectorize(lambda x: x.decode(encoding), otypes=[np.object])
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 7: unexpected end of data
How can I store and query my unicode data in hdf5?
I have many tables with column names I do not know beforehand and which are not proper pytable names (NaturalNameWarning). I would like the user to be able to query on this columns, so I wonder how could I query these when their name prevents me? I see this used to have no easy fix, so if that is still the case I will just remove the offending characters from the heading.
import csv
import pandas as pd
dados = pd.read_csv("EKKA - Cópia.csv")
print(dados)
store= pd.HDFStore('teste.h5' , encoding="utf-8")
store.append("EKKA", dados, format="table", data_columns=True)
store.select("EKKA", "columns=['Mdt', 'Fornecedor']")
store.close()
Would I be better off doing this in sqlite?
Environment:
Windows 7 64bit
Pandas 15.2
NumPy 1.9.2
So under Python 2.7 on Windows 7, pandas 0.15.2, everything worked as expected, no encoding necessary. However on Python 3.4, the following worked for me. Apparently some characters are not representable in 'utf-8'; 'latin1' encoding usually solves these issues. Note that I had to read the csv in the first place with this encoding.
>>> df = pd.read_csv('../../test.csv',encoding='latin1')
>>> df
Mdt Doccompra OrgC Cen NumP Criadopor Dtcriacao Fornecedor P Fun
0 400 8751215432 2581 NaN 1 MIGRAÇ\xc3O 01.10.2004 75852214 NaN TD
1 400 5464282154 9874 NaN 1 MIGRAÇ\xc3O 01.10.2004 78995411 NaN FO
Further, the encoding must be specified not when opening the store, but on the append/put calls
>>> df.to_hdf('test.h5','df',format='table',mode='w',data_columns=True,encoding='latin1')
>>> pd.read_hdf('test.h5','df')
Mdt Doccompra OrgC Cen NumP Criadopor Dtcriacao Fornecedor P Fun
0 400 8751215432 2581 NaN 1 MIGRAÇ\xc3O 01.10.2004 75852214 NaN TD
1 400 5464282154 9874 NaN 1 MIGRAÇ\xc3O 01.10.2004 78995411 NaN FO
Once it is written encoded, it is not necessary to specify the encoding when reading.