csv file reading issue ,contain ' # ' in value - python-3.7

my data file contain on 26 row value 1 column value are " sample # ",when I try to read i get error,Please help
I hade tried in three ways,
1) pd.read_csv('/home/ami/test/st/st_04042018A.txt', delimiter=",") # getting error 'utf-8' codec can't decode byte 0xb3 in position 13: invalid start byte ------ error
2) csv = np.genfromtxt('/home/ami/test/st/st_04042018A.txt', delimiter=",") ------ same error
3)
c=open(r'/home/ami/test/st/st_04042018A.txt')
c=c.read()
----same error
please help to read the file.

Related

OCTAVE data import from PCE-VDL data logger device and conversion of decimal coma to decimal point

I have a measurement device PCE-VDL, which gives me measurements in following CSV format below, which I need to import to OCTAVE for further investigation.
Especially I need to import last 3 columns with xyz acceleration data.
The file is in CSV format with delimiter of semicolon ";".
I have tried:
A_1 = importdata ("file.csv", ";", 3);
but have recieved
error: missing_idx(10): out of bound 9
The CSV file looks like this:
#PCE-VDL X - TableView series
#2020.16.11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020.28.10;16:16:32:0000;00:000;;;;0,0195;-0,0547;1,0039;
2020.28.10;16:16:32:0052;00:005;;;;0,0898;-0,0273;0,8789;
2020.28.10;16:16:32:0104;00:010;;;;0,0977;-0,0313;0,9336;
2020.28.10;16:16:32:0157;00:015;;;;0,1016;-0,0273;0,9297;
The numbers in last 3 columns have also decimal coma and not decimal point. So there probably should be done also some conversion.
Thank you very much for any help.
Regards
EDIT: 18.11.2020
Thanks for help. I have tried now following:
A_1_str = fileread ("file.csv");
A_1_str_m = strrep (A_1_str, ".", "-");
A_1_str_m = strrep (A_1_str_m, ",", ".");
save "A_1_str_m.csv" A_1_str_m;
A_1 = importdata ("A_1_str_m.csv", ";", 8);
and still receive error: file_content(140): out of bound 139
There is probably some problem with time format in first columns, which I do not want to read. I just need last three columns.
After my conversion, the file looks like this:
# Created by Octave 5.1.0, Wed Nov 18 21:40:52 2020 CET <zdenek#ASUS-F5V>
# name: A_1_str_m
# type: sq_string
# elements: 1
# length: 7849
#PCE-VDL X - TableView series
#2020-16-11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020-28-10;16:16:32:0000;00:000;;;;0.0195;-0.0547;1.0039;
2020-28-10;16:16:32:0052;00:005;;;;0.0898;-0.0273;0.8789;
2020-28-10;16:16:32:0104;00:010;;;;0.0977;-0.0313;0.9336;
Thanks for support!
You can first read the data with fileread, which stores the data as a string. Then you can manipulate the string like this:
new_string = strrep(string, ",", ".");
strrep replaces all occurrences of a pattern within a string. Afterwards you save this data as a separate file or you overwrite the existing file with the manipulated data. When this is done you proceed as you have tried before.
EDIT: 19.11.2020
To avoid the additional heading lines in the new file, you can save it like this:
fid = fopen("A_1_str_m.csv", "w");
fputs(fid, A_1_str_m);
fclose(fid);
fputs will just write the string to the file.
The you can read the new file with dlmread.
A1_buf = dlmread("A_1_str_m.csv", ";");
A1_buf = real(A1); # get the real value of the complex number
A1_buf(1:3, :) = []; # remove the headlines
A1 = A1_buf(:, end-3:end-1); # get only the the 3 columns you're looking for
This will give you the three columns your looking for. But the date and time data will be ignored.
EDIT 20.11.2020
Replaced abs with real, so the sign of the value will be kept.
Use csv2cell from the io package.

Replace $ char with zero for data field using SQLLoader

A text file contains data like below.
041522$$$$$$$$$NAPTTALIE REVERE #1621500025 OLD ST FUNNRHILL MA1530 273 000000$$$$$$$03#$$$##############$$$$$$$$$$$$$$$$$$Z$$$$$$$$$$$$$$$$$$$$$$###$$$$$$$$$$$$$$$$$$$$$#####$$$$$$$$$$$$$$$#$$$$$0$$$$$$$$$$$000000$$$$$$$$$$$$#$$#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$##$$$$$$$$$$$$000000$$$$$$$$$$$$$A###Y$$$$$$$$$$$$$1##$$$$$$$$$$$$$$$$$$$##02$$$$$$$$$$$$$#$$$$$$$$$$$$$$$$$$$$$$##Y#######$$$$#################################
Control FIle:
LOAD DATA
CHARACTERSET "UTF8"
INFILE 'C:\bendex\MA_File38\fileout.txt'
BADFILE 'C:\bendex\MA_File38\baddata.bad'
DISCARDFILE 'C:\bendex\MA_File38\discdata.dsc'
APPEND
INTO TABLE "TMP_DATA_1220"
TRAILING NULLCOLS
(
SOURCE CONSTANT "TEST",
FILE_DTE "TRUNC(SYSDATE)",
AU_REGION POSITION (1:2),
AU_OFFICE POSITION (3:5),
AU_PGM_CATEGORY POSITION (6) ,
GRANTEE_SSN POSITION (7:15),
GRANTEE_NAME POSITION (16:38),
CAT_ELIG_IND POSITION (39),
PHONE POSITION (40:47),
ADDRESS POSITION (48:70),
CITY POSITION (71:83),
STATE POSITION (84:85),
ZIP POSITION (86:90),
CAN_NUM POSITION (91:95),
NET_INC POSITION (96:101) "TO_NUMBER(:NET_INC)",
START_DTE POSITION (102:107) "CASE WHEN :START_DTE ='$$$$$$' THEN TO_CHAR(REPLACE(:START_DTE, '$', '0')) ELSE DATE 'rrmmdd'",
LAST_UPDT_UID_NAM CONSTANT "LOADF38",
LAST_UPDT_TS "SYSTIMESTAMP"
)
**Error:**
Record 1: Rejected - Error on table "TMP_DATA_1220", column START_DTE.
ORA-01841: (full) year must be between -4713 and +9999, and not be 0
I have to read the data from the text file and load into table. I tried to replace '$' with '0' and convert to date field, position 102 to 107, but I am getting error. I tried using REPLACE, DECODE did not work.
Any help is much appreciated. Thank you.
NOTE: The text file has full length data but reading only first few data points using SQL Loader.
I believe you would want to make your start date NULL if it was invalid, no?
"CASE WHEN :START_DTE ='$$$$$$' THEN NULL ELSE to_date(:START_DTE, 'rrmmdd') END"

Openpyxl Unicode decode error cannot remove \ufeff from cell value

I am parsing multiple worksheets of unicode data and creating a dictionary for specific cells in each sheet but I am having trouble decoding the unicode data. The small snippet of the code is below
for key in shtDict:
sht = wb[key]
for row in sht.iter_rows('A:A',row_offset = 1):
for cell in row:
if isinstance(cell.value,unicode):
if "INC" in cell.value:
shtDict[key] = cell.value
The output of this section is:
{'60071508': u'\ufeffReason: INC8595939', '60074426': u'\ufeffReason. Ref INC8610481', '60071539': u'\ufeffReason: INC8603621'}
I tried to properly decode the data based on u'\ufeff' in Python string, by changing the last line to:
shtDict[key] = cell.value.decode('utf-8-sig')
But I get the following error:
Traceback (most recent call last):
File "", line 55, in <module>
shtDict[key] = cell.value.decode('utf-8-sig')
File "C:\Python27\lib\encodings\utf_8_sig.py", line 22, in decode
(output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
Not sure what the issue is, I have also tried decoding with 'utf-16', but I get the same error. Can anyone help with this?
Just make it simpler: you can ignore BOF, so just ignore BOF characters.
shtDict[key] = cell.value.replace(u'\ufeff', '', 1)
Note: cell.value is already unicode type (you just checked it), so you cannot decode it again.

How to read a string containing a comma and an at sign with textread?

My prototype data line looks like this:
(1) 11 July England 0-0 Uruguay # Wembley Stadium, London
Currently I'm using this:
[no,dd,mm,t1,p1,p2,t2,loc]=textread('1966.txt','(%d) %d %s %s %d-%d %s # %[%s \n]');
But it gives me the following error:
Error using dataread
Trouble reading string from file (row 1, field 12) ==> Wembley Stadium, London\n
Error in textread (line 174)
[varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>
So it seems to have trouble with reading a string that contains a comma, or it's the at sign that causes trouble. I read the documentation thoroughly but nowhere does it mention what to do when you have special characters such as # or if you want to read a string that contains a delimiter even though it I don't want it recognized as a delimiter.
You want
[no,dd,mm,t1,p1,p2,t2,loc] = ...
textread('1966.txt','(%d) %d %s %s %d-%d %s # %[^\n]');

Matlab Read Text File List Exclude first 34 characters

I am trying to read values from a text file. I want the value after ': '.
Here is a sample of the text file. All lines are formated the same.
There are 34 places before the start of the data.
File Name : IMG_1184.JPG
File Size : 2.1 MB
File Modification Date/Time : 2012:07:14 11:53:18-05:00
File Permissions : rw-rw-rw-
File Type : JPEG
MIME Type : image/jpeg
Exif Byte Order : Big-endian (Motorola, MM)
I tried to use this code:
fileID = fopen('Exif.txt');
Exif1 = textscan(fileID, '%s %s','delimiter', ':');
This worked on most of the data but some data also used ':' so that didn't work.
I tried to use this code:
fileID = fopen('Exif.txt');
Exif1 = textscan(fileID, '%s %s','delimiter', ': ');
This returned a mess. Not sure why. Everything was fragmented.
Can anyone explain how to just get the 35th value to the end of every string and put it into an array?
There is the function strtrim(string) in Matlab which will strip the leading and trailing spaces for you. Try reading the data in a line at the time into the textscan function after using strtrim?
Read the whole line into a variable then get the 35th and subsequent characters like this:
whole_line(35:end)