Sending corrupted csv line over the socket in Python 3 - sockets

I build a sensor unit where data are gathered at one Raspberry Pi and send to another's over the network.
My first Pi creates a line with multiple readings from different sensors. It supposed to create a server and send it to clients. The client Pis needs to receive the sentence, do further processing or visualisation.
To test my solutions I want to read data from a txt file, which was build in an experiment. The problem is that sometimes data are corrupted, has different format depending on sensor and rows can be different for different set-ups.
I have build a function which suppose to change the input line to bytes. (I tried different methods but only this clunky function is the closest to any results). But it does not convert over the network
import struct
message = ['First sensor', 'second data', 'third',1, '19.04.2016', 0.1]
def packerForNet(message):
pattern = ''
newMessage = []
for cell in message:
if isinstance( cell, int ):
pattern += ('I')
newMessage.append(cell)
elif isinstance( cell, float ):
pattern += ('d')
newMessage.append(cell)
elif isinstance(cell, str):
pattern += (str(len(cell)))
pattern += ('s')
newMessage.append(cell.encode('UTF-8'))
else:
cell = str(cell)
pattern += (len(cell))
pattern += ('s')
newMessage.append(cell.encode('UTF-8'))
return (newMessage, pattern)
newMessage, pattern = packerForNet(message)
patternStruct = struct.Struct(pattern)
packedM = patternStruct.pack(*newMessage)
The output from the function does not unpack correctly:
packedM = b'First sensorsecond datathird\x01\x00\x00\x0019.04.2016\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xb9?'
56
print('unpacked = %s' % patternStruct.unpack(packedM))
TypeError: not all arguments converted during string formatting
In addition I need to know the pattern to unpack it on the client side so in general it doesn't have sense.
In final version the server needs to work in the way that after client connects it will send to the client line from the sensors every millisecond. Sensors parsing is implemented in C and at the moment it creates a txt file for off-line processing. I can't change the way the sensor's line are made.
I don't know how to pack the list of different types and constantly send such lists to the client.

Actually it does unpack correctly. The problem is not with the unpacking, but with the print.
Try:
unpackedM = patternStruct.unpack(packedM)
print(unpackedM)
unpackedM is a tuple of multiple values. The formatting of the string with the tuple failed.
EDIT:
To convert entire objects you can use python msgpack. That is what we use in communication python to python and php to python.

Related

Import Fortran unformatted binary

I have an unformatted binary file generated using the Compaq Visual Fortran compiler (big endian).
Here's what the little bit of documentation states about it:
The binary file is written in a general format consisting of data arrays, headed by a descriptor record:
An 8-character keyword which identifies the data in the block.
A 4-byte signed integer defining the number of elements in the block.
A 4-character keyword defining the type of data. (INTE, REAL, LOGI, DOUB, or CHAR)
The header items are read in as a single record. The data follows the descriptor on a new record. Numerical arrays are divided into block of up to 1000 items. The physical record size is the same as the block size.
Attempts to read such data
module modbin
type rectype
character(len=8)::key
integer::data_count
character(len=4)::data_type
logical::is_int
integer, allocatable:: idata(:)
real(kind=8), allocatable::rdata(:)
end type
contains
subroutine rec_read(in_file, out_rec)
integer, intent(in):: in_file
type (rectype), intent(inout):: out_rec
!
! You need to play around with this figure. It may not be
! entirely accurate - 1000 seems to work, 1024 does not
integer, parameter:: bsize = 1000
integer:: bb, ii, iimax
! read the header
out_rec%data_count = 0
out_rec%data_type = ' '
read(in_file, end = 20) out_rec%key, out_rec%data_count,
out_rec%data_type
! what type is it?
select case (out_rec%data_type)
case ('INTE')
out_rec%is_int = .true.
allocate(out_rec%idata(out_rec%data_count))
case ('DOUB')
out_rec%is_int = .false.
allocate(out_rec%rdata(out_rec%data_count))
end select
! read the data in blocks of bsize
bb = 1
do while (bb .lt. out_rec%data_count)
iimax = bb + bsize - 1
if (iimax .gt. out_rec%data_count) iimax = out_rec%data_count
if (out_rec%is_int) then
read(in_file) (out_rec%idata(ii), ii = bb, iimax)
else
read(in_file) (out_rec%rdata(ii), ii = bb, iimax)
end if
bb = iimax + 1
end do
20 continue
end subroutine rec_read
subroutine rec_print(in_recnum, in_rec)
integer, intent(in):: in_recnum
type (rectype), intent(in):: in_rec
print *, in_recnum, in_rec%key, in_rec%data_count, in_rec%data_type
! print out data
open(unit=12, file='reader.data' , status='old')
write(12,*)key
!write(*,'(i5')GEOMINDX
!write(*,'(i5')ID_BEG
!write(*,'(i5')ID_END
!write(*,'(i5')ID_CELL
!write(*,'(i5')TIME_BEG
!write(*,'(i5')SWAT
!format('i5')
!end do
close(12)
end subroutine rec_print
end module modbin
program main
use modbin
integer, parameter:: infile=11
! fixed size for now - should really be allocatable
integer, parameter:: rrmax = 500
type (rectype):: rec(rrmax)
integer:: rr, rlast
open(unit=infile, file='TEST1603.SLN0001', form='UNFORMATTED',
status='OLD', convert='BIG_ENDIAN')
rlast = 0
do rr = 1, rrmax
call rec_read(infile, rec(rr))
if (rec(rr)%data_type .eq. ' ') exit
rlast = rr
call rec_print(rr, rec(rr))
end do
close(infile)
end program main
This code compiles and runs smoothly showing
and produces no errors but this is written in the output file
shows me no useful numerical values
The file in question is available here
And the right WRITE statement should produce a file like this one here
Is my WRITE STATEMENT to output this file type wrong? , and if so, what is the best way?
thank you
The comments above are trying to direct you to one of (at least) two problems in your code. In the subroutine rec_print you have write(12,*)key where you meant to write write(12,*)in_rec%key (at least I think that's what you wanted.)
The other problem I spotted is that rec_print opens reader.data with status='old' and then closes it after writing key. (The use of old here suggests that the file already exists.) Each time rec_print is called, the file is opened, the first record is overwritten, and the file is closed. One solution to this would be to use status='unknown'. position='append', though it would be more efficient to open the file once in the main program and just let the subroutine write to it.
If I make these changes, I get in the data file:
INTEHEAD
GEOMETRY
GEOMINDX
ID_BEG
ID_END
ID_CELL
TIME_BEG
SWAT
A side-comment about CONVERT= and derived types: Your program isn't affected by this, but there are compiler differences with how reading a derived type record with CONVERT= is handled. I think gfortran converts each component according to its type, but I know that Intel Fortran doesn't convert reads (nor writes) of an entire derived type. You are reading individual components, which works in both compilers, so that's fine, but I thought it was worth mentioning.
If you're wondering why Intel Fortran does it this way, it's due to the VAX FORTRAN (where CONVERT= came from) heritage with STRUCTURE/RECORD and the possible use of UNION/MAP (not available in standard Fortran). With unions, there's no way to know how a particular component should be converted, so it just transfers the bytes. I had suggested to the Intel team that this could be relaxed if no UNIONs were present, but that I'm sure is very low priority.

Sending Data Array over Serial

I have a serial port called COM1 to which I want to synchronously write some data, but when using data with more than one row I get an error:
test1 = randi([0,255],1,32);
test2 = randi([0,255],2,32);
fwrite(COM1, test1, 'int8', 'sync'); %successful
fwrite(COM1, test2, 'int8', 'sync'); %unsuccessful
The error I get with test2 simply says Unsuccessful write: An error occurred during writing. Is it possible to send data as an array in this way? If so, I suspect the problem lies in how the rows get terminated. How do I ensure the terminators of the arrays and serial port match? I am aware of set(COM1, 'Terminator', X) but don't know what X should be to match a MATLAB array.
Using MATLAB version 9.3.0.713579 (R2017b).
That is simply a data format requirement for fwrite/serial. The data A should a vector, not an array. You should try
fwrite(COM1, test2(:), 'int8', 'sync');
But make sure the column-major order is what you need. Otherwise, you will do
test2 = test2.'; % will send in row order
fwrite(COM1, test2(:), 'int8', 'sync');

General purpose Tuple in Hadoop

I'm new to Hadoop, so please do not judge strictly my seemingly simple question.
The short version: What tuple data type can I use in Hadoop, to store 2 longs as a single value is a sequence file?
Moreover, I want to be able to read and process this file with Apache Pig like A = LOAD '/my/file' AS (a:long, (b:long, c:long)) and with Scala & Spark like val a = sc.sequenceFile[LongWritable, DesiredTuple]("/my/file", 1).
The full story:
I'm writing a Hadoop Job in Java, and I need to output a sequence file, which contains 3 long values at each line. I use first value a a key and group two other values together as a value in my Reducer.
I tried several variants:
Using org.apache.hadoop.mapreduce.lib.join.TupleWritable
public class MyReducer extends Reducer<...> {
public void reduce(Context context){
long a,b,c;
// ...
context.write(a, new TupleWritable(
new LongWritable[]{new LongWritable(b), new LongWritable(c)}));
}
}
But the javadoc of TupleWritable class says " * This is not a general-purpose tuple type." It seems to be ok for first attempt, but I can't get back my Tuples. Look as a simple script in Apace Pig:
A = LOAD '/my/file' USING org.apache.pig.piggybank.storage.SequenceFileLoader()
AS (a:long, (b:long, t:long));
DUMP A;
I got Something like this:
(2220,)
(5640,)
(6240,)
...
So what is the Apache Pig way of reading Hadoop's TupleWritable from a sequence file?
Furthermore, I tried to change sequence format to text format: job.setOutputFormatClass(TextOutputFormat.class);
This time I just looked in one of outputed files:
> hdfs dfs -cat /my/file/part-r-00000 | head
2220 [,]
5640 [,]
6240 [,]
...
So is the next question: Why there is nothing in my TupleWritable value?
After that, I tried org.apache.mahout.cf.taste.hadoop.EntityEntityWritable.
For a sequence file I got the same result as before:
grunt> A = LOAD '/my/file' USING org.apache.pig.piggybank.storage.SequenceFileLoader() AS (a:long, (b:long, c:long));
(2220,)
(5640,)
(6240,)
...
For a text file I got the desired result:
2220 2 15
5640 1 9
6240 0 1
...
And next question is: How to read such tuples (EntityEntityWritable) and may be other custom objects back from Hadoop-written sequence file?

Export data to same CSV file for multiple runs of .m file

I am running a MATLAB program and storing the results in two matrices. For each run of the program, those matrices are written to the same .csv file.
How can I continue to store data to the same file for future runs of the program? Is there a function that checks for data already being present to avoid overwriting cells?
t = 0.0001*[0:70];
v = B_2*R_R.*exp(-alpha.*t).*sin(omega_d.*t);
tv = [t; v].';
csvwrite('thedata.csv',tv,3,0)
I couldn't resist rewriting your code a bit.
This should be equivalent to what you have, and print both vectors to thedata.csv.
t = 0.0001*[0:70];
v = B_2*R_R.*exp(-alpha.*t).*sin(omega_d.*t);
tv = [t; v].';
csvwrite('thedata.csv',tv,3,0)
Due to the way csv-files are stored, you can only append data at the end of the file, which happens to be the bottom row, not the last column. What you should do, is concatenate all data before writing to the csv-file. That way you avoid multiple calls to csvwrite or dlmwrite (they are time consuming).
If that's impossible, then I suggest reading the data from the csv-file, using csvread, append the new data to the data you retrieve, and write it all back again.
csvwrite('thedata.csv',tv)
mydata = csvread('thedata.csv');
mydata2 = [mydata, tv2];
csvwrite('thedata.csv',mydata2)

date in pig latin

I am trying to do the following. I have multiple dates and I want to create a pig script which gets unknown number of input dates and then runs the pig script for the input arguments. My question is:
How can I send an unknown number of input variables to a pig script and then handle them within the pig script?
Thanks
Sara
I have some trouble understanding what you actually want to do. That would be my solution >for your problem, sending an unknown number of dates (sorted as chararray):
A = load 'input_dates' AS (date:chararray);
B = my_macro(A);
It's quite basic, so I guess I didn't understand your problem correctly. Could you maybe >develop a little bit more your problem?
UPDATE >> How about something like this if you use Pig 0.11 (there is a bug until 0.10 for module imports):
#!/usr/bin/python
import os
from org.apache.pig.scripting import *
P = Pig.compile("""
data = LOAD '$docs_in' AS (a:int);
-- do something
""")
lof = os.listdir("/home/.../dates/")
params = []
for elem in lof:
params.append({'docs_in': str(elem)})
lof.remove(elem)
bound = P.bind(list_of_files)
stats = bound.run(params)
If each run is counting on the result of the previous one, use runSingle() instead.
If I understand question correctly, you want to load number of files or directories. You can specify as "," as input.
Below is an example:
load.pig (content):
A = LOAD '$input' using PigStorage();
dump A;
command to run ( to run locally):
pig -x local -param input=20120301,20120302,20120304 load.pig