ORSSerialPort data that should be a single line is coming in over multiple lines - swift

I have an arduino that spits out a single line of GPS data down the serial line every half second, which I know works because I can look at the serial monitor in the arduino IDE and every half second, a new single line of data appears.
I'm now in the process of writing a Mac program using Swift that puts each coordinate on a map as it comes in through the serial port, and am using the ORSSerialPort library to connect to the arduino and receive its data. This works fine and I had a basic version working earlier, however I noticed that there were gaps in the GPS data (they were appearing in small groups on the map, with a noticeable space in between when it should be a constant line of them).
Before I had the map I had a text field that would have each GPS data line added to it as it came in, which produced the exact same output as the arduino IDE serial monitor, so I thought everything was working fine.
To try and fix the problem with the map I removed the map code and simply print()ed out each line into the XCode IDE as it came in through the serial port. To my surprise there were random line breaks in the data and I don't understand why. I feel that this may be causing the problems I am having (with splitting the string at every comma so I can extract the individual values) so would like to know why it comes out as a single line in the arduino IDE and the text field, but not in the XCode IDE and presumably whenever else I am working with the string.
EDIT: I prefixed the print to XCode IDE and the output to the text field with five plusses and suffixed them with five dashes, then set the serial port to close after sending a single report (what should be a single line of data). The output I got to both things ended up being three lines, each prefixed and suffixed with the plusses and dashes. See the photo below, which shows what should be a single line:
Why are my single lines of data coming through over multiple lines and behaving like individual variables (as in getting the last character of the line returns the last character of the first line of the three, not a semi colon)?

The issue isn't likely that there are extra newlines being inserted. Rather, ORSSerialPort (like the underlying POSIX API it uses) simply reports data to its delegate as it comes in. It has no way of knowing that for your particular use case you only want complete lines.
You need to buffer the incoming data and only process it when you've received a complete "line"/packet. ORSSerialPort includes an API, ORSSerialPacketDescriptor that makes this easier. There is further documentation for that API here: https://github.com/armadsen/ORSSerialPort/wiki/Packet-Parsing-API
Do note that this API doesn't (yet) support the use of a end delimiter only. You need to validate the entire packet beginning to end, as the parsing routine is "lazy". That is, it tries to find the smallest match possible starting from the end of the packet.

Related

Dataprep import dataset does not detect headers in first row automatically

I am importing a dataset from Google Cloud Storage (parameterized) into Dataprep. So far, this worked perfectly fine and one of the feature that I liked is that it auto detects that the first row in my (application/octet-stream) .csv file are my headers.
However, today I tried to import a new dataset and it did not detect the headers, but it auto assigned column1, column2...
What has changed and or why is this the case. I have checked the box auto-detect and use UTF-8:
While the auto-detect option is usually pretty good, there are times that it fails for numerous reasons. I've specifically noticed this when the field names contain certain characters (e.g. comma, invisible characters like zero-width-non-joiners, null bytes), or when multiple different styles of newline delimiters are used within the same file.
Another case I saw this is when there were more columns of data than there were headers.
As you already hit on, you can use the following snippet to do mostly the same thing:
rename type: header method: filter sanitize: true
. . . or make separate recipe steps to convert the first row to header and then bulk-rename to your own liking.
More often than not, however, I've found that when auto-detect fails on a previously working file, it tends to be a sign of some sort of issue with the source file. I would look for mismatched data, as well as misplaced commas within the output, as well as comparing the header and some data rows to the original source using a plaintext editor.
When all else fails, you can try a CSV validator . . . but in my experience they tend to be incredibly opinionated when it comes to the formatting options of the fileā€”so depending on the system generating the CSV, it could either miss any errors or give false-positives. I have had two experiences where auto-detect fails for no apparent reason on perfectly clean files, so it is possible that process was just skipped for some reason.
It should also be noted that if you have a structured file that was correctly detected but want to revert it, you can go to the dataset details, select the "..." (More) button, and choose "Remove structure..." (I'm hoping that one day they'll let you do the opposite when you want to add structure to a raw dataset or work around bugs like this!)
Best of luck!
Can be resolved as a transformation within a Flow:
rename type: header method: filter sanitize: true

Add the hash of the code in executable file

I have an STM32 application which uses two blocks of memory. In 0th block, I have a boot code (which runs just after power-on) and in 7th block, I have an application code (which may or may not run depending on the authorization decision given by the boot code).
Those two codes are developed hence generated by two separate projects. They are flashed on the specific blocks (boot code to 0th block and application code to 7th block) of STM32 NOR memory using openocd tool by giving an offset value to the openocd's write_image command.
What I would like to do basically in the boot code is that I want to calculate the hash of the application code and compare it with the reference digest. If they are equal, I will give the hand to the application code. For that, after I generate the executable (can be in elf, hex or bin format) of the application code, I want to:
Create another file (in any format listed above) which has 128K byte size
Copy the content of the executable file to the recently created file from its beginning (0 offset)
Write the hash of the executable to the last 32 bytes of the recently created file
Fill the gap with 0xFF
Finally flash this executable file (if it is still) to the 7th block of the memory
Do you think that it is doable and feasible? If so:
Which format should I use to generate the executable?
Do I have something that I need to give specific attention to achieve this?
Lastly, do you think that it makes sense to do that or is there any other more standard way for this purpose?
Thanks a lot in advance.
You just need to add an additional step to your building sequence. After the linking extract the binary file from elf
Then write a program in your favourite programming language which will calculate something and append the result to that bin file

Store row numbers which are causing "error"

I have to retrieve certain information from urls. For this I have to enter text into fields of the url. I am using GET operation for this. I have to modify the text to replace spaces with "%20". Some times the text(which is taken from the database) is badly formed. I would like to know the row numbers so I can manually change the text for such rows in the database and run it again. I have tried to use the logs and errors section but with little luck. Does anybody have an idea of how to do this?
First shot: Output bad urls on the console
So far, I came up with the following job design for your problem:
The trick is to catch the exceptions of the tHttpRequest component and print the necessary details on the console. For this example, I included the line number, the exception message and the URL that produced the exception.
Output (I couldn't reproduce your "Illegal character error", so I took a different one):
Second shot: Output to a file
If you really need to output the line numbers to a file, things get a little more complicated.
Instead of printing the info straight onto the console, we collect all line numbers into a context variable of type (Java) List inside the tJavaFlex. After the usual URL processing (which I have left out from the job design to keep the example small), we iterate over the Java List
and save it into a tHashOutput, so that we can finally write to a file.
We cannot directly write to the file in the tLoop section, since the Iterate flow would lead to the situation the the tFileInputDelimited would be opened several times. If "Append" was disabled, only the last bad URL line number would finally appear in the output file. If "Append" was enabled, you would get the full list of line numbers after the very first job run - but you would append every time you run the job, making the list longer and longer. Workarounds would be to use a runtime-dependent file name (e.g. timestamp) or to delete the file at the beginning of the job run. I chose the third option, that overwrites the file every time we run the job. Feel free to chose among those options the one which suits your use case best.
Details
The tHashOutput/tHashInput components are not visible on default, but must be enabled first to show up: https://www.talendforge.org/forum/viewtopic.php?pid=107249#p107249
Context variable:
INIT:
tJavaFlex "catch errors", end code:
tLoop:
tFixedFlowInput "badURL":
tHashOutput:
Needs to have "Append" enabled.

Defining what is a line in Tesseract

I'm working on document recognition for scanned bank statement. The statements that I have are organized by lines, such as the one attached. Because Tesseract does such a good job at detecting the areas of text, it breaks the lines in the middle (I'm assuming this is because of the large white space between the first block in the line (blurred for privacy reason), and the next one ('EUR', or 'COURS').
In the hocr file, the bbox of all the elements in the line are within 2px or so, so I could potentially rebuild a line myself. However, this seems more like a hack. Is there a way to tell Tesseract that lines should be as wide as the document itself? Or would there be another way to go about it? I've tried playing with the psm option, but with no luck.
-psm 6 -- Assume a single uniform block of text -- should work. If not, you may want to use the older version 2.0x, which does not perform page layout analysis.

How can I make log4perl output easier to read?

When using log4perl, the debug log layout that I'm using is :
log4perl.appender.D10.layout=PatternLayout
log4perl.appender.D10.layout.ConversionPattern=%d [pid=%P] %p %F{1} (%L) %M %m%n
log4perl.appender.D10.Filter = DebugAndUp
This produces very verbose debug logs, for example:
2008/11/26 11:57:28 [pid=25485] DEBUG SomeModule.pm (331) functions::SomeModule::Test Test XXX was successfull
2008/11/26 11:57:29 [pid=25485] ERROR SomeOtherUnrelatedModule.pm (99999) functions::SomeModule::AnotherTest AnotherTest YYY has faled
This works great, and provides excellent debugging data.
However, each line of the debug log contains different function names, pid length, etc. This makes each line layout differently, and makes reading debug logs much harder than it needs to be.
Is there a way in log4perl to format the line so that the debugging metadata (everything up until the actual log message) be padded at the end with spaces/tabs, and have the actual message start at the same column of text?
You can pad the single fields that make up your entries. For example [pid=%5P] will always give you at least 5 characters for the PID.
The "Quantify Placeholders" section in the docs for Log::Log4perl::Layout gives more details.
There are a couple of ways to go with this, although you have to figure out which one works better for your situation:
Use a different appender if you are working live. Have that appender use a pattern that shows only the information you want. If you're working in a single process, for instance, your alternate appender might leave off the PID and the timestamp. You might only need the file name and line number.
Use %n to put newlines in the right place. That makes it multi-line output that is slightly harder to parse later, but you can choose another sequence for the input record separator (say, a literal "[EOL]") to make it easy to read entry-by-entry.
Log to a database instead of a file. For your reports, select just the columns you want to inspect.
Log everything, but write a filter to go through the log file ad-hoc to display just the parts that you want to see, such as only the debugging messages, the entries between certain times, only the entries involving a file, and so on.