SWIG wrong encoded string crashes Python - unicode

I've a problem where all my SWIG wrappers that deals with strings crashes If I pass a wrong encoded string inside a std::string, I mean strings that contains èé and so on, characters valid for the current locale, but not UTF-8 valid.
On my code side, I have solved parsing the input as wide strings and convert them to UTF-8, but I would like to catch those kind of errors with an Exception rather than a crash, isn't supposed PyUnicode_Check to fail with those strings ?
Swig actually crashes in SWIG_AsCharPtrAndSize() when calling PyString_AsStringAndSize(), this is the swig generated code:
SWIGINTERN int
SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc)
{
#if PY_VERSION_HEX>=0x03000000
#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
if (PyBytes_Check(obj))
#else
if (PyUnicode_Check(obj))
#endif
#else
if (PyString_Check(obj))
#endif
{
char *cstr; Py_ssize_t len;
#if PY_VERSION_HEX>=0x03000000
#if !defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
if (!alloc && cptr) {
/* We can't allow converting without allocation, since the internal
representation of string in Python 3 is UCS-2/UCS-4 but we require
a UTF-8 representation.
TODO(bhy) More detailed explanation */
return SWIG_RuntimeError;
}
obj = PyUnicode_AsUTF8String(obj);
if(alloc) *alloc = SWIG_NEWOBJ;
#endif
PyBytes_AsStringAndSize(obj, &cstr, &len);
#else
PyString_AsStringAndSize(obj, &cstr, &len);
#endif
if (cptr) {
Crash happens to into the last PyString_AsStringAndSize visible.
I remark that strings are passed as std::string but in happens also with const char* without any kind of difference.
Thanks in advice !

Cannot reproduce. Edit your question and add a Minimal, Complete, Verifable Example if this example doesn't solve your issue and need further help:
test.i
%module test
%include <std_string.i>
%inline %{
#include <string>
std::string func(std::string s)
{
return '[' + s + ']';
}
%}
Demo:
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.func('ábc')
'[ábc]'

Problem was with 3.3.0 version we were still using, updating to 3.3.7 solved the problem, in the Python release notes there's several bug fixed in regards to PyUnicode_Check

Related

How to make Tcl_WriteChars support Unicode?

Is there any initial setup needed to make Tcl_WriteChars output UTF-8 characters correctly? e.g.
#include <tcl.h>
int main()
{
Tcl_Interp *tcl = Tcl_CreateInterp();
Tcl_Channel channel = Tcl_GetStdChannel(TCL_STDOUT);
Tcl_WriteChars(channel, "hello\n", -1);
Tcl_WriteChars(channel, "你好\n", -1);
Tcl_WriteRaw(channel, "你好\n", -1);
Tcl_Close(tcl, channel);
Tcl_DeleteInterp(tcl);
return 0;
}
Source code is saved in UTF-8 encoding, and following output is from UTF-8 locale Linux:
hello
??
你好
You need to configure the encoding to be UTF-8 (and the host you're running on appears to be using something else for its default). Do this before you write to the channel.
Tcl_SetChannelOption(interp, channel, "-encoding", "utf-8");
Properly, you should check for the return code of that (as below) but all channels have that option and the utf-8 encoding is baked directly into Tcl, so it won't fail.
if (Tcl_SetChannelOption(interp, channel, "-encoding", "utf-8") != TCL_OK) {
return TCL_ERROR;
}
[EDIT]: Having re-read the code a little more carefully (and found out that the system's default encoding is really UTF-8 in the first place), the actual problem is that you're not calling Tcl_FindExecutable(). That routine is a bit mis-named, as what it actually does (apart from making info nameofexecutable work inside scripts) is let Tcl initialise its internal library. In particular, it initialises the encoding management subsystem, and that's the point where it works out what the system encoding really is (otherwise it falls back to iso8859-1, which is the least problematic ordinary encoding to recover from).
Your code should read:
#include <tcl.h>
int main(int argc, char *argv[]) /// <<<< CHANGED HERE
{
Tcl_FindExecutable(argv[0]); /// <<<< CHANGED HERE
Tcl_Interp *tcl = Tcl_CreateInterp();
Tcl_Channel channel = Tcl_GetStdChannel(TCL_STDOUT);
Tcl_WriteChars(channel, "hello\n", -1);
Tcl_WriteChars(channel, "你好\n", -1);
Tcl_WriteRaw(channel, "你好\n", -1);
Tcl_Close(tcl, channel);
Tcl_DeleteInterp(tcl);
return 0;
}
I'm assuming you're using a compiler that is happy with putting declarations after statements. That's a widely-implemented C99 feature (and is also in C++) so I expect it will be fine.

discrepancy between build result and Problems view in Eclipse CDT

I'm using Eclipse 4.2, with CDT, and MinGW toolchain on a Windows machine (although I've a feeling the problem has nothing to do with this specific configuration). The G++ compiler is 4.7
I'm playing with c++11 features, with the following code:
#include <iostream>
#include <iomanip>
#include <memory>
#include <vector>
#include <list>
#include <algorithm>
using namespace std;
int main( int argc, char* argv[] )
{
vector<int> v { 1, 2, 3, 4, 5, 6, 7 };
int x {5};
auto mark = remove_if( v.begin(), v.end(), [x](int n) { return n<x; } );
v.erase( mark, v.end() );
for( int x : v ) { cout << x << ", "; }
cout << endl;
}
Everything is very straight forward and idiomatic c++11. The code compiles with no problems on the command line (g++ -std=c++11 hello.cpp).
In order to make this code compile In eclipse, I set the compiler to support C++11:
Properties -> C/C++ Build -> Settings -> Miscellaneous -> Ohter Flags:
I'm adding -std=c++11
Properties -> C/C++Build -> Discovery Options -> Compiler invocation arguments:
Adding -std=c++11
That's the only change I did to either the global preferences or to the project properties.
First Question: Why do I've to change the flags in two places? When each compiler flags is used?
If I hit Ctrl-B, the project will build successfully, as expected, and running it from within eclipse show the expected result (It prints: '5, 6, 7,').
However, the editor view shows red marks of error on both the 'remove_if' line, and the 'v.erase' line. Similarly, the Problems view shows I've these two problems. Looking at the details of the problem, I get:
For the remove_if line: 'Invalid arguments. Candidates are: #0 remove_if(#0, #0, #1)
For the erase line: 'Invalid arguments Candidates are: '? erase(?), ? erase(?,?)'
Second questions: It appears there are two different builds: one for continues status, and one for the actual build. Is that right? If so, do they have different rule (compilation flags, include paths, etc.)?
Third question: In the problem details I also see: 'Name resolution problem found by the indexer'. I guess this is why the error message are so cryptic. Are those messages coming from MinGW g++ compiler or from Eclipse? What is this Name resolution? How do I fix?
Appreciate your help.
EDIT (in reply to #Eugene): Thank you Eugene. I've opened a bug on Eclipse. I think that C++11 is only partially to blame. I've cleaned my code from C++11 stuff, and removed the -std=c++11 flag from both compilation switch. And yet, the CodAn barks on the remove_if line:
int pred( int n ) { return n < 5; }
int main( int argc, char* argv[] )
{
vector<int> v;
for( int i=0; i<=7; ++i ) {
v.push_back( i );
}
vector<int>::iterator mark = remove_if( v.begin(), v.end(), pred );
v.erase( mark, v.end() );
for( vector<int>::iterator i = v.begin(); i != v.end(); ++i ) {
cout << *i << ", ";
}
cout << endl;
}
The code compiles just fine (with Ctrl-B), but CodAn doesn't like the remove_if line, saying: Invalid Arguments, Candidates are '#0 remove_if(#0,#0,#1)'.
This is a very cryptic message - it appears it misses to substitute arguments in format string (#0 for 'iterator' and #1 for 'predicate'). I'm going to update the bug.
Interestingly, using 'list' instead of 'vector' clears up the error.
However, as for my question, I'm curious about how the CodAn work. Does it uses g++ (with a customized set of flags), or another external tool (lint?), or does it do it internally in Java? If there is a tool, how can I get its command line argument, and its output?
Build/Settings - these flags will be included into your makefile to do actual build. Build/Discovery - these flags will be passed to a compiler when "scanner settings" are discovered by IDE. IDE will run compiler in a special mode to discover values of the predefined macros, include paths, etc.
I believe, the problems you are seeing are detected by "Codan". Codan is a static analysis built into the CDT editor, you may find its settings on "C/C++ General"/"Code Analysis". You should report the problem to the bugs.eclipse.org if you feel the errors shown are bogus. Note that CDT does not yet support all C++11 features.

NSLog style debug messages from C code

I have some C code in a static library that I'm compiling into an iPhone app. I'd like to print some debug messages to the console; is there something like NSLog I should use? I'm guessing NSLog only works for the Objective-C parts of the program.
EDIT: fprintf(stdout, fmt...) and fprintf(stderr, fmt...) don't work either.. any idea why they don't? Should they work?
you can always do the classic:
fprintf(stderr, "hi, this is a log line: %s", aStringVariable);
You can make a wrapper for NSLog if you mix Objective-C code like this:
log.h
void debug(const char *message, ...) __attribute__((format(printf, 1, 2)));
log.m
#import <Foundation/Foundation.h>
#import "log.h"
void debug(const char *message,...)
{
va_list args;
va_start(args, message);
NSLog(#"%#",[[NSString alloc] initWithFormat:[NSString stringWithUTF8String:message] arguments:args]);
va_end(args);
}
and then, in your C file:
#include "log.h"
...
debug("hello world! variable: %d", num);
Other solution:
#include <CoreFoundation/CoreFoundation.h>
extern "C" void NSLog(CFStringRef format, ...);
#define MyLog(fmt, ...) \
{ \
NSLog(CFSTR(fmt), ##__VA_ARGS__); \
}
MyLog("val = %d, str = %s", 123, "abc");
While printf will show up if you're debugging from XCode, it won't show up in the Organizer Console. You can use what NSLog uses: CFLog or syslog.
#include <asl.h>
...
asl_log(NULL, NULL, ASL_LEVEL_ERR, "Hi There!");
Note that lower priority levels such as ASL_LEVEL_INFO may not show up in console.
You should be able to see printf or fprintf statements. Try running some of the code in the library but from the terminal, something should appear if not. A much more complicated (and even silly/stupid/wrong) method that I will guarantee you that will work would be:
sprintf(message,"This is a log line %s",someString);
system("echo %s",message);
If that doesn't work then you probably have something weird in your code.
Note: This will probably only work in the simulator.
You probably need to use the Apple System Log facility to get the output into the device console.
Check the functions in usr/asl.h.
asl_open
asl_new
asl_set
asl_log
asl_free
While printf will show up if you're debugging from XCode, it won't show up in the Organizer Console. You can run the following command to print only to device's console:
syslog(LOG_WARNING, "log string");
You will also need to #include <sys/syslog.h> for syslog and LOG_WARNING to be explicitly declared

Conditional Compilation in assembler (.s) code for iPhone - how?

I have a few lines of assembler arm code in an .s file. Just a few routines i need to call. It works fine when building for the device, however when i switch to iPhone Simulator i get "no such instruction" errors. I tried to compile parts of the .s file conditionally with what i know:
#if !TARGET_IPHONE_SIMULATOR
But the assembler doesn't recognize these preprocessor directives (of course) and none of the conditional compilation techniques for assembler that i could remember or find worked, so i'm scratching my head now on how to avoid compilation of that assembler code when building for the Simulator. I also don't see a project option in Xcode that would allow me to compile the file or not depending on the target platform.
SOLVED:
All i was missing was the proper #import in the assembler file. I did not think of adding it because Xcode syntax highlighted any preprocessor directive in green (comment) which made me assume that these commands are not recognized when in fact they work just fine.
This works:
#import "TargetConditionals.h"
#if !TARGET_IPHONE_SIMULATOR
... asm code here ...
#endif
You do do it with a pre-processor macro. They are defined in TargetConditionals.h TARGET_IPHONE_SIMULATOR should be there! (You do need to #include it however.)
Here is code I use to detect ARM vs Thumb vs Simulator:
#include "TargetConditionals.h"
#if defined(__arm__)
# if defined(__thumb__)
# define COMPILE_ARM_THUMB_ASM 1
# else
# define COMPILE_ARM_ASM 1
# endif
#endif
#if TARGET_IPHONE_SIMULATOR
// Simulator defines
#else
// ARM or Thumb mode defines
#endif
// And here is how you might use it
uint32_t
test_compare_shifted_operand(uint32_t w1) {
uint32_t local;
#if defined(COMPILE_ARM_ASM)
const uint32_t shifted = (1 << 8);
__asm__ __volatile__ (
"mov %[w2], #1\n\t"
"cmp %[w2], %[w1], lsr #8\n\t"
"moveq %[w2], #10\n\t"
"movne %[w2], #11\n\t"
: \
[w1] "+l" (w1),
[w2] "+l" (local)
: \
[shifted] "l" (shifted)
);
#else // COMPILE_ARM_ASM
if ((w1 >> 8) == 1) {
local = 10;
} else {
local = 11;
}
#endif // COMPILE_ARM_ASM
return local;
}

ncurses and stdin blocking

I have stdin in a select() set and I want to take a string from stdin whenever the user types it and hits Enter.
But select is triggering stdin as ready to read before Enter is hit, and, in rare cases, before anything is typed at all. This hangs my program on getstr() until I hit Enter.
I tried setting nocbreak() and it's perfect really except that nothing gets echoed to the screen so I can't see what I'm typing. And setting echo() doesn't change that.
I also tried using timeout(0), but the results of that was even crazier and didn't work.
What you need to do is tho check if a character is available with the getch() function. If you use it in no-delay mode the method will not block. Then you need to eat up the characters until you encounter a '\n', appending each char to the resulting string as you go.
Alternatively - and the method I use - is to use the GNU readline library. It has support for non-blocking behavior, but documentation about that section is not so excellent.
Included here is a small example that you can use. It has a select loop, and uses the GNU readline library:
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
#include <stdlib.h>
#include <stdbool.h>
int quit = false;
void rl_cb(char* line)
{
if (NULL==line) {
quit = true;
return;
}
if(strlen(line) > 0) add_history(line);
printf("You typed:\n%s\n", line);
free(line);
}
int main()
{
struct timeval to;
const char *prompt = "# ";
rl_callback_handler_install(prompt, (rl_vcpfunc_t*) &rl_cb);
to.tv_sec = 0;
to.tv_usec = 10000;
while(1){
if (quit) break;
select(1, NULL, NULL, NULL, &to);
rl_callback_read_char();
};
rl_callback_handler_remove();
return 0;
}
Compile with:
gcc -Wall rl.c -lreadline