How to make Tcl_WriteChars support Unicode? - unicode

Is there any initial setup needed to make Tcl_WriteChars output UTF-8 characters correctly? e.g.
#include <tcl.h>
int main()
{
Tcl_Interp *tcl = Tcl_CreateInterp();
Tcl_Channel channel = Tcl_GetStdChannel(TCL_STDOUT);
Tcl_WriteChars(channel, "hello\n", -1);
Tcl_WriteChars(channel, "你好\n", -1);
Tcl_WriteRaw(channel, "你好\n", -1);
Tcl_Close(tcl, channel);
Tcl_DeleteInterp(tcl);
return 0;
}
Source code is saved in UTF-8 encoding, and following output is from UTF-8 locale Linux:
hello
??
你好

You need to configure the encoding to be UTF-8 (and the host you're running on appears to be using something else for its default). Do this before you write to the channel.
Tcl_SetChannelOption(interp, channel, "-encoding", "utf-8");
Properly, you should check for the return code of that (as below) but all channels have that option and the utf-8 encoding is baked directly into Tcl, so it won't fail.
if (Tcl_SetChannelOption(interp, channel, "-encoding", "utf-8") != TCL_OK) {
return TCL_ERROR;
}
[EDIT]: Having re-read the code a little more carefully (and found out that the system's default encoding is really UTF-8 in the first place), the actual problem is that you're not calling Tcl_FindExecutable(). That routine is a bit mis-named, as what it actually does (apart from making info nameofexecutable work inside scripts) is let Tcl initialise its internal library. In particular, it initialises the encoding management subsystem, and that's the point where it works out what the system encoding really is (otherwise it falls back to iso8859-1, which is the least problematic ordinary encoding to recover from).
Your code should read:
#include <tcl.h>
int main(int argc, char *argv[]) /// <<<< CHANGED HERE
{
Tcl_FindExecutable(argv[0]); /// <<<< CHANGED HERE
Tcl_Interp *tcl = Tcl_CreateInterp();
Tcl_Channel channel = Tcl_GetStdChannel(TCL_STDOUT);
Tcl_WriteChars(channel, "hello\n", -1);
Tcl_WriteChars(channel, "你好\n", -1);
Tcl_WriteRaw(channel, "你好\n", -1);
Tcl_Close(tcl, channel);
Tcl_DeleteInterp(tcl);
return 0;
}
I'm assuming you're using a compiler that is happy with putting declarations after statements. That's a widely-implemented C99 feature (and is also in C++) so I expect it will be fine.

Related

SWIG wrong encoded string crashes Python

I've a problem where all my SWIG wrappers that deals with strings crashes If I pass a wrong encoded string inside a std::string, I mean strings that contains èé and so on, characters valid for the current locale, but not UTF-8 valid.
On my code side, I have solved parsing the input as wide strings and convert them to UTF-8, but I would like to catch those kind of errors with an Exception rather than a crash, isn't supposed PyUnicode_Check to fail with those strings ?
Swig actually crashes in SWIG_AsCharPtrAndSize() when calling PyString_AsStringAndSize(), this is the swig generated code:
SWIGINTERN int
SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc)
{
#if PY_VERSION_HEX>=0x03000000
#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
if (PyBytes_Check(obj))
#else
if (PyUnicode_Check(obj))
#endif
#else
if (PyString_Check(obj))
#endif
{
char *cstr; Py_ssize_t len;
#if PY_VERSION_HEX>=0x03000000
#if !defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
if (!alloc && cptr) {
/* We can't allow converting without allocation, since the internal
representation of string in Python 3 is UCS-2/UCS-4 but we require
a UTF-8 representation.
TODO(bhy) More detailed explanation */
return SWIG_RuntimeError;
}
obj = PyUnicode_AsUTF8String(obj);
if(alloc) *alloc = SWIG_NEWOBJ;
#endif
PyBytes_AsStringAndSize(obj, &cstr, &len);
#else
PyString_AsStringAndSize(obj, &cstr, &len);
#endif
if (cptr) {
Crash happens to into the last PyString_AsStringAndSize visible.
I remark that strings are passed as std::string but in happens also with const char* without any kind of difference.
Thanks in advice !
Cannot reproduce. Edit your question and add a Minimal, Complete, Verifable Example if this example doesn't solve your issue and need further help:
test.i
%module test
%include <std_string.i>
%inline %{
#include <string>
std::string func(std::string s)
{
return '[' + s + ']';
}
%}
Demo:
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.func('ábc')
'[ábc]'
Problem was with 3.3.0 version we were still using, updating to 3.3.7 solved the problem, in the Python release notes there's several bug fixed in regards to PyUnicode_Check

How to encode Chinese text in QR barcodes generated with iTextSharp?

I'm trying to draw QR barcodes in a PDF file using iTextSharp. If I'm using English text the barcodes are fine, they are decoded properly, but if I'm using Chinese text, the barcode is decoded as question marks. For example this character '测' (\u6D4B) is decoded as '?'. I tried all supported character sets, but none of them helped.
What combination of parameters should I use for the QR barcode in iTextSharp in order to encode correctly Chinese text?
iText and iTextSharp apparently don't natively support this but you can write some code to handle this on your own. The trick is to get the QR code parser to work with just an arbitrary byte array instead of a string. What's really nice is that the iTextSharp code is almost ready for this but doesn't expose the functionality. Unfortunately many of the required classes are sealed so you can't just subclass them, you'll have to recreate them. You can either download the entire source and add these changes or just create separate classes with the same names. (Please check over the license to make sure you are allowed to do this.) My changes below don't have any error correction so make sure you do that, too.
The first class that you'll need to recreate is iTextSharp.text.pdf.qrcode.BlockPair and the only change you'll need to make is to make the constructor public instead of internal. (You only need to do this if you are creating your own code and not modifying the existing code.)
The second class is iTextSharp.text.pdf.qrcode.Encoder. This is where we'll make the most changes. Add an overload to Append8BitBytes that looks like this:
static void Append8BitBytes(byte[] bytes, BitVector bits) {
for (int i = 0; i < bytes.Length; ++i) {
bits.AppendBits(bytes[i], 8);
}
}
The string version of this method converts text to a byte array and then uses the above so we're just cutting out the middle man. Next, add a new overload to the constructor that takes in a byte array instead of a string. We'll then just cut out the string detection part and force the system to byte-mode, otherwise the code below is pretty much the same.
public static void Encode(byte[] bytes, ErrorCorrectionLevel ecLevel, IDictionary<EncodeHintType, Object> hints, QRCode qrCode) {
String encoding = DEFAULT_BYTE_MODE_ENCODING;
// Step 1: Choose the mode (encoding).
Mode mode = Mode.BYTE;
// Step 2: Append "bytes" into "dataBits" in appropriate encoding.
BitVector dataBits = new BitVector();
Append8BitBytes(bytes, dataBits);
// Step 3: Initialize QR code that can contain "dataBits".
int numInputBytes = dataBits.SizeInBytes();
InitQRCode(numInputBytes, ecLevel, mode, qrCode);
// Step 4: Build another bit vector that contains header and data.
BitVector headerAndDataBits = new BitVector();
// Step 4.5: Append ECI message if applicable
if (mode == Mode.BYTE && !DEFAULT_BYTE_MODE_ENCODING.Equals(encoding)) {
CharacterSetECI eci = CharacterSetECI.GetCharacterSetECIByName(encoding);
if (eci != null) {
AppendECI(eci, headerAndDataBits);
}
}
AppendModeInfo(mode, headerAndDataBits);
int numLetters = dataBits.SizeInBytes();
AppendLengthInfo(numLetters, qrCode.GetVersion(), mode, headerAndDataBits);
headerAndDataBits.AppendBitVector(dataBits);
// Step 5: Terminate the bits properly.
TerminateBits(qrCode.GetNumDataBytes(), headerAndDataBits);
// Step 6: Interleave data bits with error correction code.
BitVector finalBits = new BitVector();
InterleaveWithECBytes(headerAndDataBits, qrCode.GetNumTotalBytes(), qrCode.GetNumDataBytes(),
qrCode.GetNumRSBlocks(), finalBits);
// Step 7: Choose the mask pattern and set to "qrCode".
ByteMatrix matrix = new ByteMatrix(qrCode.GetMatrixWidth(), qrCode.GetMatrixWidth());
qrCode.SetMaskPattern(ChooseMaskPattern(finalBits, qrCode.GetECLevel(), qrCode.GetVersion(),
matrix));
// Step 8. Build the matrix and set it to "qrCode".
MatrixUtil.BuildMatrix(finalBits, qrCode.GetECLevel(), qrCode.GetVersion(),
qrCode.GetMaskPattern(), matrix);
qrCode.SetMatrix(matrix);
// Step 9. Make sure we have a valid QR Code.
if (!qrCode.IsValid()) {
throw new WriterException("Invalid QR code: " + qrCode.ToString());
}
}
The third class is iTextSharp.text.pdf.qrcode.QRCodeWriter and once again we just need to add an overloaded Encode method supports a byte array and that calls are new constructor created above:
public ByteMatrix Encode(byte[] bytes, int width, int height, IDictionary<EncodeHintType, Object> hints) {
ErrorCorrectionLevel errorCorrectionLevel = ErrorCorrectionLevel.L;
if (hints != null && hints.ContainsKey(EncodeHintType.ERROR_CORRECTION))
errorCorrectionLevel = (ErrorCorrectionLevel)hints[EncodeHintType.ERROR_CORRECTION];
QRCode code = new QRCode();
Encoder.Encode(bytes, errorCorrectionLevel, hints, code);
return RenderResult(code, width, height);
}
The last class is iTextSharp.text.pdf.BarcodeQRCode which we once again add our new constructor overload:
public BarcodeQRCode(byte[] bytes, int width, int height, IDictionary<EncodeHintType, Object> hints) {
newCode.QRCodeWriter qc = new newCode.QRCodeWriter();
bm = qc.Encode(bytes, width, height, hints);
}
The last trick is to make sure when calling this that you include the byte order mark (BOM) so that decoders know to decode this properly, in this case UTF-8.
//Create an encoder that supports outputting a BOM
System.Text.Encoding enc = new System.Text.UTF8Encoding(true, true);
//Get the BOM
byte[] bom = enc.GetPreamble();
//Get the raw bytes for the string
byte[] bytes = enc.GetBytes("测");
//Combine the byte arrays
byte[] final = new byte[bom.Length + bytes.Length];
System.Buffer.BlockCopy(bom, 0, final, 0, bom.Length);
System.Buffer.BlockCopy(bytes, 0, final, bom.Length, bytes.Length);
//Create are barcode using our new constructor
var q = new BarcodeQRCode(final, 100, 100, null);
//Add it to the document
doc.Add(q.GetImage());
Looks like you may be out of luck. I tried too and got the same results as you did. Then looked at the Java API:
"*CHARACTER_SET the values are strings and can be Cp437, Shift_JIS and
ISO-8859-1 to ISO-8859-16. The default value is ISO-8859-1.*"
Lastly, looked at the iTextSharp BarcodeQRCode class source code to confirm that only those characters sets are supported. I'm by no means an authority on Unicode or encoding, but according to ISO/IEC 8859, the character sets above won't work for Chinese.
Essentially the same trick that Chris has done in his answer could be implemented by specifying UTF-8 charset in barcode hints.
var hints = new Dictionary<EncodeHintType, Object>() {{EncodeHintType.CHARACTER_SET, "UTF-8"}};
var q = new BarcodeQRCode("\u6D4B", 100, 100, hints);
If you want to be more safe, you can start your string with BOM character '\uFEFF', like Chris suggested, so it would be "\uFEFF\u6D4B".
UTF-8 is unfortunately not supported by QR codes specification, and there are a lot of discussions on this subject, but the fact is that most QR code readers will correctly read the code created by this method.

c gtk+: loading a text file into a GtkSourceView's TextBuffer

I'm writing a program using the C language with gtk+ and gtksourceview-2.0.
I'm using a GtkFileChooser for the user to choose a file and when he clicks on it, i want the content to be loaded to the GtkSourceView' TextBuffer
this is the function that gets executed when a user double click's a file on the GtkFileChooser:
void on_file_activated(GtkWidget *widget, gpointer data) {
GFile *file;
FILE *fp;
gchar *path_name;
long file_size;
gchararray file_buffer;
file = gtk_file_chooser_get_file(GTK_FILE_CHOOSER(widget));
path_name=g_file_get_path(file);
g_debug("%s is chosen\n", path_name);
fp=fopen(path_name, "r");
g_assert( fp != NULL);
fseek(fp, 0L, SEEK_END);
file_size = ftell(fp);
rewind(fp);
g_debug("file size: %ld\n",file_size*sizeof(gchar));
file_buffer=calloc(file_size, sizeof(gchar));
g_assert(file_buffer != NULL);
fread(&file_buffer,file_size,1,fp);
g_debug("after fread");
//file_buffer[file_size*sizeof(gchar)]=0;
//g_debug("after adding zero: %s",file_buffer);
gtk_text_buffer_set_text (textbuffer, file_buffer,2);
g_debug("after set text");
g_object_unref(file);
}
this is the output of my application:
** (tour_de_gtk:18107): DEBUG: /home/ufk/Projects/gtk-projects/tour-de-gtk/Debug/src/examples/example_gtk_label/main.c is chosen
** (tour_de_gtk:18107): DEBUG: file size: 16
** (tour_de_gtk:18107): DEBUG: after fread
after then i get a segmentation fault on the command gtk_text_buffer_set_text
as you can see i have two commands that are commented out. trying to g_debug the buffer which obviously creates a segmentation fault because i didn't add a zero to the end of the string, and even when I try to add zero to the end of the string i get a segmentation fault. I probably did something wrong.
here i'm trying to write only the first two characters of the buffer but with no luck.
any ideas?
update
the finished function:
void on_file_activated(GtkWidget *widget, gpointer data) {
GFile *file;
gchar *path_name;
long file_size;
gchar *file_buffer;
GError *error;
gboolean read_file_status;
file = gtk_file_chooser_get_file(GTK_FILE_CHOOSER(widget));
path_name=g_file_get_path(file);
g_debug("%s is chosen\n", path_name);
read_file_status=g_file_get_contents (path_name,&file_buffer,NULL, &error);
if (read_file_status == FALSE) {
g_error("error opening file: %s\n",error && error->message ? error->message : "No Detail");
return;
}
gtk_text_buffer_set_text (textbuffer, file_buffer,-1);
g_debug("after set text");
g_object_unref(file);
}
There are a lot of improvements possible here, you may already know many and just be messing around, but I'll list several in case.
gchararray file_buffer;
Just use char*
g_assert( fp != NULL);
Should use assert for programming errors, not runtime errors, so here g_printerr() or a dialog would be better
fseek(fp, 0L, SEEK_END);
file_size = ftell(fp);
rewind(fp);
fstat(fileno(fp), &statbuf) is probably a better way to do this, but the whole approach is kind of bad; rather than get the size, it's better to just read into a dynamically-growing buffer. Or if you're willing to preallocate the whole buffer, just use g_file_get_contents(). Another approach is g_file_query_info() (which is more portable and uses the vfs)
file_buffer=calloc(file_size, sizeof(gchar));
g_new0(char, file_size) is nicer, or g_malloc0(file_size). Also you need file_size+1 to make room for the nul byte.
fread(&file_buffer,file_size,1,fp);
Here you want file_buffer (a char*) rather than &file_buffer (a char**). This is probably the actual cause of the immediate breakage.
You also need to check the return value of fread().
Also missing here is g_utf8_validate() on the data read in.
Have a look at the implementation of g_file_get_contents() to see one approach here. You could also use g_file_load_contents to use a GFile instead of a path (portable, uses vfs) or better yet in a real-world app, g_file_load_contents_async().
To debug segfaults, the two best tools are:
run in gdb, wait for crash, then type "bt"; be sure to use -g with your compiler when you compile
run in valgrind, see where it says you look at bad memory

What's the CFString Equiv of NSString's UTF8String?

I'm stuck on stoopid today as I can't convert a simple piece of ObjC code to its Cpp equivalent. I have this:
const UInt8 *myBuffer = [(NSString*)aRequest UTF8String];
And I'm trying to replace it with this:
const UInt8 *myBuffer = (const UInt8 *)CFStringGetCStringPtr(aRequest, kCFStringEncodingUTF8);
This is all in a tight unit test that writes an example HTTP request over a socket with CFNetwork APIs. I have working ObjC code that I'm trying to port to C++. I'm gradually replacing NS API calls with their toll free bridged equivalents. Everything has been one for one so far until this last line. This is like the last piece that needs completed.
This is one of those things where Cocoa does all the messy stuff behind the scenes, and you never really appreciate just how complicated things can be until you have to roll up your sleeves and do it yourself.
The simple answer for why it's not 'simple' is because NSString (and CFString) deal with all the complicated details of dealing with multiple character sets, Unicode, etc, etc, while presenting a simple, uniform API for manipulating strings. It's object oriented at its best- the details of 'how' (NS|CF)String deals with strings that have different string encodings (UTF8, MacRoman, UTF16, ISO 2022 Japanese, etc) is a private implementation detail. It all 'just works'.
It helps to understand how [#"..." UTF8String] works. This is a private implementation detail, so this isn't gospel, but based on observed behavior. When you send a string a UTF8String message, the string does something approximating (not actually tested, so consider it pseudo-code, and there's actually simpler ways to do the exact same thing, so this is overly verbose):
- (const char *)UTF8String
{
NSUInteger utf8Length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSMutableData *utf8Data = [NSMutableData dataWithLength:utf8Length + 1UL];
char *utf8Bytes = [utf8Data mutableBytes];
[self getBytes:utf8Bytes
maxLength:utf8Length
usedLength:NULL
encoding:NSUTF8StringEncoding
options:0UL
range:NSMakeRange(0UL, [self length])
remainingRange:NULL];
return(utf8Bytes);
}
You don't have to worry about the memory management issues of dealing with the buffer that -UTF8String returns because the NSMutableData is autoreleased.
A string object is free to keep the contents of the string in whatever form it wants, so there's no guarantee that its internal representation is the one that would be most convenient for your needs (in this case, UTF8). If you're using just plain C, you're going to have to deal with managing some memory to hold any string conversions that might be required. What was once a simple -UTF8String method call is now much, much more complicated.
Most of NSString is actually implemented in/with CoreFoundation / CFString, so there's obviously a path from a CFStringRef -> -UTF8String. It's just not as neat and simple as NSString's -UTF8String. Most of the complication is with memory management. Here's how I've tackled it in the past:
void someFunction(void) {
CFStringRef cfString; // Assumes 'cfString' points to a (NS|CF)String.
const char *useUTF8StringPtr = NULL;
UInt8 *freeUTF8StringPtr = NULL;
CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L;
if((useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8)) == NULL) {
if((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL) {
CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes);
freeUTF8StringPtr[usedBytes] = 0;
useUTF8StringPtr = (const char *)freeUTF8StringPtr;
}
}
long utf8Length = (long)((freeUTF8StringPtr != NULL) ? usedBytes : stringLength);
if(useUTF8StringPtr != NULL) {
// useUTF8StringPtr points to a NULL terminated UTF8 encoded string.
// utf8Length contains the length of the UTF8 string.
// ... do something with useUTF8StringPtr ...
}
if(freeUTF8StringPtr != NULL) { free(freeUTF8StringPtr); freeUTF8StringPtr = NULL; }
}
NOTE: I haven't tested this code, but it is modified from working code. So, aside from obvious errors, I believe it should work.
The above tries to get the pointer to the buffer that CFString uses to store the contents of the string. If CFString happens to have the string contents encoded in UTF8 (or a suitably compatible encoding, such as ASCII), then it's likely CFStringGetCStringPtr() will return non-NULL. This is obviously the best, and fastest, case. If it can't get that pointer for some reason, say if CFString has its contents encoded in UTF16, then it allocates a buffer with malloc() that is large enough to contain the entire string when its is transcoded to UTF8. Then, at the end of the function, it checks to see if memory was allocated and free()'s it if necessary.
And now for a few tips and tricks... CFString 'tends to' (and this is a private implementation detail, so it can and does change between releases) keep 'simple' strings encoded as MacRoman, which is an 8-bit wide encoding. MacRoman, like UTF8, is a superset of ASCII, such that all characters < 128 are equivalent to their ASCII counterparts (or, in other words, any character < 128 is ASCII). In MacRoman, characters >= 128 are 'special' characters. They all have Unicode equivalents, and tend to be things like extra currency symbols and 'extended western' characters. See Wikipedia - MacRoman for more info. But just because a CFString says it's MacRoman (CFString encoding value of kCFStringEncodingMacRoman, NSString encoding value of NSMacOSRomanStringEncoding) doesn't mean that it has characters >= 128 in it. If a kCFStringEncodingMacRoman encoded string returned by CFStringGetCStringPtr() is composed entirely of characters < 128, then it is exactly equivalent to its ASCII (kCFStringEncodingASCII) encoded representation, which is also exactly equivalent to the strings UTF8 (kCFStringEncodingUTF8) encoded representation.
Depending on your requirements, you may be able to 'get by' using kCFStringEncodingMacRoman instead of kCFStringEncodingUTF8 when calling CFStringGetCStringPtr(). Things 'may' (probably) be faster if you require strict UTF8 encoding for your strings but use kCFStringEncodingMacRoman, then check to make sure the string returned by CFStringGetCStringPtr(string, kCFStringEncodingMacRoman) only contains characters that are < 128. If there are characters >= 128 in the string, then go the slow route by malloc()ing a buffer to hold the converted results. Example:
CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L;
useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8);
for(CFIndex idx = 0L; (useUTF8String != NULL) && (useUTF8String[idx] != 0); idx++) {
if(useUTF8String[idx] >= 128) { useUTF8String = NULL; }
}
if((useUTF8String == NULL) && ((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL)) {
CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes);
freeUTF8StringPtr[usedBytes] = 0;
useUTF8StringPtr = (const char *)freeUTF8StringPtr;
}
Like I said, you don't really appreciate just how much work Cocoa does for you automatically until you have to do it all yourself. :)
In the sample code above, the following appears:
CFIndex stringLength = CFStringGetLength(cfString)
stringLength is then being used to malloc() a temporary buffer of that many bytes, plus 1.
But the header file for CFStringGetLength() expressly says it returns the number of 16-bit Unicode characters, not bytes. So if some of those Unicode characters are outside the ASCII range, the malloc() buffer won't be long enough to hold the UTF-8 conversion of the string.
Perhaps I'm missing something, but to be absolutely safe, the number of bytes needed to hold N arbitrary Unicode characters is at most 4*n, when they're all converted to UTF-8.
From the documentation:
Whether or not this function returns a valid pointer or NULL depends on many factors, all of which depend on how the string was created and its properties. In addition, the function result might change between different releases and on different platforms. So do not count on receiving a non-NULL result from this function under any circumstances.
You should use CFStringGetCString if CFStringGetCStringPtr returns NULL.
Here's some working code. I started with #johne's answer, replaced CFStringGetBytes with CFStringGetLength for simplicity, and made the correction suggested by #Doug.
const char *useUTF8StringPtr = NULL;
char *freeUTF8StringPtr = NULL;
if ((useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8)) == NULL)
{
CFIndex stringLength = CFStringGetLength(cfString);
CFIndex maxBytes = 4 * stringLength + 1;
freeUTF8StringPtr = malloc(maxBytes);
CFStringGetCString(cfString, freeUTF8StringPtr, maxBytes, kCFStringEncodingUTF8);
useUTF8StringPtr = freeUTF8StringPtr;
}
// ... do something with useUTF8StringPtr...
if (freeUTF8StringPtr != NULL)
free(freeUTF8StringPtr);
If it's destined for a socket, perhaps CFStringGetBytes() would be your best choice?
Also note that the documentation for CFStringGetCStringPtr() says:
This function either returns the requested pointer immediately, with no memory allocations and no copying, in constant time, or returns NULL. If the latter is the result, call an alternative function such as the CFStringGetCString function to extract the characters.
Here's a way to printf a CFStringRef which implies we get a '\0'-terminated string from a CFStringRef:
// from: http://lists.apple.com/archives/carbon-development/2001/Aug/msg01367.html
// by Ali Ozer
// gcc -Wall -O3 -x objective-c -fobjc-exceptions -framework Foundation test.c
#import <stdio.h>
#import <Foundation/Foundation.h>
/*
This function will print the provided arguments (printf style varargs) out to the console.
Note that the CFString formatting function accepts "%#" as a way to display CF types.
For types other than CFString and CFNumber, the result of %# is mostly for debugging
and can differ between releases and different platforms. Cocoa apps (or any app which
links with the Foundation framework) can use NSLog() to get this functionality.
*/
void show(CFStringRef formatString, ...) {
CFStringRef resultString;
CFDataRef data;
va_list argList;
va_start(argList, formatString);
resultString = CFStringCreateWithFormatAndArguments(NULL, NULL, formatString, argList);
va_end(argList);
data = CFStringCreateExternalRepresentation(NULL, resultString,
CFStringGetSystemEncoding(), '?');
if (data != NULL) {
printf ("%.*s\n", (int)CFDataGetLength(data), CFDataGetBytePtr(data));
CFRelease(data);
}
CFRelease(resultString);
}
int main(void)
{
// To use:
int age = 25;
CFStringRef name = CFSTR("myname");
show(CFSTR("Name is %#, age is %d"), name, age);
return 0;
}

ncurses and stdin blocking

I have stdin in a select() set and I want to take a string from stdin whenever the user types it and hits Enter.
But select is triggering stdin as ready to read before Enter is hit, and, in rare cases, before anything is typed at all. This hangs my program on getstr() until I hit Enter.
I tried setting nocbreak() and it's perfect really except that nothing gets echoed to the screen so I can't see what I'm typing. And setting echo() doesn't change that.
I also tried using timeout(0), but the results of that was even crazier and didn't work.
What you need to do is tho check if a character is available with the getch() function. If you use it in no-delay mode the method will not block. Then you need to eat up the characters until you encounter a '\n', appending each char to the resulting string as you go.
Alternatively - and the method I use - is to use the GNU readline library. It has support for non-blocking behavior, but documentation about that section is not so excellent.
Included here is a small example that you can use. It has a select loop, and uses the GNU readline library:
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
#include <stdlib.h>
#include <stdbool.h>
int quit = false;
void rl_cb(char* line)
{
if (NULL==line) {
quit = true;
return;
}
if(strlen(line) > 0) add_history(line);
printf("You typed:\n%s\n", line);
free(line);
}
int main()
{
struct timeval to;
const char *prompt = "# ";
rl_callback_handler_install(prompt, (rl_vcpfunc_t*) &rl_cb);
to.tv_sec = 0;
to.tv_usec = 10000;
while(1){
if (quit) break;
select(1, NULL, NULL, NULL, &to);
rl_callback_read_char();
};
rl_callback_handler_remove();
return 0;
}
Compile with:
gcc -Wall rl.c -lreadline