Char string encoding differences between native C++ and C++/CLI? - unicode

I have a strange problem for which I believe there is a solution but I cannot find it. Your help would be appreciated.
On the one hand, I have a native C++ class named Native which has a static wchar_t array containing accentuated characters. This array is const and defined at build time.
/// Header file
Native
{
public:
static const wchar_t* Array() const { return mArray; }
private:
static const wchar_t *mArray;
};
//--------------------------------------------------------------
/// .cpp file
const wchar_t* Native::mArray = {L"This is a description éàçï"};
On the other hand, I have a C++/CLI class that uses the array like this:
/// C++/CLI use
System::String^ S1 = gcnew System::String( Native::Array() );
System::String^ S2 = gcnew System::String( L"This is a description éàçï" };
The problem is that while S2 gives This is a description éàçï as expected, S1 gives This is a description éà çï. I do not understand why passing a pointer to a static array will not give the same result as giving the same array directly???
I guess this is an encoding problem but I would have expected the same results for both S1 and S2. Do you know how to solve the problem? The way I must use it in my program is like S1 i.e. by accessing the build time static array with a static method that returns a const wchar_t*.
Thanks for your help!
EDIT 1
What is the best way to define literals at build time in C++ using Intel C++ 13.0 to make them directly usable in C++/CLI System::String constructor? This could be the ultimate question for my problem.

I don't have enough reputation to add a comment to ask this question, so I apologize for posting this as an answer if that seems inappropriate.
Could the problem be that your compiler defines wchar_t to be 8 bits? I'm basing that is possible on this answer:
Should I use wchar_t when using UTF-8?
To answer your question (in the comments) about building a UTF-16 array at build time, I believe you can force it to be UTF-16 by using u"..." for your literal instead of L"..." (see http://en.cppreference.com/w/cpp/language/string_literal)
Edit 1:
For what it's worth, I tried your code (after fixing a couple compile errors) using Microsoft Visual Studio 10 and didn't have the same problem (both strings printed as expected).
I don't know if it will help you, but another possible way to statically initialize this wchar_t array is to use std::wstring to wrap your literal and then set your array to the c-string pointer returned by wstring::c_str(), shown as follows:
std::wstring ws(L"This is a description éàçï");
const wchar_t* Native::mArray = ws.c_str();
This edit was inspired by Dynamic wchar_t array (C++ beginner)

Related

Doxygen : error variable seen as function

I have in a function the following variable :
97
98 UINT8 Reponse;
99 static UINT8 Initialisation = 0;
100 static DWORD StartTime = 0; //
Initialisation is also the name of one function :
void Initialisation(void)
When I clic on the hyperlink on Initialisation line 99, the block of function void Initialisation(void) is oppened.
Did any of you have an idea of what is appening ?
Thanks you for your help
Jean-Marie
See doxygen's Known Problems:
Not all names in code fragments that are included in the documentation are replaced by links (for instance when using SOURCE_BROWSER = YES) and links to overloaded members may point to the wrong member. This also holds for the "Referenced by" list that is generated for each function.
For a part this is because the code parser isn't smart enough at the moment. I'll try to improve this in the future. But even with these improvements not everything can be properly linked to the corresponding documentation, because of possible ambiguities or lack of information about the context in which the code fragment is found.
and
Doxygen does not work properly if there are multiple classes, structs or unions with the same name in your code. It should not crash however, rather it should ignore all of the classes with the same name except one.

What does `m_` variable prefix mean?

I often see m_ prefix used for variables (m_World,m_Sprites,...) in tutorials, examples and other code mainly related to game development.
Why do people add prefix m_ to variables?
This is typical programming practice for defining variables that are member variables. So when you're using them later, you don't need to see where they're defined to know their scope. This is also great if you already know the scope and you're using something like intelliSense, you can start with m_ and a list of all your member variables are shown. Part of Hungarian notation, see the part about scope in the examples here.
In Clean Code: A Handbook of Agile Software Craftsmanship there is an explicit recommendation against the usage of this prefix:
You also don't need to prefix member variables with m_ anymore. Your classes and functions should be small enough that you don't need them.
There is also an example (C# code) of this:
Bad practice:
public class Part
{
private String m_dsc; // The textual description
void SetName(string name)
{
m_dsc = name;
}
}
Good practice:
public class Part
{
private String description;
void SetDescription(string description)
{
this.description = description;
}
}
We count with language constructs to refer to member variables in the case of explicitly ambiguity (i.e., description member and description parameter): this.
It is common practice in C++. This is because in C++ you can't have same name for the member function and member variable, and getter functions are often named without "get" prefix.
class Person
{
public:
std::string name() const;
private:
std::string name; // This would lead to a compilation error.
std::string m_name; // OK.
};
main.cpp:9:19: error: duplicate member 'name'
std::string name;
^
main.cpp:6:19: note: previous declaration is here
std::string name() const;
^
1 error generated.
http://coliru.stacked-crooked.com/a/f38e7dbb047687ad
"m_" states for the "member". Prefix "_" is also common.
You shouldn't use it in programming languages that solve this problem by using different conventions/grammar.
The m_ prefix is often used for member variables - I think its main advantage is that it helps create a clear distinction between a public property and the private member variable backing it:
int m_something
public int Something => this.m_something;
It can help to have a consistent naming convention for backing variables, and the m_ prefix is one way of doing that - one that works in case-insensitive languages.
How useful this is depends on the languages and the tools that you're using. Modern IDEs with strong refactor tools and intellisense have less need for conventions like this, and it's certainly not the only way of doing this, but it's worth being aware of the practice in any case.
As stated in the other answers, m_ prefix is used to indicate that a variable is a class member. This is different from Hungarian notation because it doesn't indicate the type of the variable but its context.
I use m_ in C++ but not in some other languages where 'this' or 'self' is compulsory. I don't like to see 'this->' used with C++ because it clutters the code.
Another answer says m_dsc is "bad practice" and 'description;' is "good practice" but this is a red herring because the problem there is the abbreviation.
Another answer says typing this pops up IntelliSense but any good IDE will have a hotkey to pop up IntelliSense for the current class members.
Lockheed Martin uses a 3-prefix naming scheme which was wonderful to work with, especially when reading others' code.
Scope Reference Type(*Case-by-Case) Type
member m pointer p integer n
argument a reference r short n
local l float f
double f
boolean b
So...
int A::methodCall(float af_Argument1, int* apn_Arg2)
{
lpn_Temp = apn_Arg2;
mpf_Oops = lpn_Temp; // Here I can see I made a mistake, I should not assign an int* to a float*
}
Take it for what's it worth.
As stated in many other responses, m_ is a prefix that denotes member variables. It is/was commonly used in the C++ world and propagated to other languages too, including Java.
In a modern IDE it is completely redundant as the syntax highlighting makes it evident which variables are local and which ones are members. However, by the time syntax highlighting appeared in the late 90s, the convention had been around for many years and was firmly set (at least in the C++ world).
I do not know which tutorials you are referring to, but I will guess that they are using the convention due to one of two factors:
They are C++ tutorials, written by people used to the m_ convention, and/or...
They write code in plain (monospaced) text, without syntax highlighting, so the m_ convention is useful to make the examples clearer.
Others have mentioned that it means a class member. Qt is a popular c++ Framework that uses this notation so alot of C++ GUI tutorial use m_. You can see almost all their examples use m_ for class members. Personally, I use m_ as it is shorter than this-> and feels compact.
To complete the current answers and as the question is not language specific, some C-project use the prefix m_ to define global variables that are specific to a file - and g_ for global variables that have a scoped larger than the file they are defined.
In this case global variables defined with prefix m_ should be defined as static.
See EDK2 (a UEFI Open-Source implementation) coding convention for an example of project using this convention.
One argument that I haven't seen yet is that a prefix such as m_ can be used to prevent name clashing with #define'd macro's.
Regex search for #define [a-z][A-Za-z0-9_]*[^(] in /usr/include/term.h from curses/ncurses.

converting a string to Unicode in C

I have a string in a variable and that string comes from the core part of the project. Now i want to convert that to unicode string. How can i do that
and adding L or _T() or TEXT() is not an option.
To further make thing clear please see below
Void foo(char* string) {
//Here the contents of the variable STRING should be converted to Unicode
//The soln should be possible to use in C code.
}
TIA
Naveen
L is used to create wchar_t literals.
From your comment about SafeArrayPutElement and the way you us the term 'Unicode' it's clear you're using Windows. Assuming that that char* string is in the legacy encoding Windows is using and not UTF-8 or something (a safe assumption on Windows) you can get a wchar_t string in the following ways:
// typical Win32 conversion in C
int output_size = MultiByteToWideChar(CP_ACP,0,string,-1,NULL,0);
wchar *wstring = malloc(output_size * sizeof(wchar_t));
int size = MultiByteToWideChar(CP_ACP,0,string,-1,wstring,output_size);
assert(output_size==size);
// make use of wstring here
free(wstring);
If you're using C++ you might want to make that exception safe by using std::wstring instead (this uses a tiny bit of C++11 and so may require VS2010 or above):
std::wstring ws(output_size,L'\0');
int size = MultiByteToWideChar(CP_ACP,0,string,-1,ws.data(),ws.size());
// MultiByteToWideChar tacks on a null character to mark the end of the string, but this isn't needed when using std::wstring.
ws.resize(ws.size() -1);
// make use of ws here. You can pass a wchar_t pointer to a function by using ws.c_str()
//std::wstring handles freeing the memory so no need to clean up
Here's another method that uses more of the C++ standard library (and takes advantage of VS2010 not being completely standards compliant):
#include <locale> // for wstring_convert and codecvt
std::wstring ws = std::wstring_convert<std::codecvt<wchar_t,char,std::mbstate_t>,wchar_t>().from_bytes(string);
// use ws.c_str() as before
You also imply in the comments that you tried converting to wchar_t and got the same error. If that's the case when you try these methods for converting to wchar_t then the error lies elsewhere. Probably in the actual content of your string. Perhaps it's not properly null terminated?
You can't say "converted to Unicode". You need to specify an encoding, Unicode is not an encoding but (roughly) a character set and a set of encodings to express those characters as sequences of bytes.
Also, you must specify the input encoding, how is e.g. a character such as "å" encoded in string?

How does the auto-free()ing work when I use functions like mktemp()?

Greetings,
I'm using mktemp() (iPhone SDK) and this function returns a char * to the new file name where all "X" are replaced by random letters.
What confuses me is the fact that the returned string is automatically free()d. How (and when) does that happen? I doubt it has something to do with the Cocoa event loop. Is it automatically freed by the kernel?
Thanks in advance!
mktemp just modifies the buffer you pass in, and returns the same poiinter you pass in, there's no extra buffer to be free'd.
That's at least how the OSX manpage describes it(I couldn't find documentation for IPhone) , and the posix manpage (although the example in the posix manpage looks to be wrong, as it pass in a pointer to a string literal - possibly an old remnant, the function is also marked as legacy - use mkstemp instead. The OSX manpage specifically mention that as being an error).
So, this is what will happen:
char template[] = "/tmp/fooXXXXXX";
char *ptr;
if((ptr = mktemp(template)) == NULL) {
assert(ptr == template); //will be true,
// mktemp just return the same pointer you pass in
}
If it's like the cygwin function of the same name, then it's returning a pointer to an internal static character buffer that will be overwritten by the next call to mktemp(). On cygwin, the mktemp man page specifically mentions _mktemp_r() and similar functions that are guaranteed reentrant and use a caller-provided buffer.

How do I access List template of C++ program from Perl using SWIG?

I want to access a template List of C++ program from a Perl script and use those values.
Example code:
typedef list < Struct1 * > sturct1_list;
struct Struct2
{
int i;
struct1_list List1;
}
struct Struct1
{
int j;
}
I used one swig generated api and did the following:
$myList = Struct2_struct1List_get
print "Reference type: " . ref($myList) ;
now this prints as:
Reference type: \_p\_std\_\_listTutils\_\_Struct1\_p\_t
how to get the values from the structure using this?
Update from duplicate question:
in interface file i put
%template(ListStruct1) std::list< Struct1 * >;
after i generate the ".pm" file. I checked the APIs available this list.
I found
ListStuct1_size
ListStuct1_empty
ListStuct1_clear
ListStuct1_push.
I was able to use those elements. But i dont know how to access individual elements of the list using these API? or am I missing something in interface file?
UPDATED:
Is typemap possible to return the list as array here??
First of all, general info
This tutorial shows how to do the wrapper for templates.
The same tutorial shows how to use the module from Perl, but the perl example doesn't touch templates.
This SO article shows how to do that with a Vector
Here's a general SWIG STL documentation that seems to mention std_list.i interface.
Second, regarding lists
You can not "access" C++ list like a Perl array, by a subscript. If you wanted that, you must use a Vector as underlying type.
As an alternate, create a class extending List, give it a new method which returns an element by an index, and expose that method in an interface.
If you wish to access the list by finding an element, like in C++, you need to write a List interface that exposes find() method - the default one does not from reading the source code.
In your interface, try:
%include "std_list.i"
%template(ListStruct1) std::list< Struct1 * >;
The std library is kinda funny, there's no actual binary object called list that swig can just wrap, it's all templates - so swig needs some extra help to figure out what's going on.
That should add insert, remove, and a bunch of other list specific functions to the wrapper.
If the above doesn't work, try adding:
%define SWIG_EXPORT_ITERATOR_METHODS
UPDATE: Of course, I neglected to mention (or even realize) that this works great for python, java, and a few others, but is totally broken in perl...