Swift: how to fully strip internal/inline symbols?

Swift: how to fully strip internal/inline symbols? - swift

I need to write some license checking code in Swift. I know Swift is not optimal for that kind of code in the first place, as it is harder to obfuscate. But if the code that needs to know whether the app is registered is written in Swift, this is still better than putting the license checking code in a separate framework that can be swapped out.
To make attacking that code harder, I'm trying to obfuscate the code by at least removing the symbols related to it.
For this, I have some inlined methods with internal visibility as follows:
#inline(__always) static func checkLicense() { /* license checking code */ }
Given that the method should always be inlined, there should be no need to include the method's name in the binary's symbol table. (I know that inline annotations often only are hints to the compiler, but I have reason to believe that they do work in this case.)
In line with that, nm MyApp.app/Contents/MacOS/MyApp does not contain references to checkLicense.
However, the output of strings MyApp.app/Contents/MacOS/MyApp still contains references to checkLicense, and I'm afraid that an attacker could use that information to more easily attack the license checking code.
Here are my questions:
Will these strings help an attacker, or are they useless without the corresponding symbol info (which would be exposed by nm)?
Would the strip settings listed below (in particular, stripping all symbols) cause a problem when shipping my code - e.g. when trying to symbolicate stack traces? I do keep the dSYMs of the shipped binaries.
Would setting "Perform Single-Object Prelink" to Yes help in obfuscating the code? The only effect I can see is that the dSYMs size shrinks from ~8 MB to ~6 MB.
I am currently using the following build options:
Deployment Postprocessing = Yes
Strip Linked Product = Yes
Use Separate Strip = Yes
Strip Style = All Symbols
Other Linker Flags = "-Xlinker -x"
Perform Single-Object Prelink = No (see above)

I have investigated this again, and found the following strip settings to work well for Release builds:
Deployment Postprocessing = Yes
Strip Linked Product = Yes
Perform Single-Object Prelink = No
Use Separate Strip: Optional, doesn't make a difference
Strip Style:
All Symbols for the main app (equivalent to -Xlinker -s according to this guide)
Non-Global Symbols for libraries (equivalent to -Xlinker -x)
Other Linker Flags: None; already provided by "Strip Style"

Related

Objective-C categories in static library

Can you guide me how to properly link static library to iPhone project. I use static library project added to app project as direct dependency (target -> general -> direct dependencies) and all works OK, but categories. A category defined in static library is not working in app.
So my question is how to add static library with some categories into other project?
And in general, what is best practice to use in app project code from other projects?

Solution: As of Xcode 4.2, you only need to go to the application that is linking against the library (not the library itself) and click the project in the Project Navigator, click your app's target, then build settings, then search for "Other Linker Flags", click the + button, and add '-ObjC'. '-all_load' and '-force_load' are no longer needed.
Details:
I found some answers on various forums, blogs and apple docs. Now I try make short summary of my searches and experiments.
Problem was caused by (citation from apple Technical Q&A QA1490 https://developer.apple.com/library/content/qa/qa1490/_index.html):
Objective-C does not define linker
symbols for each function (or method,
in Objective-C) - instead, linker
symbols are only generated for each
class. If you extend a pre-existing
class with categories, the linker does
not know to associate the object code
of the core class implementation and
the category implementation. This
prevents objects created in the
resulting application from responding
to a selector that is defined in the
category.
And their solution:
To resolve this issue, the static
library should pass the -ObjC option
to the linker. This flag causes the
linker to load every object file in
the library that defines an
Objective-C class or category. While
this option will typically result in a
larger executable (due to additional
object code loaded into the
application), it will allow the
successful creation of effective
Objective-C static libraries that
contain categories on existing
classes.
and there is also recommendation in iPhone Development FAQ:
How do I link all the Objective-C
classes in a static library? Set the
Other Linker Flags build setting to
-ObjC.
and flags descriptions:
-all_load Loads all members of static archive libraries.
-ObjC Loads all members of static archive libraries that implement an
Objective-C class or category.
-force_load (path_to_archive) Loads all members of the specified static
archive library. Note: -all_load
forces all members of all archives to
be loaded. This option allows you to
target a specific archive.
*we can use force_load to reduce app binary size and to avoid conflicts which all_load can cause in some cases.
Yes, it works with *.a files added to the project.
Yet I had troubles with lib project added as direct dependency. But later I found that it was my fault - direct dependency project possibly was not added properly. When I remove it and add again with steps:
Drag&drop lib project file in app project (or add it with Project->Add to project…).
Click on arrow at lib project icon - mylib.a file name shown, drag this mylib.a file and drop it into Target -> Link Binary With Library group.
Open target info in fist page (General) and add my lib to dependencies list
after that all works OK. "-ObjC" flag was enough in my case.
I also was interested with idea from http://iphonedevelopmentexperiences.blogspot.com/2010/03/categories-in-static-library.html blog. Author say he can use category from lib without setting -all_load or -ObjC flag. He just add to category h/m files empty dummy class interface/implementation to force linker use this file. And yes, this trick do the job.
But author also said he even not instantiated dummy object. Mm… As I've found we should explicitly call some "real" code from category file. So at least class function should be called.
And we even need not dummy class. Single c function do the same.
So if we write lib files as:
// mylib.h
void useMyLib();
#interface NSObject (Logger)
-(void)logSelf;
#end
// mylib.m
void useMyLib(){
NSLog(#"do nothing, just for make mylib linked");
}
#implementation NSObject (Logger)
-(void)logSelf{
NSLog(#"self is:%#", [self description]);
}
#end
and if we call useMyLib(); anywhere in App project
then in any class we can use logSelf category method;
[self logSelf];
And more blogs on theme:
http://t-machine.org/index.php/2009/10/13/how-to-make-an-iphone-static-library-part-1/
http://blog.costan.us/2009/12/fat-iphone-static-libraries-device-and.html

The answer from Vladimir is actually pretty good, however, I'd like to give some more background knowledge here. Maybe one day somebody finds my reply and may find it helpful.
The compiler transforms source files (.c, .cc, .cpp, .m) into object files (.o). There is one object file per source file. Object files contain symbols, code and data. Object files are not directly usable by the operating system.
Now when building a dynamic library (.dylib), a framework, a loadable bundle (.bundle) or an executable binary, these object files are linked together by the linker to produce something the operating system considers "usable", e.g. something it can directly load to a specific memory address.
However when building a static library, all these object files are simply added to a big archive file, hence the extension of static libraries (.a for archive). So an .a file is nothing than an archive of object (.o) files. Think of a TAR archive or a ZIP archive without compression. It's just easier to copy a single .a file around than a whole bunch of .o files (similar to Java, where you pack .class files into a .jar archive for easy distribution).
When linking a binary to a static library (= archive), the linker will get a table of all symbols in the archive and check which of these symbols are referenced by the binaries. Only the object files containing referenced symbols are actually loaded by the linker and are considered by the linking process. E.g. if your archive has 50 object files, but only 20 contain symbols used by the binary, only those 20 are loaded by the linker, the other 30 are entirely ignored in the linking process.
This works quite well for C and C++ code, as these languages try to do as much as possible at compile time (though C++ also has some runtime-only features). Obj-C, however, is a different kind of language. Obj-C heavily depends on runtime features and many Obj-C features are actually runtime-only features. Obj-C classes actually have symbols comparable to C functions or global C variables (at least in current Obj-C runtime). A linker can see if a class is referenced or not, so it can determine a class being in use or not. If you use a class from an object file in a static library, this object file will be loaded by the linker because the linker sees a symbol being in use. Categories are a runtime-only feature, categories aren't symbols like classes or functions and that also means a linker cannot determine if a category is in use or not.
If the linker loads an object file containing Obj-C code, all Obj-C parts of it are always part of the linking stage. So if an object file containing categories is loaded because any symbol from it is considered "in use" (be it a class, be it a function, be it a global variable), the categories are loaded as well and will be available at runtime. Yet if the object file itself is not loaded, the categories in it will not be available at runtime. An object file containing only categories is never loaded because it contains no symbols the linker would ever consider "in use". And this is the whole problem here.
Several solutions have been proposed and now that you know how all this plays together, let's have another look on the proposed solution:
One solution is to add -all_load to the linker call. What will that linker flag actually do? Actually it tells the linker the following "Load all object files of all archives regardless if you see any symbol in use or not'. Of course, that will work; but it may also produce rather big binaries.
Another solution is to add -force_load to the linker call including the path to the archive. This flag works exactly like -all_load, but only for the specified archive. Of course this will work as well.
The most popular solution is to add -ObjC to the linker call. What will that linker flag actually do? This flag tells the linker "Load all object files from all archives if you see that they contain any Obj-C code". And "any Obj-C code" includes categories. This will work as well and it will not force loading of object files containing no Obj-C code (these are still only loaded on demand).
Another solution is the rather new Xcode build setting Perform Single-Object Prelink. What will this setting do? If enabled, all the object files (remember, there is one per source file) are merged together into a single object file (that is not real linking, hence the name PreLink) and this single object file (sometimes also called a "master object file") is then added to the archive. If now any symbol of the master object file is considered in use, the whole master object file is considered in use and thus all Objective-C parts of it are always loaded. And since classes are normal symbols, it's enough to use a single class from such a static library to also get all the categories.
The final solution is the trick Vladimir added at the very end of his answer. Place a "fake symbol" into any source file declaring only categories. If you want to use any of the categories at runtime, make sure you somehow reference the fake symbol at compile time, as this causes the object file to be loaded by the linker and thus also all Obj-C code in it. E.g. it could be a function with an empty function body (which will do nothing when being called) or it could be a global variable accessed (e.g. a global int once read or once written, this is sufficient). Unlike all other solutions above, this solution shifts control about which categories are available at runtime to the compiled code (if it wants them to be linked and available, it accesses the symbol, otherwise it doesn't access the symbol and the linker will ignore it).
That's all folks.
Oh, wait, there's one more thing:
The linker has an option named -dead_strip. What does this option do? If the linker decided to load an object file, all symbols of the object file become part of the linked binary, whether they are used or not. E.g. an object file contains 100 functions, but only one of them is used by the binary, all 100 functions are still added to the binary because object files are either added as a whole or they are not added at all. Adding an object file partially is usually not supported by linkers.
However, if you tell the linker to "dead strip", the linker will first add all the object files to the binary, resolve all the references and finally scan the binary for symbols not in use (or only in use by other symbols not in use). All the symbols found to be not in use are then removed as part of the optimization stage. In the example above, the 99 unused functions are removed again. This is very useful if you use options like -load_all, -force_load or Perform Single-Object Prelink because these options can easily blow up binary sizes dramatically in some cases and the dead stripping will remove unused code and data again.
Dead stripping works very well for C code (e.g. unused functions, variables and constants are removed as expected) and it also works quite good for C++ (e.g. unused classes are removed). It is not perfect, in some cases some symbols are not removed even though it would be okay to remove them, but in most cases it works quite well for these languages.
What about Obj-C? Forget about it! There is no dead stripping for Obj-C. As Obj-C is a runtime-feature language, the compiler cannot say at compile time whether a symbol is really in use or not. E.g. an Obj-C class is not in use if there is no code directly referencing it, correct? Wrong! You can dynamically build a string containing a class name, request a class pointer for that name and dynamically allocate the class. E.g. instead of
MyCoolClass * mcc = [[MyCoolClass alloc] init];
I could also write
NSString * cname = #"CoolClass";
NSString * cnameFull = [NSString stringWithFormat:#"My%#", cname];
Class mmcClass = NSClassFromString(cnameFull);
id mmc = [[mmcClass alloc] init];
In both cases mmc is a reference to an object of the class "MyCoolClass", but there is no direct reference to this class in the second code sample (not even the class name as a static string). Everything happens only at runtime. And that's even though classes are actually real symbols. It's even worse for categories, as they are not even real symbols.
So if you have a static library with hundreds of objects, yet most of your binaries only need a few of them, you may prefer not to use the solutions (1) to (4) above. Otherwise you end up with very big binaries containing all these classes, even though most of them are never used. For classes you usually don't need any special solution at all since classes have real symbols and as long as you reference them directly (not as in the second code sample), the linker will identify their usage pretty well on its own. For categories, though, consider solution (5), as it makes it possible to only include the categories you really need.
E.g. if you want a category for NSData, e.g. adding a compression/decompression method to it, you'd create a header file:
// NSData+Compress.h
#interface NSData (Compression)
- (NSData *)compressedData;
- (NSData *)decompressedData;
#end
void import_NSData_Compression ( );
and an implementation file
// NSData+Compress
#implementation NSData (Compression)
- (NSData *)compressedData
{
// ... magic ...
}
- (NSData *)decompressedData
{
// ... magic ...
}
#end
void import_NSData_Compression ( ) { }
Now just make sure that anywhere in your code import_NSData_Compression() is called. It doesn't matter where it is called or how often it is called. Actually it doesn't really have to be called at all, it's enough if the linker thinks so. E.g. you could put the following code anywhere in your project:
__attribute__((used)) static void importCategories ()
{
import_NSData_Compression();
// add more import calls here
}
You don't have to ever call importCategories() in your code, the attribute will make the compiler and linker believe that it is called, even in case it is not.
And a final tip:
If you add -whyload to the final link call, the linker will print in the build log which object file from which library it did load because of which symbol in use. It will only print the first symbol considered in use, but that is not necessarily the only symbol in use of that object file.

This issue has been fixed in LLVM. The fix ships as part of LLVM 2.9 The first Xcode version to contain the fix is Xcode 4.2 shipping with LLVM 3.0. The usage of -all_load or -force_load is no longer needed when working with XCode 4.2 -ObjC is still needed.

Here's what you need to do to resolve this problem completely when compiling your static library:
Either go to Xcode Build Settings and set Perform Single-Object Prelink to YES or
GENERATE_MASTER_OBJECT_FILE = YES in your build configuration file.
By default,the linker generates an .o file for each .m file. So categories gets different .o files. When the linker looks at a static library .o files, it doesn't create an index of all symbols per class (Runtime will, doesn't matter what).
This directive will ask the linker to pack all objects together into one big .o file and by this it forces the linker that process the static library to get index all class categories.
Hope that clarifies it.

One factor that is rarely mentioned whenever the static library linking discussion comes up is the fact that you must also include the categories themselves in the build phases->copy files and compile sources of the static library itself.
Apple also doesn't emphasize this fact in their recently published Using Static Libraries in iOS either.
I spent a whole day trying all sorts of variations of -objC and -all_load etc.. but nothing came out of it.. this question brought that issue to my attention. (don't get me wrong.. you still have to do the -objC stuff.. but it's more than just that).
also another action that has always helped me is that I always build the included static library first on its own.. then i build the enclosing application..

You probably need to have the category in you're static library's "public" header: #import "MyStaticLib.h"

MS VS-2005 Compiler optimization not removing unused/unexecuted code

I have a workspace built using MS-Visual Studio 2005 with all C code.In that i see many functions which are not called but they are still compiled(they are not under any compile time macro to disable them from compiling).
I set following optimization settings for the MS-VS2005 project to remove that unused code:-
Optimization level - /Ox
Enable whole program optimization - /GL
I tried both Favor speed /Ot and Favor Size /Os
Inspite of all these options, when i see the linker generated map file, I see the symbols(unsed functions) names present in the map file.
Am I missing something? I want to completely remove the unused code.
How do I do this?

The compiler compiles C files one-at-a-time. Therefore, while compiling a C-file that does contains an unused function, the compiler cannot be sure that it will not be called from another file and hence it will compile that function too. However, if that function were declared as static (file-scope), then the compiler would know it is not used and hence remove it.
Even with whole program optimization, I think it would still not be done since the compilation could be for a library.
Linkers do something similar to what you are looking for. If your code links against a library containing multiple objects, then any objects that do not contain functions used by your code (directly or indirectly) would not be included in the final executable.
One option would be to separate your code into individual libraries and object files.
PS - This is just my guess. The behavior of the compiler (with whole program optimization) or linker essentially depends on the design choices of that particular compiler or linker

On our projects we have a flag set under the project properties\Linker\Refrences. We set it to Eliminate Unreferenced Data (/OPT:REF), according to the description this is supposed to remove function calls or data that are never used. I am just going by the description, I have never tested this or worked with it. But I just happened to see it within the last hour and figured it might be something you could try.

iPhone -mthumb-interlinking

os i figured out how to use the -mthumb and -mno-thumb compiler flag and more or less understand what it's doing.
But what is the -mthumb-interlinking flag doing? when is it needed, and is it set for the whole project if i set 'compile for thumb' in my project settings?
thanks for the info!

Open a terminal and type man gcc
Do you mean -mthumb-interwork ?
-mthumb-interwork
Generate code which supports calling between the ARM and Thumb
instruction sets. Without this option the two instruction sets
cannot be reliably used inside one program. The default is
-mno-thumb-interwork, since slightly larger code is generated when
-mthumb-interwork is specified.
If this is related to a build configuration, you should be able to set it separately for each configuration "such as Release or Debug".
Why do you want to change these settings? I know using thumb instructions save some memory but will it save enough to matter in this case?

my application uses both, thumb and vfp code but i never specifically
set -thumb-interwork flag.. how is that possible?
According to man page, without that flag the two instructions sets
cannot be reliably used inside one program.
It says "reliably"; so without that option, it seems they still can be mixed within a single program but it might be "unreliably". I think normally mixing both instructions sets works, the compiler is smart enough to figure out when it has to switch from one set to another one. However, there might be border cases the compiler just doesn't understand correctly and it might fail to see that it should switch instruction sets here, causing the application to fail (most likely it will crash). This option generates special code, so that no matter what your code does, the switching always happens correctly and reliably; the downside is that this extra code is needed for every global visible function and thus increases the binary side (I have no idea if it also might slow down function calls a little bit, I personally would expect that).
Please also note the following two settings:
-mcallee-super-interworking
Gives all externally visible functions in the file being
compiled an ARM instruction set header
which switches to Thumb mode before executing the rest of
the function. This allows these
functions to be called from non-interworking code.
-mcaller-super-interworking
Allows calls via function pointers (including virtual
functions) to execute correctly regardless
of whether the target code has been compiled for
interworking or not. There is a small overhead
in the cost of executing a function pointer if this option
is enabled.
Though I think you only need those, when building libraries to be used with other projects; but I don't know for sure. The GCC thumb handling is definitely "underdocumented".

Parsing Unix/iPhone/Mac OS X version of PE headers

This is a little convoluted, but lets try:
I'm integrating LUA scripting into my game engine, and I've done this in the past on win32 in an elegant way. On win32 all I did was to mark all of the functions I wanted to expose to LUA as export functions. Then, to integrate them into LUA, I'd parse the PE header of the executable, unmangle the names, parse the parameters and such, then register them with my LUA runtime. This allowed me to avoid manually registering every function individually just to expose them to LUA.
Now, flash forward to today where I'm working on the iPhone. I've looked through some Unix stuff and I've gotten very close to taking a similar approach, however I'm not sure it will actually work.
I'm not entirely familiar with Unix, but here is what I have so far on iPhone:
Step 1: Query for the executable path through objective-C and get the path of my app
Step 2: Use dlopen to get a handle to my app using: `dlopen(path, RTLD_NOW)`
Step 3: Use `dlsym( libraryHandle, objectName )` to attempt to get the address of a known symbol.
The above steps won't actually get me to where I want to be, but even that doesn't work. Does anyone have any experience doing this type of thing on Unix? Are there any headers or functions I can google to put me on the right track?
Thanks;)

iPhone does not support dynamic linking after the initital application launch. While what you want to do does not actually require linking in any new application TEXT, it would not shock me to find out that some of the dl* functions do not behave as expected.
You may be able to write some platform specific code, but I recommend using a technique developed by the various BSDs called linker sets. Bascially you annotate the functions you want to do something with (just like you currently mark them for export). Through some preprocessor magic they store the annotations, sometimes in an extra segment in the binary image, then have code that grabs that data and enumerates its. So you simply add all the functions you want into the linker set, then walk through the linker set and register all the functions in it with lua.
I know people have gotten this stuff up and running on Windows and Linux, I have used it on Mac OS X and various *BSDs. I am linking the FreeBSD linker_set implementation, but I have not personally seen the Windows implementation.

You need to pass --export-dynamic to the linker (via -Wl,--export-dynamic).
Note: This is for Linux, but could be a starting point for your search.
References:
http://sourceware.org/binutils/docs/ld/Options.html

If static linking is an option, integrate that into the linker script. Before linking, do "nm" on all object files, extract the global symbols, and generate a C file containing a (preferably sorted/hashed) mapping of all symbol names to symbol values:
struct symbol{ char* name; void * value } symbols = [
{"foo", foo},
{"bar", bar},
...
{0,0}};
If you want to be selective in what you expose, it might be easiest to implement a naming schema, e.g. prefixing all functions/methods with Lua_.
Alternatively, you can create a trivial macro,
#define ForLua(X) X
and then grep the sources for ForLua, to select the symbols that you want to incorporate.

You could just generate a mapfile and use that instead, no?

Don't expose symbols from a used library in own static library

I am writing a reusable static library for the iPhone, following the directions provided here.
I want to use minizip in my library internally, but don't want to expose it to the user.
It should be possible for the user to include minizip themselves, possibly a different version, and not cause clashes with my "inner" minizip version.
Is this possible?
Edit:
I've tried adding -fvisibility=hidden to additional compiler flags for minizip files and changing functions to be __private_extern__ and __attribute__((visibility("hidden"))), but it still seems to produce defined external symbols:
00000918 T _unzOpen
0000058e T _unzOpen2
00001d06 T _unzOpenCurrentFile
00001d6b T _unzOpenCurrentFile2
...
Edit #2:
Apparently the symbols marked with these annotations are only made private by the linker, which never happens when Xcode builds the sources, since it adds the -c parameter ("Compile or assemble the source files, but do not link.")

You could rename all exported symbol from minizip with objcopy.
something like
objcopy -redefine-sym=minizip.syms yourstaticlibray.a
and minizip.syms
_unzOpen _yourownprefix_unzOpen
_unzOpen2 _yourownprefix_unzOpen2
... ...
No clash if an executable is linked with an other minizip.a and yourstaticlibray.a, and because you renamed all the symbol in yourstaticlibray.a your call inside yourstaticlibray.a to minizip will use the prefixed symbol, and not the unzOpen one.

Since static library is nothing more than a set of .o files (which are not linked yet, as you have mentioned), the only way to completely hide presence of minizip from the outside world is to somehow compile minizip and your library together as a single compilation unit and make minizip functions/variables static.
You could have a look at how does SQLite do the "amalgamation" process which turns library source code into single .c file for further compilation: The SQLite Amalgamation.
As a bonus you'll get better optimization (really recent GCC and Binutils are able to make link-time optimizations, but this functionality is not released yet).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Swift: how to fully strip internal/inline symbols? - swift

Related

Objective-C categories in static library

MS VS-2005 Compiler optimization not removing unused/unexecuted code

iPhone -mthumb-interlinking

Parsing Unix/iPhone/Mac OS X version of PE headers

Don't expose symbols from a used library in own static library

Categories

Resources