Throttling CPU usage in a Swift thread - swift

I want to traverse the file tree for a potentially large directory in a macOS app. It takes about 3 mins for my example case if I just do it, but the CPU spikes to 80% or so for those 3 minutes.
I can afford to do it more slowly on a background thread, but am not sure of what the best approach would be.
I thought of just inserting 1 millisecond sleep inside the loop, but I am not confident that won't have some negative impact on scheduling / disk IO etc. An alternative would be to do 1 second of work, then wait 2-3 seconds, but I am guessing there is something more elegant?
The core functionality I want is traversing a directory in a nested fashion checking file attributes:
let enumerator = FileManager.default.enumerator(atPath: filePath)
while let element = enumerator?.nextObject() as? String {
// do something here
}

It's generally most energy efficient to spike the CPU for a short time than to run it at a low level for a longer time. As long as your process has a lower priority than other processes, running the CPU at even 100% for a short time isn't a problem (particularly if it doesn't turn the fans on). Modern CPUs would like to be run very hard for short periods of time, and then be completely idle. "Somewhat busy" for a longer time is much worse because the CPU can't power-off any subsystems.
Even so, users get very upset when they see high CPU usage. I used to work on system management software, and we spoke with Apple about throttling our CPU usage. They told us the above. We said "yes, but when users see us running at 100%, they complain to IT and try to uninstall our app." Apple's answer was to use sleep, like you're describing. If it makes your process take longer, then it will likely have a negative overall impact in total energy use. But I wouldn't expect it to cause any other trouble.
That said, if you are scanning the same directory tree more than once, you should look at File System Events and File Metadata Search which may perform this operations much more efficiently.
See also: Schedule Background Activity in the Energy Efficiency Guide for Mac Apps. I highly recommend this entire doc. There are many tools that have been added to macOS in recent years that may be useful for your problem. I also recommend Writing Energy Efficient Apps from WWDC 2017.
If you do need to scan everything directly with an enumerator, you can likely greatly improve things by using the URL-based API rather than the String-based API. It allows you to pre-fetch certain values (including attributeModificationDateKey, which may be of use here). Also, be aware of the fileAttributes property of DirectoryEnumerator, which caches the last-read file's attributes (so you don't need to query them again).
Three minutes is a long time; it's possible you're doing more work than needed. Run your operation using the find commandline tool and use that as a benchmark for how much time it should take.

Related

Page Replacement and LRU

If a page fault occured, then we have to replace the least recently used page of the process that request the frame or we have to replace the page that is least recently used all over the main memory?
Thank you.
Theory
Assume that there are N pages of data, which includes:
all data belonging to all processes
all file data on disk (that could be pre-fetched into a virtual file system cache)
all DNS lookup information (that could be pre-fetched into some kind of DNS cache)
all static HTML pages, images, etc (that could be pre-fetched into some kind of web page cache)
anything else you could possibly pre-fetch before software could want it
all data that can be pre-generated by software (e.g. things like prime number sieves, cached pixel data generated from fonts, mipmaps, ...)
The goal is to fill RAM with the "most likely to be needed next" data from all possible sources. Note that this can include (e.g.) sending recently used data belonging to a process from RAM to swap space so that you can use that RAM to pre-fetch data from the internet that has not been requested (if you know the data is more likely to be needed sooner than the data from the process).
There are 3 major problems:
some of the data is controlled by normal processes and not the OS; and there's no standard way of allowing normal processes to participate in the operating system's "keep RAM filled with the most likely to be needed next data" scheme.
often you can't accurately predict the future. Note that you can look at things like when a process will wake up after calling "sleep()" to accurately predict a tiny part of the future; and you can track statistics to inaccurately predict other things (e.g. if you know that the user checked a certain web site at lunch time on 9 of the previous 10 days then you can predict that there's a 90% chance they will check that web site at lunch time today). Of course (for some cases) "most recently used" is a reasonable predictor of "most likely to be needed again soon"; which leads to "keep the most recently used in RAM", which is where "evict the least recently used" (LRU) comes from.
there is cost associated with transferring data, where the cost depends on where the data is now and how busy the hardware needed to fetch the data currently is (e.g. fetching data from a fast Internet connection might be cheap when the network card is doing nothing anyway but expensive when the network card is busy doing a lot of other stuff)
Practice
You can try to solve all the problems (e.g. keep track of lots of things and have fancy prediction algorithms; and take the cost of transferring and/or generating data into account when deciding what to do; and provide some kind of "current memory pressure notification" that normal processes can use to participate in the operating system's "keep RAM filled with the most likely to be needed data" scheme); but it's all complicated and difficult (e.g. you'd want to ensure that the overhead of figuring out what should/shouldn't be in RAM doesn't cost more performance than you gain), so operating systems often do something much simpler and less effective.
Specifically; a very simple OS might only do "evict the least recently used" (with no pre-fetching, no consideration for the cost of transfers and and without normal processes participating at all); and this might be considered "good enough" despite being horribly bad.
If a page fault occured, then we have to replace the least recently used page of the process that request the frame or we have to replace the page that is least recently used all over the main memory?
Ideally, you'd try to evict the "least likely to be needed soon" data from all of memory (possibly including data belonging to the kernel itself); but compromises are unavoidable and there's nothing to say a "good enough despite being horribly bad" OS can't just evict the least recently used page from the current process.

Parallel processing input/output, queries, and indexes AS400

IBM V6.1
When using the I system navigator and when you click System values the following display.
By default the Do not allow parallel processing is selected.
What will the impact be on processing in programs when you choose multiple processes, we have allot of rpgiv programs and sql queries being executed and I think it will increase performance?
Basically I want to turn this on in production environment but not sure if I will break anything by doing this for example input or output of different programs running parallel or data getting out of sequence?
I did do some research :
https://publib.boulder.ibm.com/iseries/v5r2/ic2924/index.htm?info/rzakz/rzakzqqrydegree.htm
And understand each option but I do not know the risk of changing it from default to multiple.
First off, in order get the most out of *MAX and *OPTIMIZE, you'd need a system with more than one core (enabled for IBM i / DB2) along with the DB2 Symmetric Multiprocessing (SMP) (57xx-SS1 option 26) license program installed; thus allowing the system to use SMP for queries and index builds.
For *IO, the system can use multiple tasks via simultaneous multithreading (SMT) even on a single core POWER 5 or higher box. SMT is enabled via the Processor multi tasking (QPRCMLTTSK) system value
You're unlikely to "break" anything by changing the value. As long as your applications don't make bad assumptions about result set ordering. For example, CPYxxxIMPF makes use of SQL behind the scenes; with anything but *NONE you might end up with the rows in your DB2 table in different order from the rows in the import file.
You will most certainly increase the CPU usage. This is not a bad thing; unless you're currently pushing 90% + CPU usage regularly. If you're only using 50% of your CPU, it's probably a good thing to make use of SMT/SMP to provide better response time even if it increases the CPU utilization to 60%.
Having said that, here's a story of it being a problem... http://archive.midrange.com/midrange-l/200304/msg01338.html
Note that in the above case, the OP was pre-building work tables at sign on in order to minimize the wait when it was time to use them. Great idea 20 years ago with single threaded systems. Today, the alternative would be to take advantage of SMP/SMT and build only what's needed when needed.
As you note in a comment, this kind of change is difficult to test in non-production environments since workloads in DEV & TEST are different. So it's important to collect good performance data before & after the change. You might also consider moving it stages *NONE --> *IO --> *OPTIMIZE and then *MAX if you wish. I'd spend at least a month at each level, if you have periodic month end jobs.

Scala: performance boost on incremental garbage collection

I have written an application in Scala. Basically, the first step is to create a array of objects an then to initialise these objects from a csv file. When running the application on the jvm it is really slow, and after some experimenting I found out that using the -J-Xincgc flag which enables incremental garbage collection speeds up the application by a factor of 4 (it's 4 times faster with the switch!). I wonder:
Why?
Did I use some inefficient coding, and if so, where should I start to find out whats going on?
Thanks!
I'll assume you're running this on hotspot.
The hotspot JVM has a whole zoo of garbage collectors, most of which also may have some sort of sub-modes or various command-line switches that significantly alter their behavior.
Which GC is used by default varies based on JVM version, operating system and 32/64bit VM.
So you basically changed whatever the default was to a specific algorithm that happened to perform "faster" for your workload.
But "faster" is a fuzzy measure. Wall time is not the same as CPU cycles spent if you consider multi-threading. And some collectors may simply choose to grow the heap more aggressively, thus deferring the cost of collection to a later point in time, which you might not have measured if your program didn't run long enough.
To make an accurate assessment much more information would be needed
what GC was used by default
your VM version
how many cores your CPU has
what kind of workload do you have (multi/single-thread, long/short-running, expected memory footprint, object allocation rate)
Oracle's GC tuning guide may prove useful for you
In your case, -Xincgc translates to CMS in incremental mode, which is intended for single-core environments and has been deprecated as of java8. It probably just happened to be better than the default, but it's not necessarily an optimal choice.
If you get into a situation where you are running close to your heap-size limit, you can waste a lot of GC time, which can lead to a lot of false findings about performance. If that's your situation, first increase your heap-size limit before doing anything else. Consider use of jvisualvm to eyeball the situation - it's trivially easy to get started with.

NSOperationQueueDefaultMaxConcurrentOperationCount in the wild

The Apple docs say that if you set the maxConcurrentOperationCount property of an NSOperationQueue to NSOperationQueueDefaultMaxConcurrentOperationCount (the default) then it will adjust the value at run time based on "system conditions".
If you specify the value
NSOperationQueueDefaultMaxConcurrentOperationCount (which is
recommended), the maximum number of operations can change dynamically
based on system conditions.
Can anyone report what they are seeing that value being set to in the wild on different devices? Are we talking 1 or 2 for old phones and 3-4 for new models, or 10 or ?? It does not give any insight into the possibility set or most common result. I think it would be useful for developers to know what to expect to happen in production, rather than just we'll take care of it without any explanation of what they are optimizing for (UI responsiveness, operation execution speed, etc).
As it said in many different variations and resources (for instance), you never know how many threads you should create for your application's workflow.
It's a matter of CPU time and load how much threads your application needs (because it's not alone in the woods). So, answer is:
optimization for number of cores
optimization for CPU architecture
optimization for current CPU load

Reasons for & against a Database

i had a discussion with a coworker about the architecture of a program i'm writing and i'd like some more opinions.
The Situation:
The Program should update at near-realtime (+/- 1 Minute).
It involves the movement of objects on a coordinate system.
There are some events that occur at regular intervals (i.e. creation of the objects).
Movements can change at any time through user input.
My solution was:
Build a server that runs continously and stores the data internally.
The server dumps a state-of-the-program at regular intervals to protect against powerfailures and/or crashes.
He argued that the program requires a Database and i should use cronjobs to update the data. I can store movement information by storing startpoint, endpoint and speed and update the position in the cronjob (and calculate collisions with other objects there) by calculating direction and speed.
His reasons:
Requires more CPU & Memory because it runs constantly.
Powerfailures/Crashes might destroy data.
Databases are faster.
My reasons against this are mostly:
Not very precise as events can only occur at full minutes (wouldn't be that bad though).
Requires (possibly costly) transformation of data on every run from relational data to objects.
RDBMS are a general solution for a specialized problem so a specialized solution should be more efficient.
Powerfailures (or other crashes) can leave the Data in an undefined state with only partially updated data unless (possibly costly) precautions (like transactions) are taken.
What are your opinions about that?
Which arguments can you add for any side?
Databases are not faster. How silly... How can a database be faster than writing a custom data structure and storing it in memory ?? Databases are Generalized tools to persist data to disk for you so you don't have to write all the code to do that yourself. Because they have to address the needs of numerous disparate (and sometimes inconsistent) business functions (Persistency (Durability), Transactional integrity, caching, relational integrity, atomicity, etc. etc. ) and do it in a way that protects the application developer from having to worry about it so much, by definition it is going to be slower. That doesn't necessarilly mean his conclusion is wrong however.
Each of his other objections can be addressed by writing the code to address that issue yourself... But you see where that is going... At some point, the development efforts of writing the custom code to address the issues that are important for your application outweigh the performance hit of just using a database - which already does all that stuff out of the box... How many of these issues are important ? and do you know how to write the code necessary to address them ?
From what you've described here, I'd say your solution does seem to be the better option. You say it runs once a minute, but how long does it take to run? If only a few seconds, then the transformation to relational data would likely be inconsequential, as would any other overhead. most of this would take likely 30 seconds. This is assuming, again, that the program is quite small.
However, if it is larger, and assuming that it will get larger, doing a straight dump is a better method. You might not want to do a full dump every run, but that's up to you, just remember that it could wind up taking a lot of space (same goes if you're using a database).
If you're going to dump the state, you would need to have some sort of a redundancy system in place, along with quasi-transactions. You would want to store several copies, in case something happens to the newest version. Say, the power goes out while you're storing, and you have no backups beyond this half-written one. Transactions, you would need something to tell that the file has been fully written, so if something does go wrong, you can always tell what the most recent successful save was.
Oh, and for his argument of it running constantly: if you have it set to a cronjob, or even a self-enclosed sleep statement or similar, it doesn't use any CPU time when it's not running, the same amount that it would if you're using an RDBMS.
If you're writing straight to disk, then this will be the faster method over a database, and faster retrieval, since, as you pointed out, there is no overhead.
Summary: A database is a good idea if you have a lot of idle processor time or historical records, but if resources are a legitimate concern, then it can become too much overhead and a dump with precautions taken is better.
mySQL can now model spatial data.
http://dev.mysql.com/doc/refman/4.1/en/gis-introduction.html
http://dev.mysql.com/doc/refman/5.1/en/spatial-extensions.html
You could use the database to keep track of world locations, user locations, items locations ect.