I need a serial queue that instead of building up a backlog of tasks, only performs the task it's doing and the latest one queued since that started. Any job waiting in the queue for execution should be discarded if a new one comes in before it starts. Been trying to make it work using actor, async and await as follows but it's a bit advanced considering I only learned this stuff today. Is this close?
actor Worker {
var task: Task <Void, Never>? = nil
var next: Item? = nil
var latestResult = false
func analyseItem(_ item: Item) async -> Bool {
// make our item next (but could be overwritten)
next = item
// let anything that's running complete
await task?.value
// start a task for the latest request
task = Task {
latestResult = next?.processItem()
task = nil
}
return latestResult
}
}
I made a small project in Playgrounds
import Combine
import Foundation
import _Concurrency
actor Analyzer {
var timestamp = Date()
var flag = false
func analyzeData(timestamp: Date) async throws {
if flag { return }
flag = true
print("analyzing #\(timestamp)...")
try await Task.sleep(nanoseconds: 3_000_000_000)
print("analyzing #\(timestamp) complete.")
flag = false
}
}
let analyzer = Analyzer()
var cancellable: AnyCancellable
cancellable = Timer.publish(every: 2, on: .main, in: .default)
.autoconnect()
.receive(on: DispatchQueue.main)
.map({ output -> Date in
Task {
try? await analyzer.analyzeData(timestamp: output)
}
return output
})
.sink(receiveValue: { date in
// print(date)
})
Since flag is isolated, tasks will be analyzed every 4 seconds, because the timers fired every 2 and tasks last 3 seconds.
Related
I created a function to add my course events to the calendar app using EventKit.
After learning the swift concurrency, I want to update my code to make the progress much faster, namely using the detached task or TaskGroup to add these events.
Synchronize code without detached task or task group:
func export_test() {
Task.detached {
for i in 0...15 {
print("Task \(i): Start")
let courseEvent = EKEvent(eventStore: eventStore)
courseEvent.title = "TEST"
courseEvent.location = "TEST LOC"
courseEvent.startDate = .now
courseEvent.endDate = .now.addingTimeInterval(3600)
courseEvent.calendar = eventStore.defaultCalendarForNewEvents
courseEvent.addRecurrenceRule(EKRecurrenceRule(recurrenceWith: .daily, interval: 1, end: nil))
do {
try eventStore.save(courseEvent, span: .futureEvents)
} catch { print(error.localizedDescription) }
print("Task \(i): Finished")
}
}
}
Doing the same thing using the TaskGroup :
func export_test() {
Task.detached {
await withTaskGroup(of: Void.self) { group in
for i in 0...15 {
group.addTask {
print("Task \(i): Start")
let courseEvent = EKEvent(eventStore: eventStore)
courseEvent.title = "TEST"
courseEvent.location = "TEST LOC"
courseEvent.startDate = .now
courseEvent.endDate = .now.addingTimeInterval(3600)
courseEvent.calendar = eventStore.defaultCalendarForNewEvents
courseEvent.addRecurrenceRule(EKRecurrenceRule(recurrenceWith: .daily, interval: 1, end: nil))
do {
try eventStore.save(courseEvent, span: .futureEvents)
} catch { print(error.localizedDescription) }
print("Task \(i): Finished")
}
}
}
}
}
The output of the TaskGroup version:
Task 0: Start
Task 1: Start
Task 2: Start
Task 4: Start
Task 3: Start
Task 5: Start
Task 6: Start
Task 7: Start
Task 0: Finished
Task 8: Start
Task 1: Finished
Task 9: Start
Sometimes, only a few tasks will been done, and others will not, or even never been started (I created 16 tasks but only printed 9 in this example). Sometimes, all of these events can be added.
In my point of view, I have created 16 child tasks in the TaskGroup.
Each child task will add one event to the Calendar. I think in this way, I can take the full advantage of the multi-core performance (maybe it's actually not. 🙃)
If I put the for-loop inside the group.addTask closure, it will always have the expected result, but in this way, we only have a single loop so the TaskGroup may no longer needed.
I'm really exhausted🙃🙃.
snapshot:
Snapshot
To see all status messages that the tasks have finished you have to await each result and print it there
func export_test() {
Task.detached {
await withTaskGroup(of: String.self) { group in
for i in 0...15 {
group.addTask {
print("Task \(i): Start")
let courseEvent = EKEvent(eventStore: eventStore)
courseEvent.title = "TEST"
courseEvent.location = "TEST LOC"
courseEvent.startDate = .now
courseEvent.endDate = .now.addingTimeInterval(3600)
courseEvent.calendar = eventStore.defaultCalendarForNewEvents
courseEvent.addRecurrenceRule(EKRecurrenceRule(recurrenceWith: .daily, interval: 1, end: nil))
do {
try eventStore.save(courseEvent, span: .futureEvents)
} catch { print(error.localizedDescription) }
return "Task \(i): Finished"
}
}
for await taskStatus in group {
print(taskStatus)
}
}
}
}
After a long time researching every aspect of the code. I found that the problem is try eventStore.save(courseEvent, span: .futureEvents).
Alternatively, I use try eventStore.save(..., span: ..., commit: false) and eventStore.commit() to solve the problem.
I think this is caused by the data races because I'm using swift concurrency. While one event is saving, another one calls save method to save again, leading to data conflicts (just my guess.)
To solve this, luckily, we can commit a batch of events later using eventStore.commit() to avoid data conflicts. The result is what I expected !!🥳
And after that optimization, the performance of this function is up to 25% faster (exactly 136ms faster). (haha. Perfect.)
Final Code (in Swift 5.7):
func export_test() {
Task.detached {
await withTaskGroup(of: Void.self) { group in
for i in 0...15 {
group.addTask {
print("Task \(i): Start")
let courseEvent = EKEvent(eventStore: eventStore)
courseEvent.title = "TEST"
courseEvent.location = "TEST LOC"
courseEvent.startDate = .now
courseEvent.endDate = .now.addingTimeInterval(3600)
courseEvent.calendar = eventStore.defaultCalendarForNewEvents
courseEvent.addRecurrenceRule(EKRecurrenceRule(recurrenceWith: .daily, interval: 1, end: nil))
try eventStore.save(courseEvent, span: .futureEvents, commit: false)
}
}
}
eventStore.commit()
}
}
I have an actor that is processing values and is then publishing the values with a Combine Publisher.
I have problems understanding actors, I thought when using actors in an async context, it would automatically be serialised. However, the numbers get processed in different orders and not in the expected order (see class tests for comparison).
I understand that if I would wrap Task around the for loop that then this would be returned serialised, but my understanding is, that I could call a function of an actor and this would then be automatically serialised.
How can I make my actor thread safe so it publishes the values in the expected order even if it is called from a different thread?
import XCTest
import Combine
import CryptoKit
actor AddNumbersActor {
private let _numberPublisher: PassthroughSubject<(Int,String), Never> = .init()
nonisolated lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
func process(_ number: Int) {
let string = SHA512.hash(data: Data(String(number).utf8))
.description
_numberPublisher.send((number, string))
}
}
class AddNumbersClass {
private let _numberPublisher: PassthroughSubject<(Int,String), Never> = .init()
lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
func process(_ number: Int) {
let string = SHA512.hash(data: Data(String(number).utf8))
.description
_numberPublisher.send((number, string))
}
}
final class TestActorWithPublisher: XCTestCase {
var subscription: AnyCancellable?
override func tearDownWithError() throws {
subscription = nil
}
func testActor() throws {
let addNumbers = AddNumbersActor()
var numbersResults = [(int: Int, string: String)]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
print(results)
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for number in 1...numberCount {
Task {
await addNumbers.process(number)
}
}
wait(for: [expectation], timeout: 5)
print(numbersResults.count)
XCTAssertEqual(numbersResults[10].0, 11)
XCTAssertEqual(numbersResults[100].0, 101)
XCTAssertEqual(numbersResults[500].0, 501)
}
func testClass() throws {
let addNumbers = AddNumbersClass()
var numbersResults = [(int: Int, string: String)]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
print(results)
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for number in 1...numberCount {
addNumbers.process(number)
}
wait(for: [expectation], timeout: 5)
print(numbersResults.count)
XCTAssertEqual(numbersResults[10].0, 11)
XCTAssertEqual(numbersResults[100].0, 101)
XCTAssertEqual(numbersResults[500].0, 501)
}
}
``
Using actor does indeed serialize access.
The issue you're running into is that the tests aren't testing whether calls to process() are serialized, they are testing the execution order of the calls. And the execution order of the Task calls is not guaranteed.
Try changing your AddNumbers objects so that instead of the output order reflecting the order in which the calls were made, they will succeed if calls are serialized but will fail if concurrent calls are made. You can do this by keeping a count variable, incrementing it, sleeping a bit, then publishing the count. Concurrent calls will fail, since count will be incremented multiple times before its returned.
If you make that change, the test using an Actor will pass. The test using a class will fail if it calls process() concurrently:
DispatchQueue.global(qos: .default).async {
addNumbers.process()
}
It will also help to understand that Task's scheduling depends on a bunch of stuff. GCD will spin up tons of threads, whereas Swift concurrency will only use 1 worker thread per available core (I think!). So in some execution environments, just wrapping your work in Task { } might be enough to serialize it for you. I've been finding that iOS simulators act as if they have a single core, so task execution ends up being serialized. Also, otherwise unsafe code will work if you ensure the task runs on the main actor, since it guarantees serial execution:
Task { #MainActor in
// ...
}
Here are modified tests showing all this:
class TestActorWithPublisher: XCTestCase {
actor AddNumbersActor {
private let _numberPublisher: PassthroughSubject<Int, Never> = .init()
nonisolated lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
var count = 0
func process() {
// Increment the count here
count += 1
// Wait a bit...
Thread.sleep(forTimeInterval: TimeInterval.random(in: 0...0.010))
// Send it back. If other calls to process() were made concurrently, count may have been incremented again before being sent:
_numberPublisher.send(count)
}
}
class AddNumbersClass {
private let _numberPublisher: PassthroughSubject<Int, Never> = .init()
lazy var numberPublisher = _numberPublisher.eraseToAnyPublisher()
var count = 0
func process() {
count += 1
Thread.sleep(forTimeInterval: TimeInterval.random(in: 0...0.010))
_numberPublisher.send(count)
}
}
var subscription: AnyCancellable?
override func tearDownWithError() throws {
subscription = nil
}
func testActor() throws {
let addNumbers = AddNumbersActor()
var numbersResults = [Int]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for _ in 1...numberCount {
Task.detached(priority: .high) {
await addNumbers.process()
}
}
wait(for: [expectation], timeout: 10)
XCTAssertEqual(numbersResults, Array(1...numberCount))
}
func testClass() throws {
let addNumbers = AddNumbersClass()
var numbersResults = [Int]()
let expectation = expectation(description: "numberOfExpectedResults")
let numberCount = 1000
subscription = addNumbers.numberPublisher
.sink { results in
numbersResults.append(results)
if numberCount == numbersResults.count {
expectation.fulfill()
}
}
for _ in 1...numberCount {
DispatchQueue.global(qos: .default).async {
addNumbers.process()
}
}
wait(for: [expectation], timeout: 5)
XCTAssertEqual(numbersResults, Array(1...numberCount))
}
}
i saw this thread Swift 5.5 Concurrency: how to serialize async Tasks to replace an OperationQueue with maxConcurrentOperationCount = 1?
but i am not clear on whether two scheduled Tasks will execute serially. What i mean is, if i had a piece of code
func fetchImages() {
Task.init {
let fetch = await loadImages()
}
Task.init {
let fetch1 = await loadImages1()
}
}
will first task always finish before second task starts? The application i am trying to get is to execute task 1 as soon as possible (save time) but task 2 relies on the result of task 1 so it needs to wait for task 1 to finish before proceeding. Task 2 also is only conditionally triggered upon an event so they cannot be in the same Task.
You asked:
will first task always finish before second task starts?
No, it will not. Whenever you see await, that is a “suspension point” at which Swift concurrency is free to switch to another task. In short, these can run concurrently. Let me illustrate that with Xcode Instruments:
import os.log
private let log = OSLog(subsystem: "Test", category: .pointsOfInterest)
class Foo {
func fetchImages() {
Task {
let fetch = await loadImages()
print("done loadImages")
}
Task {
let fetch1 = await loadImages1()
print("done loadImages1")
}
}
func loadImages() async {
// start “points of interest” interval
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "start")
// perform 3-second asynchronous task
try? await Task.sleep(nanoseconds: 3 * NSEC_PER_SEC)
// end “points of interest” interval
os_signpost(.end, log: log, name: #function, signpostID: id, "end")
}
func loadImages1() async {
// start “points of interest” interval
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "start")
// perform 1-second asynchronous task
try? await Task.sleep(nanoseconds: 1 * NSEC_PER_SEC)
// end “points of interest” interval
os_signpost(.end, log: log, name: #function, signpostID: id, "end")
}
}
Profiling (in Xcode, either press command-i or choose from the menu, “Product” » “Profile”) this with the “time profiler” in Instruments, you can see that these run concurrently:
The trick is to have the second task await the first one. E.g.,
func fetchImages() {
let firstTask = Task {
let fetch = await loadImages()
print("done loadImages")
}
Task {
_ = await firstTask.result
let fetch1 = await loadImages1()
print("done loadImages1")
}
}
Or you can store the first task in some property:
var firstTask: Task<Void, Never>? // you may need to adjust the `Success` and `Failure` types to match your real example
func fetchImages() {
firstTask = Task {
let fetch = await loadImages()
print("done loadImages")
}
Task {
_ = await firstTask?.result
let fetch1 = await loadImages1()
print("done loadImages1")
}
}
When you do that, you can see the sequential execution:
FWIW, this concept, of using await on the prior task was the motivating idea behind that other answer that you referenced. That is a more generalized rendition of the above. Hopefully this illustrates the mechanism outlined in that other answer.
I am discovering Combine. I wrote methods that make HTTP requests in a "combine" way, for example:
func testRawDataTaskPublisher(for url: URL) -> AnyPublisher<Data, Error> {
var request = URLRequest(url: url,
cachePolicy: .useProtocolCachePolicy,
timeoutInterval: 15)
request.httpMethod = "GET"
return urlSession.dataTaskPublisher(for: request)
.tryMap {
return $0.data
}
.eraseToAnyPublisher()
}
I would like to call the method multiple times and do a task after all, for example:
let myURLs: [URL] = ...
for url in myURLs {
let cancellable = testRawDataTaskPublisher(for: url)
.sink(receiveCompletion: { _ in }) { data in
// save the data...
}
}
The code above won't work because I have to store the cancellable in a variable that belongs to the class.
The first question is: is it a good idea to store many (for example 1000) cancellables in something like Set<AnyCancellable>??? Won't it cause memory leaks?
var cancellables = Set<AnyCancellable>()
...
let cancellable = ...
cancellables.insert(cancellable) // ???
And the second question is: how to start a task when all the cancellables are finished? I was thinking about something like that
class Test {
var cancellables = Set<AnyCancellable>()
func run() {
// show a loader
let cancellable = runDownloads()
.receive(on: RunLoop.main)
.sink(receiveCompletion: { _ in }) { _ in
// hide the loader
}
cancellables.insert(cancellable)
}
func runDownloads() -> AnyPublisher<Bool, Error> {
let myURLs: [URL] = ...
return Future<Bool, Error> { promise in
let numberOfURLs = myURLS.count
var numberOfFinishedTasks = 0
for url in myURLs {
let cancellable = testRawDataTaskPublisher(for: url)
.sink(receiveCompletion: { _ in }) { data in
// save the data...
numberOfFinishedTasks += 1
if numberOfFinishedTasks >= numberOfURLs {
promise(.success(true))
}
}
cancellables.insert(cancellable)
}
}.eraseToAnyPublisher()
}
func testRawDataTaskPublisher(for url: URL) -> AnyPublisher<Data, Error> {
...
}
}
Normally I would use DispatchGroup, start multiple HTTP tasks and consume the notification when the tasks are finished, but I am wondering how to write that in a modern way using Combine.
You can run some operations in parallel by creating a collection of publishers, applying the flatMap operator and then collect to wait for all of the publishers to complete before continuing. Here's an example that you can run in a playground:
import Combine
import Foundation
func delayedPublisher<Value>(_ value: Value, delay after: Double) -> AnyPublisher<Value, Never> {
let p = PassthroughSubject<Value, Never>()
DispatchQueue.main.asyncAfter(deadline: .now() + after) {
p.send(value)
p.send(completion: .finished)
}
return p.eraseToAnyPublisher()
}
let myPublishers = [1,2,3]
.map{ delayedPublisher($0, delay: 1 / Double($0)).print("\($0)").eraseToAnyPublisher() }
let cancel = myPublishers
.publisher
.flatMap { $0 }
.collect()
.sink { result in
print("result:", result)
}
Here is the output:
1: receive subscription: (PassthroughSubject)
1: request unlimited
2: receive subscription: (PassthroughSubject)
2: request unlimited
3: receive subscription: (PassthroughSubject)
3: request unlimited
3: receive value: (3)
3: receive finished
2: receive value: (2)
2: receive finished
1: receive value: (1)
1: receive finished
result: [3, 2, 1]
Notice that the publishers are all immediately started (in their original order).
The 1 / $0 delay causes the first publisher to take the longest to complete. Notice the order of the values at the end. Since the first took the longest to complete, it is the last item.
I have an async function that synchronizes the network and database from the last call, then returns the results. There are several callers from different threads that calls this function.
Instead of executing and serving the request per call, I'd like to queue up the tasks while the async function runs, then flush out the queue so the next set of tasks can be queued up.
Here's what I came up so far:
extension DataWorker {
// Handle simultanuous pull requests in a queue
private static let pullQueue = DispatchQueue(label: "DataWorker.remotePull")
private static var pullTasks = [((SomeType) -> Void)]()
private static var isPulling = false
func remotePull(completion: ((SomeType) -> Void)?) {
DataWorker.pullQueue.async {
if let completion = completion {
DataWorker.pullTasks.append(completion)
}
guard !DataWorker.isPulling else { return }
DataWorker.isPulling = true
self.store.remotePull { result in
print("Remote pull executed")
DataWorker.pullQueue.async {
let tasks = DataWorker.pullTasks
DataWorker.pullTasks.removeAll()
DataWorker.isPulling = false
DispatchQueue.main.async {
tasks.forEach { $0(result) }
}
}
}
}
}
}
Below is how I'm testing it, which I expect exactly 100 iterations but only a couple of remotePull executions:
DispatchQueue.concurrentPerform(iterations: 100) { iteration in
self.dataWorker.remotePull { _ in
print("Iteration: \(iteration)")
}
}
Is this approach even accurate, or a more elegant or efficient way of achieving this shared task approach?