How do I prevent BB8 connections to break after several repeats - postgresql

I have an application that should use a shared connection pool for all requests. I observe that at seemingly-random times, requests fail with the error type "Closed". I have isolated this behavior into the following example:
use lazy_static::lazy_static;
use bb8_postgres::bb8::Pool;
use bb8_postgres::PostgresConnectionManager;
use bb8_postgres::tokio_postgres::{NoTls, Client};
lazy_static! {
static ref CONNECTION_POOL: Pool<PostgresConnectionManager<NoTls>> = {
let manager = PostgresConnectionManager::new_from_stringlike("dbname=demodb host=localhost user=postgres", NoTls).unwrap();
Pool::builder().build_unchecked(manager)
};
}
fn main() {
println!("Hello, world!");
}
#[cfg(test)]
mod test {
use super::*;
#[tokio::test]
async fn much_insert_traffic() {
much_traffic("INSERT INTO foo(a,b) VALUES (1, 2) RETURNING id").await
}
#[tokio::test]
async fn much_select_traffic() {
much_traffic("SELECT MAX(id) FROM foo").await
}
#[tokio::test]
async fn much_update_traffic() {
much_traffic("UPDATE foo SET a = 81 WHERE id = 1919 RETURNING b").await;
}
async fn much_traffic(stmt: &str) {
let c = CONNECTION_POOL.get().await.expect("Get a connection");
let client = &*c;
for i in 0..10000i32 {
let res = client.query_opt(stmt, &[]).await.expect(&format!("Perform repeat {} of {} ok", i, stmt));
}
}
}
When executing the tests, >50% one of the tests will fail in a later iteration with output similar to the following:
Perform repeat 8782 of UPDATE foo SET a = 81 WHERE id = 1919 RETURNING b ok: Error { kind: Closed, cause: None }
thread 'test::much_update_traffic' panicked at 'Perform repeat 8782 of UPDATE foo SET a = 81 WHERE
id = 1919 RETURNING b ok: Error { kind: Closed, cause: None }', src\main.rs:44:23

Turns out, the problem is completely predicated on the [tokio::test] annotation starting up a distinct runtime whenever a test is executed. The lazy static is initialized with one of these runtimes, and as soon as that runtime shuts down, the pool is destroyed. The other tests (with different runtimes) can use the value as long as the spawning test still runs, but are met with an invalid state once it has shut down.

Related

How to suspend subsequent tasks until first finishes then share its response with tasks that waited?

I have an actor which throttles requests in a way where the first one will suspend subsequent requests until finished, then share its response with them so they don't have to make the same request.
Here's what I'm trying to do:
let cache = Cache()
let operation = OperationStatus()
func execute() async {
if await operation.isExecuting else {
await operation.waitUntilFinished()
} else {
await operation.set(isExecuting: true)
}
if let data = await cache.data {
return data
}
let request = myRequest()
let response = await myService.send(request)
await cache.set(data: response)
await operation.set(isExecuting: false)
}
actor Cache {
var data: myResponse?
func set(data: myResponse?) {
self.data = data
}
}
actor OperationStatus {
#Published var isExecuting = false
private var cancellable = Set<AnyCancellable>()
func set(isExecuting: Bool) {
self.isExecuting = isExecuting
}
func waitUntilFinished() async {
guard isExecuting else { return }
return await withCheckedContinuation { continuation in
$isExecuting
.first { !$0 } // Wait until execution toggled off
.sink { _ in continuation.resume() }
.store(in: &cancellable)
}
}
}
// Do something
DispatchQueue.concurrentPerform(iterations: 1_000_000) { _ in execute() }
This ensures one request at a time, and subsequent calls are waiting until finished. It seems this works but wondering if there's a pure Concurrency way instead of mixing Combine in, and how I can test this? Here's a test I started but I'm confused how to test this:
final class OperationStatusTests: XCTestCase {
private let iterations = 10_000 // 1_000_000
private let outerIterations = 10
actor Storage {
var counter: Int = 0
func increment() {
counter += 1
}
}
func testConcurrency() {
// Given
let storage = Storage()
let operation = OperationStatus()
let promise = expectation(description: "testConcurrency")
promise.expectedFulfillmentCount = outerIterations * iterations
#Sendable func execute() async {
guard await !operation.isExecuting else {
await operation.waitUntilFinished()
promise.fulfill()
return
}
await operation.set(isExecuting: true)
try? await Task.sleep(seconds: 8)
await storage.increment()
await operation.set(isExecuting: false)
promise.fulfill()
}
waitForExpectations(timeout: 10)
// When
DispatchQueue.concurrentPerform(iterations: outerIterations) { _ in
(0..<iterations).forEach { _ in
Task { await execute() }
}
}
// Then
// XCTAssertEqual... how to test?
}
}
Before I tackle a more general example, let us first dispense with some natural examples of sequential execution of asynchronous tasks, passing the result of one as a parameter of the next. Consider:
func entireProcess() async throws {
let value = try await first()
let value2 = try await subsequent(with: value)
let value3 = try await subsequent(with: value2)
let value4 = try await subsequent(with: value3)
// do something with `value4`
}
Or
func entireProcess() async throws {
var value = try await first()
for _ in 0 ..< 4 {
value = try await subsequent(with: value)
}
// do something with `value`
}
This is the easiest way to declare a series of async functions, each of which takes the prior result as the input for the next iteration. So, let us expand the above to include some signposts for Instruments’ “Points of Interest” tool:
import os.log
private let log = OSLog(subsystem: "Test", category: .pointsOfInterest)
func entireProcess() async throws {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "start")
var value = try await first()
for _ in 0 ..< 4 {
os_signpost(.event, log: log, name: #function, "Scheduling: %d with input of %d", i, value)
value = try await subsequent(with: value)
}
os_signpost(.end, log: log, name: #function, signpostID: id, "%d", value)
}
func first() async throws -> Int {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "start")
try await Task.sleep(seconds: 1)
let value = 42
os_signpost(.end, log: log, name: #function, signpostID: id, "%d", value)
return value
}
func subsequent(with value: Int) async throws -> Int {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "%d", value)
try await Task.sleep(seconds: 1)
let newValue = value + 1
defer { os_signpost(.end, log: log, name: #function, signpostID: id, "%d", newValue) }
return newValue
}
So, there you see a series of requests that pass their result to the subsequent request. All of that os_signpost signpost stuff is so we can visually see that they are running sequentially in Instrument’s “Points of Interest” tool:
You can see ⓢ event signposts as each task is scheduled, and the intervals illustrate the sequential execution of these asynchronous tasks.
This is the easiest way to have dependencies between tasks, passing values from one task to another.
Now, that begs the question is how to generalize the above, where we await the prior task before starting the next one.
One pattern is to write an actor that awaits the result of the prior one. Consider:
actor SerialTasks<Success> {
private var previousTask: Task<Success, Error>?
func add(block: #Sendable #escaping () async throws -> Success) {
previousTask = Task { [previousTask] in
let _ = await previousTask?.result
return try await block()
}
}
}
Unlike the previous example, this does not require that you have a single function from which you initiate the subsequent tasks. E.g., I have used the above when some separate user interaction requires me to add a new task to the end of the list of previously submitted tasks.
There are two subtle, yet critical, aspects of the above actor:
The add method, itself, must not be an asynchronous function. We need to avoid actor reentrancy. If this were an async function (like in your example), we would lose the sequential execution of the tasks.
The Task has a [previousTask] capture list to capture a copy of the prior task. This way, each task will await the prior one, avoiding any races.
The above can be used to make a series of tasks run sequentially. But it is not passing values between tasks, itself. I confess that I have used this pattern where I simply need sequential execution of largely independent tasks (e.g., sending separate commands being sent to some Process). But it can probably be adapted for your scenario, in which you want to “share its response with [subsequent requests]”.
I would suggest that you post a separate question with MCVE with a practical example of precisely what you wanted to pass from one asynchronous function to another. I have, for example, done permutation of the above, passing integer from one task to another. But in practice, that is not of great utility, as it gets more complicated when you start dealing with the reality of heterogenous results parsing. In practice, the simple example with which I started this question is the most practical pattern.
On the broader question of working with/around actor reentrancy, I would advise keeping an eye out on SE-0306 - Future Directions which explicitly contemplates some potential elegant forthcoming alternatives. I would not be surprised to see some refinements, either in the language itself, or in the Swift Async Algorithms library.
tl;dr
I did not want to encumber the above with discussion regarding your code snippets, but there are quite a few issues. So, if you forgive me, here are some observations:
The attempt to use OperationStatus to enforce sequential execution of async calls will not work because actors feature reentrancy. If you have an async function, every time you hit an await, that is a suspension point at which point another call to that async function is allowed to proceed. The integrity of your OperationStatus logic will be violated. You will not experience serial behavior.
If you are interesting in suspension points, I might recommend watching WWDC 2021 video Swift concurrency: Behind the scenes.
The testConcurrency is calling waitForExpectations before it actually starts any tasks that will fulfill any expectations. That will always timeout.
The testConcurrency is using GCD concurrentPerform, which, in turn, just schedules an asynchronous task and immediately returns. That defeats the entire purpose of concurrentPerform (which is a throttling mechanism for running a series of synchronous tasks in parallel, but not exceed the maximum number of cores on your CPU). Besides, Swift concurrency features its own analog to concurrentPerform, namely the constrained “cooperative thread pool” (also discussed in that video, IIRC), rendering concurrentPerform obsolete in the world of Swift concurrency.
Bottom line, it doesn't make sense to include concurrentPerform in a Swift concurrency codebase. It also does not make sense to use concurrentPerform to launch asynchronous tasks (whether Swift concurrency or GCD). It is for launching a series of synchronous tasks in parallel.
In execute in your test, you have two paths of execution, one which will await some state change and fulfills the expectation without ever incrementing the storage. That means that you will lose some attempts to increment the value. Your total will not match the desired resulting value. Now, if your intent was to drop requests if another was pending, that's fine. But I don't think that was your intent.
In answer to your question about how to test success at the end. You might do something like:
actor Storage {
private var counter: Int = 0
func increment() {
counter += 1
}
var value: Int { counter }
}
func testConcurrency() async {
let storage = Storage()
let operation = OperationStatus()
let promise = expectation(description: "testConcurrency")
let finalCount = outerIterations * iterations
promise.expectedFulfillmentCount = finalCount
#Sendable func execute() async {
guard await !operation.isExecuting else {
await operation.waitUntilFinished()
promise.fulfill()
return
}
await operation.set(isExecuting: true)
try? await Task.sleep(seconds: 1)
await storage.increment()
await operation.set(isExecuting: false)
promise.fulfill()
}
// waitForExpectations(timeout: 10) // this is not where you want to wait; moved below, after the tasks started
// DispatchQueue.concurrentPerform(iterations: outerIterations) { _ in // no point in this
for _ in 0 ..< outerIterations {
for _ in 0 ..< iterations {
Task { await execute() }
}
}
await waitForExpectations(timeout: 10)
// test the success to see if the store value was correct
let value = await storage.value // to test that you got the right count, fetch the value; note `await`, thus we need to make this an `async` test
// Then
XCTAssertEqual(finalCount, value, "Count")
}
Now, this test will fail for a variety of reasons, but hopefully this illustrates how you would verify the success or failure of the test. But, note, that this will test only that the final result was correct, not that they were executed sequentially. The fact that Storage is an actor will hide the fact that they were not really invoked sequentially. I.e., if you really needed the results of one request to prepare the next is not tested here.
If, as you go through this, you want to really confirm the behavior of your OperationStatus pattern, I would suggest using os_signpost intervals (or simple logging statements where your tasks start and finish). You will see that the separate invocations of the asynchronous execute method are not running sequentially.

Using a callback when handling TCP connections with Tokio

I am trying to have a struct that starts an event loop, listens for TCP connections and calls a callback for each connection.
(The callback will be handed some prepossessed data from the socket. In my example below I just hand it the IP address of the connection but in my real code I will parse the contents that I receive with serde into a struct and pass that into the callback. I hope that doesn't invalidate the following "not working example").
My Cargo.toml:
[package]
name = "lifetime-problem"
version = "0.1.0"
edition = "2018"
[dependencies]
tokio-tcp = "0.1.3"
tokio = "0.1.14"
[[bin]]
name = "lifetime-problem"
path = "main.rs"
and main.rs:
use tokio::prelude::*;
struct Test {
printer: Option<Box<Fn(std::net::SocketAddr) + Sync>>,
}
impl Test {
pub fn start(&mut self) -> Result<(), Box<std::error::Error>> {
let addr = "127.0.0.1:4242".parse::<std::net::SocketAddr>()?;
let listener = tokio::net::TcpListener::bind(&addr)?;
let server = listener
.incoming()
.map_err(|e| eprintln!("failed to accept socket; error = {:?}", e))
.for_each(move |socket: tokio::net::TcpStream| {
let address = socket.peer_addr().expect("");
match self.printer {
Some(callback) => { callback(address); }
None => { println!("{}", address); }
}
Ok(())
});
tokio::run(server);
Ok(())
}
}
fn main() {
let mut x = Test{ printer: None };
x.start();
}
I have tried several things starting from this code (which is adopted directly from the Tokio example).
If I use the code like posted above I get:
error[E0277]: (dyn std::ops::Fn(std::net::SocketAddr) + std::marker::Sync + 'static) cannot be sent between threads safely
for the line 24 (tokio::run(server)).
If I add the Send trait on the Fn in the printer field XOR if I remove the move in the closure in the for_each call I get another error instead:
error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
which points me to the closure that apparently cannot outlive the start method where it is defined but tokio::run seems to have conflicting requirements for it.
Do you know if I am addressing the callback pattern in totally the wrong way or if there is just some minor error in my code?
First things first:
Compiler will translate Box<Fn(std::net::SocketAddr) + Sync> to Box<Fn(std::net::SocketAddr) + Sync + 'static> unless the lifetime is explicitly specified.
Let's have a look at the errors:
error[E0277]: (dyn std::ops::Fn(std::net::SocketAddr) + std::marker::Sync + 'static) cannot be sent between threads safely
This is self-explanatory. You are trying to move &mut T to another thread, but cannot, because T here is not Send. To send &mut T to another thread T too needs to be of type Send.
Here is the minimal code that will give the same error:
use std::fmt::Debug;
fn func<T> (i:&'static mut T) where T: Debug {
std::thread::spawn(move || {
println!("{:?}", i);
});
}
If I make T above to also be of type Send, the error goes away.
But in your case when you add the Send trait, it gives lifetime error. Why?
&mut self has some lifetime greater than the function start() set by the caller, but there's no guarantee that its 'static. You move this reference into the closure which is passed to the thread and can potentially outlive the scope it is closing over, leading to a dangling reference.
Here's a minimal version, that would give the same error.
use std::fmt::Debug;
fn func<'a, T:'a> (i:&'a mut T) where T: Debug + Sync + Send {
std::thread::spawn(move || {
println!("{:?}", i);
});
}
Sync is not really required here as it is &mut T. Changing &mut T to &T (retaining Sync), will also result into the same error. The onus here is on references and not mutability. So you see, there is some lifetime 'a and it is moved into a closure (given to a thread), which means the closure now contains a reference (disjoint from the main context). So now, what is 'a and how long will it live from the closure's perspective that is invoked from another thread? Not inferable! As a result, the compiler complains saying cannot infer an appropriate lifetime due to conflicting requirements.
If we tweak the code a bit to;
impl Test {
pub fn start(&'static mut self) -> Result<(), Box<std::error::Error>> {
let addr = "127.0.0.1:4242".parse::<std::net::SocketAddr>()?;
let listener = tokio::net::TcpListener::bind(&addr)?;
let server = listener
.incoming()
.map_err(|e| eprintln!("failed to accept socket; error = {:?}", e))
.for_each(move |socket: tokio::net::TcpStream| {
let address = socket.peer_addr().expect("");
match &self.printer {
Some(callback) => { callback(address); }
None => { println!("{}", address); }
}
Ok(())
});
tokio::run(server);
Ok(())
}
}
it will compile fine. There's a guarantee there that self has a 'static lifetime. Please note that in the match statement we need to pass &self.printer, as you cannot move out of a borrowed context.
However, this expects Test to be declared static and that too a mutable one, which is generally not the best way, if you have other options.
Another way is; if it's ok for you to pass Test by value to start() and then further move it into for_each(), the code would look like this:
use tokio::prelude::*;
struct Test {
printer: Option<Box<Fn(std::net::SocketAddr) + Send>>,
}
impl Test {
pub fn start(mut self) -> Result<(), Box<std::error::Error>> {
let addr = "127.0.0.1:4242".parse::<std::net::SocketAddr>()?;
let listener = tokio::net::TcpListener::bind(&addr)?;
let server = listener
.incoming()
.map_err(|e| eprintln!("failed to accept socket; error = {:?}", e))
.for_each(move |socket: tokio::net::TcpStream| {
let address = socket.peer_addr().expect("");
match &self.printer {
Some(callback) => {
callback(address);
}
None => {
println!("{}", address);
}
}
Ok(())
});
tokio::run(server);
Ok(())
}
}
fn main() {
let mut x = Test { printer: None };
x.start();
}

How do I close a Unix socket in Rust?

I have a test that opens and listens to a Unix Domain Socket. The socket is opened and reads data without issues, but it doesn't shutdown gracefully.
This is the error I get when I try to run the test a second time:
thread 'test_1' panicked at 'called Result::unwrap() on an Err
value: Error { repr: Os { code: 48, message: "Address already in use"
} }', ../src/libcore/result.rs:799 note: Run with RUST_BACKTRACE=1
for a backtrace.
The code is available at the Rust playground and there's a Github Gist for it.
use std::io::prelude::*;
use std::thread;
use std::net::Shutdown;
use std::os::unix::net::{UnixStream, UnixListener};
Test Case:
#[test]
fn test_1() {
driver();
assert_eq!("1", "2");
}
Main entry point function
fn driver() {
let listener = UnixListener::bind("/tmp/my_socket.sock").unwrap();
thread::spawn(|| socket_server(listener));
// send a message
busy_work(3);
// try to disconnect the socket
let drop_stream = UnixStream::connect("/tmp/my_socket.sock").unwrap();
let _ = drop_stream.shutdown(Shutdown::Both);
}
Function to send data in intervals
#[allow(unused_variables)]
fn busy_work(threads: i32) {
// Make a vector to hold the children which are spawned.
let mut children = vec![];
for i in 0..threads {
// Spin up another thread
children.push(thread::spawn(|| socket_client()));
}
for child in children {
// Wait for the thread to finish. Returns a result.
let _ = child.join();
}
}
fn socket_client() {
let mut stream = UnixStream::connect("/tmp/my_socket.sock").unwrap();
stream.write_all(b"hello world").unwrap();
}
Function to handle data
fn handle_client(mut stream: UnixStream) {
let mut response = String::new();
stream.read_to_string(&mut response).unwrap();
println!("got response: {:?}", response);
}
Server socket that listens to incoming messages
#[allow(unused_variables)]
fn socket_server(listener: UnixListener) {
// accept connections and process them, spawning a new thread for each one
for stream in listener.incoming() {
match stream {
Ok(mut stream) => {
/* connection succeeded */
let mut response = String::new();
stream.read_to_string(&mut response).unwrap();
if response.is_empty() {
break;
} else {
thread::spawn(|| handle_client(stream));
}
}
Err(err) => {
/* connection failed */
break;
}
}
}
println!("Breaking out of socket_server()");
drop(listener);
}
Please learn to create a minimal reproducible example and then take the time to do so. In this case, there's no need for threads or functions or testing frameworks; running this entire program twice reproduces the error:
use std::os::unix::net::UnixListener;
fn main() {
UnixListener::bind("/tmp/my_socket.sock").unwrap();
}
If you look at the filesystem before and after the test, you will see that the file /tmp/my_socket.sock is not present before the first run and it is present before the second run. Deleting the file allows the program to run to completion again (at which point it recreates the file).
This issue is not unique to Rust:
Note that, once created, this socket file will continue to exist, even after the server exits. If the server subsequently restarts, the file prevents re-binding:
[...]
So, servers should unlink the socket pathname prior to binding it.
You could choose to add some wrapper around the socket that would automatically delete it when it is dropped or create a temporary directory that is cleaned when it is dropped, but I'm not sure how well that would work. You could also create a wrapper function that deletes the file before it opens the socket.
Unlinking the socket when it's dropped
use std::path::{Path, PathBuf};
struct DeleteOnDrop {
path: PathBuf,
listener: UnixListener,
}
impl DeleteOnDrop {
fn bind(path: impl AsRef<Path>) -> std::io::Result<Self> {
let path = path.as_ref().to_owned();
UnixListener::bind(&path).map(|listener| DeleteOnDrop { path, listener })
}
}
impl Drop for DeleteOnDrop {
fn drop(&mut self) {
// There's no way to return a useful error here
let _ = std::fs::remove_file(&self.path).unwrap();
}
}
You may also want to consider implementing Deref / DerefMut to make this into a smart pointer for sockets:
impl std::ops::Deref for DeleteOnDrop {
type Target = UnixListener;
fn deref(&self) -> &Self::Target {
&self.listener
}
}
impl std::ops::DerefMut for DeleteOnDrop {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.listener
}
}
Unlinking the socket before it's opened
This is much simpler:
use std::path::Path;
fn bind(path: impl AsRef<Path>) -> std::io::Result<UnixListener> {
let path = path.as_ref();
std::fs::remove_file(path)?;
UnixListener::bind(path)
}
Note that you can combine the two solutions, such that the socket is deleted before creation and when it's dropped.
I think that deleting during creation is a less-optimal solution: if you ever start a second server, you'll prevent the first server from receiving any more connections. It's probably better to error and tell the user instead.

How do I wrap up a series of related NSTask operations into discrete functions

I'm writing a Swift command line tool that uses NSTask to interact with git. In the simplest scenario I want to run three commands: init, add ., and commit -m Initial Commit. I intend to use a separate NSTask for each command, and want to house each command in its own function - returning true if the task succeeded or false if it didn't. This set-up would allow my main function to look like this:
func main() {
if runInit() {
if runStage() {
if runCommit() {
NSLog("success!")
}
}
}
}
To accomplish this each of the three functions must do the following before returning (i) launch the task (ii) wait for it to complete, (iii) obtain whatever is in stdout, and (iv) set the return value (true or false). Here's what I've got for the commit stage:
func runCommit() -> Bool {
var retval = false
var commitTask = NSTask()
commitTask.standardOutput = NSPipe()
commitTask.launchPath = gitPath
commitTask.arguments = ["commit", "-m", "Initial Commit"]
commitTask.currentDirectoryPath = demoProjectURL.path!
commitTask.standardOutput.fileHandleForReading.readToEndOfFileInBackgroundAndNotify()
nc.addObserverForName(NSFileHandleReadToEndOfFileCompletionNotification,
object: commitTask.standardOutput.fileHandleForReading,
queue: nil) { (note) -> Void in
// get the output, log it, then...
if commitTask.terminationStatus == EXIT_SUCCESS {
retval = true
}
}
commitTask.launch()
commitTask.waitUntilExit()
return retval
}
My question is essentially about how waitUntilExit works, particularly in conjunction with the notification I sign up for to enable me to get the output. Apple's docs say:
This method first checks to see if the receiver is still running using isRunning. Then it polls the current run loop using NSDefaultRunLoopMode until the task completes.
I'm a bit out of my depth when it comes to run loop mechanics, and was wondering what this means in this context - can I safely assume that my notification block will always be executed before the enclosing function returns?
waitUntilExit returns when the SIGCHILD signal has been received
to indicate that the child process has terminated. The notification
block is executed when EOF is read from the pipe to the child process.
It is not specified which of these events occurs first.
Therefore you have to wait for both. There are several possible solutions,
here is one using a "signalling semaphore", you could also use
a "dispatch group".
Another error in your code is that the observer is never removed.
func runCommit() -> Bool {
let commitTask = NSTask()
commitTask.standardOutput = NSPipe()
commitTask.launchPath = gitPath
commitTask.arguments = ["commit", "-m", "Initial Commit"]
commitTask.currentDirectoryPath = demoProjectURL.path!
commitTask.standardOutput!.fileHandleForReading.readToEndOfFileInBackgroundAndNotify()
let sema = dispatch_semaphore_create(0)
var obs : NSObjectProtocol!
obs = nc.addObserverForName(NSFileHandleReadToEndOfFileCompletionNotification,
object: commitTask.standardOutput!.fileHandleForReading, queue: nil) {
(note) -> Void in
// Get data and log it.
if let data = note.userInfo?[NSFileHandleNotificationDataItem] as? NSData,
let string = String(data: data, encoding: NSUTF8StringEncoding) {
print(string)
}
// Signal semaphore.
dispatch_semaphore_signal(sema)
nc.removeObserver(obs)
}
commitTask.launch()
// Wait for process to terminate.
commitTask.waitUntilExit()
// Wait for semaphore to be signalled.
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
let retval = commitTask.terminationStatus == EXIT_SUCCESS
return retval
}

golang: goroute with select doesn't stop unless I added a fmt.Print()

I tried the Go Tour exercise #71
If it is run like go run 71_hang.go ok, it works fine.
However, if you use go run 71_hang.go nogood, it will run forever.
The only difference is the extra fmt.Print("") in the default in the select statement.
I'm not sure, but I suspect some sort of infinite loop and race-condition? And here is my solution.
Note: It's not deadlock as Go didn't throw: all goroutines are asleep - deadlock!
package main
import (
"fmt"
"os"
)
type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}
func crawl(todo Todo, fetcher Fetcher,
todoList chan Todo, done chan bool) {
body, urls, err := fetcher.Fetch(todo.url)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("found: %s %q\n", todo.url, body)
for _, u := range urls {
todoList <- Todo{u, todo.depth - 1}
}
}
done <- true
return
}
type Todo struct {
url string
depth int
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
visited := make(map[string]bool)
doneCrawling := make(chan bool, 100)
toDoList := make(chan Todo, 100)
toDoList <- Todo{url, depth}
crawling := 0
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
default:
if os.Args[1]=="ok" { // *
fmt.Print("")
}
if crawling == 0 {
goto END
}
}
}
END:
return
}
func main() {
Crawl("http://golang.org/", 4, fetcher)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f *fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := (*f)[url]; ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}
// fetcher is a populated fakeFetcher.
var fetcher = &fakeFetcher{
"http://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"http://golang.org/pkg/",
"http://golang.org/cmd/",
},
},
"http://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"http://golang.org/",
"http://golang.org/cmd/",
"http://golang.org/pkg/fmt/",
"http://golang.org/pkg/os/",
},
},
"http://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
"http://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
}
Putting a default statement in your select changes the way select works. Without a default statement select will block waiting for any messages on the channels. With a default statement select will run the default statement every time there is nothing to read from the channels. In your code I think this makes an infinite loop. Putting the fmt.Print statement in is allowing the scheduler to schedule other goroutines.
If you change your code like this then it works properly, using select in a non blocking way which allows the other goroutines to run properly.
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
}
if crawling == 0 {
break
}
}
You can make your original code work if you use GOMAXPROCS=2 which is another hint that the scheduler is busy in an infinite loop.
Note that goroutines are co-operatively scheduled. What I don't fully understand about your problem is that select is a point where the goroutine should yield - I hope someone else can explain why it isn't in your example.
You have 100% CPU load because almost all times the default case will be executed, resulting effectively in an infinite loop because it's executed over and over again. In this situation the Go scheduler does not hand control to another goroutine, by design. So any other goroutine will never have the opportunity to set crawling != 0 and you have your infinite loop.
In my opinion you should remove the default case and instead create another channel if you want to play with the select statement.
Otherwise the runtime package helps you to go the dirty way:
runtime.GOMAXPROCS(2) will work (or export GOMAXPROCS=2), this way you will have more than one OS thread of execution
call runtime.Gosched() inside Crawl from time to time. Eventhough CPU load is 100%, this will explicitely pass control to another Goroutine.
Edit: Yes, and the reason why fmt.Printf makes a difference: because it explicitely passes control to some syscall stuff... ;)