Graceful Shutdown of Background Tasks

Chapter 13.5.

Graceful Shutdown of Background Tasks

Sending our welcome email in the background is working well, but there’s still an issue we need to address.

When we initiate a graceful shutdown of our application, it won’t wait for any background goroutines that we’ve launched to complete. So — if we happen to shutdown our server at an unlucky moment — it’s possible that a new client will be created on our system but they will never be sent their welcome email.

Fortunately, we can prevent this by using Go’s sync.WaitGroup functionality to coordinate the graceful shutdown and our background goroutines.

An introduction to sync.WaitGroup

When you want to wait for a collection of goroutines to finish their work, the principal tool to help with this is the sync.WaitGroup type.

The way that it works is conceptually a bit like a ‘counter’. Each time you launch a background goroutine you can increment the counter by 1, and when each goroutine finishes, you then decrement the counter by 1. You can then monitor the counter, and when it equals zero you know that all your background goroutines have finished.

Let’s take a quick look at a standalone example of how sync.WaitGroup works in practice.

In the code below, we’ll launch five goroutines that print out "hello from a goroutine", and use sync.WaitGroup to wait for them all to complete before the program exits.

package main

import (
    "fmt"
    "sync"
)

func main() {
    // Declare a new WaitGroup.
    var wg sync.WaitGroup

    // Execute a loop 5 times.
    for i := 1; i <= 5; i++ {
        // Increment the WaitGroup counter by 1, BEFORE we launch the background routine.
        wg.Add(1)

        // Launch the background goroutine.
        go func() {
            // Defer a call to wg.Done() to indicate that the background goroutine has 
            // completed when this function returns. Behind the scenes this decrements 
            // the WaitGroup counter by 1 and is the same as writing wg.Add(-1).
            defer wg.Done()

            fmt.Println("hello from a goroutine")
        }()
    }

    // Wait() blocks until the WaitGroup counter is zero --- essentially blocking until all
    // goroutines have completed.
    wg.Wait()

    fmt.Println("all goroutines finished")
}

If you run the above code, you’ll see that the output looks like this:

hello from a goroutine
hello from a goroutine
hello from a goroutine
hello from a goroutine
hello from a goroutine
all goroutines finished

One thing that’s important to emphasize here is that we increment the counter with wg.Add(1) immediately before we launch the background goroutine. If we called wg.Add(1) in the background goroutine itself, there is a race condition because wg.Wait() could potentially be called before the counter is even incremented.

Fixing our application

Let’s update our application to incorporate a sync.WaitGroup that coordinates our graceful shutdown and background goroutines.

We’ll begin in our cmd/api/main.go file, and edit the application struct to contain a new sync.WaitGroup. Like so:

File: cmd/api/main.go

package main

import (
    "context"
    "database/sql"
    "flag"
    "log/slog"
    "os"
    "sync" // New import
    "time"

    "greenlight.alexedwards.net/internal/data"
    "greenlight.alexedwards.net/internal/mailer"

    _ "github.com/lib/pq"
)

...

// Include a sync.WaitGroup in the application struct. The zero-value for a
// sync.WaitGroup type is a valid, useable, sync.WaitGroup with a 'counter' value of 0,
// so we don't need to do anything else to initialize it before we can use it.
type application struct {
    config config
    logger *slog.Logger
    models data.Models
    mailer mailer.Mailer
    wg     sync.WaitGroup
}

...

Next let’s head to the cmd/api/helpers.go file and update the app.background() helper so that the sync.WaitGroup counter is incremented each time before we launch a background goroutine, and then decremented when it completes.

Like this:

File: cmd/api/helpers.go

package main

...

func (app *application) background(fn func()) {
    // Increment the WaitGroup counter.
    app.wg.Add(1)

    // Launch the background goroutine.
    go func() {
        // Use defer to decrement the WaitGroup counter before the goroutine returns.
        defer app.wg.Done()

        defer func() {
            if err := recover(); err != nil {
                app.logger.Error(fmt.Sprintf("%v", err))
            }
        }()

        fn()
    }()
}

Then the final thing we need to do is update our graceful shutdown functionality so that it uses our new sync.WaitGroup to wait for any background goroutines before terminating the application. We can do that by adapting our app.serve() method like so:

File: cmd/api/server.go

package main

...

func (app *application) serve() error {
    srv := &http.Server{
        Addr:         fmt.Sprintf(":%d", app.config.port),
        Handler:      app.routes(),
        IdleTimeout:  time.Minute,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
        ErrorLog:     slog.NewLogLogger(app.logger.Handler(), slog.LevelError),
    }

    shutdownError := make(chan error)

    go func() {
        quit := make(chan os.Signal, 1)
        signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
        s := <-quit

        app.logger.Info("shutting down server", "signal", s.String())

        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()

        // Call Shutdown() on the server like before, but now we only send on the
        // shutdownError channel if it returns an error.
        err := srv.Shutdown(ctx)
        if err != nil {
            shutdownError <- err
        }

        // Log a message to say that we're waiting for any background goroutines to
        // complete their tasks.
        app.logger.Info("completing background tasks", "addr", srv.Addr)

        // Call Wait() to block until our WaitGroup counter is zero --- essentially
        // blocking until the background goroutines have finished. Then we return nil on
        // the shutdownError channel, to indicate that the shutdown completed without
        // any issues.
        app.wg.Wait()
        shutdownError <- nil
    }()

    app.logger.Info("starting server", "addr", srv.Addr, "env", app.config.env)

    err := srv.ListenAndServe()
    if !errors.Is(err, http.ErrServerClosed) {
        return err
    }

    err = <-shutdownError
    if err != nil {
        return err
    }

    app.logger.Info("stopped server", "addr", srv.Addr)

    return nil
}

To try this out, go ahead and restart the API and then send a request to the POST /v1/users endpoint immediately followed by a SIGTERM signal. For example:

$ BODY='{"name": "Edith Smith", "email": "edith@example.com", "password": "pa55word"}'
$ curl -d "$BODY" localhost:4000/v1/users & pkill -SIGTERM api &

When you do this, your server logs should look similar to the output below:

$ go run ./cmd/api
time=2023-09-10T10:59:13.722+02:00 level=INFO msg="database connection pool established"
time=2023-09-10T10:59:13.722+02:00 level=INFO msg="starting server" addr=:4000 env=development
time=2023-09-10T10:59:14.722+02:00 level=INFO msg="shutting down server" signal=terminated
time=2023-09-10T10:59:14.722+02:00 level=INFO msg="completing background tasks" addr=:4000
time=2023-09-10T10:59:18.722+02:00 level=INFO msg="stopped server" addr=:4000

Notice how the "completing background tasks" message is written, then there is a pause of a couple of seconds while the background email sending completes, followed finally by the "stopped server" message?

This nicely illustrates how the graceful shutdown process waited for the welcome email to be sent (which took about two seconds in my case) before finally terminating the application.