The progressr package provides a minimal API for reporting progress updates in R. The design is to separate the representation of progress updates from how they are presented. What type of progress to signal is controlled by the developer. How these progress updates are rendered is controlled by the end user. For instance, some users may prefer visual feedback such as a horizontal progress bar in the terminal, whereas others may prefer auditory feedback.

Design motto:

The developer is responsible for providing progress updates but it's only the end user who decides if, when, and how progress should be presented. No exceptions will be allowed.

## Two Minimal APIs - One For Developers and One For End-Users

Developer's API

1. Set up a progressor with a certain number of steps:
p <- progressor(nsteps)
p <- progressor(along = x)


1. Signal progress:
p()               # one-step progress
p(amount = 0)     # "still alive"


End-user's API

1a. Subscribe to progress updates from everywhere:

handlers(global = TRUE)

y <- slow_sum(1:5)
y <- slow_sum(6:10)


1b. Subscribe to a specific expression:

with_progress({
y <- slow_sum(1:5)
y <- slow_sum(6:10)
})


1. Configure how progress is presented:
handlers("progress")
handlers("txtprogressbar", "beepr")
handlers(handler_pbcol(enable_after = 3.0))
handlers(handler_progress(complete = "#"))


## A simple example

Assume that we have a function slow_sum() for adding up the values in a vector. It is so slow, that we like to provide progress updates to whoever might be interested in it. With the progressr package, this can be done as:

slow_sum <- function(x) {
p <- progressr::progressor(along = x)
sum <- 0
for (kk in seq_along(x)) {
Sys.sleep(0.1)
sum <- sum + x[kk]
}
sum
}

Note how there are no arguments in the code that specifies how progress is presented. The only task for the developer is to decide on where in the code it makes sense to signal that progress has been made. As we will see next, it is up to the end user of this code to decide whether they want to receive progress updates or not, and, if so, in what format.

### Without reporting on progress

When calling this function as in:

> y <- slow_sum(1:10)
> y
[1] 55
>

it will behave as any function and there will be no progress updates displayed.

### Reporting on progress

If we are only interested in progress for a particular call, we can do:

> library(progressr)
> with_progress(y <- slow_sum(1:10))
|====================                               |  40%

However, if we want to report on progress from every call, wrapping the calls in with_progress() might become too cumbersome. If so, we can enable the global progress handler:

> library(progressr)
> handlers(global = TRUE)

so that progress updates are reported on wherever signaled, e.g.

> y <- slow_sum(1:10)
|====================                               |  40%
> y <- slow_sum(10:1)
|========================================           |  80%

This requires R 4.0.0 or newer. To disable this again, do:

> handlers(global = FALSE)

In the below examples, we will assume handlers(global = TRUE) is already set.

## Customizing how progress is reported

The default is to present progress via utils::txtProgressBar(), which is available on all R installations. To change the default, to, say, progress_bar() by the progress package, set:

handlers("progress")

This progress handler will present itself as:

> y <- slow_sum(1:10)
/ [================>--------------------------]  40% Added 4

To set the default progress handler, or handlers, in all your R sessions, call progressr::handlers(...) in your ~/.Rprofile file.

Progress updates do not have to be presented visually. They can equally well be communicated via audio. For example, using:

handlers("beepr")

will present itself as sounds played at the beginning, while progressing, and at the end (using different beepr sounds). There will be no output written to the terminal;

> y <- slow_sum(1:10)
> y
[1] 55
>

### Concurrent auditory and visual progress updates

It is possible to have multiple progress handlers presenting progress updates at the same time. For example, to get both visual and auditory updates, use:

handlers("txtprogressbar", "beepr")

### Silence all progress

To silence all progress updates, use:

handlers("void")

### Further configuration of progress handlers

Above we have seen examples where the handlers() takes one or more strings as input, e.g. handlers(c("progress", "beepr")). This is short for a more flexible specification where we can pass a list of handler functions, e.g.

handlers(list(
handler_progress(),
handler_beepr()
))

With this construct, we can make adjustments to the default behavior of these progress handlers. For example, we can configure the format, width, and complete arguments of progress::progress_bar$new(), and tell beepr to use a different finish sound and generate sounds at most every two seconds by setting: handlers(list( handler_progress( format = ":spin :current/:total (:message) [:bar] :percent in :elapsed ETA: :eta", width = 60, complete = "+" ), handler_beepr( finish = "wilhelm", interval = 2.0 ) )) ## Sticky messages As seen above, some progress handlers present the progress message as part of its output, e.g. the "progress" handler will display the message as part of the progress bar. It is also possible to "push" the message up together with other terminal output. This can be done by adding class attribute "sticky" to the progression signaled. This works for several progress handlers that output to the terminal. For example, with: slow_sum <- function(x) { p <- progressr::progressor(along = x) sum <- 0 for (kk in seq_along(x)) { Sys.sleep(0.1) sum <- sum + x[kk] p(sprintf("Step %d", kk), class = if (kk %% 5 == 0) "sticky", amount = 0) p(message = sprintf("Added %g", x[kk])) } sum } we get > handlers("txtprogressbar") > y <- slow_sum(1:30) Step 5 Step 10 |==================== | 43% and > handlers("progress") > y <- slow_sum(1:30) Step 5 Step 10 / [===============>--------------------------] 43% Added 13 ## Use regular output as usual alongside progress updates In contrast to other progress-bar frameworks, output from message(), cat(), print() and so on, will not interfere with progress reported via progressr. For example, say we have: slow_sqrt <- function(xs) { p <- progressor(along = xs) lapply(xs, function(x) { message("Calculating the square root of ", x) Sys.sleep(2) p(sprintf("x=%g", x)) sqrt(x) }) } we will get: > library(progressr) > handlers(global = TRUE) > handlers("progress") > y <- slow_sqrt(1:8) Calculating the square root of 1 Calculating the square root of 2 - [===========>-----------------------------------] 25% x=2 This works because progressr will briefly buffer any output internally and only release it when the next progress update is received just before the progress is re-rendered in the terminal. This is why you see a two second delay when running the above example. Note that, if we use progress handlers that do not output to the terminal, such as handlers("beepr"), then output does not have to be buffered and will appear immediately. Comment: When signaling a warning using warning(msg, immediate. = TRUE) the message is immediately outputted to the standard-error stream. However, this is not possible to emulate when warnings are intercepted using calling handlers, which are used by with_progress(). This is a limitation of R that cannot be worked around. Because of this, the above call will behave the same as warning(msg) - that is, all warnings will be buffered by R internally and released only when all computations are done. ## Support for progressr elsewhere Note that progression updates by progressr is designed to work out of the box for any iterator framework in R. Below is an set of examples for the most common ones. ### Base R Apply Functions library(progressr) handlers(global = TRUE) my_fcn <- function(xs) { p <- progressor(along = xs) y <- lapply(xs, function(x) { Sys.sleep(0.1) p(sprintf("x=%g", x)) sqrt(x) }) } my_fcn(1:5) # |==================== | 40% ### The foreach package library(foreach) library(progressr) handlers(global = TRUE) my_fcn <- function(xs) { p <- progressor(along = xs) y <- foreach(x = xs) %do% { Sys.sleep(0.1) p(sprintf("x=%g", x)) sqrt(x) } } my_fcn(1:5) # |==================== | 40% ### The purrr package library(purrr) library(progressr) handlers(global = TRUE) my_fcn <- function(xs) { p <- progressor(along = xs) y <- map(xs, function(x) { Sys.sleep(0.1) p(sprintf("x=%g", x)) sqrt(x) }) } my_fcn(1:5) # |==================== | 40% ### The plyr package library(plyr) library(progressr) handlers(global = TRUE) my_fcn <- function(xs) { p <- progressor(along = xs) y <- llply(xs, function(x, ...) { Sys.sleep(0.1) p(sprintf("x=%g", x)) sqrt(x) }) } my_fcn(1:5) # |==================== | 40% Note: This solution does not involved the .progress = TRUE argument that plyr implements. Because progressr is more flexible, and because .progress is automatically disabled when running in parallel (see below), I recommend to use the above progressr approach instead. Having said this, as proof-of-concept, the progressr package implements support .progress = "progressr" if you still prefer the plyr way of doing it. ## Parallel processing and progress updates The future framework, which provides a unified API for parallel and distributed processing in R, has built-in support for the kind of progression updates produced by the progressr package. This means that you can use it with for instance future.apply, furrr, and foreach with doFuture, and plyr or BiocParallel with doFuture. ### future_lapply() - parallel lapply() Here is an example that uses future_lapply() of the future.apply package to parallelize on the local machine while at the same time signaling progression updates: library(future.apply) plan(multisession) library(progressr) handlers(global = TRUE) handlers("progress", "beepr") my_fcn <- function(xs) { p <- progressor(along = xs) y <- future_lapply(xs, function(x, ...) { Sys.sleep(6.0-x) p(sprintf("x=%g", x)) sqrt(x) }) } my_fcn(1:5) # / [================>-----------------------------] 40% x=2 ### foreach() with doFuture Here is an example that uses foreach() of the foreach package to parallelize on the local machine (via doFuture) while at the same time signaling progression updates: library(doFuture) registerDoFuture() ## %dopar% parallelizes via future plan(multisession) library(progressr) handlers(global = TRUE) handlers("progress", "beepr") my_fcn <- function(xs) { p <- progressor(along = xs) y <- foreach(x = xs) %dopar% { Sys.sleep(6.0-x) p(sprintf("x=%g", x)) sqrt(x) } } my_fcn(1:5) # / [================>-----------------------------] 40% x=2 ### future_map() - parallel purrr::map() Here is an example that uses future_map() of the furrr package to parallelize on the local machine while at the same time signaling progression updates: library(furrr) plan(multisession) library(progressr) handlers(global = TRUE) handlers("progress", "beepr") my_fcn <- function(xs) { p <- progressor(along = xs) y <- future_map(xs, function(x) { Sys.sleep(6.0-x) p(sprintf("x=%g", x)) sqrt(x) }) } my_fcn(1:5) # / [================>-----------------------------] 40% x=2 Note: This solution does not involved the .progress = TRUE argument that furrr implements. Because progressr is more generic and because .progress = TRUE only works for certain future backends and produces errors on others, I recommended to stop using .progress = TRUE and use the progressr package instead. ### BiocParallel::bplapply() - parallel lapply() Here is an example that uses bplapply() of the BiocParallel package to parallelize on the local machine while at the same time signaling progression updates: library(BiocParallel) library(doFuture) register(DoparParam()) ## BiocParallel parallelizes via %dopar% registerDoFuture() ## %dopar% parallelizes via future plan(multisession) library(progressr) handlers(global = TRUE) handlers("progress", "beepr") my_fcn <- function(xs) { p <- progressor(along = xs) y <- bplapply(xs, function(x) { Sys.sleep(6.0-x) p(sprintf("x=%g", x)) sqrt(x) }) } my_fcn(1:5) # / [================>-----------------------------] 40% x=2 ### plyr::llply(..., .parallel = TRUE) with doFuture Here is an example that uses llply() of the plyr package to parallelize on the local machine while at the same time signaling progression updates: library(plyr) library(doFuture) registerDoFuture() ## %dopar% parallelizes via future plan(multisession) library(progressr) handlers(global = TRUE) handlers("progress", "beepr") my_fcn <- function(xs) { p <- progressor(along = xs) y <- llply(xs, function(x, ...) { Sys.sleep(6.0-x) p(sprintf("x=%g", x)) sqrt(x) }, .parallel = TRUE) } my_fcn(1:5) # / [================>-----------------------------] 40% x=2 Note: Although progressr implements support for using .progress = "progressr" with plyr, unfortunately, this will not work when using .parallel = TRUE. This is because plyr resets .progress to the default "none" internally regardless how we set .progress. See https://github.com/HenrikBengtsson/progressr/issues/70 for details and a hack that works around this limitation. ### Near-live versus buffered progress updates with futures As of November 2020, there are four types of future backends that are known(*) to provide near-live progress updates: 1. sequential, 2. multicore, 3. multisession, and 4. cluster (local and remote) Here "near-live" means that the progress handlers will report on progress almost immediately when the progress is signaled on the worker. For all other future backends, the progress updates are only relayed back to the main machine and reported together with the results of the futures. For instance, if future_lapply(X, FUN) chunks up the processing of, say, 100 elements in X into eight futures, we will see progress from each of the 100 elements as they are done when using a future backend supporting "near-live" updates, whereas we will only see those updated to be flushed eight times when using any other types of future backends. (*) Other future backends may gain support for "near-live" progress updating later. Adding support for those is independent of the progressr package. Feature requests for adding that support should go to those future-backend packages. ## Note of caution - sending progress updates too frequently Signaling progress updates comes with some overhead. In situation where we use progress updates, this overhead is typically much smaller than the task we are processing in each step. However, if the task we iterate over is quick, then the extra time induced by the progress updates might end up dominating the overall processing time. If that is the case, a simple solution is to only signal progress updates every n:th step. Here is a version of slow_sum() that signals progress every 10:th iteration: slow_sum <- function(x) { p <- progressr::progressor(length(x) / 10) sum <- 0 for (kk in seq_along(x)) { Sys.sleep(0.1) sum <- sum + x[kk] if (kk %% 10 == 0) p(message = sprintf("Added %g", x[kk])) } sum } The overhead of progress signaling may depend on context. For example, in parallel processing with near-live progress updates via 'multisession' futures, each progress update is communicated via a socket connections back to the main R session. These connections might become clogged up if progress updates are too frequent. ## Progress updates in non-interactive mode ("batch mode") When running R from the command line, R runs in a non-interactive mode (interactive() returns FALSE). The default behavior of progressr is to not report on progress in non-interactive mode. To reported on progress also then, set R options progressr.enable or environment variable R_PROGRESSR_ENABLE to TRUE. For example, $ Rscript -e "library(progressr)" -e "with_progress(y <- slow_sum(1:10))"

will not report on progress, whereas

$export R_PROGRESSR_ENABLE=TRUE$ Rscript -e "library(progressr)" -e "with_progress(y <- slow_sum(1:10))"

will.

Because this project is under active development, the progressr API is currently kept at a very minimum. This will allow for the framework and the API to evolve while minimizing the risk for breaking code that depends on it. The roadmap for developing the API is roughly:

• [x] Provide minimal API for producing progress updates, i.e. progressor(), with_progress(), handlers()

• [x] Add support for global progress handlers removing the need for the user having to specify with_progress(), i.e. handlers(global = TRUE) and handlers(global = FALSE)

• [ ] Make it possible to create a progressor also in the global environment (see 'Known issues' below)

• [ ] Add API to allow users and package developers to design additional progression handlers

For a more up-to-date view on what features might be added, see https://github.com/HenrikBengtsson/progressr/issues.

## Appendix

### Known issues

It is not possible to create a progressor in the global environment, e.g. in the the top-level of a script. It has to be created inside a function, within with_progress({ ... }), local({ ... }), or a similar construct. For example, the following:

library(progressr)
handlers(global = TRUE)

xs <- 1:5
p <- progressor(along = xs)
y <- lapply(xs, function(x) {
Sys.sleep(0.1)
p(sprintf("x=%g", x))
sqrt(x)
})

results in an error if tried:

Error in progressor(along = xs) :
A progressor must not be created in the global environment unless wrapped in a
with_progress() or without_progress() call. Alternatively, create it inside a
function or in a local() environment to make sure there is a finite life span
of the progressor

The solution is to wrap it in a local({ ... }) call, or more explicitly, in a with_progress({ ... }) call:

library(progressr)
handlers(global = TRUE)

xs <- 1:5
with_progress({
p <- progressor(along = xs)
y <- lapply(xs, function(x) {
Sys.sleep(0.1)
p(sprintf("x=%g", x))
sqrt(x)
})
})
#  |====================                               |  40%

The main reason for this is to limit the life span of each progressor. If we created it in the global environment, there is a significant risk it would never finish and block all of the following progressors.

### Under the hood

When using the progressr package, progression updates are communicated via R's condition framework, which provides methods for creating, signaling, capturing, muffling, and relaying conditions. Progression updates are of classes progression and immediateCondition(*). The below figure gives an example how progression conditions are created, signaled, and rendered.

(*) The immediateCondition class of conditions are relayed as soon as possible by the future framework, which means that progression updates produced in parallel workers are reported to the end user as soon as the main R session have received them.

Figure: Sequence diagram illustrating how signaled progression conditions are captured by with_progress(), or the global progression handler, and relayed to the two progression handlers 'progress' (a progress bar in the terminal) and 'beepr' (auditory) that the end user has chosen.

### Debugging

> handlers("debug")
> with_progress(y <- slow_sum(1:3))
[23:19:52.738] (0.000s => +0.002s) initiate: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:52.739] (0.001s => +0.000s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:52.942] (0.203s => +0.002s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.145] (0.407s => +0.001s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.348] (0.610s => +0.002s) update: 1/3 (+1) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.555] (0.817s => +0.004s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.758] (1.020s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.961] (1.223s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.165] (1.426s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.368] (1.630s => +0.001s) update: 2/3 (+1) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.795] (3.057s => +0.000s) shutdown: 3/3 (+0) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=ok}