The progressr package provides a minimal API for reporting progress updates in R. The design is to separate the representation of progress updates from how they are presented. What type of progress to signal is controlled by the developer. How these progress updates are rendered is controlled by the end user. For instance, some users may prefer visual feedback such as a horizontal progress bar in the terminal, whereas others may prefer auditory feedback.
Design motto:
The developer is responsible for providing progress updates but it’s only the end user who decides if, when, and how progress should be presented. No exceptions will be allowed.
Assume that we have a function slow_sum()
for adding up
the values in a vector. It is so slow, that we like to provide progress
updates to whoever might be interested in it. With the
progressr package, this can be done as:
slow_sum <- function(x) {
p <- progressr::progressor(along = x)
sum <- 0
for (kk in seq_along(x)) {
Sys.sleep(0.1)
sum <- sum + x[kk]
p(message = sprintf("Adding %g", x[kk]))
}
sum
}
Note how there are no arguments in the code that specifies how progress is presented. The only task for the developer is to decide on where in the code it makes sense to signal that progress has been made. As we will see next, it is up to the end user of this code to decide whether they want to receive progress updates or not, and, if so, in what format.
When calling this function as in:
it will behave as any function and there will be no progress updates displayed.
If we are only interested in progress for a particular call, we can do:
However, if we want to report on progress from every call,
wrapping the calls in with_progress()
might become too
cumbersome. If so, we can enable the global progress handler:
so that progress updates are reported on wherever signaled, e.g.
> y <- slow_sum(1:10)
|==================== | 40%
> y <- slow_sum(10:1)
|======================================== | 80%
This requires R 4.0.0 or newer. To disable this again, do:
In the below examples, we will assume
handlers(global = TRUE)
is already set.
The default is to present progress via
utils::txtProgressBar()
, which is available on all R
installations. It presents itself as an ASCII-based horizontal progress
bar in the R terminal. This is rendered as:
We can tweak this “txtprogressbar” handler to use red hearts for the bar, e.g.
handlers(handler_txtprogressbar(char = cli::col_red(cli::symbol$heart)))
which results in:
Another example is:
handlers(handler_pbcol(
adjust = 1.0,
complete = function(s) cli::bg_red(cli::col_black(s)),
incomplete = function(s) cli::bg_cyan(cli::col_black(s))
))
which results in:
To change the default, to, say, cli_progress_bar()
by
the cli package,
set:
handlers("cli")
This progress handler will present itself as:
To instead use progress_bar()
by the progress
package, set:
handlers("progress")
This progress handler will present itself as:
To set the default progress handler, or handlers, in all your R
sessions, call progressr::handlers(...)
in your
~/.Rprofile
startup file.
Progress updates do not have to be presented visually. They can equally well be communicated via audio. For example, using:
handlers("beepr")
will present itself as sounds played at the beginning, while progressing, and at the end (using different beepr sounds). There will be no output written to the terminal;
It is possible to have multiple progress handlers presenting progress updates at the same time. For example, to get both visual and auditory updates, use:
handlers("txtprogressbar", "beepr")
Above we have seen examples where the handlers()
takes
one or more strings as input,
e.g. handlers(c("progress", "beepr"))
. This is short for a
more flexible specification where we can pass a list of handler
functions, e.g.
handlers(list(
handler_progress(),
handler_beepr()
))
With this construct, we can make adjustments to the default behavior
of these progress handlers. For example, we can configure the
format
, width
, and complete
arguments of progress::progress_bar$new()
, and tell
beepr to use a different finish
sound and
generate sounds at most every two seconds by setting:
handlers(list(
handler_progress(
format = ":spin :current/:total (:message) [:bar] :percent in :elapsed ETA: :eta",
width = 60,
complete = "+"
),
handler_beepr(
finish = "wilhelm",
interval = 2.0
)
))
As seen above, some progress handlers present the progress message as
part of its output, e.g. the “progress” handler will display the message
as part of the progress bar. It is also possible to “push” the message
up together with other terminal output. This can be done by adding class
attribute "sticky"
to the progression signaled. This works
for several progress handlers that output to the terminal. For example,
with:
slow_sum <- function(x) {
p <- progressr::progressor(along = x)
sum <- 0
for (kk in seq_along(x)) {
Sys.sleep(0.1)
sum <- sum + x[kk]
p(sprintf("Step %d", kk), class = if (kk %% 5 == 0) "sticky", amount = 0)
p(message = sprintf("Adding %g", x[kk]))
}
sum
}
we get
and
In contrast to other progress-bar frameworks, output from
message()
, cat()
, print()
and so
on, will not interfere with progress reported via
progressr. For example, say we have:
slow_sqrt <- function(xs) {
p <- progressor(along = xs)
lapply(xs, function(x) {
message("Calculating the square root of ", x)
Sys.sleep(2)
p(sprintf("x=%g", x))
sqrt(x)
})
}
we will get:
> library(progressr)
> handlers(global = TRUE)
> handlers("progress")
> y <- slow_sqrt(1:8)
Calculating the square root of 1
Calculating the square root of 2
- [===========>-----------------------------------] 25% x=2
This works because progressr will briefly buffer any
output internally and only release it when the next progress update is
received just before the progress is re-rendered in the terminal. This
is why you see a two second delay when running the above example. Note
that, if we use progress handlers that do not output to the terminal,
such as handlers("beepr")
, then output does not have to be
buffered and will appear immediately.
Comment: When signaling a warning using
warning(msg, immediate. = TRUE)
the message is immediately
outputted to the standard-error stream. However, this is not possible to
emulate when warnings are intercepted using calling handlers, which are
used by with_progress()
. This is a limitation of R that
cannot be worked around. Because of this, the above call will behave the
same as warning(msg)
- that is, all warnings will be
buffered by R internally and released only when all computations are
done.
Note that progression updates by progressr is designed to work out of the box for any iterator framework in R. Below is an set of examples for the most common ones.
library(plyr)
library(progressr)
handlers(global = TRUE)
my_fcn <- function(xs) {
p <- progressor(along = xs)
llply(xs, function(x, ...) {
Sys.sleep(0.1)
p(sprintf("x=%g", x))
sqrt(x)
})
}
my_fcn(1:5)
# |==================== | 40%
Note how this solution does not make use of plyr’s
.progress
argument, because the above solution is more
powerful and more flexible, e.g. we have more control on progress
updates and their messages. However, if you prefer the traditional
plyr approach, you can use
.progress = "progressr"
,
e.g. y <- llply(..., .progress = "progressr")
.
When compiling (“knitting”) an knitr-based vignette, for instance,
via knitr::knit()
, knitr shows
the progress of code chunks processed thus far using a progress bar. In
knitr (>= 1.42) [to be released as of 2022-12-12],
we can use progressr for this progress reporting. To do
this, set R option knitr.progress.fun
as:
options(knitr.progress.fun = function(total, labels) {
p <- progressr::progressor(total, on_exit = FALSE)
list(
update = function(i) p(sprintf("chunk: %s", labels[i])),
done = function() p(type = "finish")
)
})
This configures knitr to signal progress via the progressr framework. To report on these, use:
progressr::handlers(global = TRUE)
The cli package is used for progress reporting by many several packages, notably tidyverse packages. For instance, in purrr, you can do:
to report on progress via the cli package as
map()
is iterating over the elements. Now, instead of using
the default, built-in cli progress bar, we can
customize cli to report on progress via
progressr instead. To do this, set R option
cli.progress_handlers
as:
options(cli.progress_handlers = "progressr")
With this option set, cli will now report on
progress according to your progressr::handlers()
settings.
For example, with:
will report on progress using beepr and the RStudio Console progress panel.
To make cli report via progressr in
all your R session, set the above R option in your
~/.Rprofile
startup file.
Note: A cli progress bar can have a “name”,
which can be specfied in purrr function via argument
.progress
, e.g. .progress = "processing"
. This
name is then displayed in front of the progress bar. However, because
the progressr framework does not have a concept of
progress “name”, they are silently ignored when using
options(cli.progress_handlers = "progressr")
.
The future
framework, which provides a unified API for parallel and distributed
processing in R, has built-in support for the kind of progression
updates produced by the progressr package. This means
that you can use it with for instance future.apply,
furrr, and
foreach
with doFuture,
and plyr or
BiocParallel
with doFuture. In contrast, non-future
parallelization methods such as parallel’s
mclapply()
and, parallel::parLapply()
, and
foreach adapters like doParallel do
not support progress reports via
progressr.
Here is an example that uses future_lapply()
of the
future.apply
package to parallelize on the local machine while at the same time
signaling progression updates:
library(future.apply)
plan(multisession)
library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")
my_fcn <- function(xs) {
p <- progressor(along = xs)
future_lapply(xs, function(x, ...) {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
})
}
my_fcn(1:5)
# / [================>-----------------------------] 40% x=2
Here is an example that uses foreach()
of the foreach
package together with %dofuture%
of the doFuture
package to parallelize while reporting on progress. This example
parallelizes on the local machine, it works alsof for remote
machines:
library(doFuture) ## %dofuture%
plan(multisession)
library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")
my_fcn <- function(xs) {
p <- progressor(along = xs)
foreach(x = xs) %dofuture% {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
}
}
my_fcn(1:5)
# / [================>-----------------------------] 40% x=2
For existing code using the traditional %dopar%
operators of the foreach
package, we can register the doFuture
adaptor and use the same progressr as above to progress
updates;
library(doFuture)
registerDoFuture() ## %dopar% parallelizes via future
plan(multisession)
library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")
my_fcn <- function(xs) {
p <- progressor(along = xs)
foreach(x = xs) %dopar% {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
}
}
my_fcn(1:5)
# / [================>-----------------------------] 40% x=2
Here is an example that uses future_map()
of the
furrr
package to parallelize on the local machine while at the same time
signaling progression updates:
library(furrr)
plan(multisession)
library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")
my_fcn <- function(xs) {
p <- progressor(along = xs)
future_map(xs, function(x) {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
})
}
my_fcn(1:5)
# / [================>-----------------------------] 40% x=2
Note: This solution does not involved the
.progress = TRUE
argument that furrr
implements. Because progressr is more generic and
because .progress = TRUE
only supports certain future
backends and produces errors on non-supported backends, I recommended to
stop using .progress = TRUE
and use the
progressr package instead.
Here is an example that uses bplapply()
of the
BiocParallel
package to parallelize on the local machine while at the same time
signaling progression updates:
library(BiocParallel)
library(doFuture)
register(DoparParam()) ## BiocParallel parallelizes via %dopar%
registerDoFuture() ## %dopar% parallelizes via future
plan(multisession)
library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")
my_fcn <- function(xs) {
p <- progressor(along = xs)
bplapply(xs, function(x) {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
})
}
my_fcn(1:5)
# / [================>-----------------------------] 40% x=2
Here is an example that uses llply()
of the plyr package
to parallelize on the local machine while at the same time signaling
progression updates:
library(plyr)
library(doFuture)
registerDoFuture() ## %dopar% parallelizes via future
plan(multisession)
library(progressr)
handlers(global = TRUE)
handlers("progress", "beepr")
my_fcn <- function(xs) {
p <- progressor(along = xs)
llply(xs, function(x, ...) {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
}, .parallel = TRUE)
}
my_fcn(1:5)
# / [================>-----------------------------] 40% x=2
Note: As an alternative to the above, recommended approach,
one can use .progress = "progressr"
together with
.parallel = TRUE
. This requires plyr
(>= 1.8.7).
As of November 2020, there are four types of future backends that are known(*) to provide near-live progress updates:
sequential
,multicore
,multisession
, andcluster
(local and remote)Here “near-live” means that the progress handlers will report on
progress almost immediately when the progress is signaled on the worker.
For all other future backends, the progress updates are only relayed
back to the main machine and reported together with the results of the
futures. For instance, if future_lapply(X, FUN)
chunks up
the processing of, say, 100 elements in X
into eight
futures, we will see progress from each of the 100 elements as they are
done when using a future backend supporting “near-live” updates, whereas
we will only see those updated to be flushed eight times when using any
other types of future backends.
(*) Other future backends may gain support for “near-live” progress updating later. Adding support for those is independent of the progressr package. Feature requests for adding that support should go to those future-backend packages.
Signaling progress updates comes with some overhead. In situation
where we use progress updates, this overhead is typically much smaller
than the task we are processing in each step. However, if the task we
iterate over is quick, then the extra time induced by the progress
updates might end up dominating the overall processing time. If that is
the case, a simple solution is to only signal progress updates every
n:th step. Here is a version of slow_sum()
that signals
progress every 10:th iteration:
slow_sum <- function(x) {
p <- progressr::progressor(length(x) / 10)
sum <- 0
for (kk in seq_along(x)) {
Sys.sleep(0.1)
sum <- sum + x[kk]
if (kk %% 10 == 0) p(message = sprintf("Adding %g", x[kk]))
}
sum
}
The overhead of progress signaling may depend on context. For example, in parallel processing with near-live progress updates via ‘multisession’ futures, each progress update is communicated via a socket connections back to the main R session. These connections might become clogged up if progress updates are too frequent.
When running R from the command line, R runs in a non-interactive
mode (interactive()
returns FALSE
). The
default behavior of progressr is to not report
on progress in non-interactive mode. To reported on progress also then,
set R options progressr.enable
or environment variable
R_PROGRESSR_ENABLE
to TRUE
. For example,
will not report on progress, whereas
$ export R_PROGRESSR_ENABLE=TRUE
$ Rscript -e "library(progressr)" -e "with_progress(y <- slow_sum(1:10))"
will.
Because this project is under active development, the progressr API is currently kept at a very minimum. This will allow for the framework and the API to evolve while minimizing the risk for breaking code that depends on it. The roadmap for developing the API is roughly:
For a more up-to-date view on what features might be added, see https://github.com/futureverse/progressr/issues.
It is not possible to create a progressor in the global environment,
e.g. in the the top-level of a script. It has to be created inside a
function, within with_progress({ ... })
,
local({ ... })
, or a similar construct. For example, the
following:
library(progressr)
handlers(global = TRUE)
xs <- 1:5
p <- progressor(along = xs)
y <- lapply(xs, function(x) {
Sys.sleep(0.1)
p(sprintf("x=%g", x))
sqrt(x)
})
results in an error if tried:
Error in progressor(along = xs) :
A progressor must not be created in the global environment unless wrapped in a
with_progress() or without_progress() call. Alternatively, create it inside a
function or in a local() environment to make sure there is a finite life span
of the progressor
The solution is to wrap it in a local({ ... })
call, or
more explicitly, in a with_progress({ ... })
call:
library(progressr)
handlers(global = TRUE)
xs <- 1:5
with_progress({
p <- progressor(along = xs)
y <- lapply(xs, function(x) {
Sys.sleep(0.1)
p(sprintf("x=%g", x))
sqrt(x)
})
})
# |==================== | 40%
The main reason for this is to limit the life span of each progressor. If we created it in the global environment, there is a significant risk it would never finish and block all of the following progressors.
It is not possible to call
handlers(global = TRUE)
in all circumstances. For example,
it cannot be called within tryCatch()
and
withCallingHandlers()
;
> tryCatch(handlers(global = TRUE), error = identity)
Error in globalCallingHandlers(NULL) :
should not be called with handlers on the stack
This is not a bug - neither in progressr nor in R itself. It’s due to a conservative design on how global calling handlers should work in R. If it allowed, there’s a risk we might end up getting weird and unpredictable behaviors when messages, warnings, errors, and other types of conditions are signaled.
Because tryCatch()
and
withCallingHandlers()
is used in many places throughout
base R, this means that we also cannot call
handlers(global = TRUE)
as part of a package’s startup
process, e.g. .onLoad()
or .onAttach()
.
Another example of this error is if
handlers(global = TRUE)
is used inside package vignettes
and dynamic documents such as Rmarkdown. In such cases, the global
progress handler has to be enabled prior to processing the
document, e.g.
When using the progressr package, progression
updates are communicated via R’s condition framework, which provides
methods for creating, signaling, capturing, muffling, and relaying
conditions. Progression updates are of classes progression
and immediateCondition
(*). The below figure gives an
example how progression conditions are created, signaled, and
rendered.
(*) The immediateCondition
class of conditions are
relayed as soon as possible by the future
framework, which means that progression updates produced in parallel
workers are reported to the end user as soon as the main R session have
received them.
Figure: Sequence diagram illustrating how signaled progression
conditions are captured by with_progress()
, or the global
progression handler, and relayed to the two progression handlers
‘progress’ (a progress bar in the terminal) and ‘beepr’ (auditory) that
the end user has chosen.
To debug progress updates, use:
> handlers("debug")
> with_progress(y <- slow_sum(1:3))
[23:19:52.738] (0.000s => +0.002s) initiate: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:52.739] (0.001s => +0.000s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:52.942] (0.203s => +0.002s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.145] (0.407s => +0.001s) update: 0/3 (+0) '' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.348] (0.610s => +0.002s) update: 1/3 (+1) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
M: Adding value 1
[23:19:53.555] (0.817s => +0.004s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.758] (1.020s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:53.961] (1.223s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.165] (1.426s => +0.001s) update: 1/3 (+0) 'P: Adding 1' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.368] (1.630s => +0.001s) update: 2/3 (+1) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
M: Adding value 2
[23:19:54.574] (1.835s => +0.003s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.777] (2.039s => +0.001s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:54.980] (2.242s => +0.001s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.183] (2.445s => +0.001s) update: 2/3 (+0) 'P: Adding 2' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.387] (2.649s => +0.001s) update: 3/3 (+1) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=}
[23:19:55.388] (2.650s => +0.003s) update: 3/3 (+0) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=}
M: Adding value 3
[23:19:55.795] (3.057s => +0.000s) shutdown: 3/3 (+0) 'P: Adding 3' {clear=TRUE, enabled=TRUE, status=ok}