Title: | Batch Computing with R |
---|---|
Description: | Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page. |
Authors: | Bernd Bischl <[email protected]>, Michel Lang <[email protected]>, Henrik Bengtsson <[email protected]> |
Maintainer: | Bernd Bischl <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 1.9 |
Built: | 2024-12-12 04:41:30 UTC |
Source: | https://github.com/tudo-r/batchjobs |
Mutator function for packages
in makeRegistry
.
addRegistryPackages(reg, packages)
addRegistryPackages(reg, packages)
reg |
[ |
packages |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Mutator function for src.dirs
in makeRegistry
.
addRegistrySourceDirs(reg, src.dirs, src.now = TRUE)
addRegistrySourceDirs(reg, src.dirs, src.now = TRUE)
reg |
[ |
src.dirs |
[ |
src.now |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistryPackages()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Mutator function for src.files
in makeRegistry
.
addRegistrySourceFiles(reg, src.files, src.now = TRUE)
addRegistrySourceFiles(reg, src.files, src.now = TRUE)
reg |
[ |
src.files |
[ |
src.now |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Maps an n-ary-function over a list of all combinations which are given by some vectors.
Internally expand.grid
is used to compute the combinations, then
batchMap
is called.
batchExpandGrid(reg, fun, ..., more.args = list())
batchExpandGrid(reg, fun, ..., more.args = list())
reg |
[ |
fun |
[ |
... |
[any] |
more.args |
[ |
[data.frame
]. Expanded grid of combinations produced by expand.grid
.
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x, y, z) x * y + z # lets store the param grid grid = batchExpandGrid(reg, f, x = 1:2, y = 1:3, more.args = list(z = 10)) submitJobs(reg) waitForJobs(reg) y = reduceResultsVector(reg) # later, we can always access the param grid like this grid = getJobParamDf(reg) cbind(grid, y = y)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x, y, z) x * y + z # lets store the param grid grid = batchExpandGrid(reg, f, x = 1:2, y = 1:3, more.args = list(z = 10)) submitJobs(reg) waitForJobs(reg) y = reduceResultsVector(reg) # later, we can always access the param grid like this grid = getJobParamDf(reg) cbind(grid, y = y)
Saves objects as RData
files in the “exports” subdirectory of your file.dir
to be later loaded on the slaves.
batchExport(reg, ..., li = list(), overwrite = FALSE)
batchExport(reg, ..., li = list(), overwrite = FALSE)
reg |
[ |
... |
[any] |
li |
[ |
overwrite |
[ |
[character
]. Invisibly returns a character vector of exported objects.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.
The package currently support the following further R options, which you can set
either in your R profile file or a script via options
:
This boolean flag can be set to FALSE
to reduce the
console output of the package operations. Usually you want to see this output in interactive
work, but when you use the package in e.g. knitr documents,
it clutters the resulting document too much.
If this boolean flag is enabled, the package checks your
registry file dir (and related user-defined directories) quite strictly to be POSIX compliant.
Usually this is a good idea, you do not want to have strange chars in your file paths,
as this might results in problems when these paths get passed to the scheduler or other
command-line tools that the package interoperates with.
But on some OS this check might be too strict and cause problems.
Setting the flag to FALSE
allows to disable the check entirely.
The default is FALSE
on Windows systems and TRUE
else.
You can then submit these jobs to the batch system.
batchMap(reg, fun, ..., more.args = list(), use.names = FALSE)
batchMap(reg, fun, ..., more.args = list(), use.names = FALSE)
reg |
[ |
fun |
[ |
... |
[any] |
more.args |
[ |
use.names |
[ |
Vector of type integer
with job ids.
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) print(reg)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) print(reg)
Combination of makeRegistry
, batchMap
and submitJobs
for quick computations on the cluster.
Should only be used by skilled users who know what they are doing.
Creates the file.dir, maps function, potentially chunks jobs and submits them.
batchMapQuick( fun, ..., more.args = list(), file.dir = NULL, packages = character(0L), chunk.size, n.chunks, chunks.as.arrayjobs = FALSE, inds, resources = list() )
batchMapQuick( fun, ..., more.args = list(), file.dir = NULL, packages = character(0L), chunk.size, n.chunks, chunks.as.arrayjobs = FALSE, inds, resources = list() )
fun |
[ |
... |
[any] |
more.args |
[ |
file.dir |
[ |
packages |
[ |
chunk.size |
[ |
n.chunks |
[ |
chunks.as.arrayjobs |
[ |
inds |
[ |
resources |
[ |
[Registry
]
Maps a function over the results of a registry by using batchMap.
batchMapResults( reg, reg2, fun, ..., ids, part = NA_character_, more.args = list() )
batchMapResults( reg, reg2, fun, ..., ids, part = NA_character_, more.args = list() )
reg |
[ |
reg2 |
[ |
fun |
[ |
... |
[any] |
ids |
[ |
part |
[ |
more.args |
[ |
Vector of type integer
with job ids.
reg1 = makeRegistry(id = "BatchJobsExample1", file.dir = tempfile(), seed = 123) # square some numbers f = function(x) x^2 batchMap(reg1, f, 1:10) # submit jobs and wait for the jobs to finish submitJobs(reg1) waitForJobs(reg1) # look at results reduceResults(reg1, fun = function(aggr,job,res) c(aggr, res)) reg2 = makeRegistry(id = "BatchJobsExample2", file.dir = tempfile(), seed = 123) # define function to tranform results, we simply do the inverse of the squaring g = function(job, res) sqrt(res) batchMapResults(reg1, reg2, fun = g) # submit jobs and wait for the jobs to finish submitJobs(reg2) waitForJobs(reg2) # check results reduceResults(reg2, fun = function(aggr,job,res) c(aggr, res))
reg1 = makeRegistry(id = "BatchJobsExample1", file.dir = tempfile(), seed = 123) # square some numbers f = function(x) x^2 batchMap(reg1, f, 1:10) # submit jobs and wait for the jobs to finish submitJobs(reg1) waitForJobs(reg1) # look at results reduceResults(reg1, fun = function(aggr,job,res) c(aggr, res)) reg2 = makeRegistry(id = "BatchJobsExample2", file.dir = tempfile(), seed = 123) # define function to tranform results, we simply do the inverse of the squaring g = function(job, res) sqrt(res) batchMapResults(reg1, reg2, fun = g) # submit jobs and wait for the jobs to finish submitJobs(reg2) waitForJobs(reg2) # check results reduceResults(reg2, fun = function(aggr,job,res) c(aggr, res))
Manually query the BatchJobs database
batchQuery(reg, query, flags = "ro")
batchQuery(reg, query, flags = "ro")
reg |
[ |
query |
[ |
flags |
[ |
[data.frame
] Result of the query.
reg = makeRegistry("test", file.dir = tempfile()) batchMap(reg, identity, i = 1:10) batchQuery(reg, "SELECT * FROM test_job_status")
reg = makeRegistry("test", file.dir = tempfile()) batchMap(reg, identity, i = 1:10) batchQuery(reg, "SELECT * FROM test_job_status")
Each jobs reduces a certain number of elements on one slave. You can then submit these jobs to the batch system.
batchReduce(reg, fun, xs, init, block.size, more.args = list())
batchReduce(reg, fun, xs, init, block.size, more.args = list())
reg |
[ |
fun |
[ |
xs |
[ |
init |
[any] |
block.size |
[ |
more.args |
[ |
Vector of type integer
with job ids.
# define function to reduce on slave, we want to sum a vector f = function(aggr, x) aggr + x reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) # sum 20 numbers on each slave process, i.e. 5 jobs batchReduce(reg, fun = f, 1:100, init = 0, block.size = 5) submitJobs(reg) waitForJobs(reg) # now reduce one final time on master reduceResults(reg, fun = function(aggr,job,res) f(aggr, res))
# define function to reduce on slave, we want to sum a vector f = function(aggr, x) aggr + x reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) # sum 20 numbers on each slave process, i.e. 5 jobs batchReduce(reg, fun = f, 1:100, init = 0, block.size = 5) submitJobs(reg) waitForJobs(reg) # now reduce one final time on master reduceResults(reg, fun = function(aggr,job,res) f(aggr, res))
Each jobs reduces a certain number of results on one slave.
You can then submit these jobs to the batch system.
Later, you can do a final reduction with reduceResults
on the master.
batchReduceResults( reg, reg2, fun, ids, part = NA_character_, init, block.size, more.args = list() )
batchReduceResults( reg, reg2, fun, ids, part = NA_character_, init, block.size, more.args = list() )
reg |
[ |
reg2 |
[ |
fun |
[ |
ids |
[ |
part |
[ |
init |
[any] |
block.size |
[ |
more.args |
[ |
Vector of type integer
with job ids.
# generating example results: reg1 = makeRegistry(id = "BatchJobsExample1", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg1, f, 1:20) submitJobs(reg1) waitForJobs(reg1) # define function to reduce on slave, we want to sum the squares myreduce = function(aggr, job, res) aggr + res # sum 5 results on each slave process, i.e. 4 jobs reg2 = makeRegistry(id = "BatchJobsExample2", file.dir = tempfile(), seed = 123) batchReduceResults(reg1, reg2, fun = myreduce, init = 0, block.size = 5) submitJobs(reg2) waitForJobs(reg2) # now reduce one final time on master reduceResults(reg2, fun = myreduce)
# generating example results: reg1 = makeRegistry(id = "BatchJobsExample1", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg1, f, 1:20) submitJobs(reg1) waitForJobs(reg1) # define function to reduce on slave, we want to sum the squares myreduce = function(aggr, job, res) aggr + res # sum 5 results on each slave process, i.e. 4 jobs reg2 = makeRegistry(id = "BatchJobsExample2", file.dir = tempfile(), seed = 123) batchReduceResults(reg1, reg2, fun = myreduce, init = 0, block.size = 5) submitJobs(reg2) waitForJobs(reg2) # now reduce one final time on master reduceResults(reg2, fun = myreduce)
Removes RData
files from the “exports” subdirectory of your file.dir
and thereby prevents loading on the slave.
batchUnexport(reg, what)
batchUnexport(reg, what)
reg |
[ |
what |
[ |
[character
]. Invisibly returns a character vector of unexported objects.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Calls can be made in parallel or consecutively, the function waits until all calls have finished and returns call results. In consecutive mode the output on the workers can also be shown on the master during computation.
Please read and understand the comments for argument dir
.
Note that this function should only be used for short administrative
tasks or information gathering on the workers, the true work horse for
real computation is submitJobs
.
In makeSSHWorker
various options for load
management are possible. Note that these will be
ignored for the current call to execute it immediatly.
callFunctionOnSSHWorkers( nodenames, fun, ..., consecutive = FALSE, show.output = consecutive, simplify = TRUE, use.names = TRUE, dir = getwd() )
callFunctionOnSSHWorkers( nodenames, fun, ..., consecutive = FALSE, show.output = consecutive, simplify = TRUE, use.names = TRUE, dir = getwd() )
nodenames |
[ |
fun |
[ |
... |
[any] |
consecutive |
[ |
show.output |
[ |
simplify |
[ |
use.names |
[ |
dir |
[ |
Results of function calls, either a list or simplified.
This function is only intended for use in your own cluster functions implementation.
Calls brew silently on your template, any error will lead to an exception.
If debug mode is turned on in the configuration, the file is stored at the same place as the
corresponding R script in the “jobs”-subdir of your files directory,
otherwise in the temp dir via tempfile
.
cfBrewTemplate(conf, template, rscript, extension)
cfBrewTemplate(conf, template, rscript, extension)
conf |
[ |
template |
[ |
rscript |
[ |
extension |
[ |
[character(1)
]. File path of result.
This function is only intended for use in your own cluster functions implementation.
Simply constructs a SubmitJobResult
object with status code 101,
NA as batch job id and an informative error message containing the output of the OS command in output
.
cfHandleUnknownSubmitError(cmd, exit.code, output)
cfHandleUnknownSubmitError(cmd, exit.code, output)
cmd |
[ |
exit.code |
[ |
output |
[ |
This function is only intended for use in your own cluster functions implementation.
Calls the OS command to kill a job via system
like this: “cmd batch.job.id”.
If the command returns an exit code > 0, the command is repeated
after a 1 second sleep max.tries-1
times.
If the command failed in all tries, an exception is generated.
cfKillBatchJob(cmd, batch.job.id, max.tries = 3L)
cfKillBatchJob(cmd, batch.job.id, max.tries = 3L)
cmd |
[ |
batch.job.id |
[ |
max.tries |
[ |
Nothing.
This function is only intended for use in your own cluster functions implementation.
Simply reads your template and returns it as a character vector. If you do this in the constructor of your cluster functions once, you can avoid this repeated file access later on.
cfReadBrewTemplate(template.file)
cfReadBrewTemplate(template.file)
template.file |
[ |
[character
].
Simply checks if probided vector of job ids is valid and throws an error if something is odd.
checkIds(reg, ids, check.present = TRUE, len = NULL)
checkIds(reg, ids, check.present = TRUE, len = NULL)
reg |
[ |
ids |
[ |
check.present |
[ |
len |
[ |
Invisibly returns the vector of ids, converted to integer.
In order to understand how the package should be configured please read https://github.com/tudo-r/BatchJobs/wiki/Configuration.
Other conf:
getConfig()
,
loadConfig()
,
setConfig()
Useful in case of severe errors. Tries different operations of increasing difficulty and provides debug output on the console
debugMulticore(r.options)
debugMulticore(r.options)
r.options |
[ |
Nothing.
Other debug:
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Useful in case of configuration problems. Tries different operations of increasing difficulty and provides debug output on the console.
Note that this function does not access nor use information specified for your cluster functions in your configuration.
debugSSH( nodename, ssh.cmd = "ssh", ssh.args = character(0L), rhome = "", r.options = c("--no-save", "--no-restore", "--no-init-file", "--no-site-file"), dir = getwd() )
debugSSH( nodename, ssh.cmd = "ssh", ssh.args = character(0L), rhome = "", r.options = c("--no-save", "--no-restore", "--no-init-file", "--no-site-file"), dir = getwd() )
nodename |
[ |
ssh.cmd |
[ |
ssh.args |
[ |
rhome |
[ |
r.options |
[ |
dir |
[ |
Nothing.
Other debug:
debugMulticore()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Find all results where a specific condition is true.
filterResults(reg, ids, fun, ...)
filterResults(reg, ids, fun, ...)
reg |
[ |
ids |
[ |
fun |
[ |
... |
[any] |
[integer
]. Ids of jobs where fun(job, result)
returns TRUE
.
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) submitJobs(reg) waitForJobs(reg) # which square numbers are even: filterResults(reg, fun = function(job, res) res %% 2 == 0)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) submitJobs(reg) waitForJobs(reg) # which square numbers are even: filterResults(reg, fun = function(job, res) res %% 2 == 0)
findDone
: Find jobs which succesfully terminated.
findDone(reg, ids, limit = NULL) findNotDone(reg, ids, limit = NULL) findMissingResults(reg, ids, limit = NULL) findErrors(reg, ids, limit = NULL) findNotErrors(reg, ids, limit = NULL) findTerminated(reg, ids, limit = NULL) findNotTerminated(reg, ids, limit = NULL) findSubmitted(reg, ids, limit = NULL) findNotSubmitted(reg, ids, limit = NULL) findOnSystem(reg, ids, limit = NULL) findNotOnSystem(reg, ids, limit = NULL) findRunning(reg, ids, limit = NULL) findNotRunning(reg, ids, limit = NULL) findStarted(reg, ids, limit = NULL) findNotStarted(reg, ids, limit = NULL) findExpired(reg, ids, limit = NULL) findDisappeared(reg, ids, limit = NULL)
findDone(reg, ids, limit = NULL) findNotDone(reg, ids, limit = NULL) findMissingResults(reg, ids, limit = NULL) findErrors(reg, ids, limit = NULL) findNotErrors(reg, ids, limit = NULL) findTerminated(reg, ids, limit = NULL) findNotTerminated(reg, ids, limit = NULL) findSubmitted(reg, ids, limit = NULL) findNotSubmitted(reg, ids, limit = NULL) findOnSystem(reg, ids, limit = NULL) findNotOnSystem(reg, ids, limit = NULL) findRunning(reg, ids, limit = NULL) findNotRunning(reg, ids, limit = NULL) findStarted(reg, ids, limit = NULL) findNotStarted(reg, ids, limit = NULL) findExpired(reg, ids, limit = NULL) findDisappeared(reg, ids, limit = NULL)
reg |
[ |
ids |
[ |
limit |
[ |
[integer
]. Ids of jobs.
Finds ids of jobs that match a query.
findJobs(reg, ids, pars, jobnames)
findJobs(reg, ids, pars, jobnames)
reg |
[ |
ids |
[ |
pars |
[R expression] |
jobnames |
[ |
[integer
]. Ids for jobs which match the query.
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x, y) x * y batchExpandGrid(reg, f, x = 1:2, y = 1:3) findJobs(reg, pars = (y > 2))
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x, y) x * y batchExpandGrid(reg, f, x = 1:2, y = 1:3) findJobs(reg, pars = (y > 2))
Returns a list of BatchJobs configuration settings
getConfig()
getConfig()
list
of current configuration variables with classs “Config”.
Other conf:
configuration
,
loadConfig()
,
setConfig()
Get error messages of jobs.
getErrorMessages(reg, ids)
getErrorMessages(reg, ids)
reg |
[ |
ids |
[ |
[character
]. Error messages for jobs as character vectorNA
if job has terminated successfully.
Other debug:
debugMulticore()
,
debugSSH()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Get job from registry by id.
getJob(reg, id, check.id = TRUE)
getJob(reg, id, check.id = TRUE)
reg |
[ |
id |
[ |
check.id |
[ |
[Job
].
Get ids of jobs in registry.
getJobIds(reg)
getJobIds(reg)
reg |
[ |
[character
].
Returns time stamps (submitted, started, done, error),
time running, approximate memory usage (in Mb),
error messages (shortened, see showLog
for detailed error messages),
time in queue, hostname of the host the job was executed,
assigned batch ID, the R PID and the seed of the job.
To estimate memory usage the sum of the last column of gc
is used.
Column “time.running” displays the time until either the job was done, or an error occured;
it will by NA
in case of time outs or hard R crashes.
getJobInfo( reg, ids, pars = FALSE, prefix.pars = FALSE, select, unit = "seconds" )
getJobInfo( reg, ids, pars = FALSE, prefix.pars = FALSE, select, unit = "seconds" )
reg |
[ |
ids |
[ |
pars |
[ |
prefix.pars |
[ |
select |
[ |
unit |
[ |
[data.frame
].
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Get the physical location of job files on the hard disk.
getJobLocation(reg, ids)
getJobLocation(reg, ids)
reg |
[ |
ids |
[ |
[character
] Vector of directories.
Get number of jobs in registry.
getJobNr(reg)
getJobNr(reg)
reg |
[ |
[integer(1)
].
Returns parameters for all jobs as the rows of a data.frame.
getJobParamDf(reg, ids)
getJobParamDf(reg, ids)
reg |
[ |
ids |
[ |
[data.frame
]. Rows are named with job ids.
# see batchExpandGrid
# see batchExpandGrid
Throws an error if call it for unsubmitted jobs.
getJobResources(reg, ids, as.list = TRUE)
getJobResources(reg, ids, as.list = TRUE)
reg |
[ |
ids |
[ |
as.list |
[ |
[list
| data.frame
]. List (or data.frame) of resource lists as passed to submitJobs
.
Get jobs from registry by id.
getJobs(reg, ids, check.ids = TRUE)
getJobs(reg, ids, check.ids = TRUE)
reg |
[ |
ids |
[ |
check.ids |
[ |
[list of Job
].
Get log file paths for jobs.
getLogFiles(reg, ids)
getLogFiles(reg, ids)
reg |
[ |
ids |
[ |
[character
]. Vector of file paths to log files.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Return the list passed to submitJobs
, e.g.
nodes, walltime, etc.
getResources()
getResources()
Can only be called in job function during job execution on slave.
[list
].
Workers are queried in parallel via callFunctionOnSSHWorkers
.
The function will display a warning if the first lib path on the worker
is not writable as this indicates potential problems in the configuration
and installPackagesOnSSHWorkers
will not work.
getSSHWorkersInfo(nodenames)
getSSHWorkersInfo(nodenames)
nodenames |
[ |
[list
]. Displayed information as a list named by nodenames.
Searches for occurence of pattern
in log files.
grepLogs( reg, ids, pattern = "warn", ignore.case = TRUE, verbose = FALSE, range = 2L )
grepLogs( reg, ids, pattern = "warn", ignore.case = TRUE, verbose = FALSE, range = 2L )
reg |
[ |
ids |
[ |
pattern |
[ |
ignore.case |
[ |
verbose |
[ |
range |
[ |
[integer
]. Ids of jobs where pattern was found in the log file.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Installation is done via callFunctionOnSSHWorkers
and install.packages
.
Note that as usual the function tries to install
the packages into the first path of .libPaths()
of each each worker.
installPackagesOnSSHWorkers( nodenames, pkgs, repos = getOption("repos"), consecutive = TRUE, show.output = consecutive, ... )
installPackagesOnSSHWorkers( nodenames, pkgs, repos = getOption("repos"), consecutive = TRUE, show.output = consecutive, ... )
nodenames |
[ |
pkgs |
[ |
repos |
[ |
consecutive |
[ |
show.output |
[ |
... |
[any] |
Nothing.
Kill jobs which have already been submitted to the batch system. If a job is killed its internal state is reset as if it had not been submitted at all.
The function informs if (a) the job you want to kill has not been submitted, (b) the job has already terminated, (c) for some reason no batch job id is available. In all 3 cases above, nothing is changed for the state of this job and no call to the internal kill cluster function is generated.
In case of an error when killing, the function tries - after a short sleep - to kill the remaining batch jobs again. If this fails again for some jobs, the function gives up. Only jobs that could be killed are reset in the DB.
killJobs(reg, ids, progressbar = TRUE)
killJobs(reg, ids, progressbar = TRUE)
reg |
[ |
ids |
[ |
progressbar |
[ |
[integer
]. Ids of killed jobs.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
resetJobs()
,
setJobFunction()
,
showLog()
,
testJob()
## Not run: reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) Sys.sleep(x) batchMap(reg, f, 1:10 + 5) submitJobs(reg) waitForJobs(reg) # kill all jobs currently _running_ killJobs(reg, findRunning(reg)) # kill all jobs queued or running killJobs(reg, findNotTerminated(reg)) ## End(Not run)
## Not run: reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) Sys.sleep(x) batchMap(reg, f, 1:10 + 5) submitJobs(reg) waitForJobs(reg) # kill all jobs currently _running_ killJobs(reg, findRunning(reg)) # kill all jobs queued or running killJobs(reg, findNotTerminated(reg)) ## End(Not run)
Load a specific configuration file.
loadConfig(conffile = ".BatchJobs.R")
loadConfig(conffile = ".BatchJobs.R")
conffile |
[ |
Invisibly returns a list of configuration settings.
Other conf:
configuration
,
getConfig()
,
setConfig()
Loads exported RData
object files in the “exports” subdirectory of your file.dir
and assigns the objects to the global environment.
loadExports(reg, what = NULL)
loadExports(reg, what = NULL)
reg |
[ |
what |
[ |
[character
]. Invisibly returns a character vector of loaded objects.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Loads a previously created registry from the file system.
The file.dir
is automatically updated upon load if adjust.paths
is set to
TRUE
, so be careful if you use the registry on multiple machines simultaneously,
e.g. via sshfs or a samba share.
There is a heuristic included which tries to detect if the location of the registry has changed and returns a read-only registry if necessary.
loadRegistry(file.dir, work.dir, adjust.paths = FALSE)
loadRegistry(file.dir, work.dir, adjust.paths = FALSE)
file.dir |
[ |
work.dir |
[ |
adjust.paths |
[ |
[Registry
].
Loads a specific result file.
loadResult( reg, id, part = NA_character_, missing.ok = FALSE, impute.val = NULL )
loadResult( reg, id, part = NA_character_, missing.ok = FALSE, impute.val = NULL )
reg |
[ |
id |
[ |
part |
[ |
missing.ok |
[ |
impute.val |
[any] |
[any]. Result of job.
Loads result files for id vector.
loadResults( reg, ids, part = NA_character_, simplify = FALSE, use.names = "ids", missing.ok = FALSE, impute.val = NULL )
loadResults( reg, ids, part = NA_character_, simplify = FALSE, use.names = "ids", missing.ok = FALSE, impute.val = NULL )
reg |
[ |
ids |
[ |
part |
[ |
simplify |
[ |
use.names |
[ |
missing.ok |
[ |
impute.val |
[any] |
[list
]. Results of jobs as list, possibly named by ids.
Use this funtion when you implement a backend for a batch system. You must define the functions specified in the arguments.
makeClusterFunctions( name, submitJob, killJob, listJobs, getArrayEnvirName, class = NULL, ... )
makeClusterFunctions( name, submitJob, killJob, listJobs, getArrayEnvirName, class = NULL, ... )
name |
[ |
submitJob |
[ |
killJob |
[ |
listJobs |
[ |
getArrayEnvirName |
[ |
class |
[ |
... |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
All jobs executed under these cluster functions are executed
sequentially, in the same interactive R process that you currently are.
That is, submitJob
does not return until the
job has finished. The main use of this ClusterFunctions
implementation is to test and debug programs on a local computer.
Listing jobs returns an empty vector (as no jobs can be running when you call this)
and killJob
returns at once (for the same reason).
makeClusterFunctionsInteractive(write.logs = TRUE)
makeClusterFunctionsInteractive(write.logs = TRUE)
write.logs |
[ |
Other clusterFunctions:
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
All jobs executed under these cluster functions are executed
sequentially, but in an independent, new R session.
That is, submitJob
does not return until the
job has finished. The main use of this ClusterFunctions
implementation is to test and debug programs on a local computer.
Listing jobs returns an empty vector (as no jobs can be running when you call this)
and killJob
returns at once (for the same reason).
makeClusterFunctionsLocal()
makeClusterFunctionsLocal()
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
Job files are created based on the brew template
template.file
. This file is processed with brew and then
submitted to the queue using the bsub
command. Jobs are
killed using the bkill
command and the list of running jobs
is retrieved using bjobs -u $USER -w
. The user must have the
appropriate privileges to submit, delete and list jobs on the
cluster (this is usually the case).
The template file can access all arguments passed to the
submitJob
function, see here ClusterFunctions
.
It is the template file's job to choose a queue for the job
and handle the desired resource allocations.
Examples can be found on
https://github.com/tudo-r/BatchJobs/tree/master/examples/cfLSF.
makeClusterFunctionsLSF( template.file, list.jobs.cmd = c("bjobs", "-u $USER", "-w") )
makeClusterFunctionsLSF( template.file, list.jobs.cmd = c("bjobs", "-u $USER", "-w") )
template.file |
[ |
list.jobs.cmd |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
Jobs are spawned by starting multiple R sessions on the commandline
(similar like on true batch systems).
Packages parallel
or multicore
are not used in any way.
makeClusterFunctionsMulticore( ncpus = max(getOption("mc.cores", parallel::detectCores()) - 1L, 1L), max.jobs, max.load, nice, r.options = c("--no-save", "--no-restore", "--no-init-file", "--no-site-file"), script )
makeClusterFunctionsMulticore( ncpus = max(getOption("mc.cores", parallel::detectCores()) - 1L, 1L), max.jobs, max.load, nice, r.options = c("--no-save", "--no-restore", "--no-init-file", "--no-site-file"), script )
ncpus |
[ |
max.jobs |
[ |
max.load |
[ |
nice |
[ |
r.options |
[ |
script |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
Job files are created based on the brew template
template.file
. This file is processed with brew and then
submitted to the queue using the bsub
command. Jobs are
killed using the bkill
command and the list of running jobs
is retrieved using bjobs -u $USER -w
. The user must have the
appropriate privileges to submit, delete and list jobs on the
cluster (this is usually the case).
The template file can access all arguments passed to the
submitJob
function, see here ClusterFunctions
.
It is the template file's job to choose a queue for the job
and handle the desired resource allocations.
Examples can be found on
https://github.com/tudo-r/BatchJobs/tree/master/examples/cfOpenLava.
makeClusterFunctionsOpenLava( template.file, list.jobs.cmd = c("bjobs", "-u $USER", "-w") )
makeClusterFunctionsOpenLava( template.file, list.jobs.cmd = c("bjobs", "-u $USER", "-w") )
template.file |
[ |
list.jobs.cmd |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
Job files are created based on the brew template
template.file
. This file is processed with brew and then
submitted to the queue using the qsub
command. Jobs are
killed using the qdel
command and the list of running jobs
is retrieved using qselect
. The user must have the
appropriate privileges to submit, delete and list jobs on the
cluster (this is usually the case).
The template file can access all arguments passed to the
submitJob
function, see here ClusterFunctions
.
It is the template file's job to choose a queue for the job
and handle the desired resource allocations.
Examples can be found on
https://github.com/tudo-r/BatchJobs/tree/master/examples/cfSGE.
makeClusterFunctionsSGE(template.file, list.jobs.cmd = c("qstat", "-u $USER"))
makeClusterFunctionsSGE(template.file, list.jobs.cmd = c("qstat", "-u $USER"))
template.file |
[ |
list.jobs.cmd |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
Job files are created based on the brew template
template.file
. This file is processed with brew and then
submitted to the queue using the sbatch
command. Jobs are
killed using the scancel
command and the list of running jobs
is retrieved using squeue
. The user must have the
appropriate privileges to submit, delete and list jobs on the
cluster (this is usually the case).
The template file can access all arguments passed to the
submitJob
function, see here ClusterFunctions
.
It is the template file's job to choose a queue for the job
and handle the desired resource allocations.
Examples can be found on
https://github.com/tudo-r/BatchJobs/tree/master/examples/cfSLURM.
makeClusterFunctionsSLURM( template.file, list.jobs.cmd = c("squeue", "-h", "-o %i", "-u $USER") )
makeClusterFunctionsSLURM( template.file, list.jobs.cmd = c("squeue", "-h", "-o %i", "-u $USER") )
template.file |
[ |
list.jobs.cmd |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSSH()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
Worker nodes must share the same file system and be accessible by ssh without manually entering passwords (e.g. by ssh-agent or passwordless pubkey). Note that you can also use this function to parallelize on multiple cores on your local machine. But you still have to run an ssh server and provide passwordless access to localhost.
makeClusterFunctionsSSH(..., workers)
makeClusterFunctionsSSH(..., workers)
... |
[ |
workers |
[list of |
[ClusterFunctions
].
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsTorque()
,
makeClusterFunctions()
## Not run: # Assume you have three nodes larry, curley and moe. All have 6 # cpu cores. On curley and moe R is installed under # "/opt/R/R-current" and on larry R is installed under # "/usr/local/R/". larry should not be used extensively because # somebody else wants to compute there as well. # Then a call to 'makeClusterFunctionsSSH' # might look like this: cluster.functions = makeClusterFunctionsSSH( makeSSHWorker(nodename = "larry", rhome = "/usr/local/R", max.jobs = 2), makeSSHWorker(nodename = "curley", rhome = "/opt/R/R-current"), makeSSHWorker(nodename = "moe", rhome = "/opt/R/R-current")) ## End(Not run)
## Not run: # Assume you have three nodes larry, curley and moe. All have 6 # cpu cores. On curley and moe R is installed under # "/opt/R/R-current" and on larry R is installed under # "/usr/local/R/". larry should not be used extensively because # somebody else wants to compute there as well. # Then a call to 'makeClusterFunctionsSSH' # might look like this: cluster.functions = makeClusterFunctionsSSH( makeSSHWorker(nodename = "larry", rhome = "/usr/local/R", max.jobs = 2), makeSSHWorker(nodename = "curley", rhome = "/opt/R/R-current"), makeSSHWorker(nodename = "moe", rhome = "/opt/R/R-current")) ## End(Not run)
Job files are created based on the brew template
template.file
. This file is processed with brew and then
submitted to the queue using the qsub
command. Jobs are
killed using the qdel
command and the list of running jobs
is retrieved using qselect
. The user must have the
appropriate privileges to submit, delete and list jobs on the
cluster (this is usually the case).
The template file can access all arguments passed to the
submitJob
function, see here ClusterFunctions
.
It is the template file's job to choose a queue for the job
and handle the desired resource allocations.
Examples can be found on
https://github.com/tudo-r/BatchJobs/tree/master/examples/cfTorque.
makeClusterFunctionsTorque( template.file, list.jobs.cmd = c("qselect", "-u $USER", "-s EHQRTW") )
makeClusterFunctionsTorque( template.file, list.jobs.cmd = c("qselect", "-u $USER", "-s EHQRTW") )
template.file |
[ |
list.jobs.cmd |
[ |
Other clusterFunctions:
makeClusterFunctionsInteractive()
,
makeClusterFunctionsLSF()
,
makeClusterFunctionsLocal()
,
makeClusterFunctionsMulticore()
,
makeClusterFunctionsOpenLava()
,
makeClusterFunctionsSGE()
,
makeClusterFunctionsSLURM()
,
makeClusterFunctionsSSH()
,
makeClusterFunctions()
Usually you will not do this manually. Every object is a list that contains the passed arguments of the constructor.
makeJob(id = NA_integer_, fun, fun.id = digest(fun), pars, name, seed)
makeJob(id = NA_integer_, fun, fun.id = digest(fun), pars, name, seed)
id |
[ |
fun |
[ |
fun.id |
[ |
pars |
[ |
name |
[ |
seed |
[ |
Note that if you don't want links in your paths (file.dir
, work.dir
) to get resolved and have
complete control over the way the path is used internally, pass an absolute path which begins with “/”.
makeRegistry( id, file.dir, sharding = TRUE, work.dir, multiple.result.files = FALSE, seed, packages = character(0L), src.dirs = character(0L), src.files = character(0L), skip = TRUE )
makeRegistry( id, file.dir, sharding = TRUE, work.dir, multiple.result.files = FALSE, seed, packages = character(0L), src.dirs = character(0L), src.files = character(0L), skip = TRUE )
id |
[ |
file.dir |
[ |
sharding |
[ |
work.dir |
[ |
multiple.result.files |
[ |
seed |
[ |
packages |
[ |
src.dirs |
[ |
src.files |
[ |
skip |
[ |
Every object is a list that contains the passed arguments of the constructor.
[Registry
]
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) print(reg)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) print(reg)
Create SSH worker for SSH cluster functions.
makeSSHWorker( nodename, ssh.cmd = "ssh", ssh.args = character(0L), rhome = "", ncpus, max.jobs, max.load, nice, r.options = c("--no-save", "--no-restore", "--no-init-file", "--no-site-file"), script )
makeSSHWorker( nodename, ssh.cmd = "ssh", ssh.args = character(0L), rhome = "", ncpus, max.jobs, max.load, nice, r.options = c("--no-save", "--no-restore", "--no-init-file", "--no-site-file"), script )
nodename |
[ |
ssh.cmd |
[ |
ssh.args |
[ |
rhome |
[ |
ncpus |
[ |
max.jobs |
[ |
max.load |
[ |
nice |
[ |
r.options |
[ |
script |
[ |
[SSHWorker
].
Use this function in your implementation of makeClusterFunctions
to create a return value for the submitJob
function.
makeSubmitJobResult(status, batch.job.id, msg, ...)
makeSubmitJobResult(status, batch.job.id, msg, ...)
status |
[ |
batch.job.id |
[ |
msg |
[ |
... |
[ |
[SubmitJobResult
]. A list, containing
status
, batch.job.id
and msg
.
The following functions provide ways to reduce result files into either specific R objects (like vectors, lists, matrices or data.frames) or to arbitrarily aggregate them, which is a more general operation.
reduceResults( reg, ids, part = NA_character_, fun, init, impute.val, progressbar = TRUE, ... ) reduceResultsList( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val, progressbar = TRUE ) reduceResultsVector( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val ) reduceResultsMatrix( reg, ids, part = NA_character_, fun, ..., rows = TRUE, use.names = "ids", impute.val ) reduceResultsDataFrame( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val, strings.as.factors = FALSE ) reduceResultsDataTable( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val )
reduceResults( reg, ids, part = NA_character_, fun, init, impute.val, progressbar = TRUE, ... ) reduceResultsList( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val, progressbar = TRUE ) reduceResultsVector( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val ) reduceResultsMatrix( reg, ids, part = NA_character_, fun, ..., rows = TRUE, use.names = "ids", impute.val ) reduceResultsDataFrame( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val, strings.as.factors = FALSE ) reduceResultsDataTable( reg, ids, part = NA_character_, fun, ..., use.names = "ids", impute.val )
reg |
[ |
ids |
[ |
part |
[ |
fun |
[ |
init |
[ |
impute.val |
[any] |
progressbar |
[ |
... |
[any] |
use.names |
[ |
rows |
[ |
strings.as.factors |
[ |
Aggregated results, return type depends on function. If ids
is empty: reduceResults
returns init
(if available) or NULL
, reduceResultsVector
returns c()
,
reduceResultsList
returns list()
, reduceResultsMatrix
returns matrix(0,0,0)
,
reduceResultsDataFrame
returns data.frame()
.
# generate results: reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:5) submitJobs(reg) waitForJobs(reg) # reduce results to a vector reduceResultsVector(reg) # reduce results to sum reduceResults(reg, fun = function(aggr, job, res) aggr+res) reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) list(a = x, b = as.character(2*x), c = x^2) batchMap(reg, f, 1:5) submitJobs(reg) waitForJobs(reg) # reduce results to a vector reduceResultsVector(reg, fun = function(job, res) res$a) reduceResultsVector(reg, fun = function(job, res) res$b) # reduce results to a list reduceResultsList(reg) # reduce results to a matrix reduceResultsMatrix(reg, fun = function(job, res) res[c(1,3)]) reduceResultsMatrix(reg, fun = function(job, res) c(foo = res$a, bar = res$c), rows = TRUE) reduceResultsMatrix(reg, fun = function(job, res) c(foo = res$a, bar = res$c), rows = FALSE) # reduce results to a data.frame print(str(reduceResultsDataFrame(reg))) # reduce results to a sum reduceResults(reg, fun = function(aggr, job, res) aggr+res$a, init = 0)
# generate results: reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:5) submitJobs(reg) waitForJobs(reg) # reduce results to a vector reduceResultsVector(reg) # reduce results to sum reduceResults(reg, fun = function(aggr, job, res) aggr+res) reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) list(a = x, b = as.character(2*x), c = x^2) batchMap(reg, f, 1:5) submitJobs(reg) waitForJobs(reg) # reduce results to a vector reduceResultsVector(reg, fun = function(job, res) res$a) reduceResultsVector(reg, fun = function(job, res) res$b) # reduce results to a list reduceResultsList(reg) # reduce results to a matrix reduceResultsMatrix(reg, fun = function(job, res) res[c(1,3)]) reduceResultsMatrix(reg, fun = function(job, res) c(foo = res$a, bar = res$c), rows = TRUE) reduceResultsMatrix(reg, fun = function(job, res) c(foo = res$a, bar = res$c), rows = FALSE) # reduce results to a data.frame print(str(reduceResultsDataFrame(reg))) # reduce results to a sum reduceResults(reg, fun = function(aggr, job, res) aggr+res$a, init = 0)
If there are no live/running jobs, the registry will be closed and all of its files will be removed from the file system. If there are live/running jobs, an informative error is generated. The default is to prompt the user for confirmation.
removeRegistry(reg, ask = c("yes", "no"))
removeRegistry(reg, ask = c("yes", "no"))
reg |
[ |
ask |
[ |
[logical[1]
]
Mutator function for packages
in makeRegistry
.
removeRegistryPackages(reg, packages)
removeRegistryPackages(reg, packages)
reg |
[ |
packages |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Mutator function for src.dirs
in makeRegistry
.
removeRegistrySourceDirs(reg, src.dirs)
removeRegistrySourceDirs(reg, src.dirs)
reg |
[ |
src.dirs |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceFiles()
,
setRegistryPackages()
Mutator function for src.files
in makeRegistry
.
removeRegistrySourceFiles(reg, src.files)
removeRegistrySourceFiles(reg, src.files)
reg |
[ |
src.files |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
setRegistryPackages()
Reset state of jobs in the database. Useful under two circumstances: Either to re-submit them because of changes in e.g. external data or to resolve rare issues when jobs are killed in an unfortunate state and therefore blocking your registry.
The function internally lists all jobs on the batch system and
if those include some of the jobs you want to reset, it informs you to kill them first by raising
an exception.
If you really know what you are doing, you may set force
to TRUE
to omit this sanity check.
Note that this is a dangerous operation to perform which may harm
the database integrity. In this case you HAVE to make externally sure that none of the jobs
you want to reset are still running.
resetJobs(reg, ids, force = FALSE)
resetJobs(reg, ids, force = FALSE)
reg |
[ |
ids |
[ |
force |
[ |
Vector of reseted job ids.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
setJobFunction()
,
showLog()
,
testJob()
Replaces backward slashes with forward slashes and optionally normalizes the path.
sanitizePath(path, make.absolute = TRUE, normalize.absolute = FALSE)
sanitizePath(path, make.absolute = TRUE, normalize.absolute = FALSE)
path |
[ |
make.absolute |
[ |
normalize.absolute |
[ |
character
with sanitized paths.
Set and overwrite configuration settings
setConfig(conf = list(), ...)
setConfig(conf = list(), ...)
conf |
[ |
... |
[ |
Invisibly returns a list of configuration settings.
Other conf:
configuration
,
getConfig()
,
loadConfig()
Use this function only as last measure when there is a bug in a part of your job function and you have already computed a large number of (unaffected) results. This function allows you to fix the error and to associate the jobs with the corrected function.
Note that by default the computational state of the affected jobs is also reset.
setJobFunction(reg, ids, fun, more.args = list(), reset = TRUE, force = FALSE)
setJobFunction(reg, ids, fun, more.args = list(), reset = TRUE, force = FALSE)
reg |
[ |
ids |
[ |
fun |
[ |
more.args |
[ |
reset |
[ |
force |
[ |
Nothing.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
showLog()
,
testJob()
Set job names.
setJobNames(reg, ids, jobnames)
setJobNames(reg, ids, jobnames)
reg |
[ |
ids |
[ |
jobnames |
[ |
Named vector of job ids.
Mutator function for packages
in makeRegistry
.
setRegistryPackages(reg, packages)
setRegistryPackages(reg, packages)
reg |
[ |
packages |
[ |
[Registry
]. Changed registry.
Other exports:
addRegistryPackages()
,
addRegistrySourceDirs()
,
addRegistrySourceFiles()
,
batchExport()
,
batchUnexport()
,
loadExports()
,
removeRegistryPackages()
,
removeRegistrySourceDirs()
,
removeRegistrySourceFiles()
Currently only supported for multicore and SSH mode.
Displays: Name of node, current load, number of running R processes, number of R processes
with more than 50
The latter counts either jobs belonging to reg
or all BatchJobs jobs if reg was not passed.
showClusterStatus(reg)
showClusterStatus(reg)
reg |
[ |
[data.frame
].
Display the contents of a log file, useful in case of errors.
Note this rare special case: When you use chunking, submit some jobs, some jobs fail,
then you resubmit these jobs again in different chunks, the log files will contain the log
of the old, failed job as well. showLog
tries to jump to the correct part
of the new log file with a supported pager.
showLog(reg, id, pager = getOption("pager"))
showLog(reg, id, pager = getOption("pager"))
reg |
[ |
id |
[ |
pager |
[ |
[character(1)
]. Invisibly returns path to log file.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
testJob()
E.g.: How many there are, how many are done, any errors, etc.
showStatus
displays on the console, getStatus
returns an informative result
without console output.
showStatus(reg, ids, run.and.exp = TRUE, errors = 10L) getStatus(reg, ids, run.and.exp = TRUE)
showStatus(reg, ids, run.and.exp = TRUE, errors = 10L) getStatus(reg, ids, run.and.exp = TRUE)
reg |
[ |
ids |
[ |
run.and.exp |
[ |
errors |
[ |
[list
]. List of absolute job numbers. showStatus
returns them
invisibly.
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) submitJobs(reg) waitForJobs(reg) # should show 10 submitted jobs, which are all done. showStatus(reg)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) submitJobs(reg) waitForJobs(reg) # should show 10 submitted jobs, which are all done. showStatus(reg)
Sources all files found in src.dirs
and specified via src.files
.
sourceRegistryFiles(reg, envir = .GlobalEnv)
sourceRegistryFiles(reg, envir = .GlobalEnv)
reg |
[ |
envir |
[ |
Nothing.
If the internal submit cluster function completes successfully, the retries
counter is set back to 0 and the next job or chunk is submitted.
If the internal submit cluster function returns a fatal error, the submit process
is completely stopped and an exception is thrown.
If the internal submit cluster function returns a temporary error, the submit process
waits for a certain time, which is determined by calling the user-defined
wait
-function with the current retries
counter, the counter is
increased by 1 and the same job is submitted again. If max.retries
is
reached the function simply terminates.
Potential temporary submit warnings and errors are logged inside your file
directory in the file “submit.log”.
To keep track you can use tail -f [file.dir]/submit.log
in another
terminal.
submitJobs( reg, ids, resources = list(), wait, max.retries = 10L, chunks.as.arrayjobs = FALSE, job.delay = FALSE, progressbar = TRUE )
submitJobs( reg, ids, resources = list(), wait, max.retries = 10L, chunks.as.arrayjobs = FALSE, job.delay = FALSE, progressbar = TRUE )
reg |
[ |
ids |
[ |
resources |
[ |
wait |
[ |
max.retries |
[ |
chunks.as.arrayjobs |
[ |
job.delay |
[ |
progressbar |
[ |
[integer
]. Vector of submitted job ids.
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) submitJobs(reg) waitForJobs(reg) # Submit the 10 jobs again, now randomized into 2 chunks: chunked = chunk(getJobIds(reg), n.chunks = 2, shuffle = TRUE) submitJobs(reg, chunked)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) x^2 batchMap(reg, f, 1:10) submitJobs(reg) waitForJobs(reg) # Submit the 10 jobs again, now randomized into 2 chunks: chunked = chunk(getJobIds(reg), n.chunks = 2, shuffle = TRUE) submitJobs(reg, chunked)
Removes R scripts, log files, resource informations and temporarily stored configuration files from the registry's file directory. Assuming all your jobs completed successfully, none of these are needed for further work. This operation potentially releases quite a lot of disk space, depending on the number of your jobs. BUT A HUGE WORD OF WARNING: IF you later notice something strange and need to determine the reason for it, you are at a huge disadvantage. Only do this at your own risk and when you are sure that you have successfully completed a project and only want to archive your produced experiments and results.
sweepRegistry(reg, sweep = c("scripts", "conf"))
sweepRegistry(reg, sweep = c("scripts", "conf"))
reg |
[ |
sweep |
[ |
[logical
]. Invisibly returns TRUE
on success and FALSE
if some files could not be removed.
If the option “staged.queries” is enabled, all communication from the nodes
to the master is done via files in the subdirectory “pending” of the file.dir
.
This function checks for such files and merges the information into the database.
Usually you do not have to call this function yourself.
syncRegistry(reg)
syncRegistry(reg)
reg |
[ |
Invisibly returns TRUE
on success.
Useful for debugging. Note that neither the registry, database or file directory are changed.
testJob(reg, id, resources = list(), external = TRUE)
testJob(reg, id, resources = list(), external = TRUE)
reg |
[ |
id |
[ |
resources |
[ |
external |
[ |
[any]. Result of job. If the job did not complete because of an error, NULL is returned.
Other debug:
debugMulticore()
,
debugSSH()
,
getErrorMessages()
,
getJobInfo()
,
getLogFiles()
,
grepLogs()
,
killJobs()
,
resetJobs()
,
setJobFunction()
,
showLog()
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) if (x==1) stop("oops") else x batchMap(reg, f, 1:2) testJob(reg, 2)
reg = makeRegistry(id = "BatchJobsExample", file.dir = tempfile(), seed = 123) f = function(x) if (x==1) stop("oops") else x batchMap(reg, f, 1:2) testJob(reg, 2)
Waits for termination of jobs while displaying a progress bar containing summarizing informations of the jobs. The following abbreviations are used in the progress bar: “S” for number of jobs on system, “D” for number of jobs successfully terminated, “E” for number ofjobs terminated with an R exception and “R” for number of jobs currently running on the system.
waitForJobs( reg, ids, sleep = 10, timeout = 604800, stop.on.error = FALSE, progressbar = TRUE )
waitForJobs( reg, ids, sleep = 10, timeout = 604800, stop.on.error = FALSE, progressbar = TRUE )
reg |
[ |
ids |
[ |
sleep |
[ |
timeout |
[ |
stop.on.error |
[ |
progressbar |
[ |
[logical(1)
]. Returns TRUE
if all jobs terminated successfully
and FALSE
if either an error occurred or the timeout is reached.