[[ucc_batch]]
Batch processing
----------------

The 'batch' command allows you to run many jobs without having to start 
UCC each time. You can control how many jobs should go to which site. 
This allows efficient job processing, while putting some load on the 
client machine. If you need to take the client offline, you should consider 
using the workflow system instead, which also allows efficient high-throughput 
processing.

Assume you have a bunch of jobs in UCC's job description format 
(xref:ucc_jobdescription[]) stored in a directory 'jobs'. 
The output should go to a directory 'out'. You can run them all through 
UCC using a single invocation as follows:

----------------
ucc batch -i jobs -o out
----------------

As job files, UCC will accept files ending in ".u", ".jsdl" or ".xml".

=== Options

You can run in "follow" mode, where UCC will watch the input directory, and will
process new files as they arrive.

----------------
ucc batch -f -i jobs -o out
----------------

UCC can also process JSDL files, to batch-process these, use the "-j" option:

----------------
ucc batch -j -i jobs -o out
----------------


=== Performance tuning options

Getting the most performance out of UCC and your Grid installation can be a challenging task. 
Sending too many jobs to a site might decrease throughput, sometimes the client machine can
be the limiting factor, etc.

You should experiment a bit to get the best performance for your specific setup. UCC has many
options available for tuning. Here is an overview.


.Tuning options for the UCC batch mode
[options="header"]
|==============================
|option (short and long form)| description
| -K,--keep | Do not delete finished jobs on the server. By default, finished jobs are destroyed.
| -m,--max <MaxRunningJobs>   | Limit on jobs submitted by UCC at one time (default: 100)
| -t,--threads <NumThreads>   |  Number of threads to be used for processing (default: 4) 
| -u,--update <UpdateInterval>   | Minimum time in milliseconds between status requests on a single job (Default: 1000) 
| -R,--noResourceCheck |Do not check if the necessary application is available on the target system (will increase performance a bit) 
| -X,--noFetchOutcome |Do not fetch standard output and error 
| -S,--submitOnly | Only submit the jobs, do not wait for them to finish
| -M,--maxNewJobs | Limit the number of job submissions (default: 100) 
| -s,--sitename | Specify which site to use 
| -W,--siteWeights | Specify a file containing site weights  
| -j,--jsdl | Assume jobs are in JSDL format instead of the default JSON .u files 
|======================================

=== Resource selection in batch mode

By default, the UCC batch mode will select a random site for running a job. You can modify
the selection in different ways. 

  - using the "-s" option or a "Site: <sitename>," entry in the job file, you can specify the site directly
  
  - use the "-W" option to specify a file containing site weights.
 
Say you have two sites where one site is a big cluster and the other a small cluster. To send more
jobs to the big cluster, you can use the site weights file,

-------------
#example site weights file for use with "ucc batch -W ..."

BIG-CLUSTER = 100
SMALL-CLUSTER = 10

#send no jobs to this site
BAD-CLUSTER = 0

# set default weight (for any sites not specified here)
UCC_DEFAULT_SITE_WEIGHT = 10
-------------

This would tell UCC to send 10 times more jobs to the "BIG-CLUSTER" site, 
and send no jobs´to the "BAD-CLUSTER". All other sites would get weight "10", 
i.e. the same as "SMALL-CLUSTER".  

  
