Using SGI Comprehensive System Accounting facility (CSA)

The SGI Comprehensive System Accounting facility (CSA) provides data for collecting per-process resource usage, monitoring disk usage, and chargeback to specific login accounts. If is enabled on your system, LSF writes records for LSF jobs to CSA. SGI CSA writes an accounting record for each process in the pacct file, which is usually located in the /var/adm/acct/day directory. SGI system administrators then use the csabuild command to organize and present the records on a job by job basis. For each job running on the SGI system, LSF writes an accounting record to CSA when the job starts and when the job finishes. LSF daemon accounting in CSA starts and stops with the LSF daemon.

Setting up SGI CSA

To specify cpuset properties for LSF jobs, use:

  1. Enable the following parameters in /etc/csa.conf:

    • CSA_STA

    • WKMG_START

  2. Run the csaswitch command to turn on the configuration changes in /etc/csa.conf.

Information written to the pacct file

LSF writes the following records to the pacct file when a job starts and when it exits:

  • Job record type (job start or job exit)

  • Current system clock time

  • Service provider (LSF)

  • Submission time of the job (at job start only)

  • User ID of the job owner

  • LSF job name if it exists

  • Submission host name

  • LSF queue name

  • LSF external job ID

  • LSF job array index

  • LSF job exit code (at job exit only)

  • NCPUS: The number of CPUs the LSF job has been using

Viewing LSF job information recorded in CSA

Use the SGI csaedit command to see the ASCII content of the pacct file. For example:

# csaedit -P /var/csa/day/pacct -A

For each LSF job, you should see two lines similar to the following:

37   Raw-Workld-Mgmt  user1    0x19ac91ee000064f2 0x0000000000000000   0  
REQID=1771  ARRAYID=0  PROV=LSF  START=Jun  4 15:52:01  ENTER=Jun  4 15:51:49  
TYPE=INIT  SUBTYPE=START  MACH=hostA  REQ=myjob  QUE=normal
...
39   Raw-Workld-Mgmt  user1    0x19ac91ee000064f2 0x0000000000000000 	  0  
REQID=1771  ARRAYID=0  PROV=LSF  START=Jun  4 16:09:14  TYPE=TERM  SUBTYPE=EXIT  
MACH=hostA  REQ=myjob  QUE=normal--

The REQID is the LSF job ID (1771).