Fermi CC IN2P3 resources management policy
The resources allocated by CC IN2P3 to the Fermi LAT collaboration are managed by a privileged user, who is the communication interface between technical teams of CC and the Fermi users. This user is called a "Czar" and his/her role is precisely defined by CC, as described here.
The mailing list GLASTCCLYON-L@in2p3.fr located at CC has been created to make easier the communication between the different users of the glast unix group, and between the czar and the group. This list is also managed by the czar.
The way the Fermi CC resources are organized and managed has been decided by the group, as described here **broken link, to be fixed**.
The CC IN2P3 provides different types of AFS spaces. Each user has a $HOME_DIR. In addition, experiments are granted with a $THRONG_DIR and a $GROUP_DIR. Only the $THRONG_DIR and the $HOME_DIR are backuped by the CC teams.
A general (in french and english) documentation given by CC about AFS $GROUP_DIR management can be found here.
The most important AFS space in the case of Fermi collaboration is the $GROUP_DIR, with a size of 170 GB and which is divided in two identical partitions.
The organization of this space has been decided by the group as following :
- Each user has a private space of 200 MB. In some particular cases, a larger space has been allocated.
- Larger spaces have been reserved for the so-called activities : Pipeline and Catalog (20000 MB each).
- The "ground" directory (40000 MB) reproduces the structure of "ground" directory at SLAC and ACL permissions have been applied for every member of the group to be allowed to write in some parts of it (releases subdirectory for example).
In order to avoid I/O saturation of AFS servers, the files needed by Pipeline jobs have been spread over the two partitions. Then the scripts needed for the configuration of the jobs and some ancillary data are located in partition 2 (/afs/in2p3.fr/group/glast/Pipeline/PipelineConfig), whereas external libraries and releases are in partition 1 ($GROUP_DIR/ground).
The SPS storage space is managed by a GPFS system. The allocation given to Fermi collaboration is 4 Terabytes.
In order to correctly manage this space, it has been decided to separate the space used by individual users from the one needed for global activities, then,
- A directory /sps/glast/users has been created, where each user has a 60 GB space.
- 250 GB have been allocated to "data" and "catalog" each, and 500 GB to "Pipeline2".
Status of AFS and GPFS resources
The details of the status of the various resources available for the GLAST Project group at IN2P3 can be found here
Examples of AFS and GPFS commands
HPSS tape storage
No limit. The only restriction for the use of this storage system is not to store too small files, as data are retrieved by a mechanical automate device and too many files can diminish its performances. The definition of what is too small evolves with time, so the best is to look for it in the documentation given by CC, here.
Since 2007, the Fermi LAT collaboration has been using the Anastasie batch farm, which consists in mutualized workers, shared among different experiments (see details here). Two management systems coexist for the moment, GridEngine/SGE and BQS, but BQS will be abandoned at the end of 2011.
4 million HS06 have been allocated to Fermi for 2011 (1 HS06 = 250 SI2). This amount is mainly dedicated to the Pipeline activity, for which an MoU between SLAC and CC guaranties a 600 cores long term availability (1200 since Jan 2011).
The CPU request for 2012 can be found here.
CC teams use "unitary resources" to anticipate production needs, and to avoid saturation of some systems.
For the case of BQS batch system and for glast production, two unitary resources were created to avoid SPS system saturation : u_sps_glast and u_sps_glast_prod. They were needed to separate users production from Pipeline one. They define the maximum number of jobs that can be accepted in execution simultaneously and the number of jobs that can be queued at each clock step. The way to know what are the current parameters of a unitary resource is described here.
**to be updated — SGE and not BQS anymore**
Statistics of the different systems
- (BQS, obsolete) Batch system statistics page
- (BQS, obsolete) BQS monitoring for u_sps_glast resource
- (BQS, obsolete) BQS monitoring for u_sps_glast_prod resource
- (BQS, obsolete) BQS monitoring for glastpro
The Fermi Pipeline at CC IN2P3
A global scheme of the architecture of the Fermi Pipeline system at CC-IN2P3 and a "confluence" documentation can be found here :
A generic login (glastpro) has been created for Pipeline production purposes. Some privileged glast users are allowed to use it. Every production job for Pipeline is launched under this login. Instructions to launch Fermi jobs at CC IN2P3 through Pipeline are posted here.
As shown in the functional scheme, a service machine (aliased to ccglast.in2p3.fr) is provided by the CC for Fermi Pipeline production. Its main role is to host a daemon aimed to receive requests for job submission on the Anastasie farm from the SLAC client machines. This is actually the server side of a client/server Java RMI software, which is responsible of the construction of both the SPS filesystem linked to each simulation stream, and the production scripts.
Several tools have been built to monitor the Pipeline system. Cron jobs are launched on the service machine to run these tools periodically and to clean the related log files. A detailed "confluence" documentation of the control scripts, cron jobs and alarms can be found here.