1.
|
|
2.
|
|
3.
|
Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits
/ Balcas, J (Caltech) ; Bockelman, B (Nebraska U.) ; Hufnagel, D (Fermilab) ; Hurtado Anampa, K (Notre Dame U.) ; Aftab Khan, F (NCP, Islamabad) ; Larson, K (Fermilab) ; Letts, J (UC, San Diego) ; Marra da Silva, J (Sao Paulo, IFT) ; Mascheroni, M (Fermilab) ; Mason, D (Fermilab) et al.
The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. [...]
2017 - 7 p.
- Published in : J. Phys.: Conf. Ser. 898 (2017) 052031
Fulltext: PDF;
In : 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, San Francisco, Usa, 10 - 14 Oct 2016, pp.052031
|
|
4.
|
Exploiting CRIC to streamline the configuration management of GlideinWMS factories for CMS support
/ Dost, Jeffrey (UC, San Diego) ; Mascheroni, Marco (UC, San Diego) ; Andreeva, Julia (CERN) ; Anisenkov, Alexey (Novosibirsk State U. ; Novosibirsk, IYF) ; Box, Dennis (Fermilab) ; Di Girolamo, Alessandro (CERN) ; Haleem, Saqib (NCP, Islamabad) ; Kizinevič, Edita (Vilnius U.) ; Majewski, Krista (Fermilab) ; Letts, James (UC, San Diego) et al.
GlideinWMS is a workload management and provisioning system that allows sharing computing resources distributed over independent sites. Based on the requests made by GlideinWMS frontends, a dynamically sized pool of resources is created by GlideinWMS pilot factories via pilot job submission to resource sites’ CEs. [...]
2020 - 8 p.
- Published in : EPJ Web Conf. 245 (2020) 03023
Fulltext from publisher: PDF;
In : 24th International Conference on Computing in High Energy and Nuclear Physics, Adelaide, Australia, 4 - 8 Nov 2019, pp.03023
|
|
5.
|
Reaching new peaks for the future of the CMS HTCondor Global Pool
/ Perez-Calero Yzquierdo, Antonio Maria (Madrid, CIEMAT) ; Mascheroni, Marco (UC, San Diego) ; Acosta Flechas, Maria (Fermilab) ; Dost, Jeffrey Michael (UC, San Diego) ; Haleem, Saqib (Quaid-i-Azam U.) ; Hurtado Anampa, Kenyi Paolo (Notre Dame U.) ; Khan, Farrukh Aftab (Fermilab) ; Kizinevic, Edita (CERN) ; Peregonow, Nicholas (Fermilab)
/CMS Collaboration
The CMS experiment at CERN employs a distributed computing infrastructure to satisfy its data processing and simulation needs. The CMS Submission Infrastructure team manages a dynamic HTCondor pool, aggregating mainly Grid clusters worldwide, but also HPC, Cloud and opportunistic resources. [...]
CMS-CR-2021-023.-
Geneva : CERN, 2021 - 9 p.
- Published in : EPJ Web Conf. 251 (2021) 02055
Fulltext: PDF;
In : 25th International Conference on Computing in High-Energy and Nuclear Physics (CHEP), Online, Online, 17 - 21 May 2021, pp.02055
|
|
6.
|
Effective HTCondor-based monitoring system for CMS
/ Balcas, J (Caltech) ; Bockelman, B P (Nebraska U.) ; Da Silva, J M (Rio Claro State U.) ; Hernandez, J (Madrid, CIEMAT) ; Khan, F A (NCP, Islamabad) ; Letts, J (UC, San Diego) ; Mascheroni, M (Fermilab) ; Mason, D A (Fermilab) ; Perez-Calero Yzquierdo, A (Madrid, CIEMAT ; PIC, Bellaterra) ; Vlimant, J R (Caltech)
/CMS
The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning systems, respectively. Given the scale of the global queue in CMS, the operators found it increasingly difficult to monitor the pool to find problems and fix them. [...]
2017 - 8 p.
- Published in : J. Phys.: Conf. Ser. 898 (2017) 092039
Fulltext: PDF;
In : 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, San Francisco, Usa, 10 - 14 Oct 2016, pp.092039
|
|
7.
|
Connecting restricted, high-availability, or low-latency resources to a seamless Global Pool for CMS
/ Balcas, J (Caltech) ; Bockelman, B (Nebraska U.) ; Hufnagel, D (Fermilab) ; Hurtado Anampa, K (Notre Dame U.) ; Jayatilaka, B (Fermilab) ; Khan, F (NCP, Islamabad) ; Larson, K (Fermilab) ; Letts, J (UC, San Diego) ; Mascheroni, M (Fermilab) ; Mohapatra, A (Wisconsin U., Madison) et al.
/CMS
The connection of diverse and sometimes non-Grid enabled resource types to the CMS Global Pool, which is based on HTCondor and glideinWMS, has been a major goal of CMS. These resources range in type from a high-availability, low latency facility at CERN for urgent calibration studies, called the CAF, to a local user facility at the Fermilab LPC, allocation-based computing resources at NERSC and SDSC, opportunistic resources provided through the Open Science Grid, commercial clouds, and others, as well as access to opportunistic cycles on the CMS High Level Trigger farm. [...]
2017 - 8 p.
- Published in : J. Phys.: Conf. Ser. 898 (2017) 052037
Fulltext: PDF;
In : 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, San Francisco, Usa, 10 - 14 Oct 2016, pp.052037
|
|
8.
|
Improving Scheduling Efficiency of a Global Multi-core HTCondor Pool in CMS
/ Letts, James (UC, San Diego)
/CMS Collaboration
Scheduling multi-core workflows in a global HTCondor pool is a multi-dimensional problem whose solution depends on the requirements of the job payloads, the characteristics of available resources, and the boundary conditions such as fair share and prioritization imposed on the job matching to resources. Within the context of a dedicated task force, CMS has increased significantly the scheduling efficiency of workflows in reusable multi-core pilots by various improvements to the limitations of the glideinWMS pilots, accuracy of resource requests, efficiency and speed of the HTCondor infrastructure, and job matching algorithms..
CMS-CR-2018-303.-
Geneva : CERN, 2018 - 9 p.
Fulltext: PDF;
In : 23rd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2018, Sofia, Bulgaria, 9 - 13 Jul 2018
|
|
9.
|
Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
/ Balcas, J (Vilnius U.) ; Belforte, S (INFN, Trieste ; Trieste U.) ; Bockelman, B (Nebraska U.) ; Gutsche, O (Fermilab) ; Khan, F (Quaid-i-Azam U.) ; Larson, K (Fermilab) ; Letts, J (UC, San Diego) ; Mascheroni, M (INFN, Milan Bicocca ; Milan Bicocca U.) ; Mason, D (Fermilab) ; McCrea, A (UC, San Diego) et al.
The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources between various activities in CMS. [...]
FERMILAB-CONF-15-604-CD.-
2015 - 7 p.
- Published in : J. Phys.: Conf. Ser. 664 (2015) 062030
IOP Open Access article: PDF; External link: FERMILABCONF
In : 21st International Conference on Computing in High Energy and Nuclear Physics, Okinawa, Japan, 13 - 17 Apr 2015, pp.062030
|
|
10.
|
Archival, anonymization and presentation of HTCondor logs with GlideinMonitor
/ Mambelli, Marco (Fermilab) ; Yancey Mirica (Valparaiso U.) ; Hein, Thomas (Illinois U., Chicago)
GlideinWMS is a pilot framework to provide uniform and reliable HTCondor clusters using heterogeneous resources. The Glideins are pilot jobs that are sent to the selected nodes, test them, set them up as desired by the user jobs, and ultimately start an HTCondor schedd to join an elastic pool. [...]
FERMILAB-CONF-21-060-SCD.-
2021 - 8 p.
- Published in : EPJ Web Conf. 251 (2021) 02012
Fulltext: document - PDF; fermilab-conf-21-060-scd - PDF;
In : 25th International Conference on Computing in High-Energy and Nuclear Physics (CHEP), Online, Online, 17 - 21 May 2021, pp.02012
|
|