The CERN Open Data portal is the access point to a growing range of data produced through the research performed at CERN. It disseminates the preserved output from various research activities and includes accompanying software and documentation needed to understand and analyse the data.
The portal adheres to established global standards in data preservation and Open Science: the products are shared under open licenses; they are issued with a Digital Object Identifier (DOI) to make them citable objects.
Data produced by the LHC experiments are usually categorised in four different levels (DPHEP Study Group (2009)):
The CERN Open Data portal focuses on the release of event data from levels 2 and 3. The LHC collaborations may also provide small samples of level 4 data.
All four LHC experiments have approved data preservation and access policies which state that they will make their data available after a certain embargo period. For detailed information regarding embargo periods, accessibility and preservation of LHC data of various levels, please refer to the experiments data policies.
All datasets and other material available in this portal are minted with a persistent DOI identifiers that allow permanent linking to the records. The CERN Open Data portal endorses the FORCE 11 Joint Declaration of Data Citation Principles. We thus ask you to cite the data provided on the portal when you reuse them. To make this easier for you, we provide you with a citation recommendation for every dataset as well as other suitable output formats (such as BibTeX). Citing datasets in the reference list of your paper will allow other platforms such as INSPIRE to track citations to these datasets and measure their impact.
This portal is built around the following technologies:
Invenio is a digital repository framework that allows to build and run your own digital library, institutional repository, multimedia archive, or research data repository on the web. Invenio technology covers all aspects of digital repository management, from document ingestion through classification, indexing, and curation up to document dissemination. The flexible data model uses a custom JSON Schema to describe data assets. Invenio is a strong advocate of open standards.
CernVM is a baseline Virtual Software Appliance for the participants of LHC experiments. The Appliance represents a complete, portable and easy-to-configure user environment for developing and running LHC data analyses locally and on institutional and commercial clouds (OpenStack, Amazon EC2, Google Compute Engine), independently of Operating System platforms (Linux, Windows, macOS). The goal is to remove a need for the installation of the experiment software and to minimise the number of platforms (compiler-OS combinations) on which experiment software needs to be supported and tested.
EOS is a disk-based service providing a low-latency storage infrastructure for physics users. EOS provides a highly-scalable hierarchical namespace implementation. Data access is provided by the XRootD protocol. The main target area for the service are physics data analysis use often cases characterised by many concurrent users, a significant fraction random data access and a large file open rate.
The CERN Open Data portal is developed by the CERN Information Technology group in close collaboration with LHC experiments and many researchers in the High-Energy Physics community. If you want to contact us for any request or submission, please send us an email to opendata-support@cern.ch.