Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _episodes/02-storage-spaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ This should go quickly as you are not actually writing the file.

Note, if the destination for an ifdh cp command is a directory instead of filename with full path, you have to add the "-D" option to the command line.

Prior to attempting the first exercise, please take a look at the full list of IFDH commands, to be able to complete the exercise. In particular, mkdir, cp, rmdir,
Prior to attempting the first exercise, please take a look at the full list of IFDH commands, to be able to complete the exercise. In particular, cp, rmdir,

**Resource:** [ifdh commands](https://cdcvs.fnal.gov/redmine/projects/ifdhc/wiki/Ifdh_commands)

Expand Down
12 changes: 8 additions & 4 deletions _episodes/03-data-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,11 @@ If you want to process data using the full power of DUNE computing, you should t

## How to find and access official data

### What is metacat?
{% include OfficialDatasets_include.md %}

You can also query the catalogs yourself using [metacat][metacat] and [rucio][rucio] catalogs. Metacat contains information about file content and official datasets, rucio stores the physical location of those files. Files should have entries in both catalogs. Generally you ask metacat first to find the files you want and then ask rucio for their location.

## What is metacat?

Metacat is a file and dataset catalog - it allows you to search for files and datasets that have particular attributes and understand their provenance, including details on all of their processing steps.
It also allows for querying jointly the file catalog and the DUNE conditions database.
Expand Down Expand Up @@ -243,7 +247,7 @@ Total size: 17553648200600 (17.554 TB)

<!-- To look at all the files in that run you need to use XRootD - **DO NOT TRY TO COPY 4 TB to your local area!!!*** -->

## Official datasets <a name="Official_Datasets"></a>
<!-- ## Official datasets <a name="Official_Datasets"></a>

The production group make official datasets which are sets of files which share important characteristics such as experiment, data_tier, data_stream, processing version and processing configuration.

Expand Down Expand Up @@ -335,8 +339,8 @@ fardet-vd:fardet-vd__full-reconstructed__v09_81_00d02__reco2_dunevd10kt_anu_1x8x
You can also do keyword/value queries like the ones above using the Other tab on the web-based Data Catalog.

![Full query search](../fig/otherquery.png){: .image-with-shadow }


-->
### What describes a dataset?

Let's look at the metadata describing that anti-neutrino dataset: the -j means json output
Expand Down
6 changes: 6 additions & 0 deletions _extras/OfficialDatasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: Official Datasets
permalink: OfficialDatasets
---

{% include OfficialDatasets_include.md %}
94 changes: 94 additions & 0 deletions _includes/OfficialDatasets_include.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@

## Official datasets <a name="Official_Datasets"></a>

The production group make official datasets which are sets of files which share important characteristics such as experiment, data_tier, data_stream, processing version and processing configuration. Often all you need is an official dataset.

See [DUNE Physics Datasets](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=29787&filename=DUNEdataset_v1.pdf) for a detailed description.

### Fast web catalog queries

You can do fast string queries based on keywords embedded in the dataset name.

Go to [dunecatalog](https://dune-tech.rice.edu/dunecatalog/) and log in with your services password.

Choose your apparatus (Far Detector for example), use the category key to further refine your search and then type in keywords. Here I chose the `Far Detectors` tab and the `FD-VD` category from the pulldown menu.

![Fast keyword search](../fig/keywordquery.png){: .image-with-shadow }

If you click on a dataset you can see a sample of the files inside it.


You can find a more detailed tutorial for the dunecatalog site at:
[Dune Catalog Tutorial](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=33738&filename=DUNE%20Catalog%20Presentation.pdf&version=2)



### Command line tools and advanced queries

You can also explore and find the right dataset on the command line by using metacat dataset keys:

First you need to know your namespace and then explore within it.

~~~
metacat namespace list # find likely namespaces
~~~
{: .language-bash}

There are official looking ones like `hd-protodune-det-reco` and ones for users doing production testing like `schellma`. The default for general use is `usertests`

Creation of namespaces by non-privileged users is currently disabled. A tool is in progress which will automatically make one namespace for each user

### metacat web interface

Metacat also has a web interface that is useful in exploring file parentage [metacat gui](https://metacat.fnal.gov:9443/dune_meta_prod/app/gui)

### Example of finding reconstructed Monte Carlo

Let's look for some reconstructed Monte Carlo from the VD far detector.

~~~
metacat query "datasets matching fardet-vd:*official having core.data_tier=full-reconstructed"
~~~
{: .language-bash}

Lots of output ... looks like there are 2 types of official ones - let's get "v2"

~~~
metacat query "datasets matching fardet-vd:*v2_official having core.data_tier=full-reconstructed"
~~~
{: .language-bash}

and there are then several different generators. Let's explore reconstructed simulation of the vertical drift far detector.

~~~
metacat query "datasets matching fardet-vd:*v2_official having core.data_tier=full-reconstructed and dune_mc.gen_fcl_filename=prodgenie_nu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg.fcl"
~~~
{: .language-bash}

Ok, found the official neutrino beam dataset:

~~~
fardet-vd:fardet-vd__full-reconstructed__v09_81_00d02__reco2_dunevd10kt_nu_1x8x6_3view_30deg_geov3__prodgenie_nu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg__out1__v2_official
~~~
{: .output}


~~~
metacat query "datasets matching fardet-vd:*v2_official having core.data_tier=full-reconstructed and dune_mc.gen_fcl_filename=prodgenie_anu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg.fcl"
~~~

And the anti-neutrino dataset:

~~~
fardet-vd:fardet-vd__full-reconstructed__v09_81_00d02__reco2_dunevd10kt_anu_1x8x6_3view_30deg_geov3__prodgenie_anu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg__out1__v2_official
~~~
{: .output}



### you can use the web data catalog to do advanced searches

You can also do keyword/value queries like the ones above using the Other tab on the web-based Data Catalog.

![Full query search](../fig/otherquery.png){: .image-with-shadow }