From 6e2501f7bd6ee73a6d1a48f44a970959faa8a760 Mon Sep 17 00:00:00 2001
From: Heidi Schellman <33669005+hschellman@users.noreply.github.com>
Date: Thu, 9 Oct 2025 13:34:38 -0700
Subject: [PATCH] move official datasets into its own file for reuse
---
_episodes/02-storage-spaces.md | 2 +-
_episodes/03-data-management.md | 12 ++--
_extras/OfficialDatasets.md | 6 ++
_includes/OfficialDatasets_include.md | 94 +++++++++++++++++++++++++++
4 files changed, 109 insertions(+), 5 deletions(-)
create mode 100644 _extras/OfficialDatasets.md
create mode 100644 _includes/OfficialDatasets_include.md
diff --git a/_episodes/02-storage-spaces.md b/_episodes/02-storage-spaces.md
index bd3bdde..8380604 100644
--- a/_episodes/02-storage-spaces.md
+++ b/_episodes/02-storage-spaces.md
@@ -212,7 +212,7 @@ This should go quickly as you are not actually writing the file.
Note, if the destination for an ifdh cp command is a directory instead of filename with full path, you have to add the "-D" option to the command line.
-Prior to attempting the first exercise, please take a look at the full list of IFDH commands, to be able to complete the exercise. In particular, mkdir, cp, rmdir,
+Prior to attempting the first exercise, please take a look at the full list of IFDH commands, to be able to complete the exercise. In particular, cp, rmdir,
**Resource:** [ifdh commands](https://cdcvs.fnal.gov/redmine/projects/ifdhc/wiki/Ifdh_commands)
diff --git a/_episodes/03-data-management.md b/_episodes/03-data-management.md
index e86160d..ac289ee 100644
--- a/_episodes/03-data-management.md
+++ b/_episodes/03-data-management.md
@@ -77,7 +77,11 @@ If you want to process data using the full power of DUNE computing, you should t
## How to find and access official data
-### What is metacat?
+{% include OfficialDatasets_include.md %}
+
+You can also query the catalogs yourself using [metacat][metacat] and [rucio][rucio] catalogs. Metacat contains information about file content and official datasets, rucio stores the physical location of those files. Files should have entries in both catalogs. Generally you ask metacat first to find the files you want and then ask rucio for their location.
+
+## What is metacat?
Metacat is a file and dataset catalog - it allows you to search for files and datasets that have particular attributes and understand their provenance, including details on all of their processing steps.
It also allows for querying jointly the file catalog and the DUNE conditions database.
@@ -243,7 +247,7 @@ Total size: 17553648200600 (17.554 TB)
-## Official datasets
+
+
### What describes a dataset?
Let's look at the metadata describing that anti-neutrino dataset: the -j means json output
diff --git a/_extras/OfficialDatasets.md b/_extras/OfficialDatasets.md
new file mode 100644
index 0000000..558bc54
--- /dev/null
+++ b/_extras/OfficialDatasets.md
@@ -0,0 +1,6 @@
+---
+title: Official Datasets
+permalink: OfficialDatasets
+---
+
+{% include OfficialDatasets_include.md %}
\ No newline at end of file
diff --git a/_includes/OfficialDatasets_include.md b/_includes/OfficialDatasets_include.md
new file mode 100644
index 0000000..da7f836
--- /dev/null
+++ b/_includes/OfficialDatasets_include.md
@@ -0,0 +1,94 @@
+
+## Official datasets
+
+The production group make official datasets which are sets of files which share important characteristics such as experiment, data_tier, data_stream, processing version and processing configuration. Often all you need is an official dataset.
+
+See [DUNE Physics Datasets](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=29787&filename=DUNEdataset_v1.pdf) for a detailed description.
+
+### Fast web catalog queries
+
+You can do fast string queries based on keywords embedded in the dataset name.
+
+Go to [dunecatalog](https://dune-tech.rice.edu/dunecatalog/) and log in with your services password.
+
+Choose your apparatus (Far Detector for example), use the category key to further refine your search and then type in keywords. Here I chose the `Far Detectors` tab and the `FD-VD` category from the pulldown menu.
+
+{: .image-with-shadow }
+
+If you click on a dataset you can see a sample of the files inside it.
+
+
+You can find a more detailed tutorial for the dunecatalog site at:
+[Dune Catalog Tutorial](https://docs.dunescience.org/cgi-bin/sso/RetrieveFile?docid=33738&filename=DUNE%20Catalog%20Presentation.pdf&version=2)
+
+
+
+### Command line tools and advanced queries
+
+You can also explore and find the right dataset on the command line by using metacat dataset keys:
+
+First you need to know your namespace and then explore within it.
+
+~~~
+metacat namespace list # find likely namespaces
+~~~
+{: .language-bash}
+
+There are official looking ones like `hd-protodune-det-reco` and ones for users doing production testing like `schellma`. The default for general use is `usertests`
+
+Creation of namespaces by non-privileged users is currently disabled. A tool is in progress which will automatically make one namespace for each user
+
+### metacat web interface
+
+Metacat also has a web interface that is useful in exploring file parentage [metacat gui](https://metacat.fnal.gov:9443/dune_meta_prod/app/gui)
+
+### Example of finding reconstructed Monte Carlo
+
+Let's look for some reconstructed Monte Carlo from the VD far detector.
+
+~~~
+metacat query "datasets matching fardet-vd:*official having core.data_tier=full-reconstructed"
+~~~
+{: .language-bash}
+
+Lots of output ... looks like there are 2 types of official ones - let's get "v2"
+
+~~~
+metacat query "datasets matching fardet-vd:*v2_official having core.data_tier=full-reconstructed"
+~~~
+{: .language-bash}
+
+and there are then several different generators. Let's explore reconstructed simulation of the vertical drift far detector.
+
+~~~
+metacat query "datasets matching fardet-vd:*v2_official having core.data_tier=full-reconstructed and dune_mc.gen_fcl_filename=prodgenie_nu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg.fcl"
+~~~
+{: .language-bash}
+
+Ok, found the official neutrino beam dataset:
+
+~~~
+fardet-vd:fardet-vd__full-reconstructed__v09_81_00d02__reco2_dunevd10kt_nu_1x8x6_3view_30deg_geov3__prodgenie_nu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg__out1__v2_official
+~~~
+{: .output}
+
+
+~~~
+metacat query "datasets matching fardet-vd:*v2_official having core.data_tier=full-reconstructed and dune_mc.gen_fcl_filename=prodgenie_anu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg.fcl"
+~~~
+
+And the anti-neutrino dataset:
+
+~~~
+fardet-vd:fardet-vd__full-reconstructed__v09_81_00d02__reco2_dunevd10kt_anu_1x8x6_3view_30deg_geov3__prodgenie_anu_numu2nue_nue2nutau_dunevd10kt_1x8x6_3view_30deg__out1__v2_official
+~~~
+{: .output}
+
+
+
+### you can use the web data catalog to do advanced searches
+
+You can also do keyword/value queries like the ones above using the Other tab on the web-based Data Catalog.
+
+{: .image-with-shadow }
+