Monitoring in Allen

Overview

Monitoring in Allen is performed by dedicated monitoring threads (by default there is a single thread). After a slice of data is processed, the HostBuffers corresponding to that slice are sent to the monitoring thread concurrent with being sent to the I/O thread for output. The flow of HostBuffers is shown below:

        graph LR
A((HostBuffer<br>Manager))-->B[GPU thread]
B-->C[I/O thread]
B-->|if free|D[Monitoring thread]
C-->A
D-->A
    

To avoid excessive load on the CPU, monitoring threads will not queue HostBuffers, i.e, if the monitoring thread is already busy then new HostBuffers will be immediately marked as monitored. Functionality exists within MonitorManager to reactively reduce the amount of monitoring performed (n.b. this corresponds to an increase in the monitoring_level) in response to a large number of skipped slices. This is not currently used but would allow monitoring to favour running some types of monitors for all slices over running all types of monitors for some slices. Additionally, less important monitors could be run on a random sub-sample of slices. The MetaMonitor provides monitoring histograms that track the numbers of successfully monitored and skipped slices as well as the monitoring level.

Monitor classes

Currently, monitoring is performed of the rate for each HLT line (RateMonitor) and for the momentum, pT and chi^2(IP) of each track produced by the Kalman filter (TrackMonitor). Further monitoring histograms can be either added to one of these classes or to a new monitoring class, as appropriate.

Additional monitors that produce histograms based on information in the HostBuffers should be added to integration/monitoring and inherit from the BufferMonitor class. The RateMonitor class provides an example of this. Furthermore, each histogram that is added must be given a unique key in MonitorBase::MonHistType.

Once a new monitoring class has been written, this may be added to the monitoring thread(s) by including an instance of the class in the vectors created in MonitorManager::init, e.g.

m_monitors.back().push_back(new RateMonitor(buffers_manager, time_step, offset));

To monitor a feature, either that feature or others from which it can be calculated must be present in the HostBuffers. For example, the features recorded by TrackMonitor depend on the buffers host_kf_tracks (for the track objects) and host_atomics_scifi (for the number of tracks in each event and the offset to the start of each event). It is important that any buffers used by the monitoring are copied from the device to the host memory and that they do not depend on runtime_options.do_check being set. Additionally, to avoid a loss of performance, these buffers must be written to pinned memory, i.e. the memory must be allocated by cudaMallocHost and not by malloc in HostBuffers::reserve.

Saving histograms

All histograms may be saved by calling MonitorManager::saveHistograms. This is currently performed once after Allen has finished executing. In principle, this could be performed on a regular basis within the main loop but ideally would require monitoring threads to be paused for thread safety.

Histograms are currently written to monitoringHists.root.

Gaudi monitoring

Add to a line

A good example is in the KsToPiPiLine which I will use to demonstrate the necessary changes here.

Edit the header

There are several necessary additions to the header that monitoring will use:

  • Include the necessary header

#include "AllenMonitoring.h"
  • Add the enable_monitoring property and set the default value to false. This should default to false

because the Gaudi monitoring needs to be off in Allen standalone and affects the production throughput. Put

PROPERTY(enable_monitoring_t, "enable_monitoring", "Enable line monitoring", bool) enable_monitoring;

in the parameters and

Property<enable_monitoring_t> m_enable_monitoring {this, false};

in the property list.

  • Add the DeviceAccumulators struct after the line struct declaration

struct DeviceAccumulators {
  Allen::Monitoring::Histogram<>::DeviceType histogram_ks_mass;
  DeviceAccumulators(const kstopipi_line_t& algo, const Allen::Context& ctx) :
    histogram_ks_mass(algo.m_histogram_ks_mass.data(ctx))
  {}
};
  • The additional function needs to be declared in the SelectionAlgorithm struct, for example:

__device__ static void monitor(
  const Parameters& parameters,
  const DeviceAccumulators& accumulators,
  std::tuple<const Allen::Views::Physics::CompositeParticle> input,
  unsigned index,
  bool sel);
  • In the list of properties, add the histogram with the name, title, and a tuple of the number of bins, minimum,

and maximum. This one will appear in the root file as ks_mass, have a title of m(ks), and 100 bins between 400 and 600.

Allen::Monitoring::Histogram<> m_histogram_ks_mass {this, "ks_mass", "m(ks)", {100u, 400.f, 600.f}};

Fill the histogram

The monitor function is where the histogram will be filled. Using what conditions you want to fill the histogram (typically that the event is selected by the line i.e. sel), increment the histogram. An example of this is

__device__ void kstopipi_line::kstopipi_line_t::monitor(
  const Parameters& parameters,
  const DeviceAccumulators& accumulators,
  std::tuple<const Allen::Views::Physics::CompositeParticle> input,
  unsigned index,
  bool sel)
{
  if (sel) {
    const auto ks = std::get<0>(input);
    accumulators.histogram_ks_mass.increment(ks.m12(Allen::mPi, Allen::mPi));
  }
}

Turn on the monitoring

In the configuration of the line (a file called hlt1_*_lines.py, for KsToPiPi it is hlt1_inclusive_hadron_lines.py) the new enable_monitoring property needs to be set. After this the make_kstopipi_line function now looks like

def make_kstopipi_line(long_tracks,
                      secondary_vertices,
                      pre_scaler_hash_string=None,
                      post_scaler_hash_string=None,
                      name='Hlt1KsToPiPi_{hash}',
                      enable_monitoring=True):
    number_of_events = initialize_number_of_events()

    return make_algorithm(
        kstopipi_line_t,
        name=name,
        enable_monitoring=is_allen_standalone() and enable_monitoring,
        host_number_of_events_t=number_of_events["host_number_of_events"],
        host_number_of_svs_t=secondary_vertices["host_number_of_svs"],
        dev_particle_container_t=secondary_vertices[
            "dev_multi_event_composites"],
        pre_scaler_hash_string=pre_scaler_hash_string or name + "_pre",
        post_scaler_hash_string=post_scaler_hash_string or name + "_post")

Note that it requires the is_allen_standalone flag to be true, which can be imported using

from AllenCore.configuration_options import is_allen_standalone

if it is not already in the configuration file. enable_monitoring is set to True by default here, and so every version of the KsToPiPiLine will have monitoring unless explicitly set to False. To turn on monitoring for just one version of a line, set enable_monitoring to False by default in the hlt1_*_lines.py file, and then set it to True in HLT1.py, as done by the DiMuonDrellYan line for example:

make_di_muon_drell_yan_line(
  long_tracks,
  dileptons,
  muonid,
  name="Hlt1DiMuonDrellYan",
  pre_scaler_hash_string="di_muon_drell_yan_line_pre",
  post_scaler_hash_string="di_muon_drell_yan_line_post",
  minMass=5000.,
  minTrackP=12500,
  maxChi2Corr=2.4,
  enable_monitoring=True,
  enable_tupling=enable_tupling)

Add to an algorithm

For this one I am using VeloConsolidateTracks as an example.

Edit the header

We will need similar edits to the header

  • Include the monitoring header

#include "AllenMonitoring.h"
  • Change the algorithm declaration to include the monitoring inputs

__global__ void velo_consolidate_tracks(
  Parameters,
  Allen::Monitoring::Histogram<>::DeviceType,
  Allen::Monitoring::AveragingCounter<>::DeviceType);
  • Add it to the property list where the fields are the same as before (name, title, and a tuple of number of bins,

minimum, and maximim) .. code-block:: c++

Allen::Monitoring::Histogram<> m_histogram_n_velo_tracks {this,

“n_velo_tracks_event”, “n_velo_tracks_event”, {1001u, -0.5f, 1000.5f}};

Pass it to the algorithm

In the Operator function, there is a global_function call to the algorithm which should be edited to include the histogram as an input. The histogram can be accesssed like so .. code-block:: c++

global_function(velo_consolidate_tracks)(size<dev_event_list_t>(arguments), property<block_dim_t>(), context)(

arguments, m_histogram_n_velo_tracks.data(context), m_velo_tracks.data(context));

Increment

  • The algorithm declaration will need to be updated to reflect the additional input

__global__ void velo_consolidate_tracks::velo_consolidate_tracks(
  velo_consolidate_tracks::Parameters parameters,
  Allen::Monitoring::Histogram<>::DeviceType dev_number_of_tracks_histo,
  Allen::Monitoring::AveragingCounter<>::DeviceType dev_tracks_counter)
  • Then the histogram can be filled from inside the algorithm with the generated values

dev_number_of_tracks_histo.increment(event_total_number_of_tracks);

2D Histograms

Most everything is the same for a 2D histogram, but you will need to add the second axis to the declaration,

Allen::Monitoring::Histogram2D<> m_histogram_test_2d {this, "2d", "2d title", {10u, 0.f, 100.f}, {10u, 0.f, 100.f}};

edit anywhere the type is specified to be

Allen::Monitoring::Histogram2D<>::DeviceType

and increment using both values

histo_test_2d.increment(test_val_1, test_val_2);

There is not currently support for 3D histograms.

Counters

Similarly, counters follow the same pattern as histograms except for minor changes. The declaration is

Allen::Monitoring::Counter<> m_invalid_chanid {this, "n_invalid_chanid"};

where only the name is chosen. The type is

Allen::Monitoring::Counter<>::DeviceType invalid_chanid

and it can be incremented as

invalid_chanid.increment();

Please note there is also the AveragingCounter type available which is incremented using a value like

dev_n_pvs_counter.add(*tmp_number_vertices);

as an example. Then the number of entries, sum, and mean are saved whereas the standard counter only saves the number of entries.

Testing offline

Please note that Gaudi monitoring is only avaliable when running with a Gaudi build from the Allen event loop. To test the histogram, set the flags --register-monitoring-counters 1 monitoring-filename test in your command. Then after running, there will be a root file created with the given name plus _gaudi with the histograms.