Particle Physics in Birmingham

Particle Physics Seminar

Wednesday 26th January 2022 at 13:30

(coffee served at 13:15)

Non-parametric Data-driven Background Modelling using Conditional Probabilities

Kostas Nikolopoulos (Birmingham University)


Background modelling is one of the main challenges in particle physics data analysis. Commonly employed strategies are the use of simulated events of the background process and the fit of parametric background models to the observed data. However, reliable simulations are not always available or may be extremely costly to produce. As a result, the uncertainties arising from simulation-based background modelling or from limited simulation statistics in many cases are the limiting factor in the analysis sensitivity. At the same time, parametric models are limited by the a priori unknown functional form and parameter values of the modelled background. These issues become ever more pressing when studying exclusive signatures involving hadronic backgrounds, and when large datasets become available, as it is the case at the LHC.  Two novel non-parametric data-driven background modelling techniques are presented, which address these issues for a broad class of searches and measurements by providing an almost fully generic background modelling strategy. The first method uses data from a relaxed version of the event selection to estimate a graph of conditional probability density functions of the variables used in the analysis, accounting for all significant correlations. A background model is then generated by sampling events from this graph, before the full event selection is applied. In the second method, a generative adversarial network is trained to estimate the joint probability density function of the variables used in the analysis, conditioned on the variable used to blind the signal region. This training proceeds in the sidebands, and the conditional probability density function is interpolated into the signal region to estimate the background. Results are presented which demonstrate the performance of both methods, and their impacts on two benchmark analyses are discussed.