Bridging the gap between
simulations and LST-1 data
using domain adaptation

GammaLearn status update

Author Under the supervision of
Michaël Dell'aiera (LAPP, LISTIC) Thomas Vuillaume (LAPP)
Alexandre Benoit (LISTIC)
06-03-2024, CTA France
michael.dellaiera@lapp.in2p3.fr

Introduction

Contextualisation


**[GammaLearn](https://purl.org/gammalearn):** Collaboration between LAPP and LISTIC

* Fosters innovative methods in AI for CTA → Expore and evaluate the added value of deep learning

* Current main application is LST-1 → Offline analysis → R&D for online analysis: DIRECTA project funded by the ANR

* Code and documentation on [https://gitlab.in2p3.fr/gammalearn/gammalearn](https://gitlab.in2p3.fr/gammalearn/gammalearn)

GammaLearn workflow


The detection workflow in accordance to γ-PhysNet.

The γ-PhysNet neural network architecture


γ-PhysNet architecture: DOI: 10.5220/0010297405340544

Results on real data: Crab Nebula


Mrk501 event map by gammalearn Your second image
Reconstruction Significance Background counts
γ-PhysNet 12.5 σ 302
γ-PhysNet + Poisson noise 14.3 σ 317

* NSB training MC ≠ NSB real data * Matching NSB with the addition of Poisson noise to DL1 images * Increased significance with NSB matching → Sensitivity to NSB level

Domain adaptation

The challenging transition from MC to real data


**Simulations are approximations of the reality**

Variation of NSB
Stars in the FoV, dysfunctioning pixels (pedestal image)

→ **Non-trivial direct application to real data**

Domain adaptation


**[Domain adaptation](https://arxiv.org/abs/2009.00155): Set of algorithms and techniques aiming at reducing domain discrepancies**

* Take into account unknown differences between the source (labelled, simulations) and target (unlabelled, real data) domains * Selection, implementation and validation of [DANN](https://arxiv.org/abs/1505.07818) (focus of this talk), [DeepJDOT](https://arxiv.org/abs/1803.10081), [DeepCORAL](https://arxiv.org/abs/1607.01719)

Domain confusion

Validation pipeline of our approach


**Validation of the methods**
* Controlled perturbations on the simulated (labelled) datasets * Variation of the NSB * Label shift (gamma/proton ratio) * Source = **MC**, Target = **MC+Poisson(λ)**
→ Validation with figures of merit: [Published](https://arxiv.org/abs/2308.12732)
**Tests on real telescope acquisitions**
* Source = **MC**, Target = **Real data**
→ Detection of known gamma-ray sources

Results on MC simulations

γ-PhysNet applied to simulations: Setup


Best scenario Simulations = Real data

Train

Test

Source
Labelled

Unlabelled
MC

ratio=50%/50%
MC

ratio=50%/50%

Worst scenario Simulations ≠ Real data, no NSB matching

Train

Test

Source
Labelled

Unlabelled
MC

ratio=50%/50%
MC+P(0.46) (MC*)

ratio=50%/50%

γ-PhysNet applied to simulations: Results


Domain adaptation applied to simulations: Setup


Domain adaptation (DANN) Noise with label shift (real case)

Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC

ratio=50%/50%
MC+P(0.46) (MC*)

ratio=100%
MC+P(0.46) (MC*)

ratio=50%/50%

γ-PhysNet-DANN applied to simulations: Results


Results on real data

Domain adaptation applied to the Crab: Setup


Domain adaptation w/o NSB matching

Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC

ratio=50%/50%
Real data

ratio=1γ for > 1000p
Real data

ratio=1γ for > 1000p

Domain adaptation w/ NSB matching

Train

Test

Source
Labelled
Target
Unlabelled

Unlabelled
MC+P(λ) (MC*)

ratio=50%/50%
Real data

ratio=1γ for > 1000p
Real data

ratio=1γ for > 1000p

Dataset details



Crab runs

6892

6893

6894

6895

Zenith angle 16.1° 20.3° 27.9° 32.4°
Light pollution 1.94pe 1.81pe 1.64pe 1.60pe

* Heterogeneity within a run

Methods procedures



Methods

lstchain

γ-PhysNet

γ-PhysNet-DANN

Input Cleaned images All the pixels All the pixels
Training data MC* MC
MC*
MC & Crab
MC* & Crab

* 2 runs ~ 20 million of events, 1 million of Crab events for the domain adaptation target training data (5%)

Application to Crab data: Results


Conclusion

Conclusion & Perspectives


  • Novel technique to solve MC vs real data discreprency
    • Tested on MC, in different settings (NSB and label shift)
    • Crab data, both moonlight and no moonlight conditions
  • γ-PhysNet strongly affected by moonlight / NSB level in MC vs observations
  • The benefits of domain adaptation is not well established yet
    • Advantage demonstrated on MC data with different level of NSB
    • Results on par with γ-PhysNet on real data
    • Maybe there is nothing more to gain after correcting for NSB level? - Currently testing this hypothesis
    • Domain adaptation vs Data adaptation: The key may be both
  • Submitted to ADASS 2023
  • Stability of the model: Uncertainy of the model & uncertainty over data sampling

Acknowledgments


- This project is supported by the facilities offered by the Univ. Savoie Mont Blanc - CNRS/IN2P3 MUST computing center - This project was granted access to the HPC resources of IDRIS under the allocation 2020-AD011011577 made by GENCI - This project is supported by the computing and data processing ressources from the CNRS/IN2P3 Computing Center (Lyon - France) - We gratefully acknowledge the support of the NVIDIA Corporation with the donation of one NVIDIA P6000 GPU for this research. - We gratefully acknowledge financial support from the agencies and organizations listed [here](https://www.cta-observatory.org/consortium\_acknowledgment). - This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 653477 - This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824064

γ-PhysNet + DANN


γ-PhysNet + DANN


γ-PhysNet + DANN


γ-PhysNet + DANN


γ-PhysNet + DANN


γ-PhysNet + DANN conditionnal


γ-PhysNet + DANN conditionnal


γ-PhysNet + DANN conditionnal


γ-PhysNet + DANN conditionnal


γ-PhysNet + DANN conditionnal