# ATLAS Level-1 Calorimeter Trigger: Subsystem Tests of a Prototype Cluster Processor Module.

J. Garvey, S. Hillier, G. Mahout<sup>\*</sup>, T.H. Moye, R.J. Staley, P.J. Watkins, A. Watson

School of Physics and Astronomy, University of Birmingham, Birmingham B15 2TT, UK

R. Achenbach, P. Hanke, E-E. Kluge, K. Meier, P. Meshkov, O. Nix, K. Penno, K. Schmitt

Kirchhoff-Institut für Physik, University of Heidelberg, D-69120 Heidelberg, Germany

C. Ay, B. Bauss, A. Dahlhoff, K. Jakobs, K. Mahboubi, U. Schäfer, T. Trefzger

Institut fur Physik, Universität Mainz, D-55099 Mainz, Germany

E. Eisenhandler, M. Landon, D. Mills, E. Moyse, J. Thomas

Physics Department, Queen Mary, University of London, London E1 4NS, UK

## P. Apostologlou, B.M. Barnett, I.P. Brawn, A.O. Davis, J. Edwards, C. N. P. Gee, A.R. Gillman, V.J.O. Perera, W.Qian

Rutherford Appleton Laboratory, Chilton, Oxon OX11 0QX, UK

C. Bohm, S. Hellman, A. Hidvégi, S. Silverstein

Fysikum, University of Stockholm, SE-106 Stockholm, Sweden

#### Abstract

The Level-1 Calorimeter Trigger consists of a Preprocessor, a Cluster Processor (CP), and a Jet/Energy-sum Processor (JEP). The CP and JEP receive digitised triggertower data from the Preprocessor and produce trigger multiplicity and region-of-interest (RoI) information. The CP Modules (CPM) are designed to find isolated electron/photon and hadron/tau clusters in overlapping windows of trigger towers. The trigger will also provide intermediate results to the data acquisition (DAQ) system for monitoring and diagnostic purposes by using Readout Driver (ROD) Modules Four full-specification prototypes of CPMs have been built and real time data tests on individual boards are presented. These modules were then integrated with other modules to build an ATLAS Level-1 Calorimeter Trigger subsystem test bench. Real time data were exchanged between modules, and time-slice readout data were tagged and transferred to the ROD at a trigger rate up to 130 kHz. Tests results have been successful and the CPM's present design is close to the final production design.

At the full LHC design luminosity of  $10^{34}$  cm<sup>-2</sup>s<sup>-1</sup>, there will be approximately 23 proton-proton interactions per bunch crossing. The ATLAS Level-1 trigger [1] is to reduce the 1 GHz interaction rate to a trigger rate of 75 kHz for input to the Level-2 trigger. The reduction is performed by processing reduced-granularity data from the calorimeters and muon spectrometer. Potentially interesting events are selected by identifying electron/photon candidates, jets, single-hadron/tau candidates, missing transverse energy, and total transverse energy. Muon candidates are identified in a separate trigger. The Level-1 decision will be based upon a combination of these signals, and is made by the Central Trigger Processor (CTP). The total Level-1 latency is 2  $\mu$ s, during which data from the entire ATLAS detector will be stored in analogue or digital pipeline memories.

This paper describes the Cluster Processor Module of the Level-1 Calorimeter Trigger system, designed to identify transverse energy clusters associated with electrons/photons and isolated hadrons/taus. After a more detailed description of the Level-1 Calorimeter Trigger system, the hardware implementation of the CPM will be briefly reviewed as a

I. INTRODUCTION

<sup>&</sup>lt;sup>\*</sup> Corresponding author: gm@hep.ph.bham.ac.uk



Figure 1:The ATLAS Level-1 Calorimeter Trigger System

detailed description of the board can be found in [2]. Standalone tests on real time data will be presented and its integration with other modules of the Level-1 trigger will be shown.

# II. THE ATLAS LEVEL-1 CALORIMETER TRIGGER SYSTEM

As shown in figure 1, the ATLAS Level-1 Calorimeter Trigger system consists of three subsystems, namely the Preprocessor, electron/photon and tau/hadron Cluster Processor (CP), and Jet/Energy-sum Processor (JEP). The CP and JEP will receive digitised calorimeter trigger-tower (TT) data from the Preprocessor, and send trigger multiplicity information to the Central Trigger Processor via Common Merger Modules (CMM). Using Readout Driver (ROD) modules, the CP and JEP will also provide region-of-interest (RoI) information for the Level-2 trigger, and intermediate and final results to the data acquisition (DAQ) system for monitoring and diagnostic purposes.

The calorimeter trigger covers the region  $|\eta| < 5$  and  $\phi=0$  to  $2\pi$  On the detector, cells are combined to form trigger towers, with a reduced granularity of  $(\Delta\eta x \Delta \phi) = 0.1 \times 0.1$  over the region  $|\eta| < 2.5$  and a variable granularity elsewhere. Analogue pulses enter the Preprocessor, where they are digitised to 10-bit precision at a frequency of 40 MHz. After Bunch-Crossing IDentification (BCID), the precise value of transverse energy for each trigger tower is produced in a look-up table. The transverse energy is an 8-bit word, giving a transverse energy scale linear up to 255 GeV. The total number of trigger towers is 7200, of which 6400 are processed by the Cluster Processor, corresponding to an area of  $|\eta| < 2.5$ . Each Cluster Processor Module sends the number of  $2 \times 10^{-1}$  and  $\pi$ /hadron clusters it has found, up to a maximum of 7 per set of thresholds, to two merger modules located in the

same crate. All of the four crates in the CP are merged and sent to the CTP to build the Level-1 decision. Similar architecture is used for the Jet Processor, where jets are identified and their multiplicities are also sent via CMMs to the CTP, as well as the total and missing transverse energy. On receipt of a Level-1 request, both subsystems, Jet and Cluster Processor, send RoI information, trigger tower Data and multiplicity information to the ROD.

### III. CLUSTER PROCESSOR MODULE REQUIREMENTS

The CPM receives digitised data from the Preprocessor Module and sends real time cluster information to the CMM. It also provides information to the readout chain. Its requirements are as follows:

- Identify possible isolated electrons, photons and semihadronic t decays
- Calculate multiplicities of e/ $\gamma$  candidates and  $\tau$  candidates for different threshold conditions on  $E_T$
- Transmit these multiplicities as input to the Level-1 trigger decision
- Transmit Trigger Tower data, multiplicities and RoI coordinates to ReadOut Driver Modules.

## IV. THE CLUSTER PROCESSOR ALGORITHM AND ITS HARDWARE IMPLEMENTATION

The calorimeter trigger is designed to find electromagnetic clusters and isolated hadrons. A large fraction of the isolated hadron triggers will probably come from taus.

### A. The Cluster Finding Algorithm

RoIs are identified by looking inside overlapping windows of  $(\Delta \eta x \Delta \phi) = 0.4 x 0.4$  in both electromagnetic and hadronic calorimeters. These windows overlap in steps of 0.1 in  $\eta$  and  $\phi$  to cover the trigger phase space of the calorimeters. A RoI must be a local maximum in transverse energy, and must pass threshold requirements on regions within it as well as isolation criteria. Information on its  $(\eta, \phi)$  coordinates and which of the 16 sets of threshold conditions it passed are used in the Level-2 trigger. For full details of the algorithms, see [2].

### B. The Cluster Processor Chip

Despite the high density of signals to process in order to perform the cluster finding algorithm, FPGAs have been chosen to perform the CP algorithm; this required data to be serialised and multiplexed to reduce pin counts. The choice has been to design a chip capable of handling a total of 8 4x4x2 windows. Eight of these so-called Cluster Processor chips populate one CPM. Because the algorithm involves overlapping data, Trigger Tower data are shared:

- Between CP chips onboard
- Between CPMs, across a custom-built backplane

The limited number of inputs per chip required that we serialise data at 160 MHz at the input of the CP chip. This is the same data which are also sent/received through the backplane to/from its neighbouring modules.

# V. PROTOTYPE CLUSTER PROCESSOR MODULE LAYOUT

To perform all the requirements listed in III, the CPM has been designed with the following layout:

- 80 400 Mbit/s LVDS links, to collect data from Preprocessor Modules
- 80 LVDS deserialisers, to convert data to 40 MHz 10-bit parallel words
- 20 Serialiser (SRL) chips, to distribute data at 160 MHz:
  - On-board to perform cluster finding algorithm
  - Via backplane to adjacent modules
- Receive data at 160 MHz from neighbouring modules
- 8 CP chips, to perform the Cluster Finding Algorithm
- 2 Hit Merger chips, to calculate and transmit multiplicities to Level-1 Trigger decision via CMM
- 2 readout controller (ROC) chips, to pipeline Trigger Tower Data, multiplicities and RoI co-ordinates
- On Level-1 request, to send readout information to ROD module to help build Level-2 decision and read out to DAQ

What follows is a brief description of the main components of the board:

#### A. Serialiser Chips (SRL)

The serialiser chip performs two tasks:

- Multiplexes and re-serialises data at 160 MHz to send to CP chips on-board and to the adjacent CPMs on either side.
- Stores data in a pipeline, waiting for a Level-1 Readout request to output them toward the readout controller

The serialiser chip has been implemented in a Xilinx VirtexE FPGA, XCV100E. There are 20 per CPM, half dedicated to the electromagnetic data, the other half to the hadronic.

#### B. Cluster Processor Chip (CP)

The Cluster Processor Chip executes the algorithm, and this requires a bigger device than the SRL chip. The algorithm has been implemented in a XCV1000E with a total of 660 pins and 1.3 million gates. The outputs of the chip are:

- RoI co-ordinates
- Sets of thresholds, among 16, which have been passed

The multiplicity of each threshold for all 8 CP chips is counted by two extra FPGAs called Hit Merger.

### C. Readout Controller Chip (ROC)

The readout controller has been implemented in a Xilinx XCV100E FPGA. Two devices are used to drive on one path the RoI information to the Level-2 system, and, on another path, TT and hit multiplicity data for the DAQ system.

### D. TTC information

Electrical TTC information is broadcast through the backplane to each individual board. Each CPM receives its 40 MHz clock and TTC commands through a TTCrx chip mounted on each of them.

#### VI. CLUSTER PROCESSOR MODULE PROTOTYPE

Figure 2 shows a picture of a full specification prototype CPM.



Figure 2: The Cluster Processor Module

This is a 9U board with a total of 32 FPGAs: 20 SRL Chips, 8 CP chips, 2 Hit Mergers and 2 ROCs. FPGA configurations are stored in FlashRams. Eighty LVDS deserialisers are positioned close to the backplane input, immediately followed by the SRL chips. The bigger devices are the 8 CP chips, and near the top front panel, the two ROC prepare data to be output via 2 G-link connectors.

#### VII. TESTING PROCEDURE

A common approach has been used in testing different areas of the board or to validate data sent downstream to the Level-1 trigger system chain. A very detailed simulation package was able to generate expected data at different points inside the trigger system, down to the level of each individual chip component. As seen on figure 3, the measurement requires selecting the type of data to process. Test vectors are generated and loaded inside available playback memory of the system. The simulation used the same test vectors, and the computed output is compared to the hardware data uploaded from spy memory. Errors, if any, are recorded and new test could be redone in order to see the effect of parameters such as clock delays.



Figure 3:Schematic of the test procedure

#### VIII. SERIAL LVDS INPUT

The CPM receives its data from the PPM in LVDS format. Data Source-Sink (DSS) modules have been designed earlier in the group to allow a variety of tests on different modules (see [3]). The use of daughter cards allows several different types of data transmitters or receivers to be used. In the present test, LVDS Tx daughter boards have been used to emulate the PPM. Cables are connected on the back of the backplane, a feature of the CPM design giving the freedom to change a module without uncabling. Once the data were recovered correctly inside the spy memory of the serialiser chip, a timing investigation of the input data was performed. It consisted of delaying the 40 MHz clock strobing the incoming data in steps of 104 ps, the minimal value provided by the TTCrx chip. The result is shown in figure 4 and it can be seen that data are valid over a period of time near the 25 ns period, the overall transition time being 5 ns.

In addition, a bit-error rate test has been performed on each input and an upper limit of  $10^{13}$  per channel was measured.



Figure 4: Error profile of the data received inside the serialiser as a function of the delayed 40 MHz clock.

#### IX. REAL TIME TESTING

One of the challenges the CPM has to overcome is to correctly receive the 160 MHz data, on board and from the backplane. Playback memories are allocated inside the serialiser chips to drive 160 MHz test data to the CP chips. Once data were been correctly received inside the CP chip, a timing investigation was also performed. It consisted of delaying the clock driving the CP chip. Error checks were performed on each individual pin input of the CP chip and results are shown on figure 5.



Figure 5:Error Profile of the data received inside the CP chips as a function of its delayed clock.

A pattern with an expected period of 6.25 ns is observed, the input data being serialised at 160 MHz.

On-board input data exhibit an error-free timing window of 2.5 ns, which is about 1 ns wider than for the backplane data. The present PCB has not been optimised to keep the delay between tracks as short as possible, and therefore the size of the error peak is due to the spread of the components itself rather than noise on the tracks. It can be seen from those two plots that two separate clocks need to be provided, in order to overlap the error-free windows of the on-board data with the ones from the backplane.

## X. INTEGRATION WITH A COMMON MERGER MODULE

The Common Merger Module collects multiplicities of individual CPMs through the backplane, and transmits the total result to the Level-2 trigger. More information on the CMM can be found in [4]. The integration test of the CMM with the CPM was to check that the hit multiplicity was correctly recovered inside the spy memory of the CMM. A timing investigation of the validity of the data has been performed and the result is shown in figure 6, showing a timing window of 12 ns. To perform this scan, both clocks of the serialiser (playback data) and CP chip were shifted together, taking care to run inside an error-free zone as measured earlier. To save latency, output data are not clocked, therefore the size of the error peak has to be compared to the time needed by the hit merger logic to calculate the multiplicity of all 16 thresholds, which is close to half a 40 MHz period.



Figure 6:Timing investigation of the hit multiplicities received inside the CMM

### XI. INTEGRATION WITH A ROD

#### A. Readout Frame definition

On each L1A signal received by the CPM, 2 frames of data are output serially via Glink to a dedicated ROD:

• A RoI frame containing which set of thresholds, among 16, have been passed by the RoI, 2 bits of controls,

the  $(\eta, \phi)$  coordinates, the Bunch Crossing Number and a parity bit.

A DAQ frame containing the energies of 80 trigger towers, the BCN number, hit multiplicities and a parity bit. Up to 5 consecutive BCs of these data could be transferred.

#### B. Readout Test Result

The readout path was tested by using a DSS as a source of L1A signals. Playback memories of the DSS were downloaded with a known L1A pattern, so the simulation could anticipate BCN and data values. Readout data were sent to a ROD emulator, which was a DSS mounted with Glink Rx daughter card. Tests have been performed for the RoI and DAQ paths, with different rates of L1A, up to 130 kHz, and up to 5 consecutive BCs in the case of the DAQ. All data have been recovered successfully inside the spy memory of the DSS, including BCN values. Bursts of L1As generated with a minimum required separation of 5 ticks have also been processed correctly.

### XII. CONCLUSION

A new CPM with an updated design is expected to appear early next year. The present tests of the CPM integrated with other Level-1 calorimeter boards at a subsystem level have been successful, and the module requires only a few changes in the design. Timing investigation has shown that the new PCB needs to handle the delays between tracks with care. Extra clocks will also be a benefit to cope with the difference of regime between the data coming from the backplane and the one on board.

The present board is ready to be used for the complete Level-1 calorimeter trigger system integration test foreseen by the end of this year. The new CPM will be available for the combined test beam planned for next year.

#### REFERENCES

- ATLAS First Level Trigger Technical Design Report, CERN/LHCC/98-14 and ATLAS TDR-12, 30 June 1998. http://atlas.web.cern.ch/Atlas/GROUPS/DAQTRIG/TDR
- /tdr.html [2] "Prototype Cluster Processor Module for the ATLAS Level-1 Calorimeter Trigger", 8<sup>th</sup> Workshop on Electronics for LHC experiments, CERN/LHCC/2002-034, 11 October 2002, p.256
- [3] "Prototype Readout Module for the ATLAS Level-1 Calorimeter Trigger", 8<sup>th</sup> Workshop on Electronics for LHC experiments, CERN/LHCC/2001-034, 22 October 2001, p.263
- "One Size Fits All: Multiple Uses of Common Modules in the ATLAS Level-1 Calorimeter Trigger", 7<sup>th</sup> Workshop on Electronics for LHC experiments, CERN/LHCC/2001-034, 22 October 2001, p.253