Minutes 09/05/00 (Birmingham)

Present: A. Baird, Y. Fleming, S. Kolya, D. Mercer, P. Newman, D. Sankey, A. Schoening, R. Staley

Project Status and Time Schedules

Andre' reported on the outcome of the May PRC meeting. The PRC have formally approved the level 2 feasibility document. Level 1 is accepted for the time being, since there is no reason to stop it. The PRC was not totally convinced by our preliminary feasibility report , but accepted that things are not yet well defined. We are asked to report back to the October PRC as explained below.

The time scheduling of the project was reviewed in light of recent developments. The main deadlines are ....
  • October 2000 (DESY PRC): Algorithm fully simulated in Quartus. Final decision on level 1 trigger viability.
  • January 2001: First prototypes for testing in UK.
  • March 2001 (HERA Machine studies): Prototypes installed at DESY.
  • June 2001 (Luminosity begins): Start of Mass Production.
  • December 2001: Full system operational.
It was agreed that we urgently need a much more detailed time schedule (Gant chart?), covering all aspects of the project. `Micro-milestones' should be better defined by working back from the completion date and making reasoned estimates of how long everything will take. Dave was asked to try and produce such a chart in consultation with all concerned.

Front End Module Design

It was strongly felt that design of the Front End Module should begin more or less immediately. - This implies that the level 1 trigger design (fairly undefined so far) should be kept as separate as possible from the ADC and segment finding (fairly well defined). This probably implies that the generic module cannot be used for the level 1 trigger functionality.
The necessary size of FPGA and the I/O are basically known for the the Q/t, segment finding etc and much of the work can be done without detailed knowledge of the algorithm. It should therefore be possible to start first design / schematics already. It is estimated that this will take 6 months. Adam said that it should be possible to get going very soon, but first Brian Claxton has to finish other tasks. The end of May was given as a speculative date for Brian to become available.

Adam explained the present plan for the FEM and much discussion ensued (see section below on System Architecture). The following (at least!) became clear by the end of the meeting.
  • Input in analogue form through front of module.
  • Each module contains 30 ADCs working at 80 MHz (8 bits).
  • A farm of FPGAs sits behind the ADCs. We expect to use 1 FPGA (20k600) per triplet of wires.
  • All digital data I/O internal to the L1 FTT will be moved accross the backplane.
  • An extra FPGA may be needed to control the readout etc.
  • Two parallel LVDS outputs will feed the level 1 trigger and the level 2 FTT respectively.

Algorithm Simulation

Yves and Richard have been working on simulations of various parts of the algorithm using the Altera Quartus software. There is presently a problem with the Windows NT PCs which are running the software. - They regularly crash with a parity exception when running the software. This is being investigated and should not pose a long term problem.
It is still felt that the Quartus / VHDL design is likely to be relatively trivial compared to finalising the algorithm concept. Even so, at least Yves should attend a course in Quartus design etc. Yves (and Richard?) were asked to look into available courses and decide which they think would be most useful.

System Architecture

The overall system architecture was discussed at length, starting from an update of the diagram from Adam showing a possible crate layout and necessary bus connections. This scheme is based on the concept of a distributed level 1 trigger - the system is divided into 6 regions in phi and the segments within a phi region are moved over a dedicated custom backplane into a local trigger module (same card as the FEM?). The local trigger module builds a (16 x 10) Kappa-Phi histogram, which is then passed to a further card for final trigger decisions, where the full (16 x 60) histogram is built. This scheme ensures that the necessary high data flow only takes place in local regions and over short distances (we estimated a maximum of around 1 GByte / second to pass all information from one phi region to the trigger module at each bunch crossing). Connections between the different phi sectors would only be needed to pass neighbouring shift registers and the information from the edgemost phi bin for the trigger. This could be done by linking all modules within a single trigger layer (e.g. all CJC2 modules) in a ring on separate buses.

Though the distributed trigger idea is appealing from the point of view of minimising data movement, there are also disadvantages. In particular, it was felt that the proposed re-use of the generic module for the trigger as well as the FEM would unnecessarily delay the design of the FEM. The required custom backplane in this scenario was also felt to be a bit awkward. It was therefore decided that we should proceed directly with the FEM design and separately try to deal with the large data flows involved in accumulating all 20 MHz track segments in one place for the trigger module. We have previously thought about solutions involving merger cards . It may also be possible to use the algorithm (maybe even the final design) that has already been propsed to solve the same problem at level 2 .

Algorithm Design

A `minimal path' was defined, ensuring that the level 1 trigger problem does not interfere with the more fundamental task of providing track segments to level 2. For the first design at least, we therefore assume that ....
  • The main segment finding task (20MHz and 80MHz) does not begin until AFTER L1Keep.
  • A level 1 trigger is still planned in which the 20 MHz segment finding takes place `on the fly' (pivot elements etc). This may or may not use the same FPGA space as the post L1Keep segment finding.
The first priority is therefore to build a working algorithm to perform the 20MHz and 80MHz segment finding based on frozen shift registers with the bunch crossing of origin known a priori. This should later be generalised to also deal with 20MHz segment finding prior to L1Keep.

Andre' introduced his ideas for a generalisation of the solution proposed in the PRC feasibility study , such that a similar algorithm takes place on the fly to produce segments for the level 1 trigger. His talk is here and the block diagram of the algorithm is here . In this scheme, a full shift register analysis is performed at each bunch-crossing, so the pivot layer technique is not used.
Each shift register is divided into two (or more if necessary) and each combination of half registers from the three layers forms the input to a CAM (<=31 bits). The geometry probably makes some of the half-register combinations safely ignorable. An unencoded scan forms the input for the level 1 trigger algorithm. Encoded mode is used to pass the information on to the 80MHz refinement step. The level 1 algorithm looks very quick in this scenario. It would probably be the responsibility of the level 1 trigger decision unit to determine the event T0 (e.g. by finding the peak in a histogram of number of valid segments against time). It would therefore not be necessary to pass information on the duration of a valid segment around. - All valid segments are found at each clock cycle.
The multiple match mode can handle 2 or more valid signals in the same CAM, but it is probably still desireable to implement some load ballanceing in the CAMs by carefully choosing which register elements get sent to which CAM. To help decide this, it would be very useful to make a table of which valid masks correspond to which kappa-phi values. Yves was asked to produce this as an extention to his mask generation code.
In any case, this proposal seems to be a fairly natural extention to the post L1Keep plan. It therefore sits nicely as something to investigate once the basic post L1Keep plan is simulated in Quartus.

Andre also pointed out an error from the PRC feasibility study document. When estimating the numbers of valid masks in the shift registers (table 1), we gave the numbers of distinct valid patterns rather than the total numbers of valid masks. - These were clearly the wrong numbers for the post L1Keep frozen shift register `minimal' solution (slapped wrist for Paul!) The number of valid masks thus increases by typically a factor of 3 compared to our estimates in the feasibility study. - We still expect to fit the algorithm into one 20k600 FPGA per wire triplet, but the gate usage now looks like 80-90%, which starts to look a little more tricky!

Next meetings

No dates have yet been defined for future meetings, though we clearly need to spend a longer period discussing the issues above as soon as possible. Dave is asked to arrange next meeting dates in parallel with defining the detailed project schedule. As things become better defined, it should eventually be possible to have smaller scale meetings on specific aspects of the project.

Compiled by P. Newman, 11/5/00