# Usage ## General information The framework requires two main components: - **Geant4 simulation** software to produce the training data. There are no requirements as to the user-interface of this software, nor to the output format, but each simulation must output only *one* file. The existence of this file marks the Simulation Task as completed, and the code continues to the next Task. It is possible to have intermediate results like `root` files but the final output with all the information that you want to provide to the Reconstruction Task must be in a single file. - **Reconstruction algorithm** that produces high-level variables from which a single loss term can be computed. In essence, the reconstruction is responsible for evaluating the performance of a given detector setup. The metric for this is left completely to the user, as long as it is monotonically falling with improving performance (to allow gradient descent by the aido optimizer). ## Entry point function The main function to run is: ```python aido.optimize( parameters: List[aido.SimulationParameter] | aido.SimulationParameterDictionary, user_interface: UserInterfaceBase, ) ``` ### Parameters The `parameters` argument determines what detectors parameters are to be optimized. Further information is found in the corresponding class docstring. ### Interface The argument `user_interface` must inherit from [`aido.UserInterfaceBase`](/api/aido.interface) ABC class and implement the following methods: - `simulate` This method calls the Geant4 simulation with a set of parameters. This allows maximal flexibility, as the simulation software can be called via bash scripts, python bindings or with macro files, depending on the specific implementation. The output file must be saved to the output path provided by the framework as a function argument. - `merge` This method must merge the outputs of the geant4 simulations into a single dataset. There is no enforced format for this file, but the file must be saved to the path specified by the framework. - `reconstruct` Start the algorithm / ML model that computes high-level variables from the Geant4 simulation outputs (based on the dataset generated by the `merge` method). The output file from this method must have a specific format: a pandas DataFrame with each row being a single event and with the following columns - **"Parameters":** Use the `aido.SimulationParameterDictionary.to_df` method to obtain a pandas DataFrame of the parameters (best used already in the `merge` method). - **"Context":** Additional information for each event (such as initial PID) - **"Reconstructed":** The *predicted* high-level quantities that the surrogate model will interpolate to enable gradient descent on the detector parameters. - **"Targets":** The *true* high-level quantities used to compute the Loss for the optimization. Use the original Monte Carlo information to fill this column. - `loss` An equivalent implementation of the reconstruction loss, in order for the optimizer to compute the Loss based on the prediction by the surrogate. ```{note} In future versions, we aim to avoid this approach and directly train the Loss distribution instead of the Targets. ``` Optional methods are `constraints` (for user-defined penalties) and `plot` (a convenient hook for plotting). ```{note} Since this framework is still under development, the methods `merge` and `reconstruct` are not yet completely disconnected from the formats used in the examples. In future versions, the user will not have to provide the "Parameters" column themselves, and the output format for the `reconstruct` will not be the final DataFrame saved to parquet but instead only the columns of interest. ``` ## Config The root directory includes a JSON File named `config.json`. You can change the values in this file to adjust the hyperparameters of the AIDO training. It is bound to a python dataclass [AIDOConfig](/api/aido.config) with the following fields: ```{admonition} Default Values - Optimizer: - optimizer.lr: float = 0.02 (>0) - optimizer.batch_size: int = 512 - optimizer.n_epochs: int = 40 - Surrogate: - surrogate.n_epoch_pre: int = 24 - surrogate.n_epochs_main: int = 40 - Simulation: - simulation.generate_scaling: float = 1.2 (>0) - simulation.sigma: float = 1.5 (>0) - simulation.sigma_mode: str = "flat" (or "scale") - Scheduler: - scheduler.training_num_retries: int = 20 - scheduler.training_delay_between_retries: int | float = 60 (in seconds) ``` `````{tip} If you want to reset the values or if the file was removed, call this module to restore it: ```python python3 aido/config.py ``` ````` ## Internal structure The pipeline for the optimization algorithm is handled by b2luigi. - **Wrapper**: - Accept geometry parameters with starting value, type, min, max, cost scaling coefficient, etc... - Write parameters to file for each iteration of the training loop - Call the class that generates new parameters in a given region (normally distributed) - Call the detector simulation and reconstruction for each set of parameters - **Detector simulation**: - Start several containers of the geant4 simulation using the executable provided by the user. - The starting parameters are written to a .json file. - This .json file must be used by the user to initialize the relevant parameters of the geometry in the simulation. The implementation of this step is left to the user, for example a script that converts this parameter dict to a geant4 macro file, function parameters or similar. - The simulation finally outputs a file which must be saved to the provided path. - **Reconstruction**: - An analysis program provided by the user which performs a physics analysis and returns a goodness metric (energy resolution, tracking accuracy, etc...). - **Optimization**: - Surrogate model is trained on the detector parameters of each geometry - Learns the expected physics performance in a given region of parameter space. - After finding a local minimum, the Optimizer model will propose a new region to explore in the detector geometry parameter space. - These parameters are passed back to the Wrapper handling the detector simulation, which will in turn start new jobs with the new parameters. - Through iteration, a set of optimized parameters are found, which are the final output of the program.