.. _genetic_algorithm_optimization_tutorial: ===================================== Optimization with a Genetic Algorithm ===================================== A genetic algorithm (GA) has been implemented for global structure optimization within ase. The optimizer consists of its own module :mod:`ase.ga` which includes all classes needed for the optimizer. The method was first described in the supplemental material of | L. B. Vilhelmsen and B. Hammer | :doi:`Systematic Study of Au6 to Au12 Gold Clusters on MgO(100) F Centers Using Density-Functional Theory <10.1103/PhysRevLett.108.126101>` | Physical Review Letters, Vol. 108 (Mar 2012), 126101 and a full account of the method is given in | L. B. Vilhelmsen and B. Hammer | :doi:`A genetic algorithm for first principles global optimization of supported nano structures <10.1063/1.4886337>` | Journal of Chemical Physics, Vol 141, 044711 (2014) Any questions about how to use the GA can be asked at the mailing list. A Brief Overview of the Implementation ====================================== The GA relies on the ase.db module for tracking which structures have been found. Before the GA optimization starts the user therefore needs to prepare this database and appropriate folders. This is done through an initialization script as the one described in the next section. In this initialization the starting population is generated and added to the database. After initialization the main script is run. This script defines objects responsible for the different parts of the GA and then creates and locally relaxes new candidates. It is up to the user to define when the main script should terminate. An example of a main script is given in the next section. Notice that because of the persistent data storage the main script can be executed multiple times to generate new candidates. The GA implementation generally follows a responsibility driven approach. This means that each part of the GA is isolated into individual classes making it possible to put together an optimizer satisfying the needs of a specific optimization problem. This tutorial will use the following parts of the GA: * A population responsible for proposing new candidates to pair together. * A paring operator which combines two candidates. * A set of mutations. * A comparator which determines if two structures are different. * A starting population generator. Each of the above components are described in the supplemental material of the first reference given above and will not be discussed here. The example will instead focus on the technical aspect of executing the GA. A Basic Example =============== The user needs to specify the following three properties about the structure that needs to be optimized. * A list of atomic numbers for the structure to be optimized * A super cell in which to do the optimization. If the structure to optimize resides on a surface or in a support this supercell contains the atoms which should not be considered explicitly by the GA. * A box defining the volume of the super cell in which to randomly distribute the starting population. As an example we will find the structure of a :mol:`Ag_2Au_2` cluster on a Au(111) surface using the EMT optimizer. The script doing all the initialisations should be run in the folder in which the GA optimisation is to take place. The script looks as follows: .. literalinclude:: basic_example_create_database.py Having initialized the GA optimization we now need to actually run the GA. The main script running the GA consists of first an initialization part, and then a loop proposing new structures and locally optimizing them. The main script can look as follows: .. literalinclude:: basic_example_main_run.py The above script proposes and locally relaxes 20 new candidates. To speed up the execution of this sample the local relaxations are limited to 100 steps. This restriction should not be set in a real application. *Note* it is important to set the ``raw_score``, as it is what is being optimized (maximized). It is really an input in the ``atoms.info['key_value_pairs']`` dictionary. The GA progress can be monitored by running the tool ``ase/ga/tools/get_all_candidates`` in the same folder as the GA. This will create a trajectory file ``all_candidates.traj`` which includes all locally relaxed candidates the GA has tried. This script can be run at the same time as the main script is running. This is possible because the ase.db database is being updated as the GA progresses. Running the GA in Parallel ========================== One of the great advantages of a GA is that many structures can be relaxed in parallel. This GA implementation includes two classes which facilitates running the GA in parallel. One class can be used for running several single threaded optimizations simultaneously on the same compute node, and the other class integrates the GA into the PBS queuing system used at many high performance computer clusters. Relaxations in Parallel on the Same Computer -------------------------------------------- In order to relax several structures simultaneously on the same computer a separate script relaxing one structure needs to be created. Continuing the example from above we therefore create a script taking as input the filename of the structure to relax and which as output saves a trajectory file with the locally optimized structure. It is important that the relaxed structure is named as in this script, since the parallel integration assumes this file naming scheme. For the example described above this script could look like .. literalinclude:: ga_basic_calc.py The main script needs to initialize the parallel controller and then the script needs to be changed the two places where structures are relaxed. The changed main script now looks like .. literalinclude:: ga_basic_parallel_main.py Notice how the main script is not cluttered by the local optimization logic and is therefore now also easier to read. ``n_simul`` controls the number of simultaneous relaxations, and can of course also be set to 1 effectively giving the same result as in the non parallel situation. The ``relax`` method on the ``ParallelLocalRun`` class only returns control to the main script when there is an execution thread available. In the above example the relax method immediately returns control to the main script the first 4 times it is called, but the fifth time control is first returned when one of the first four relaxations have been completed. Running the GA together with a queing system ============================================ The GA has been implemented with first principles structure optimization in mind. When using for instance DFT calculations for the local relaxations relaxing one structure can take many hours. For this reason the GA has been made so that it can work together with queing systems where each candidate is relaxed in a separate job. With this in mind the main script of the GA can thus also be considered a controller script which every time it is invoked gathers the current population, checks with a queing system for the number of jobs submitted, and submits new jobs. For a typical application the main script can thus be invoked by a crontab once every hour. To run the GA together with a queing system the user needs to specify a function which takes as input a job name and the path to the trajectory file that needs to be submitted (the ``jtg`` function in the sample script below). From this the function generates a PBS job file which is submitted to the queing system. The calculator script specified in the jobfile needs to obey the same naming scheme as the sample calculator script in the previous section. The sample relaxation script given in the previous can be used as starting point for a relaxation script. Handling of the parallel logic is in this case in the main script. The parameter n_simul given to the ``PBSQueueRun`` object determines how many relaxations should be in the queuing system simultaneously. The main script now looks the following: .. literalinclude:: ga_basic_pbs_main.py Parameterising the GA search for structure screening ==================================================== Relaxing every candidate suggested by the GA is very inefficient. Many of these structures are poor suggestions and are immediately discarded when they are compared to the current population. For this reason it can be very effective to screen the candidate before relaxation to have a guess whether the candidate has a chance to enter the population or not. If this is not the case they can be rejected without the need for a costly DFT calculation. By doing this you could, for example, use a more drastic mutation resulting in both potentially very good but also very bad candidates without having to waste a lot of CPU power evaluating the poor suggestions. Parameterising the whole database of structures and relating the parameters for the individual structures to their DFT energy is one example of how to handle this. As the database of structures grows doing the GA search, the fit parameters and the guessed energy becomes more refined. As a result, the screening becomes more precise. Below is a sample script of how this method can be implemented and used. The script is a direct extension of the above tutorial. A number of predefined parameterising methods are available and its implementation is by no means restricted to the use of one of those. In the example a linear relationship is expected between every parameter and the DFT energy. The main script for the GA run hence could look like: .. literalinclude:: ga_basic_parameters.py