.. _fcc_alloys_tutorial: ============================================== Genetic algorithm Search for stable FCC alloys ============================================== In this tutorial we will emulate an older paper [Johannesson]_ and determine the most stable FCC alloy using the genetic algorithm. Since the purpose is only the tutorial we will limit the phase space to the elements supported by the :mod:`EMT potential `. The search is also equivalent to the recent search for mixed metal ammines with superior properties for ammonia storage described here: | P. B. Jensen, S. Lysgaard, U. J. Quaade and T. Vegge | :doi:`Designing Mixed Metal Halide Ammines for Ammonia Storage Using Density Functional Theory and Genetic Algorithms <10.1039/C4CP03133D>` | Physical Chemistry Chemical Physics, Vol **16**, No. 36, pp. 19732-19740, (2014) .. contents:: Basic outline of the search =========================== 1. Choose the phase space of your problem. Is the number of possible individuals large enough to prevent a full screening and is the fitness function too discontinuous for a traditional optimization by derivation? If so continue. 2. Choose model structures and calculate references in those structures. Put the results somewhere accesible for a script initiated by the genetic algorithm. 3. Choose suitable parameters like population size (general rule of thumb for the population size: `log_2(N)` < pop size < `2log_2(N)`, where `N` is the size of the phase space), convergence criteria etc. 4. Create the initial population. 5. Choose procreation operators, i.e. how should offspring be produced. New operators can easily be created by modifying the existing operators. 6. Run the algorithm. Here we would like to predict the most stable fcc alloys. In this tutorial we only have the :mod:`ase.calculators.emt` available thus we are limited to the supported metal elements: Al, Ni, Cu, Pd, Ag, Pt and Au. We limit ourselves to at most 4 different metals in one structure, thereby having only `7^4 = 2401` candidates in the phase space, symmetry would make this number even lower but the number is fitting for this tutorial. For a real application of the algorithm it is necessary to use a more sophisticated calculator, in that case each individual calculation is performed on a cluster by submitting to a queuing system. How this is achieved in the algorithm is covered in :ref:`genetic_algorithm_optimization_tutorial`. .. defined for an alloy :mol:`ABC_2`: A + B + 2C -> :mol:`ABC_2` as: `\Delta H_f = E_{ABC2} - E_A - E_B - 2E_C` .. _references: Setting up reference database ============================= Now we need to set up a database in which reference calculations can be stored. This can either be in a central database server where keywords distinguish between different references or dedicated separate databases for each different type of reference calculations. In the following script, :download:`ga_fcc_references.py`, we put the references in the database file *refs.db*. Our model structure is fcc which is loaded with :func:`ase.lattice.cubic.FaceCenteredCubic`. We perform a volume relaxation to find the optimal lattice constant and lowest energy, which we save in the database as key-value pairs for quick retrieval. .. literalinclude:: ga_fcc_references.py Initial population ================== We choose a population size of 10 individuals and create the initial population by randomly selecting four elements for each starting individual. .. literalinclude:: ga_fcc_alloys_start.py Note how we add the population size and metals as extra key-value pairs when we create the database *fcc_alloys.db*. We can then retrieve these parameters later when running the main script to avoid having to input the same parameters twice. We can study our initial population by doing (on the command-line):: $ ase db fcc_alloys.db -c +atoms_string the term ``atoms_string`` determines the order in which the elements are put into the model structure. So it is possible to fully describe an individual by just providing the ``atoms_string``. .. _`main script`: Run the algorithm ================= The following script runs the algorithm, also find it here: :download:`ga_fcc_alloys_main.py`. *Note* that the `relaxation script`_ is imported from an external file :download:`ga_fcc_alloys_relax.py`. .. literalinclude:: ga_fcc_alloys_main.py In this script we run a generational GA as opposed to the pool GA outlined in :ref:`genetic_algorithm_optimization_tutorial`. This is achieved by having two for-loops; the innermost loop runs the number of times specified by the population size it corresponds to one generation. The outermost loop runs as many generations as specified in ``num_gens``. The function ``pop.update()`` is called after the innermost loop has finished thereby only adding individuals to the population after a whole generation is calculated. After each generation is finished the population is printed to the screen so we can follow the evolution. The calculated individuals are continuously added to ``fcc_alloys.db``, we can evaluate them directly by doing from the command line (in another shell instance if the GA is still running):: $ ase db fcc_alloys.db -c +atoms_string,raw_score,generation,hof -s raw_score *Note:* When reading the database using ``ase db``, it might be necessary to increase the number of shown entries, e.g. ``ase db fcc-alloys.db --limit N``, where ``N`` is the number of entries to show (as default only the first 20 entries are shown, ``--limit 0`` will show all. For further info use ``ase db --help``, or consult the :ref:`ase-db` manual). To prevent clutter we import the relax function from the following script: .. _`relaxation script`: .. literalinclude:: ga_fcc_alloys_relax.py The relaxation script is naturally similar to the script we used to calculate the references_. *Note* that the global optimum is :mol:`PtNi_3` with a -0.12 eV heat of formation, whereas the second worst alloy is :mol:`AlNi_3` heat of formation 0.26 eV. This result is in complete contrast to the conclusion obtained in [Johannesson]_, where :mol:`AlNi_3` is the most stable alloy within the phase space chosen here. Obviously there is a limit to the predictive power of EMT! Extending the algorithm ======================= There are different ways one can extend the algorithm and make it more complex and sophisticated, all employed in [Jensen]_: Extra mutation operators ------------------------ Instead of only using random operations we can include some that mutates elements to other elements nearby in the periodic table:: from ase.ga.element_mutations import RandomElementMutation from ase.ga.element_mutations import MoveDownMutation from ase.ga.element_mutations import MoveUpMutation from ase.ga.element_mutations import MoveLeftMutation from ase.ga.element_mutations import MoveRightMutation from ase.ga.element_crossovers import OnePointElementCrossover ... oclist = ([4,1,1,1,1,8], [RandomElementMutation([metals]), MoveDownMutation([metals]), MoveUpMutation([metals]), MoveLeftMutation([metals]), MoveRightMutation([metals]), OnePointElementCrossover([metals])]) operation_selector = OperationSelector(*oclist) These operators takes advantage of the fact that chemically like elements (close in the periodic table) exhibit similar properties and the substitution of one to a chemically similar elements could refine the properties of an alloy in the population. A natural extension of these operators would be to use a different ordering of the elements than the periodic table; e.g. Pettifor chemical scale, electronegativity, etc. Note how we have set the probabilities for selecting operators differently. The probability for ``RandomElementMutation`` is equal to the sum of the *move* mutations. Similarly the probability of ``OnePointElementCrossover`` is equal to the sum of all the mutation operators. This is to prevent the search from being purely local. Prevent identical calculations from being performed --------------------------------------------------- In the current `main script`_ there is no check to determine whether an identical calculation has been performed, this is easy to check in this regime where model structures are used and we can just use the ``atoms_string``. We insert the following in the inner loop:: for i in range(population_size): dup = True while dup: a1, a2 = pop.get_two_candidates(with_history=False) op = operation_selector.get_operator() a3, desc = op.get_new_individual([a1, a2]) dup = db.is_duplicate(atoms_string=''.join(a3.get_chemical_symbols())) Since the fcc model structure is completely symmetric we could compare sorted versions of the ``atoms_string``, thereby ruling out individuals containing the same elements in different order. Reuse of calculations between algorithm runs -------------------------------------------- Since genetic algorithms are inherently random in nature one can never be sure to obtain the global minimum with only one algorithm run, it is customary to perform more runs and check that the results agree. In this case it is vital to be able to reuse identical calculations between runs. We do the following from the command line to create a new database file containing only the relaxed structures:: $ ase db fcc_alloys.db relaxed=1 -i all_relaxed.db We subsequently add this to the `relaxation script`_:: def relax(input_atoms, ref_db): atoms_string = input_atoms.get_chemical_symbols() relaxed_db = connect('all_relaxed.db') save_relax = True try: dct = relaxed_db.get(atoms_string=''.join(atoms_string)) except KeyError: # Open connection to the database with reference data db = connect(ref_db) # Omitting lines up to the point where hof has been calculated ... else: hof = dct.hof latticeconstant = dct.latticeconstant save_relax = False # Place the calculated parameters in the info dictionary of the # input_atoms object ... # Put this at the very end if save_relax: relaxed_db.write(input_atoms,relaxed=1, key_value_pairs=input_atoms.info['key_value_pairs']) Before the actual calculation is performed ``all_relaxed.db`` is checked to see if it has been calculated before; if so we just collect the heat of formation, but if not we do the calculation and save it directly to ``all_relaxed.db``. *Note:* this addition assumes that `Prevent identical calculations from being performed`_. .. [Johannesson] G. Jóhannesson, T. Bligaard, A. Ruban, H. Skriver, K. Jacobsen and J. Nørskov. Combined Electronic Structure and Evolutionary Search Approach to Materials Design, Phys. Rev. Lett., Vol **88**, No. 25, pp. 1-5 (2002) .. [Jensen] P. B. Jensen, S. Lysgaard, U. J. Quaade and T. Vegge. Designing Mixed Metal Halide Ammines for Ammonia Storage Using Density Functional Theory and Genetic Algorithms Phys. Chem. Chem. Phys., Vol **16**, No. 36, pp. 19732-19740, (2014)