Material Discovery
All datasets
Actions | Name | Columns |
---|
In this form, you can configure and run machine learning experiments to discover interesting new materials.
For this purpose, given formulations are sorted according to their utility. The utility is obtained from
predicted target properties and a priori information.
To start the discovery, first configure the optimization target:
- Select a data set to work on from the table above using the blue checkmark buttons.
- Select the relevant features that define your material formulations (Input) in the left selection box.
- Select the target properties you want to predict in the center selection box.
- Select a-priori information that is relevant to the utility.
You can use Ctrl+Click or Shift+Click to select multiple properties in each selection box.
Next, configure the specific goals of the materials discovery:- For each target property, choose whether to maximize or minimize it. When maximizing, large values are preferred (e.g., for strengths). When minimizing, small and negative values are preferred (e.g., for costs).
- The weighting determines the factor of importance for each target in the utility assessment.
- You may specify required properties using the threshold:
- In the case of a priori information, formulations that do not meet the criteria, i.e. whose values are outside the threshold, are discarded and thus not considered.
- For target properties, the values are clamped to the threshold for the utility estimate. calculation. The predicted values shown in the results table below will remain unchanged.
- There are currently 4 machine learning models available: Gaussian Process Regression and Random Forest Regression, plus their variations that run Principal Component Analysis before. A statistics-based model can be selected, which is particularly suitable for relatively continuous data and simple data configurations. For instance, this model is particularly suitable at the beginning of an experimental campaign, when only a few laboratory data are available. The AI model is more powerful, but also requires more training data. You can use it for more complex formulations when there is already plenty of training data available (more than approximately twenty samples).
- The Gauss Process Regressor requires the targets to have at least one label. The Random Forest Regressor requires the targets to have at least 2 labels.
-
The curiosity value determines the factor by which the as-yet uncertain material predictions are
preferred. Preferring uncertain predictions can help to systematically gain knowledge.
At the beginning of an experimental campaign, you may increase the curiosity value to explore formulations.
Use smaller or negative curiosity if you think you have identified a desirable area of the
formulation space and want to refine ("exploit") your results by finding a nearby optimum. If you are not sure,
leave the value unchanged. You can read more information in the manual.
- When curiosity is greater than 0, predictions with greater uncertainties are favored (explore).
- If the curiosity is negative the uncertainty is subtracted instead, providing a penalty to materials with high uncertainties. This favors predictions with low uncertainty (exploit).