###################################### 08 Visualizations of Model Performance ###################################### *Published: June, 2024, ATOM DDM Team* Please check out the companion tutorial video: |youtube-image| .. |youtube-image| image:: ../_static/img/youtube_icon.png :alt: Tutorial 08 Visualization Model Performance :target: https://www.youtube.com/watch?v=D29yObV8AYI ------------ In this tutorial, we will use some of the tools provided by `AMPL `_ to visualize the model training process and the performance of the final model. Some of the tools we'll apply here are only applicable to certain classes of models; as we go along we will indicate where each function can be applied. The tutorial will present the following functions, all from the ``perf_plots`` module: - `plot_perf_vs_epoch `_ - `plot_pred_vs_actual `_ - `plot_confusion_matrices `_ - `plot_model_metrics `_ - `plot_ROC_curve `_ - `plot_prec_recall_curve `_ We will use the same training dataset and **scaffold split** as in **Tutorial 3, "Train a Simple Regression Model**", to create some **neural network models** and visualize their iterative training process. Later, we'll generate a binary classification dataset based on the same data, so we can train **classification models** and show the visualizations that are specific to classifiers. For starters, let's import some standard packages and modules: .. code:: ipython3 import pandas as pd import numpy as np import json # Set for less chatty log messages import logging logger = logging.getLogger('ATOM') logger.setLevel(logging.INFO) from atomsci.ddm.pipeline import model_pipeline as mp from atomsci.ddm.pipeline import parameter_parser as parse from atomsci.ddm.pipeline import perf_plots as pp Visualizing the Training Process for a Neural Network Regression Model ********************************************************************** When you train a **neural network model**, `AMPL `_ makes a series of iterations through the entire training subset of your curated dataset; each iteration is called an **epoch**. At the end of each epoch, `AMPL `_ saves the model parameters (i.e., the weights of each network connection) in a checkpoint file. It then computes and stores a set of metrics describing the model's performance at that stage of training: the :math:`R^2` value for regression models and the **ROC AUC** for classification models. By default these metrics are used to select the epoch yielding the best validation set performance; but you may choose a different metric by setting the ``model_choice_score_type`` parameter. The metrics are evaluated separately for the training, validation and test subsets. Generally, the training set metrics continue to improve with more epochs, while the validation and test set metrics reach a peak and then decline, as the model becomes overfitted to the training subset. The function ``plot_perf_vs_epoch`` allows you to visualize this process. The code below will train a simple fully-connected neural network on the `SLC6A3 `_ dataset. By default, `AMPL `_ uses an **early stopping** algorithm to terminate training if the chosen validation set metric peaks and does not improve further after a certain number of epochs, set by the ``early_stopping_patience`` parameter. Here we tell `AMPL `_ to optimize the **root mean squared error (RMSE)** rather than :math:`R^2`, to train for up to 100 epochs, and to stop training if the **RMSE** does not improve for 20 epochs after reaching a minimum. .. code:: ipython3 dataset_file = 'dataset/SLC6A3_Ki_curated.csv' output_dir='dataset/SLC6A3_models' response_col = "avg_pKi" id_col = "compound_id" smiles_col = "base_rdkit_smiles" split_uuid = "c35aeaab-910c-4dcf-8f9f-04b55179aa1a" params = { # dataset info "dataset_key": dataset_file, "id_col": id_col, "smiles_col": smiles_col, "response_cols": response_col, # splitting "previously_split": "True", "split_uuid" : split_uuid, "splitter": 'scaffold', "split_valid_frac": "0.15", "split_test_frac": "0.15", # featurization "featurizer": "computed_descriptors", "descriptor_type" : "rdkit_raw", # model training parameters "model_type": "NN", "prediction_type": "regression", "layer_sizes": "128,32", "dropouts": "0.2, 0.2", "max_epochs": "100", "early_stopping_patience": "20", "model_choice_score_type": "rmse", "verbose": "True", "result_dir": output_dir, "verbose": "True", } ampl_param = parse.wrapper(params) regr_pipe = mp.ModelPipeline(ampl_param) regr_pipe.train_model() .. parsed-literal:: ['dataset/SLC6A3_models/SLC6A3_Ki_curated/NN_computed_descriptors_scaffold_regression/362b134a-924b-4549-a341-cffb5ba36757/model/checkpoint1.pt', 'dataset/SLC6A3_models/SLC6A3_Ki_curated/NN_computed_descriptors_scaffold_regression/362b134a-924b-4549-a341-cffb5ba36757/model/checkpoint2.pt', 'dataset/SLC6A3_models/SLC6A3_Ki_curated/NN_computed_descriptors_scaffold_regression/362b134a-924b-4549-a341-cffb5ba36757/model/checkpoint3.pt', 'dataset/SLC6A3_models/SLC6A3_Ki_curated/NN_computed_descriptors_scaffold_regression/362b134a-924b-4549-a341-cffb5ba36757/model/checkpoint4.pt', 'dataset/SLC6A3_models/SLC6A3_Ki_curated/NN_computed_descriptors_scaffold_regression/362b134a-924b-4549-a341-cffb5ba36757/model/checkpoint5.pt'] dataset/SLC6A3_models/SLC6A3_Ki_curated/NN_computed_descriptors_scaffold_regression/362b134a-924b-4549-a341-cffb5ba36757/model/checkpoint1.pt We now use the ``plot_perf_vs_epoch`` function to show how the performance metrics change during training: .. code:: ipython3 pp.plot_perf_vs_epoch(regr_pipe) .. image:: ../_static/img/08_visualization_files/08_visualization_7_0.png The vertical dashed lines indicate the epoch at which the validation set **RMSE** was minimized; the parameters retrieved from the checkpoint file for this epoch are the ones saved in the model file. When the model is trained to optimize the default score type (:math:`R^2` or **ROC AUC**), only the left hand plot is drawn. Note that the epoch with the maximum :math:`R^2` may or may not be the same as the one that minimizes **RMSE**. .. note:: *The "pipe" argument to "plot_perf_vs_epoch" is a "ModelPipeline" object for a model you have trained in your current Python session; it doesn't work with a previously saved model that you've loaded using a function like "create_prediction_pipeline_from_file"*. Comparing Predicted with Actual Values by Split Subset ****************************************************** There are times when a single number like :math:`R^2` or **RMSE** is not enough to give you a feeling for how well your model is performing (or more importantly, where it is failing). For this reason, `AMPL `_ provides a function to produce a scatterplot of predicted vs actual values for each split subset, as shown below. .. code:: ipython3 pp.plot_pred_vs_actual(regr_pipe) .. image:: ../_static/img/08_visualization_files/08_visualization_11_0.png The plots highlight a couple of interesting features of the training dataset. First, the vertical lines of points with actual value 5 represent censored data, where the :math:`K_i` values were reported as "> 10 µM" because the maximum concentration tested did not allow higher :math:`K_i` values to be measured precisely. Second, you'll note that higher :math:`K_i` values tend to be underpredicted and lower :math:`K_i`'s are overpredicted, even for the training subset. This suggests that model performance could be improved by further hyperparameter optimization. As with ``plot_perf_vs_epoch``, the ``plot_pred_vs_actual`` function only works with "live" ``ModelPipeline`` objects trained in the current Python session. However, there is an alternative version of this function specifically for saved models. We'll try out this function on the best **random forest** model from the hyperparameter searches performed in **Tutorial 5, "Hyperparameter Optimization"**: .. code:: ipython3 pp.plot_pred_vs_actual_from_file('dataset/SLC6A3_models/SLC6A3_Ki_curated_model_9b6c9332-15f3-4f96-9579-bf407d0b69a8.tar.gz') .. image:: ../_static/img/08_visualization_files/08_visualization_13_1.png The points predicted by the optimized RF model are indeed closer to the identity line, as one would expect from the higher :math:`R^2` scores. Although the lower :math:`K_i` values are still overpredicted in the validation and test sets, the spread of predicted values above the identity line is much reduced. Visualizations of Classification Model Performance ************************************************** Classification models are trained to assign compounds to one of a set of discrete, often binary classes: active/inactive, agonist/antagonists of particular receptors, etc. They are evaluated using different performance metrics than regression models; in most cases these call for completely different visualization tools. In this section of the tutorial, we will construct a binary classification dataset, train a model against it, and use it to demonstrate some of the visualizations provided by `AMPL `_ specifically for classification models. To create a binary classification dataset, we will simply add a column called 'active' to the `SLC6A3 `_ :math:`K_i` dataset containing "1" for compounds with :math:`pK_i \ge 8` and "0" for all others: .. code:: ipython3 dset_df = pd.read_csv('dataset/SLC6A3_Ki_curated.csv') dset_df['active'] = [int(Ki >= 8) for Ki in dset_df.avg_pKi.values] classif_dset_file = 'dataset/SLC6A3_classif_pKi_ge_8.csv' dset_df.to_csv(classif_dset_file, index=False) dset_df.active.value_counts() .. parsed-literal:: active 0 1597 1 222 Name: count, dtype: int64 Note that we have purposely created an imbalanced dataset, with many more inactive than active compounds. This provides us an opportunity to apply some of the tools `AMPL `_ supplies to deal with this common situation. Next we will split the dataset by scaffold: .. code:: ipython3 output_dir='dataset/SLC6A3_models' params = { # dataset info "dataset_key" : classif_dset_file, "response_cols" : "active", "id_col": "compound_id", "smiles_col" : "base_rdkit_smiles", "result_dir": output_dir, # splitting "split_only": "True", "previously_split": "False", "splitter": 'scaffold', "split_valid_frac": "0.15", "split_test_frac": "0.15", # featurization & training params "featurizer": "ecfp", } pparams = parse.wrapper(params) split_pipe = mp.ModelPipeline(pparams) split_uuid = split_pipe.split_dataset() It is often a good idea, especially with imbalanced datasets, to check that the class proportions are similar between the split subsets. The function ``plot_split_subset_response_distrs``, which we encountered in **Tutorial 2, "Splitting Datasets for Validation and Testing"**, provides a way to do this. Note that when the ``prediction_type`` parameter is set to ``classification``, the function produces a bar graph rather than a density plot: .. code:: ipython3 import atomsci.ddm.utils.split_response_dist_plots as srdp split_params = { "dataset_key" : classif_dset_file, "smiles_col" : "base_rdkit_smiles", "prediction_type": "classification", "response_cols" : "active", "split_uuid": split_uuid, "splitter": 'scaffold', } srdp.plot_split_subset_response_distrs(split_params) .. image:: ../_static/img/08_visualization_files/08_visualization_20_0.png The proportion of actives is fairly even across the split subsets. We will check later to see if the higher percentage of actives in the training set causes the model to predict too many false positives. Now we will train a **neural network** to predict compound classes using `ECFP `_ fingerprints as features: .. code:: ipython3 params = { # dataset info "dataset_key" : classif_dset_file, "response_cols" : "active", "id_col": "compound_id", "smiles_col" : "base_rdkit_smiles", "result_dir": output_dir, # splitting "split_uuid": split_uuid, "previously_split": "True", "splitter": 'scaffold', "split_valid_frac": "0.15", "split_test_frac": "0.15", # featurization & training params "featurizer": "ecfp", "prediction_type": "classification", "model_type": "NN", "layer_sizes": "128,64", "dropouts": "0.3,0.3", "learning_rate": "0.0002", "max_epochs": "100", "early_stopping_patience": "20", "verbose": "True", } pparams = parse.wrapper(params) classif_pipe = mp.ModelPipeline(pparams) classif_pipe.train_model() .. parsed-literal:: ['dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/5aae26e9-1bbd-4f6c-8662-c7baae078bee/model/checkpoint1.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/5aae26e9-1bbd-4f6c-8662-c7baae078bee/model/checkpoint2.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/5aae26e9-1bbd-4f6c-8662-c7baae078bee/model/checkpoint3.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/5aae26e9-1bbd-4f6c-8662-c7baae078bee/model/checkpoint4.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/5aae26e9-1bbd-4f6c-8662-c7baae078bee/model/checkpoint5.pt'] dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/5aae26e9-1bbd-4f6c-8662-c7baae078bee/model/checkpoint1.pt As we did before for a regression model, we use the function ``plot_perf_vs_epoch`` to display the changes in the default performance metric over successive epochs of training. In this case only one plot is drawn because we are using the default metric (**ROC AUC**) evaluated on the validation set to decide when to stop training. .. code:: ipython3 pp.plot_perf_vs_epoch(classif_pipe) .. image:: ../_static/img/08_visualization_files/08_visualization_24_0.png Note that the validation set **ROC AUC** peaked at only 13 epochs, at around 0.88. Although this seems at first glance like a good result, we need to remind ourselves that our dataset is highly unbalanced, with 1597 inactives and 222 actives. Therefore, a 'dumb' classifier that predicts every compound to be inactive will be correct, on average, 1597/(1597+222) = 88% of the time. We need to look at some other metrics to see if our model is doing any better than a dumb classifier. First, we will plot a `confusion matrix `_ for each split subset. A confusion matrix is simply a table that shows the numbers of compounds with each possible class that are predicted to belong to that class and each other class. `AMPL `_ provides the function ``plot_confusion_matrices`` to draw the confusion matrix for each subset: .. code:: ipython3 pp.plot_confusion_matrices(classif_pipe) .. image:: ../_static/img/08_visualization_files/08_visualization_26_0.png The ``confusion matrices`` show that the model is behaving not much differently from a dumb classifier. In the validation set, it predicts the inactive class 97% of the time, even though inactives are only 88% of the compounds. `AMPL `_ calculates many other metrics for classification models, which may provide additional insight into how a model is performing. We can display a barplot of metric values for each subset using the function ``plot_model_metrics``. For an unbalanced dataset, the `precision and recall `_ metrics are far more sensitive indicators of performance than accuracy or **ROC AUC**. Here the accuracy is about 0.9, about what would be expected from a dumb classifier, for all 3 subsets; while the validation set precision and recall are 100% and 21% respectively. We can also see this from the confusion matrix: all of the predicted actives are indeed active; but only 6/28 of the true actives are predicted to be active. .. code:: ipython3 pp.plot_model_metrics(classif_pipe, plot_size=8) .. image:: ../_static/img/08_visualization_files/08_visualization_28_0.png Given the rather mediocre recall performance of our model, we would like to try training a new model that has better recall without sacrificing too much precision. One way to do this is to change the ``model_choice_score_type`` parameter to optimize the number of training epochs for a metric that balances precision and recall. `Balanced accuracy `_ and the `Matthews correlation coefficient (MCC) `_ are two such metrics often used for this purpose. We'll try out using the ``MCC``, with all other parameters left the same. .. code:: ipython3 params = { # dataset info "dataset_key" : classif_dset_file, "response_cols" : "active", "id_col": "compound_id", "smiles_col" : "base_rdkit_smiles", "result_dir": output_dir, # splitting "split_uuid": split_uuid, "previously_split": "True", "splitter": 'scaffold', "split_valid_frac": "0.15", "split_test_frac": "0.15", # featurization & training params "featurizer": "ecfp", "prediction_type": "classification", "model_type": "NN", "layer_sizes": "128,64", "dropouts": "0.3,0.3", "learning_rate": "0.0002", "max_epochs": "100", "early_stopping_patience": "20", "verbose": "True", "model_choice_score_type": "mcc", } pparams = parse.wrapper(params) mcc_pipe = mp.ModelPipeline(pparams) mcc_pipe.train_model() pp.plot_perf_vs_epoch(mcc_pipe) .. parsed-literal:: ['dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ee6a8fbb-c3f3-4a17-84c1-ffa0ad75a703/model/checkpoint1.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ee6a8fbb-c3f3-4a17-84c1-ffa0ad75a703/model/checkpoint2.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ee6a8fbb-c3f3-4a17-84c1-ffa0ad75a703/model/checkpoint3.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ee6a8fbb-c3f3-4a17-84c1-ffa0ad75a703/model/checkpoint4.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ee6a8fbb-c3f3-4a17-84c1-ffa0ad75a703/model/checkpoint5.pt'] dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ee6a8fbb-c3f3-4a17-84c1-ffa0ad75a703/model/checkpoint1.pt .. image:: ../_static/img/08_visualization_files/08_visualization_30_1.png Note that the maximum validation set MCC is achieved at epoch 11, while the **ROC AUC** is maximized much later at epoch 15. In general, the metric selected for ``model_choice_score_type`` has a much greater impact for classification models than for regression models. Now let's look at the performance metrics for the MCC-optimized model: .. code:: ipython3 pp.plot_model_metrics(mcc_pipe, plot_size=8) .. image:: ../_static/img/08_visualization_files/08_visualization_32_0.png We see that the recall is improved slightly, from 0.21 to about 0.30; while the precision has dropped from 1.0 to 0.6. This may be acceptable or not, depending on your situation. Do you want to minimize the cost of synthesizing and testing compounds that may turn out to be false positives? Or do you want to minimize the chance that your model will overlook a potential blockbuster drug? The numerous selection metrics supported by `AMPL `_ give you flexibility to tailor model training according to your priorities. As an aside, `SLC6A3 `_ provides another option for dealing with unbalanced classification datasets: the ``weight_transform_type`` parameter. Setting this parameter to "balancing" changes the way the cost function to be minimized during training is calculated so that compounds belonging to the minority class are given higher weight in the cost function. This modification eliminates the incentive for classifiers to always predict the majority class. This parameter can be combined with the ``model_choice_score_type`` parameter to yield different effects on the precision and recall metrics: .. code:: ipython3 params = { # dataset info "dataset_key" : classif_dset_file, "response_cols" : "active", "id_col": "compound_id", "smiles_col" : "base_rdkit_smiles", "result_dir": output_dir, # splitting "split_uuid": split_uuid, "previously_split": "True", "splitter": 'scaffold', "split_valid_frac": "0.15", "split_test_frac": "0.15", # featurization & training params "featurizer": "ecfp", "prediction_type": "classification", "model_type": "NN", "layer_sizes": "128,64", "dropouts": "0.3,0.3", "learning_rate": "0.0002", "max_epochs": "100", "early_stopping_patience": "20", "verbose": "True", "model_choice_score_type": "mcc", "weight_transform_type": "balancing", } pparams = parse.wrapper(params) mcc_wts_pipe = mp.ModelPipeline(pparams) mcc_wts_pipe.train_model() pp.plot_model_metrics(mcc_wts_pipe, plot_size=8) .. parsed-literal:: ['dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ffe7fda2-5c4e-4e7d-9fef-8bb3e4729f92/model/checkpoint1.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ffe7fda2-5c4e-4e7d-9fef-8bb3e4729f92/model/checkpoint2.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ffe7fda2-5c4e-4e7d-9fef-8bb3e4729f92/model/checkpoint3.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ffe7fda2-5c4e-4e7d-9fef-8bb3e4729f92/model/checkpoint4.pt', 'dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ffe7fda2-5c4e-4e7d-9fef-8bb3e4729f92/model/checkpoint5.pt'] dataset/SLC6A3_models/SLC6A3_classif_pKi_ge_8/NN_ecfp_scaffold_classification/ffe7fda2-5c4e-4e7d-9fef-8bb3e4729f92/model/checkpoint1.pt .. image:: ../_static/img/08_visualization_files/08_visualization_34_1.png The new model trained using both parameters has even better recall, at the cost of a small reduction in precision. Incidentally, the detailed metrics underlying the plots above can be obtained as a nested dictionary using the function ``get_metrics_from_model_pipeline``: .. code:: ipython3 metrics_dict = pp.get_metrics_from_model_pipeline(mcc_wts_pipe) print(json.dumps(metrics_dict, indent=4)) .. parsed-literal:: { "active": { "train": { "roc_auc": 0.9839738357222929, "prc_auc": 0.8866116456224803, "accuracy": 0.9269442262372348, "precision": 0.6442687747035574, "recall": 0.9819277108433735, "bal_accuracy": 0.9503134489176217, "npv": 0.9970588235294118, "cross_entropy": 0.17187009506671735, "kappa": 0.7365568068786424, "MCC": 0.759997950847689, "confusion_matrix": [ [ [ 1017, 90 ], [ 3, 163 ] ] ] }, "valid": { "roc_auc": 0.8443148688046648, "prc_auc": 0.48576226827635516, "accuracy": 0.8827838827838828, "precision": 0.4411764705882353, "recall": 0.5357142857142857, "bal_accuracy": 0.7290816326530611, "npv": 0.9456066945606695, "cross_entropy": 0.32558061545729045, "kappa": 0.4184529356943151, "MCC": 0.4209629887651163, "confusion_matrix": [ [ [ 226, 19 ], [ 13, 15 ] ] ] }, "test": { "roc_auc": 0.8563411078717201, "prc_auc": 0.5286311317357362, "accuracy": 0.8717948717948718, "precision": 0.41025641025641024, "recall": 0.5714285714285714, "bal_accuracy": 0.7387755102040816, "npv": 0.9487179487179487, "cross_entropy": 0.2981516587921453, "kappa": 0.4067796610169492, "MCC": 0.41403933560541256, "confusion_matrix": [ [ [ 222, 23 ], [ 12, 16 ] ] ] } } } Plotting ROC and Precision-Recall Curves **************************************** A `receiver operating characteristic `_ curve is a commonly used plot for assessing the performance of a binary classifier. It is generated from lists of true classes and predicted probabilities for the positive class by varying a threshold on the class probability, classifying as positive the compounds with probability greater than that threshold, and computing the fractions of true and false positives (the **true positive rate (TPR)** and **false positive rate (FPR)**). The ROC curve plots the resulting TPRs against the corresponding FPRs; the **ROC AUC** is simply the area under the ROC curve. The ROC curve for a completely random classifier will be close to a diagonal line running from (0,0) to (1,1), with AUC = 0.5. A perfect classifier has a ROC curve that follows the Y axis and then runs horizontally across the top of the plot. `AMPL `_ provides the function ``plot_ROC_curve``, which takes a ``ModelPipeline`` object as its main argument; it plots separate curves for the training, validation and test sets on the same axes. .. code:: ipython3 pp.plot_ROC_curve(mcc_wts_pipe) .. image:: ../_static/img/08_visualization_files/08_visualization_38_0.png A `precision-recall curve `_ is generated using a similar thresholding process, except that the metrics computed and plotted for each threshold are the precision and recall. Although the precision generally decreases with increasing recall, it usually doesn't decrease monotonically, especially for imbalanced datasets where the validation and test sets have very small numbers of active compounds. `AMPL `_ provides the function ``plot_prec_recall_curve`` to draw precision vs recall curves for the training, validation and test sets on one plot. The area under the curve, also known as the ``average precision (AP)``, is computed as well and shown in the figure legend. .. code:: ipython3 pp.plot_prec_recall_curve(mcc_wts_pipe) .. image:: ../_static/img/08_visualization_files/08_visualization_40_0.png Conclusion ********** This concludes our series of tutorials highlighting the core functions of `AMPL `_. We hope that completing these tutorials will provide you with the essential skills to train, evaluate and apply your own models for predicting chemical properties. In future versions of `AMPL `_, we will release specialized tutorials covering some of `AMPL `_'s more advanced capabilities, such as multitask modeling, transfer learning, feature importance analysis and more. If you have specific feedback about a tutorial, please complete the `AMPL Tutorial Evaluation `_.