Eight quick tips for biologically and medically informed machine learning
Machine learning has become a powerful tool for computational analysis in the biomedical sciences, with its effectiveness significantly enhanced by integrating domain-specific knowledge. This integration has give rise to informed machine learning, in contrast…
## Machine Learning in the Biomedical Sciences: Enhanced by Domain Knowledge### Understanding Informed Machine LearningMachine learning has revolutionized biomedical sciences, with informed machine learning techniques integrating domain-specific knowledge to enhance the accuracy, explainability, and reliability of models. Informed learning approaches fall into three categories: pre-, in-, and post-processing.### Pre-Processing StrategiesPre-processing plays a crucial role in leveraging domain knowledge to inform machine learning models:- **Modifying Input Data**: Cleaning, engineering, selecting, and reducing inputs ensures data quality and alignment with domain knowledge.- **Serial or Parallel Informed Learning**: Potentially partial knowledge models make predictions in parallel or series with zero-knowledge models.- **Modifying Data Set**: Selecting, enriching, and designing experiments ensure accurate representation of phenomena under study.- **Guiding Machine Learning Algorithm Selection**: Domain knowledge guides the choice of algorithm functional form, explainability level, and hyperparameters.### In-Processing TechniquesIn-processing techniques embed domain knowledge directly into the machine learning model training process:- **Regularization**: Penalizing deviations from known biological laws helps maintain model fidelity.- **Surrogate Models**: Mimicking simulator outputs for computational efficiency while capturing underlying physical relationships.- **Reasoning Methods**: Inductive logic programming, neuro-symbolic approaches, and constrained models enhance practical results.### Post-Processing MethodsPost-processing leverages domain knowledge to align the output of zero-knowledge models:- **Enforcing Constraints**: Ensuring that model predictions adhere to specific characteristics or hierarchies present in the domain.- **Adjusting Predictions**: Adapting predictions to align with established practices ensures implementability.- **Explainability Trade-Offs**: Balancing interpretability and accuracy considerations through feature importance analysis and local explainability techniques.- **Full-Knowledge Model Integration**: Utilizing well-mathematically encoded full-knowledge models as inputs for zero-knowledge models for enhanced reliability.### Best Practices for Informed Machine Learning in Biomedical Sciences- **Involve Biomedical Experts**: Collaborate with experts for scientific question definition and result interpretation.- **Evaluate Different Approaches**: Compare informed, uninformed, and knowledge-based approaches for comprehensive insights.- **Follow Open Science Practices**: Use open source software, release data openly, and publish results openly to ensure reproducibility and impact.### ConclusionInformed machine learning techniques have transformed biomedical studies by leveraging domain knowledge to improve model accuracy, explainability, and reliability. Researchers can leverage the proposed guidelines to enhance the robustness, explainability, and dependability of their research.