Machine Learning and Deep Learning Tools

The “AI and Simulations Platform” is integrated with cutting-edge solutions to support biological investigations related to understanding of cellular mechanisms. The Laboratory of Data Engineering (LADE) of Area Science Park and the MIVIA lab of the University of Salerno develop advanced research tools, based on machine learning and deep learning, to study and analyse data produced in the experimental facilities for investigating infectious diseases.

DPCfam (Density Peaks Clustering families) Algorithm and Databases

DPCfam is a new unsupervised procedure that uses alignments and Density Peak Clustering to automatically classify homologous protein regions.
Applied to protein sequences, it assists in manual annotation (e.g. domain discovery and boosting of clan membership) and can be used as a stand-alone tool for unsupervised classification of sparsely annotated protein datasets such as those from metagenomics studies. It has been recently adapted to structure predicted in silico with AlphaFold (AF) for structural domain discovery.
Results on relevant biological cases are released openly to the public and can be accessed through dedicated web-services:

The tool is developed and maintained by LADE at Area Science Park.

Prediction of Protein Stability Changes upon Mutation with Fine-tuned MSA-transformer

It is a state-of-the-art solution for predicting changes in protein thermodynamic stability resulting from single amino acid mutations. Leveraging the MSA Transformer architecture, exploiting the evolutionary information encoded in families of aligned homologous sequences, it stands out for its performance and efficiency. The algorithm is designed with a high degree of flexibility, allowing for easy adaptation to address various downstream tasks.

The tool is developed and maintained by LADE at Area Science Park.

Prediction of Protein Multiple Conformations with MSA Transformer Representations Clustering and AlphaFold

It is a method to generate heterogeneous ensembles of AlphaFold (AF) structure predictions for proteins with a biological function depending on multiple conformational substates. Building on a measure of similarity based on protein language model (MSA-transformer) representations, the algorithm clusters a Multiple Sequence Alignment in putative sub-families leading AF to predict different conformational states. Implemented in a pipeline adapted for the PRP@CERIC computation infrastructure, it can be used for screening that identify proteins presenting at least one meta-stable state or combined with other pipelines for other downstream analyses.

The tool is developed and maintained by LADE at Area Science Park.

Image Analysis for Cell Classification and Other Tools

The MIVIA lab at the University of Salerno is working on definition of new multi task networks and Graph Neural Networks for image analysis and cell classification as well as on usage of deep neural networks and LLM (Large Language Models) exploiting sequence and/or structure for predicting protein function.
Moreover, should your research necessitate similar or bespoke tools, our team welcomes collaborations or contract development opportunities. For further details, kindly direct your inquiries to us.