GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy

Molecular & Cellular Proteomics. 2008;7(9):1598-1608.

[ Abstract ] [ Full Text ]

Read more

GPS 2.0

Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to delineate molecular dynamics and plasticity underlying a variety of cellular processes. In this work, we adopted a well established rule to classify PKs into a hierarchical structure with four levels, including group, family, subfamily, and single PK. In addition, we developed a simple approach to estimate the theoretically maximal false positive rates. The on-line service and local packages of the GPS (Group-based Prediction System) 2.0 were implemented in Java with the modified version of the Group-based Phosphorylation Scoring algorithm. As the first stand alone software for predicting phosphorylation, GPS 2.0 can predict kinase-specific phosphorylation sites for 408 human PKs in hierarchy. A large scale prediction of more than 13,000 mammalian phosphorylation sites by GPS 2.0 was exhibited with great performance and remarkable accuracy.Thus, the GPS 2.0 is a useful tool for predicting protein phosphorylation sites and their cognate kinases and is freely available on line.
GPS 2.0 is freely available at http://gps.biocuckoo.org.

CSS-Palm 2.0: an updated software for palmitoylation sites prediction

Protein Engineering, Design and Selection .2008; 21 (11): 639-644.

[ Abstract ] [ Full Text ]

Read more

CSS-Palm 2.0

Protein palmitoylation is an essential post-translational lipid modification of proteins, and reversibly orchestrates a variety of cellular processes. In this work, we updated our previous CSS-Palm into version 2.0. An updated clustering and scoring strategy (CSS) algorithm was employed with great improvement. The leave-one-out validation and 4-, 6-, 8- and 10-fold cross-validations were adopted to evaluate the prediction performance of CSS-Palm 2.0. Also, an additional new data set not included in training was used to test the robustness of CSS-Palm 2.0. As an application, we performed a small-scale annotation of palmitoylated proteins in budding yeast. The online service and local packages of CSS-Palm 2.0 were freely available at:http://csspalm.biocuckoo.org

DOG 1.0: illustrator of protein domain structures

Cell Research. 2009;19:271–273.

[ Abstract ] [ Full Text ]

Read more

DOG 1.0

Development of computer software that can illustrate user-designated protein domain structures will be a great help for biological experimentalists to communicate their research results. In this work, we present a novel software of DOG (Domain Graph, version 1.0) for experimentalists, to prepare publication-quality figures of protein domain structures. The scale of a protein domain and the position of a functional motif/site will be precisely defined. The DOG 1.0 software was written in JAVA 1.5 (J2SE 5.0) and packed with Install4j 4.0.8. Then we developed several packages to support three major Operating Systems (OS), including Windows, Unix/Linux and Mac. For Windows and Linux systems, a Java Runtime Environment 6 (JRE) package of Sun Microsystems was also included. The DOG 1.0 software is freely available from: http://dog.biocuckoo.org.

Systematic study of protein sumoylation: Development of a site-specific predictor of SUMOsp 2.0

Proteomics.2009;9(12):3409-3412.

[ Abstract ] [ Full Text ]

Read more

SUMOsp 2.0

Protein sumoylation is an important reversible post-translational modification on proteins, and orchestrates a variety of cellular processes. In this work, we developed SUMOsp 2.0, an accurate computing program with an improved group-based phosphorylation scoring algorithm. Our analysis demonstrated that SUMOsp 2.0 has greater prediction accuracy than SUMOsp 1.0 and other existing tools, with a sensitivity of 88.17% and a specificity of 92.69% under the medium threshold. Previously, several large-scale experiments have identified a list of potential sumoylated substrates in Saccharomyces cerevisiae and Homo sapiens; however, the exact sumoylation sites in most of these proteins remain elusive. We have predicted potential sumoylation sites in these proteins using SUMOsp 2.0, which provides a great resource for researchers and an outline for further mechanistic studies of sumoylation in cellular plasticity and dynamics. The online service and local packages of SUMOsp 2.0 are freely available at: http://sumosp.biocuckoo.org

MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore

Nucleic Acids Research.2010;38:D155-D160.

[ Abstract ] [ Full Text ]

Read more

MiCroKit 3.0

During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification.

PhosSNP for Systematic Analysis of Genetic Polymorphisms That Influence Protein Phosphorylation

Molecular & Cellular Proteomics. 2010;9(4):623-634.

[ Abstract ] [ Full Text ]

Read more

PhosSNP

We are entering the era of personalized genomics as breakthroughs in sequencing technology have made it possible to sequence or genotype an individual person in an efficient and accurate manner. Preliminary results from HapMap and other similar projects have revealed the existence of tremendous genetic variations among world populations and among individuals. It is also generally believed that the genetic variation is the main cause for different susceptibility to certain diseases or different response to therapeutic treatments. In this work, using an in-house developed kinase-specific phosphorylation site predictor (GPS 2.0), we computationally detected that ∼70% of the reported nsSNPs are potential phosSNPs. Finally, all phosSNPs were integrated into the PhosSNP 1.0 database, which was implemented in JAVA 1.5 (J2SE 5.0). The PhosSNP 1.0 database is freely available for academic researchers at:http://phossnp.biocuckoo.org

GPS-SNO: Computational Prediction of Protein S-Nitrosylation Sites with a Modified GPS Algorithm

Plos One. 2010;5(6): e11290.

[ Abstract ] [ Full Text ]

Read more

GPS-SNO

As one of the most important and ubiquitous post-translational modifications (PTMs) of proteins, S-nitrosylation plays important roles in a variety of biological processes, including the regulation of cellular dynamics and plasticity. Identification of S-nitrosylated substrates with their exact sites is crucial for understanding the molecular mechanisms of S-nitrosylation.In this work, we developed a novel software of GPS-SNO 1.0 for the prediction of S-nitrosylation sites.By comparison, the prediction performance of GPS 3.0 algorithm was better than other methods, with an accuracy of 75.80%, a sensitivity of 53.57% and a specificity of 80.14%. As an application of GPS-SNO 1.0, we predicted putative S-nitrosylation sites for hundreds of potentially S-nitrosylated substrates for which the exact S-nitrosylation sites had not been experimentally determined.The online service and local packages of GPS-SNO were implemented in JAVA and are freely available at: http://sno.biocuckoo.org.

A Summary of Computational Resources for Protein Phosphorylation

Current Protein & Peptide Science. 2010;11(6):485-496.

[ Abstract ] [ Full Text ]

Read more

Protein Phosphorylation

Protein phosphorylation is the most ubiquitous post-translational modification (PTM), and plays important roles in most of biological processes. Identification of site-specific phosphorylated substrates is fundamental for understanding the molecular mechanisms of phosphorylation. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience, fast-speed and low-cost. In this review, we present a comprehensive but brief summarization of computational resources of protein phosphorylation, including phosphorylation databases, prediction of non-specific or organism-specific phosphorylation sites, prediction of kinase-specific phosphorylation sites or phospho-binding motifs, and other tools. The latest compendium of computational resources for protein phosphorylation is available at:http://gps.biocuckoo.org/links.php

CPLA 1.0: an integrated database of protein lysine acetylation

Nucleic Acids Research. 2011;39:D1029-1034.

[ Abstract ] [ Full Text ]

Read more

CPLA 1.0

As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation.In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at:http://cpla.biocuckoo.org

GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection

Protein Engineering, Design and Selection. 2011;24(3):255-260.

[ Abstract ] [ Full Text ]

Read more

GPS 2.1

As the most important post-translational modification of proteins, phosphorylation plays essential roles in all aspects of biological processes. Besides experimental approaches, computational prediction of phosphorylated proteins with their kinase-specific phosphorylation sites has also emerged as a popular strategy, for its low-cost, fast-speed and convenience. In this work, we developed a kinase-specific phosphorylation sites predictor of GPS 2.1 (Group-based Prediction System), with a novel but simple approach of motif length selection (MLS). By this approach, the robustness of the prediction system was greatly improved. All algorithms in GPS old versions were also reserved and integrated in GPS 2.1. The online service and local packages of GPS 2.1 were implemented in JAVA 1.5 (J2SE 5.0) and freely available for academic researches at: http://gps.biocuckoo.org

GPS-YNO2: computational prediction of tyrosine nitration sites in proteins

Mol. BioSyst.2011,7, 1197-1204

[ Abstract ] [ Full Text ]

Read more

GPS-YNO2

The last decade has witnessed rapid progress in the identification of proteintyrosine nitration (PTN), which is an essential and ubiquitous post-translational modification (PTM) that plays a variety of important roles in both physiological and pathological processes, such as the immune response, cell death, aging and neurodegeneration. Identification of site-specific nitrated substrates is fundamental for understanding the molecular mechanisms and biological functions of PTN. In contrast with labor-intensive and time-consuming experimental approaches, here we report the development of the novel software package GPS-YNO2 to predict PTN sites. The software demonstrated a promising accuracy of 76.51%, a sensitivity of 50.09% and a specificity of 80.18% from the leave-one-out validation. Through a statistical functional comparison with the nitric oxide (NO) dependent reversible modification of S-nitrosylation, we observed that PTN prefers to attack certain fundamental biological processes and functions. Finally, the online service and local packages of GPS-YNO2 1.0 were implemented in JAVA and freely available at:http://yno2.biocuckoo.org

GPS-CCD: A Novel Computational Program for the Prediction of Calpain Cleavage Sites

Plos One. 2011;6(4):e19001.

[ Abstract ] [ Full Text ]

Read more

GPS-CCD

As one of the most essential post-translational modifications (PTMs) of proteins, proteolysis, especially calpain-mediated cleavage, plays an important role in many biological processes, including cell death/apoptosis, cytoskeletal remodeling, and the cell cycle. Experimental identification of calpain targets with bona fide cleavage sites is fundamental for dissecting the molecular mechanisms and biological roles of calpain cleavage. In contrast to time-consuming and labor-intensive experimental approaches, computational prediction of calpain cleavage sites might more cheaply and readily provide useful information for further experimental investigation. In this work, we constructed a novel software package of GPS-CCD (Calpain Cleavage Detector) for the prediction of calpain cleavage sites, with an accuracy of 89.98%, sensitivity of 60.87% and specificity of 90.07%. With this software, we annotated potential calpain cleavage sites for hundreds of calpain substrates, for which the exact cleavage sites had not been previously determined.The online service and local packages of GPS-CCD 1.0 were implemented in JAVA and are freely available at: http://ccd.biocuckoo.org/.

GPS-PUP: computational prediction of pupylation sites in prokaryotic proteins

Mol. BioSyst., 2011,7, 2737-2740

[ Abstract ] [ Full Text ]

Read more

GPS-PUP

Recent experiments revealed the prokaryotic ubiquitin-like protein (PUP) to be a signal for the selective degradation of proteins in Mycobacterium tuberculosis (Mtb). By covalently conjugating the PUP, pupylation functions as a critical post-translational modification (PTM) conserved in actinomycetes. Here, we designed a novel computational tool of GPS-PUP for the prediction of pupylation sites, which was shown to have a promising performance. From small-scale and large-scale studies we collected 238 potentially pupylated substrates for which the exact pupylation sites were still not determined. As an example application, we predicted ∼85% of these proteins with at least one potential pupylation site. Furthermore, through functional analysis, we observed that pupylation can target various substrates so as to regulate a broad array of biological processes, such as the response to stress, sulfate and proton transport, and metabolism. The GPS-PUP 1.0 is freely available at: http://pup.biocuckoo.org

Computational Analysis of Phosphoproteomics: Progresses and Perspectives

Current Protein & Peptide Science. 2011;7(12):591-601.

[ Abstract ] [ Full Text ]

Read more

Phosphoproteomics

Phosphorylation is one of the most essential post-translational modifications (PTMs) of proteins, regulates a variety of cellular signaling pathways, and at least partially determines the biological diversity. Recent progresses in phosphoproteomics have identified more than 100,000 phosphorylation sites, while this number will easily exceed one million in the next decade. In this regard, how to extract useful information from flood of phosphoproteomics data has emerged as a great challenge. In this review, we summarized the leading edges on computational analysis of phosphoproteomics, including discovery of phosphorylation motifs from phosphoproteomics data, systematic modeling of phosphorylation network, analysis of genetic variation that influences phosphorylation, and phosphorylation evolution. Based on existed knowledge, we also raised several perspectives for further studies. We believe that integration of experimental and computational analyses will propel the phosphoproteomics research into a new phase.

Systematic Analysis of Protein Phosphorylation Networks From Phosphoproteomic Data

Molecular & Cellular Proteomics. 2012;11(10):1070-1083.

[ Abstract ] [ Full Text ]

Read more

iGPS

In eukaryotes, hundreds of protein kinases (PKs) specifically and precisely modify thousands of substrates at specific amino acid residues to faithfully orchestrate numerous biological processes, and reversibly determine the cellular dynamics and plasticity. Although over 100,000 phosphorylation sites (p-sites) have been experimentally identified from phosphoproteomic studies, the regulatory PKs for most of these sites still remain to be characterized. Here, we present a novel software package of iGPS for the prediction of in vivo site-specific kinase-substrate relations mainly from the phosphoproteomic data.By critical evaluations and comparisons, the performance of iGPS is satisfying and better than other existed tools. Based on the prediction results, we modeled protein phosphorylation networks and observed that the eukaryotic phospho-regulation is poorly conserved at the site and substrate levels.This work contributes to the understanding of phosphorylation mechanisms at the systemic level, and provides a powerful methodology for the general analysis of in vivo post-translational modifications regulating sub-proteomes.

Systematic analysis of the Plk-mediated phosphoregulation in eukaryotes

Briefings in Bioinformatics. 2013;14 (3):344-360

[ Abstract ] [ Full Text ]

Read more

Plk-mediated phosphoregulation

Substantial evidence has confirmed that Polo-like kinases (Plks) play a crucial role in a variety of cellular processes via phosphorylation-mediated signaling transduction. Identification of Plk phospho-binding proteins and phosphorylation substrates is fundamental for elucidating the molecular mechanisms of Plks. Here, we present an integrative approach for the analysis of Plk-specific phospho-binding and phosphorylation sites (p-sites) in proteins. From the currently available phosphoproteomic data, we predicted tens of thousands of potential Plk phospho-binding and phosphorylation sites in eukaryotes, respectively. Furthermore, statistical analysis suggested that Plk phospho-binding proteins are more closely implicated in mitosis than their phosphorylation substrates. Additional computational analysis together with in vitro and in vivo experimental assays demonstrated that human Mis18B is a novel interacting partner of Plk1, while pT14 and pS48 of Mis18B were identified as phospho-binding sites. Taken together, this systematic analysis provides a global landscape of the complexity and diversity of potential Plk-mediated phosphoregulation, and the prediction results can be helpful for further experimental investigation.

GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs

Nucleic Acids Research. 2014;42: W325-30.

[ Abstract ] [ Full Text ]

Read more

GPS-SUMO 2.0

Small ubiquitin-like modifiers (SUMOs) regulate a variety of cellular processes through two distinct mechanisms, including covalent sumoylation and noncovalent SUMO interaction. The complexity of SUMO regulations has greatly hampered the large-scale identification of SUMO substrates or interaction partners on a proteome-wide level. In this work, we developed a new tool called GPS-SUMO for the prediction of both sumoylation sites and SUMO-interaction motifs (SIMs) in proteins. To obtain an accurate performance, a new generation group-based prediction system (GPS) algorithm integrated with Particle Swarm Optimization approach was applied. By critical evaluation and comparison, GPS-SUMO was demonstrated to be substantially superior against other existing tools and methods. With the help of GPS-SUMO, it is now possible to further investigate the relationship between sumoylation and SUMO interaction processes. A web service of GPS-SUMO was implemented in PHP + JavaScript and freely available at http://sumosp.biocuckoo.org.

An integrated overview of spatiotemporal organization and regulation in mitosis in terms of the proteins in the functional supercomplexes

Frontiers in Microbiology. 2014; 5: 573

[ Abstract ] [ Full Text ]

Read more

Overview

Eukaryotic cells may divide via the critical cellular process of cell division/mitosis, resulting in two daughter cells with the same genetic information. A large number of dedicated proteins are involved in this process and spatiotemporally assembled into three distinct super-complex structures/organelles, including the centrosome/spindle pole body, kinetochore/centromere and cleavage furrow/midbody/bud neck, so as to precisely modulate the cell division/mitosis events of chromosome alignment, chromosome segregation and cytokinesis in an orderly fashion. In recent years, many efforts have been made to identify the protein components and architecture of these subcellular organelles, aiming to uncover the organelle assembly pathways, determine the molecular mechanisms underlying the organelle functions, and thereby provide new therapeutic strategies for a variety of diseases. However, the organelles are highly dynamic structures, making it difficult to identify the entire components. Here, we review the current knowledge of the identified protein components governing the organization and functioning of organelles, especially in human and yeast cells, and discuss the multi-localized protein components mediating the communication between organelles during cell division.

IBS: an illustrator for the presentation and visualization of biological sequences

Bioinformatics. 2015; 31(20):3359-61

[ Abstract ] [ Full Text ]

Read more

IBS 1.0

Biological sequence diagrams are fundamental for visualizing various functional elements in protein or nucleotide sequences that enable a summarization and presentation of existing information as well as means of intuitive new discoveries. Here, we present a software package called illustrator of biological sequences (IBS) that can be used for representing the organization of either protein or nucleotide sequences in a convenient, efficient and precise manner. Multiple options are provided in IBS, and biological sequences can be manipulated, recolored or rescaled in a user-defined mode. Also, the final representational artwork can be directly exported into a publication-quality figure.

The standalone package of IBS was implemented in JAVA, while the online service was implemented in HTML5 and JavaScript. Both the standalone package and online service are freely available at http://ibs.biocuckoo.org.

RPFdb: a database for genome wide information of translated mRNA generated from ribosome profiling.

Nucleic Acids Research. 2016;44:D254-D258.

[ Abstract ] [ Full Text ]

Read more

RPFdb

Translational control is crucial in the regulation of gene expression and deregulation of translation is associated with a wide range of cancers and human diseases. Ribosome profiling is a technique that provides genome wide information of mRNA in translation based on deep sequencing of ribosome protected mRNA fragments (RPF). RPFdb is a comprehensive resource for hosting, analyzing and visualizing RPF data, available at http://www.rpfdb.org. The current version of database contains 777 samples from 82 studies in 8 species, processed and reanalyzed by a unified pipeline. Overall our database provides a simple way to search, analyze, compare, visualize and download RPF data sets.

GPS-Lipid: a robust tool for the prediction of multiple lipid modification sites

Scientific Reports. 2016;6:28249.

[ Abstract ] [ Full Text ]

Read more

GPS-Lipid 1.0

As one of the most common post-translational modifications in eukaryotic cells, lipid modification is an important mechanism for the regulation of variety aspects of protein function. In this work, we developed a tool called GPS-Lipid for the prediction of four classes of lipid modifications by integrating the Particle Swarm Optimization with an aging leader and challengers (ALC-PSO) algorithm. GPS-Lipid was proven to be evidently superior to other similar tools. To facilitate the research of lipid modification, we hosted a publicly available web server at http://lipid.biocuckoo.org with not only the implementation of GPSLipid, but also an integrative database and visualization tool. We performed a systematic analysis of the co-regulatory mechanism between different lipid modifications with GPS-Lipid. The results demonstrated that the proximal dual-lipid modifications among palmitoylation, myristoylation and prenylation are key mechanism for regulating various protein functions. In conclusion, GPS-lipid is expected to serve as useful resource for the research on lipid modifications, especially on their coregulation.

Research

icon-1

Post-translational Modifications

Our group is engaged in the study of post-translational modifications(PTMs) using computational approaches. We have
been developing a high-effective algorithm named GPS (Group-based Prediction System) for the prediction of PTMs sites.
Based on the GPS algorithm,over ten types of PTM predictors have been released. We also built a series
databases for protein phosphorylation, lipid and lysine modifications. Recently, we are combining
the computational methods with the technology of BiFC(Bimolecular Fluorescence
Complementation) to develop a systematic approach for studying
the SUMO regulation in Homo sapiens.
icon-2

Gene Editting with CRISPR

Our group also focus on developing computational tools for assisting the design of CRISPR system. Currently, we have
developed a high efficient binary alignment scheme to screen out potential on-target and off-target sites from
the whole genome. Using machine learning methods, such as Random Forest, we predicted the cleavage
efficacies of the potential target sites, and recommended an optimal gRNA design for the users
based on our predictions. A subsequent experimental validation will be also
performed in the near further.
icon-3

RNA N6-methyladenosine Modification

RNA N6-methyladenosine (m6A) modification has a critical role in the regulation of many fundamental biological processes.
However, the role of m6A in cancer is poorly understood. We have developed a computational tool, which is called
“m6A Finder”, for predicting m6A modification sites at single-nucleotide resolution. We then systematically
investigate the m6A-associated somatic mutations in cancers using TCGA data. We are also
developing algorithms to analyze m6A-Seq data, such as peak calling
and differential methylation analysis.