GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy 
      Molecular & Cellular Proteomics. 2008;7(9):1598-1608.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      GPS 2.0
      Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to
      delineate molecular dynamics and plasticity underlying a variety of cellular processes.
      In this work, we adopted a well established rule to classify PKs into a hierarchical structure with four
      levels, including group, family, subfamily, and single PK.
      In addition, we developed a simple approach to estimate the theoretically maximal false positive rates.
      The on-line service and local packages of the GPS (Group-based Prediction System) 2.0 were implemented in
      Java with the modified version of the Group-based Phosphorylation Scoring algorithm. As the first stand
      alone software for predicting phosphorylation, GPS 2.0 can predict kinase-specific phosphorylation sites for
      408 human PKs in hierarchy. A large scale prediction of more than 13,000 mammalian phosphorylation sites by
      GPS 2.0 was exhibited with great performance and remarkable accuracy.Thus, the GPS 2.0 is a useful tool for
      predicting protein phosphorylation sites and their cognate kinases and is freely available on line. 
      GPS 2.0 now is updated as GPS3.0 and is freely available at 
http://gps.biocuckoo.org.
    
 
   
  
    
      
    
    
      CSS-Palm 2.0: an updated software for palmitoylation sites prediction 
      Protein Engineering, Design and Selection. 2008;21(11):639-644.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    ESI HCP
    Read more
   
  
    
      CSS-Palm 2.0
      Protein palmitoylation is an essential post-translational lipid modification of proteins, and reversibly
      orchestrates a variety of cellular processes.
      In this work, we updated our previous CSS-Palm into version 2.0. An updated clustering and scoring strategy
      (CSS) algorithm was employed with great improvement.
      The leave-one-out validation and 4-, 6-, 8- and 10-fold cross-validations were adopted to evaluate the
      prediction performance of CSS-Palm 2.0.
      Also, an additional new data set not included in training was used to test the robustness of CSS-Palm 2.0.
      As an application, we performed a small-scale annotation of palmitoylated proteins in budding yeast.
      The online service and local packages of CSS-Palm 2.0 were freely available at:
http://csspalm.biocuckoo.org
     
   
  
    
      
    
    
      DOG 1.0: illustrator of protein domain structures 
      Cell Research. 2009;19(2):271-273.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      DOG 1.0
      Development of computer software that can illustrate user-designated protein domain structures will be a
      great help for biological experimentalists to communicate their research results.
      In this work, we present a novel software of DOG (Domain Graph, version 1.0) for experimentalists, to
      prepare publication-quality figures of protein domain structures.
      The scale of a protein domain and the position of a functional motif/site will be precisely defined.
      The DOG 1.0 software was written in JAVA 1.5 (J2SE 5.0) and packed with Install4j 4.0.8.
      Then we developed several packages to support three major Operating Systems (OS), including Windows,
      Unix/Linux and Mac.
      For Windows and Linux systems, a Java Runtime Environment 6 (JRE) package of Sun Microsystems was also
      included.
      The DOG 1.0 software is freely available from: 
http://dog.biocuckoo.org.
    
 
   
  
    
      
    
    
      Systematic study of protein sumoylation: Development of a site-specific predictor of SUMOsp 2.0 
      Proteomics. 2009;9(12):3409-3412.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      SUMOsp 2.0
      Protein sumoylation is an important reversible post-translational modification on proteins, and orchestrates
      a variety of cellular processes.
      In this work, we developed SUMOsp 2.0, an accurate computing program with an improved group-based
      phosphorylation scoring algorithm.
      Our analysis demonstrated that SUMOsp 2.0 has greater prediction accuracy than SUMOsp 1.0 and other existing
      tools, with a sensitivity of 88.17% and a specificity of 92.69% under the medium threshold.
      Previously, several large-scale experiments have identified a list of potential sumoylated substrates in
      Saccharomyces cerevisiae and Homo sapiens;
      however, the exact sumoylation sites in most of these proteins remain elusive. We have predicted potential
      sumoylation sites in these proteins using SUMOsp 2.0,
      which provides a great resource for researchers and an outline for further mechanistic studies of
      sumoylation in cellular plasticity and dynamics.
      The online service and local packages of SUMOsp 2.0 are freely available at: 
http://sumosp.biocuckoo.org
     
   
  
    
      
    
    
      MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore 
      Nucleic Acids Research. 2010;38:D155-D160.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      MiCroKit 3.0
      During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into
      protein super complexes
      in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage
      furrow/phragmoplast/bud
      neck, and modulates cell division process faithfully. Here, we present the MiCroKit database (
http://microkit.biocuckoo.org) of proteins that localize in
      midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally
      verified microkit proteins from the scientific literature that have unambiguous supportive evidence for
      subcellular localization
      under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489
      microkit proteins
      from seven model organisms, including 
Saccharomyces cerevisiae, 
Schizasaccharomyces pombe,
      
Caenorhabditis elegans, 
Drosophila melanogaster, 
Xenopus laevis, 
Mus
      musculus and 
Homo sapiens. Moreover, the orthologous information was provided for these
      microkit proteins, and could be a useful resource for further
      experimental identification.
    
 
   
  
    
      
    
    
      PhosSNP for Systematic Analysis of Genetic Polymorphisms That Influence Protein Phosphorylation
      Molecular & Cellular Proteomics. 2010;9(4):623-634.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      PhosSNP
      We are entering the era of personalized genomics as breakthroughs in sequencing technology have made it
      possible to sequence or genotype an individual person in an efficient and accurate manner.
      Preliminary results from HapMap and other similar projects have revealed the existence of tremendous genetic
      variations among world populations and among individuals.
      It is also generally believed that the genetic variation is the main cause for different susceptibility to
      certain diseases or different response to therapeutic treatments.
      In this work, using an in-house developed kinase-specific phosphorylation site predictor (GPS 2.0), we
      computationally detected that ∼70% of the reported nsSNPs are potential phosSNPs.
      Finally, all phosSNPs were integrated into the PhosSNP 1.0 database, which was implemented in JAVA 1.5 (J2SE
      5.0).
      The PhosSNP 1.0 database is freely available for academic researchers at:
http://phossnp.biocuckoo.org
     
   
  
    
      
    
    
      GPS-SNO: Computational Prediction of Protein S-Nitrosylation Sites with a Modified GPS Algorithm
      Plos One. 2010;5(6): e11290.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      GPS-SNO
      As one of the most important and ubiquitous post-translational modifications (PTMs) of proteins, 
S-nitrosylation
      plays important roles in a variety of biological processes, including the regulation of cellular dynamics
      and plasticity. Identification of 
S-nitrosylated substrates with their exact sites is crucial for
      understanding the molecular mechanisms of 
S-nitrosylation.In this work, we developed a novel
      software of GPS-SNO 1.0 for the prediction of 
S-nitrosylation sites.By comparison, the prediction
      performance of GPS 3.0 algorithm was better than other methods, with an accuracy of 75.80%, a sensitivity of
      53.57% and a specificity of 80.14%. As an application of GPS-SNO 1.0, we predicted putative 
S-nitrosylation
      sites for hundreds of potentially 
S-nitrosylated substrates for which the exact 
S-nitrosylation
      sites had not been experimentally determined.The online service and local packages of GPS-SNO were
      implemented in JAVA and are freely available at: 
http://sno.biocuckoo.org.
    
 
   
  
    
      
    
    
      A Summary of Computational Resources for Protein Phosphorylation
      Current Protein & Peptide Science. 2010;11(6):485-496.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      Protein Phosphorylation
      Protein phosphorylation is the most ubiquitous post-translational modification (PTM), and plays important
      roles in most of biological processes. Identification of site-specific phosphorylated substrates is
      fundamental for understanding the molecular mechanisms of phosphorylation. Besides experimental approaches,
      prediction of potential candidates with computational methods has also attracted great attention for its
      convenience, fast-speed and low-cost. In this review, we present a comprehensive but brief summarization of
      computational resources of protein phosphorylation, including phosphorylation databases, prediction of
      non-specific or organism-specific phosphorylation sites, prediction of kinase-specific phosphorylation sites
      or phospho-binding motifs, and other tools. The latest compendium of computational resources for protein
      phosphorylation is available at: 
http://gps.biocuckoo.org/links.php
     
   
  
    
      
    
    
      CPLA 1.0: an integrated database of protein lysine acetylation
      Nucleic Acids Research. 2011;39:D1029-1034.
      
        [
        Abstract
        ]
        [
        Full
          Text
        ]
      
     
    Read more
   
  
    
      CPLA 1.0
      As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was
      known for its regulation of transcription through the modification of histones. Recent studies discovered
      that lysine acetylation targets broad substrates and especially plays an essential role in cellular
      metabolic regulation.In this work, we presented the compendium of protein lysine acetylation (CPLA) database
      for lysine acetylated substrates with their sites. The online services of CPLA database
      was implemented in PHP + MySQL + JavaScript, while the local packages were
      developed in JAVA 1.5 (J2SE 5.0). The CPLA database
      is updated as CPLM and is freely available for all users at: 
http://cplm.biocuckoo.org
     
   
  
    
      
    
    
      GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length
        selection
      Protein Engineering, Design and Selection. 2011;24(3):255-260.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      GPS 2.1
      As the most important post-translational modification of proteins, phosphorylation plays essential roles in
      all aspects of biological processes. Besides experimental approaches, computational prediction of
      phosphorylated proteins with their kinase-specific
      phosphorylation sites has also emerged as a popular strategy, for its low-cost, fast-speed and convenience.
      In this work,
      we developed a kinase-specific phosphorylation sites predictor of GPS 2.1 (Group-based Prediction System),
      with a novel but
      simple approach of motif length selection (MLS). By this approach, the robustness of the prediction system
      was greatly improved.
      All algorithms in GPS old versions were also reserved and integrated in GPS 2.1. The online service and
      local packages of
      GPS 2.1 were implemented in JAVA 1.5 (J2SE 5.0) and freely available for academic researches at: 
http://gps.biocuckoo.org
     
   
  
    
      
    
    
      GPS-YNO2: computational prediction of tyrosine nitration sites in proteins
      Mol. BioSyst. 2011;7(4):1197-1204.
      
        [
        Abstract
        ]
        [
        Full
          Text
        ]
      
     
    Read more
   
  
    
      GPS-YNO2
      The last decade has witnessed rapid progress in the identification of proteintyrosine nitration
      (PTN), which is an essential and ubiquitous post-translational modification (PTM) that plays a variety of
      important roles in both physiological and pathological processes, such as the immune response, cell death,
      aging and neurodegeneration.
      Identification of site-specific nitrated substrates is fundamental for understanding the molecular
      mechanisms and biological functions of PTN.
      In contrast with labor-intensive and time-consuming experimental approaches, here we report the development
      of the novel software package GPS-YNO2 to predict PTN sites.
      The software demonstrated a promising accuracy of 76.51%, a sensitivity of 50.09% and a specificity of
      80.18% from the leave-one-out validation.
      Through a statistical functional comparison with the nitric oxide (NO) dependent reversible modification of
      S-nitrosylation, we observed that PTN prefers to attack certain fundamental biological processes and
      functions.
      Finally, the online service and local packages of GPS-YNO2 1.0 were implemented in JAVA and freely available
      at:
http://yno2.biocuckoo.org
     
   
  
    
      
    
    
      GPS-CCD: A Novel Computational Program for the Prediction of Calpain Cleavage Sites
      Plos One. 2011;6(4):e19001.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      GPS-CCD
      As one of the most essential post-translational modifications (PTMs) of proteins, proteolysis, especially
      calpain-mediated cleavage, plays an important role in many biological processes, including cell
      death/apoptosis, cytoskeletal remodeling, and the cell cycle. Experimental identification of calpain targets
      with 
bona fide cleavage sites is fundamental for dissecting the molecular mechanisms and biological
      roles of calpain cleavage. In contrast to time-consuming and labor-intensive experimental approaches,
      computational prediction of calpain cleavage sites might more cheaply and readily provide useful information
      for further experimental investigation. In this work, we constructed a novel software package of GPS-CCD
      (Calpain Cleavage Detector) for the prediction of calpain cleavage sites, with an accuracy of 89.98%,
      sensitivity of 60.87% and specificity of 90.07%. With this software, we annotated potential calpain cleavage
      sites for hundreds of calpain substrates, for which the exact cleavage sites had not been previously
      determined.The online service and local packages of GPS-CCD 1.0 were implemented in JAVA and are freely
      available at: 
http://ccd.biocuckoo.org/.
    
 
   
  
    
      
    
    
      GPS-PUP: computational prediction of pupylation sites in prokaryotic proteins
      Mol. BioSyst. 2011;7(10):2737-2740.
      
        [
        Abstract
        ]
        [
        Full
          Text
        ]
      
     
    Read more
   
  
    
      GPS-PUP
      Recent experiments revealed the prokaryotic ubiquitin-like protein (PUP) to be a signal for the selective
      degradation of proteins in Mycobacterium tuberculosis (Mtb).
      By covalently conjugating the PUP, pupylation functions as a critical post-translational modification (PTM)
      conserved in actinomycetes.
      Here, we designed a novel computational tool of GPS-PUP for the prediction of pupylation sites, which was
      shown to have a promising performance.
      From small-scale and large-scale studies we collected 238 potentially pupylated substrates for which the
      exact pupylation sites were still not determined.
      As an example application, we predicted ∼85% of these proteins with at least one potential pupylation site.
      Furthermore, through functional analysis,
      we observed that pupylation can target various substrates so as to regulate a broad array of biological
      processes, such as the response to stress, sulfate and proton transport, and metabolism.
      The GPS-PUP 1.0 is freely available at: 
http://pup.biocuckoo.org
     
   
  
    
      
    
    
      Computational Analysis of Phosphoproteomics: Progresses and Perspectives
      Current Protein & Peptide Science. 2011;7(12):591-601.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Phosphoproteomics
      Phosphorylation is one of the most essential post-translational modifications (PTMs) of proteins, regulates
      a variety of cellular signaling pathways, and at least partially determines the biological diversity. Recent
      progresses in phosphoproteomics have identified more than 100,000 phosphorylation sites, while this number
      will easily exceed one million in the next decade. In this regard, how to extract useful information from
      flood of phosphoproteomics data has emerged as a great challenge. In this review, we summarized the leading
      edges on computational analysis of phosphoproteomics, including discovery of phosphorylation motifs from
      phosphoproteomics data, systematic modeling of phosphorylation network, analysis of genetic variation that
      influences phosphorylation, and phosphorylation evolution. Based on existed knowledge, we also raised
      several perspectives for further studies. We believe that integration of experimental and computational
      analyses will propel the phosphoproteomics research into a new phase.
    
   
  
    
      
    
    
      Systematic Analysis of Protein Phosphorylation Networks From Phosphoproteomic Data
      Molecular & Cellular Proteomics. 2012;11(10):1070-1083.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
iGPS
      In eukaryotes, hundreds of protein kinases (PKs) specifically and precisely modify thousands of substrates
      at specific amino
      acid residues to faithfully orchestrate numerous biological processes, and reversibly determine the cellular
      dynamics and
      plasticity. Although over 100,000 phosphorylation sites (p-sites) have been experimentally identified from
      phosphoproteomic
      studies, the regulatory PKs for most of these sites still remain to be characterized. Here, we present a
      novel software package
      of iGPS for the prediction of in vivo site-specific kinase-substrate relations mainly from the
      phosphoproteomic data.By critical evaluations and comparisons,
      the performance of iGPS is satisfying and better than other existed tools. Based on the prediction results,
      we modeled protein
      phosphorylation networks and observed that the eukaryotic phospho-regulation is poorly conserved at the site
      and substrate
      levels.This work contributes to the understanding of phosphorylation mechanisms at the systemic level, and
      provides a powerful methodology for the general analysis of in vivo post-translational
      modifications regulating sub-proteomes.
    
   
  
    
      
    
    
      Systematic analysis of the Plk-mediated phosphoregulation in eukaryotes
      Briefings in Bioinformatics. 2013;14(3):344-360.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Plk-mediated phosphoregulation
      Substantial evidence has confirmed that Polo-like kinases (Plks) play a crucial role in a variety of
      cellular processes via phosphorylation-mediated signaling transduction.
      Identification of Plk phospho-binding proteins and phosphorylation substrates is fundamental for elucidating
      the molecular mechanisms of Plks.
      Here, we present an integrative approach for the analysis of Plk-specific phospho-binding and
      phosphorylation sites (p-sites) in proteins.
      From the currently available phosphoproteomic data, we predicted tens of thousands of potential Plk
      phospho-binding and phosphorylation sites in eukaryotes, respectively.
      Furthermore, statistical analysis suggested that Plk phospho-binding proteins are more closely implicated in
      mitosis than their phosphorylation substrates.
      Additional computational analysis together with in vitro and in vivo experimental assays demonstrated that
      human Mis18B is a novel interacting partner of Plk1, while pT14 and pS48 of Mis18B were identified as
      phospho-binding sites.
      Taken together, this systematic analysis provides a global landscape of the complexity and diversity of
      potential Plk-mediated phosphoregulation, and the prediction results can be helpful for further experimental
      investigation.
    
   
  
    
      
    
    
    
      GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs
      Nucleic Acids Research. 2014;42: W325-30.
      
        [
        Abstract
        ]
        [
        Full
          Text
        ]
      
     
    ESI HCP
    Read more
   
  
    
      GPS-SUMO 2.0
      Small ubiquitin-like modifiers (SUMOs) regulate a variety of cellular processes through two distinct
      mechanisms, including covalent sumoylation and noncovalent SUMO interaction. The complexity of SUMO
      regulations has greatly hampered the large-scale identification of SUMO substrates or interaction partners
      on a proteome-wide level. In this work, we developed a new tool called GPS-SUMO for the prediction of both
      sumoylation sites and SUMO-interaction motifs (SIMs) in proteins. To obtain an accurate performance, a new
      generation group-based prediction system (GPS) algorithm integrated with Particle Swarm Optimization
      approach was applied. By critical evaluation and comparison, GPS-SUMO was demonstrated to be substantially
      superior against other existing tools and methods. With the help of GPS-SUMO, it is now possible to further
      investigate the relationship between sumoylation and SUMO interaction processes. A web service of GPS-SUMO
      was implemented in PHP + JavaScript and freely available at 
http://sumosp.biocuckoo.org.
    
 
   
  
    
      
    
    
    
      An integrated overview of spatiotemporal organization and regulation in mitosis in terms of the proteins in
        the functional supercomplexes
      Frontiers in Microbiology. 2014;5:573.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Overview
      Eukaryotic cells may divide via the critical cellular process of cell division/mitosis, resulting in two
      daughter cells with the same genetic information. A large number of dedicated proteins are involved in this
      process and spatiotemporally assembled into three distinct super-complex structures/organelles, including
      the centrosome/spindle pole body, kinetochore/centromere and cleavage furrow/midbody/bud neck, so as to
      precisely modulate the cell division/mitosis events of chromosome alignment, chromosome segregation and
      cytokinesis in an orderly fashion. In recent years, many efforts have been made to identify the protein
      components and architecture of these subcellular organelles, aiming to uncover the organelle assembly
      pathways, determine the molecular mechanisms underlying the organelle functions, and thereby provide new
      therapeutic strategies for a variety of diseases. However, the organelles are highly dynamic structures,
      making it difficult to identify the entire components. Here, we review the current knowledge of the
      identified protein components governing the organization and functioning of organelles, especially in human
      and yeast cells, and discuss the multi-localized protein components mediating the communication between
      organelles during cell division.
    
   
  
    
      
    
    
    
      IBS: an illustrator for the presentation and visualization of biological sequences
      Bioinformatics. 2015;31(20):3359-61.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    ESI Hot Paper
    Read more
   
  
    
      IBS 1.0
      Biological sequence diagrams are fundamental for visualizing various functional elements in protein or
      nucleotide sequences that enable a summarization and presentation of existing information as well as means
      of intuitive new discoveries. Here, we present a software package called illustrator of biological sequences
      (IBS) that can be used for representing the organization of either protein or nucleotide sequences in a
      convenient, efficient and precise manner. Multiple options are provided in IBS, and biological sequences can
      be manipulated, recolored or rescaled in a user-defined mode. Also, the final representational artwork can
      be directly exported into a publication-quality figure.
      The standalone package of IBS was implemented in JAVA, while the online service was implemented in HTML5 and
      JavaScript. Both the standalone package and online service are freely available at 
http://ibs.biocuckoo.org.
    
 
   
  
    
      
    
    
      RPFdb: a database for genome wide information of translated mRNA generated from ribosome profiling.
      Nucleic Acids Research. 2016;44:D254-D258.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      RPFdb
      Translational control is crucial in the regulation of gene expression and deregulation of translation is
      associated with a wide range of cancers and human diseases. Ribosome profiling is a technique that provides
      genome wide information of mRNA in translation based on deep sequencing of ribosome protected mRNA fragments
      (RPF). RPFdb is a comprehensive resource for hosting, analyzing and visualizing RPF data, available at 
http://www.rpfdb.org. The current version
      of database contains 777 samples from 82 studies in 8 species, processed and reanalyzed by a unified
      pipeline. Overall our database provides a simple way to search, analyze, compare, visualize and download RPF
      data sets.
    
 
   
  
    
      
    
    
      GPS-Lipid: a robust tool for the prediction of multiple lipid modification sites
      Scientific Reports. 2016;6:28249.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      GPS-Lipid 1.0
      As one of the most common post-translational modifications in eukaryotic cells, lipid modification is an
      important mechanism for the regulation of variety aspects of protein function. In this work, we developed a
      tool called GPS-Lipid for the prediction of four classes of lipid modifications by integrating the Particle
      Swarm Optimization with an aging leader and challengers (ALC-PSO) algorithm. GPS-Lipid was proven to be
      evidently superior to other similar tools. To facilitate the research of lipid modification, we hosted a
      publicly available web server at 
http://lipid.biocuckoo.org
      with not only the implementation of GPSLipid, but also an integrative database and visualization tool. We
      performed a systematic analysis of the co-regulatory mechanism between different lipid modifications with
      GPS-Lipid. The results demonstrated that the proximal dual-lipid modifications among palmitoylation,
      myristoylation and prenylation are key mechanism for regulating various protein functions. In conclusion,
      GPS-lipid is expected to serve as useful resource for the research on lipid modifications, especially on
      their coregulation.
    
 
   
  
    
      
    
    
      VirusMap: A visualization database for the influenza A virus
      Journal of Genetics and Genomics. 2017;44(4):281-284.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      VirusMap
      In this study, we reported a visualization platform called VirusMap, which is available at the website (
http://virusmap.renlab.org), for
      investigating the epidemiological and geographical distribution of influenza A viruses. We downloaded
      615,866 protein and 482,663 nucleotide sequences of influenza A viruses in FASTA format from IVR(Bao et al.,
      2008) andIRD(Squires et al., 2012). As the policy for the data submission in those databases, the
      information of subtype, host, sampling location, sampling time and serotype should be included for each
      virus strain. Thus, the title line of each FASTA sequence contains all of the necessary information. We
      extracted these information through a semi-automated series of steps. To ensure the data quality, only
      entries with the full information of host, serotype and sampling information were preserved. In total, there
      were 583,052 protein and 448,495nucleotide records retained in a MySQL database. As the data were obtained
      from the two most popular influenza virus resources, VirusMap contains a comprehensive and frequently
      updated dataset on the influenza A virus.
    
 
   
  
    
      
    
    
      Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis
      Nature Biotechnology. 2017;35:409–412.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      Firmiana
      Improvements in next-generation proteomics, including instrumentation, sample preparation, and computational
      analysis, have generated large amounts of data that cover protein profiling, post-translational
      modifications, and protein–protein interactions. The first draft of the human proteome, for example, made
      use of 2,000 (ref. 6) and 16,000 (ref. 5) raw files. Proteomics now calls for a uniform online pipeline that
      can host millions of data sets with the same quality standards, analyze hundreds to thousands of
      experiments, and integrate multi-dimensional omics data for knowledge mining and hypothesis generation to
      disseminate proteomics to the scientific community. Here, we describe Firmiana (V1.0) (
http://www.firmiana.org/), a one-stop proteomic data processing and
      integrated omics analysis cloud platform that allows scientists to deposit mass spectrometry (MS) raw files,
      perform proteome identification and quantification online, carry out bioinformatics analyses, extract
      knowledge, and visualize results using a biologist-friendly web interface without the need for programming
      expertise.
    
 
   
  
    
      
    
    
      A de novo substructure generation algorithm for identifying the privileged chemical fragments of liver X
        receptorβ agonists
      Scientific Reports. 2017;7:11121.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Overview
      Liver X receptorβ (LXRβ) is a promising therapeutic target for lipid disorders, atherosclerosis, chronic
      inflammation, autoimmunity, cancer and neurodegenerative diseases. Druggable LXRβ agonists have been
      explored over the past decades. However, the pocket of LXRβ ligand-binding domain (LBD) is too large to
      predict LXRβ agonists with novel scaffolds based on either receptor or agonist structures. In this paper, we
      report a de novo algorithm which drives privileged LXRβ agonist fragments by starting with individual
      chemical bonds (de novo) from every molecule in a LXRβ agonist library, growing the bonds into substructures
      based on the agonist structures with isomorphic and homomorphic restrictions, and electing the privileged
      fragments from the substructures with a popularity threshold and background chemical and biological
      knowledge. Using these privileged fragments as queries, we were able to figure out the rules to reconstruct
      LXRβ agonist molecules from the fragments. The privileged fragments were validated by building regularized
      logistic regression (RLR) and supporting vector machine (SVM) models as descriptors to predict a LXRβ
      agonist activities.
    
   
  
    
      
    
    
      m6AVar: a database of functional variants involved in m6A modification.
      Nucleic Acids Research. 2018; 46(D1): D139-145.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      m6AVar
      Here, we report m6AVar (
http://m6avar.renlab.org), a comprehensive
      database of m6A-associated variants that potentially influence m6A modification, which will help to
      interpret variants by m6A function. The m6A-associated variants were derived from three different m6A
      sources including miCLIP/PA-m6A-seq experiments (high confidence), MeRIP-Seq experiments (medium confidence)
      and transcriptome-wide predictions (low confidence). Currently, m6AVar contains 16,132 high, 71,321 medium
      and 326,915 low confidence level m6A-associated variants. We also integrated the RBP-binding regions,
      miRNA-targets and splicing sites associated with variants to help users investigate the effect of
      m6A-associated variants on post-transcriptional regulation. Because it integrates the data from genome-wide
      association studies (GWAS) and ClinVar, m6AVar is also a useful resource for investigating the relationship
      between the m6A-assocaited variants and disease. Overall, m6AVar will serve as a useful resource for
      annotating variants and identifying disease-causing variants.
    
 
   
  
    
      
    
    
      Expression and regulation of long noncoding RNAs during the osteogenic differentiation of periodontal ligament
        stem cells in the inflammatory microenvironment
      Scientific Reports. 2017;7:13991.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Overview
      Although long noncoding RNAs (lncRNAs) have been emerging as critical regulators in various tissues and
      biological processes, little is known about their expression and regulation during the osteogenic
      differentiation of periodontal ligament stem cells (PDLSCs) in inflammatory microenvironment. In this study,
      we have identified 63 lncRNAs that are not annotated in previous database. These novel lncRNAs were not
      randomly located in the genome but preferentially located near protein-coding genes related to particular
      functions and diseases, such as stem cell maintenance and differentiation, development disorders and
      inflammatory diseases. Moreover, we have identified 650 differentially expressed lncRNAs among different
      subsets of PDLSCs. Pathway enrichment analysis for neighboring protein-coding genes of these differentially
      expressed lncRNAs revealed stem cell differentiation related functions. Many of these differentially
      expressed lncRNAs function as competing endogenous RNAs that regulate protein-coding transcripts through
      competing shared miRNAs.
    
   
  
    
      
    
    
    Read more
   
  
    
      
m6ASNP
      Background: Large-scale genome sequencing projects have identified many genetic variants for diverse
      diseases. A major goal of these projects is to characterize these genetic variants to provide insight into
      their function and roles in diseases. N6-methyladenosine (m6A) is one of the most abundant RNA modifications
      in eukaryotes. Recent studies have revealed that aberrant m6A modifications are involved in many
      diseases.
      Findings: In this study, we present a user-friendly web server called “m6ASNP” that is dedicated to
      the identification of genetic variants targeting m6A modification sites. A random forest model was
      implemented in m6ASNP to predict whether the methylation status of a m6A site is altered by the variants
      surrounding the site. In m6ASNP, genetic variants in a standard VCF format are accepted as the input data,
      and the output includes an interactive table containing the genetic variants annotated by m6A function. In
      addition, statistical diagrams and a genome browser are provided to visualize the characteristics and
      annotate the genetic variants.
      Conclusions: We believe that m6ASNP is a highly convenient tool that can be used to boost further
      functional studies investigating genetic variants. The web server “m6ASNP” is implemented in JAVA and PHP
      and is freely available at http://m6asnp.renlab.org.
    
   
  
  
    
      
    
    
      Pan-Cancer Analysis Reveals the Functional Importance of Protein Lysine Modification in Cancer
        Development
      Front. Genet. 9:254. doi: 10.3389/fgene.2018.00254
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Overview
      Large-scale tumor genome sequencing projects have revealed a complex landscape of genomic
      mutations in multiple cancer types. A major goal of these projects is to characterize somatic
      mutations and discover cancer drivers, thereby providing important clues to uncover diagnostic
      or therapeutic targets for clinical treatment. However, distinguishing only a few somatic
      mutations from the majority of passenger mutations is still a major challenge facing the
      biological community. Fortunately, combining other functional features with mutations to
      predict cancer driver genes is an effective approach to solve the above problem. Protein
      lysine modifications are an important functional feature that regulates the development of
      cancer. Therefore, in this work, we have systematically analyzed somatic mutations on seven
      protein lysine modifications and identified several important drivers that are responsible for
      tumorigenesis. From published literature, we first collected more than 100,000 lysine
      modification sites for analysis. Another 1 million non-synonymous single nucleotide variants
      (SNVs) were then downloaded from TCGA and mapped to our collected lysine modification sites.
      To identify driver proteins that significantly altered lysine modifications, we further
      developed a hierarchical Bayesian model and applied the Markov Chain Monte Carlo (MCMC) method
      for testing. Strikingly, the coding sequences of 473 proteins were found to carry a higher
      mutation rate in lysine modification sites compared to other background regions.
      Hypergeometric tests also revealed that these gene products were enriched in known cancer
      drivers. Functional analysis suggested that mutations within the lysine modification regions
      possessed higher evolutionary conservation and deleteriousness. Furthermore, pathway
      enrichment showed that mutations on lysine modification sites mainly affected cancer related
      processes, such as cell cycle and RNA transport. Moreover, clinical studies also suggested
      that the driver proteins were significantly associated with patient survival, implying an
      opportunity to use lysine modifications as molecular markers in cancer diagnosis or treatment.
      By searching within protein-protein interaction networks using a random walk with restart
      (RWR) algorithm, we further identified a series of potential treatment agents and therapeutic
      targets for cancer related to lysine modifications. Collectively, this study reveals the
      functional importance of lysine modifications in cancer development and may benefit the
      discovery of novel mechanisms for cancer treatment.
    
   
  
  
  
    
      
    
    
      m6A RNA modification controls autophagy through upregulating ULK1 protein abundance
      Cell Research. 2018;
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      
Overview
      N6-methyladenosine (m6A) is the prominent dynamic mRNA modification, governed
      by methyltransferase complex (“writers”), demethylases (“erasers”) and RNA-binding
      proteins (‘readers’).1 m6A modification directs mRNAs to distinct fates by grouping them for
      differential processing, translation and decay in the processes such as cell differentiation,
      embryonic development and stress responses. Owing to a deeper understanding of this
      modification and the technological advance, functional characterizations of m6A in gene
      regulation have become a hot topic that warrants further dissection.
    
   
  
  
  
    
      
    
    
      DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning
      Genomics Proteomics Bioinformatics. 2018; 16(4): 294-306.
      
        [
        Abstract
        ]
        [
        Full Text
        ]
      
     
    Read more
   
  
    
      DeepNitro
      Protein nitration and nitrosylation are essential post-translational modifications (PTMs) involved in many
      fundamental cellular processes. Recent studies have revealed that excessive levels of nitration and nitrosylation
      in some critical proteins are linked to numerous chronic diseases. Therefore, the identification of substrates
      that undergo such modifications in a site-specific manner is an important research topic in the community and will
      provide candidates for targeted therapy. In this study, we aimed to develop a computational tool for predicting
      nitration and nitrosylation sites in proteins. We first constructed four types of encoding features, including
      positional amino acid distributions, sequence contextual dependencies, physicochemical properties, and
      position-specific scoring features, to represent the modified residues. Based on these encoding features, we
      established a predictor called DeepNitro using deep learning methods for predicting protein nitration and
      nitrosylation. Using n-fold cross-validation, our evaluation shows great AUC values for DeepNitro, 0.65 for
      tyrosine nitration, 0.80 for tryptophan nitration, and 0.70 for cysteine nitrosylation, respectively,
      demonstrating the robustness and reliability of our tool. Also, when tested in the independent dataset, DeepNitro
      is substantially superior to other similar tools with a 7%−42% improvement in the prediction performance. Taken
      together, the application of deep learning method and novel encoding schemes, especially the position-specific
      scoring feature, greatly improves the accuracy of nitration and nitrosylation site prediction and may facilitate
      the prediction of other PTM sites. DeepNitro is implemented in JAVA and PHP and is freely available for academic
      research at 
http://deepnitro.renlab.org.
    
 
   
  
  
  
    
      
    
    
      lnCAR: a comprehensive resource for lncRNAs from Cancer Arrays.
      Cancer Res February 20 2019 DOI: 10.1158/0008-5472.CAN-18-2169
      
        [
        Abstract
        ]
        [
        Full
          Text
        ]
      
     
    Read more
   
  
    
      
lnCAR
      Long non-coding RNAs (lncRNA) have emerged as promising biomarkers in cancer diagnosis, treatment, and prognosis.
      Recent studies suggest that a large number of coding gene expression microarray probes could be re-annotated as
      lncRNAs. Microarray, once the most cutting-edge high throughput gene expression technology, has been used for
      thousands of cancer studies and has brought invaluable resources for studying the functions of lncRNA in cancer
      development. However, a comprehensive lncRNA resource based on microarray data is still lacking. Here we present
      lnCAR, a comprehensive open resource for providing expression profiles and prognostic landscape of lncRNAs derived
      from re-annotation of public microarray data. Currently, lnCAR contains 52,300 samples for differential expression
      analysis and 12,883 samples for survival analysis from 10 cancer types. lnCAR allows users to interactively
      explore any annotated or novel lncRNAs. We believe lnCAR will serve as a valuable resource for the community
      focused on lncRNA research in cancer.
    
   
  
  
  
    
      
    
    
      DeepPhagy: a deep learning framework for quantitatively measuring autophagy activity in Saccharomyces
        cerevisiae.
      Autophagy. Jun 12 2019 DOI: 10.1080/15548627.2019.1632622
      
        [
        Abstract
        ]
        [
        Full
          Text
        ]
      
     
    Read more
   
  
    
      
DeepPhagy
      Seeing is believing. The direct observation of GFP-Atg8 vacuolar delivery under confocal microscopy is one of the
      most useful end-point measurements for monitoring yeast macroautophagy/autophagy. However, manually labelling
      individual cells from large-scale sets of images is time-consuming and labor-intensive, which has greatly hampered
      its extensive use in functional screens. Herein, we conducted a time-course analysis of nitrogen
      starvation-induced autophagy in wild-type and knockout mutants of 35 AuTophaGy-related (ATG) genes in
      Saccharomyces cerevisiae and obtained 1,944 confocal images containing > 200,000 cells. We manually labelled 8,078
      autophagic and 18,493 non-autophagic cells as a benchmark dataset and developed a new deep learning tool for
      autophagy (DeepPhagy), which exhibited superior accuracy in recognizing autophagic cells compared to other
      existing methods, with an area under the curve (AUC) value of 0.9710 from 10-fold cross-validations. We further
      used DeepPhagy to automatically analyze all the images and quantitatively classified the autophagic phenotypes of
      the 35 atg knockout mutants into 3 classes. The high consistency in our computational and biochemical results
      indicated the reliability of DeepPhagy for measuring autophagic activity. Moreover, we used DeepPhagy to analyze 3
      additional types of autophagic phenotypes, including the targeting of Atg1-GFP to the vacuole, the vacuolar
      delivery of GFP-Atg19, and the disintegration of autophagic bodies indicated by GFP-Atg8, all with satisfying
      accuracies. Taken together, our study not only enables the GFP-Atg8 fluorescence assay to become a quantitative
      measurement for analyzing autophagic phenotypes in S. cerevisiae but also demonstrates that deep learning-based
      methods could potentially be applied to different types of autophagy.
    
   
  
  
  
    
      
    
    
      
          BBCancer: an expression atlas of blood-based biomarkers in the early diagnosis of cancers
      
        Nucleic Acids Research. October 29 2019 DOI: 10.1093/nar/gkz942
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
BBCancer
        The early detection of cancer holds the key to combat and control the increasing global burden of cancer morbidity and mortality. Blood-based screenings using circulating DNAs (ctDNAs), circulating RNA (ctRNAs), circulating tumor cells (CTCs) and extracellular vesicles (EVs) have shown promising prospects in the early detection of cancer. Recent high-throughput gene expression profiling of blood samples from cancer patients has provided a valuable resource for developing new biomarkers for the early detection of cancer. However, a well-organized online repository for these blood-based high-throughput gene expression data is still not available. Here, we present BBCancer (http://bbcancer.renlab.org/), a web-accessible and comprehensive open resource for providing the expression landscape of six types of RNAs, including messenger RNAs (mRNAs), long noncoding RNAs (lncRNAs), microRNAs (miRNAs), circular RNAs (circRNAs), tRNA-derived fragments (tRFRNAs) and Piwi-interacting RNAs (piRNAs) in blood samples, including plasma, CTCs and EVs, from cancer patients with various cancer types. Currently, BBCancer contains expression data of the six RNA types from 5040 normal and tumor blood samples across 15 cancer types. We believe this database will serve as a powerful platform for developing blood biomarkers.
    
   
  
  
   
    
      
    
    
      
        RMVar: an updated database of functional variants involved in RNA modifications
      
        Nucleic Acids Research. 06 October 2020 DOI: 10.1093/nar/gkaa811
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
RMVar
      Distinguishing the few disease-related variants from a massive number of passenger variants is a major challenge. 
      Variants affecting RNA modifications that play critical roles in many aspects of RNA metabolism have recently been linked to many human diseases, 
      such as cancers. Evaluating the effect of genetic variants on RNA modifications will provide a new perspective for understanding the pathogenic mechanism of human diseases. 
      Previously, we developed a database called ‘m6AVar’ to host variants associated with m6A, one of the most prevalent RNA modifications in eukaryotes. 
      To host all RNA modification (RM)-associated variants, here we present an updated version of m6AVar renamed RMVar (http://rmvar.renlab.org). In this update, 
      RMVar contains 1 678 126 RM-associated variants for 9 kinds of RNA modifications, namely m6A, m6Am, m1A, pseudouridine, m5C, m5U, 2′-O-Me, A-to-I and m7G, 
      at three confidence levels. Moreover, RBP binding regions, miRNA targets, splicing events and circRNAs were integrated to assist investigations of the effects of 
      RM-associated variants on posttranscriptional regulation. In addition, disease-related information was integrated from ClinVar and other genome-wide association studies (GWAS) to 
      investigate the relationship between RM-associated variants and diseases. We expect that RMVar may boost further functional studies on genetic variants affecting RNA modifications.
      
   
  
  
  
    
      
    
    
      
        PTMsnp: A Web Server for the Identification of Driver Mutations That Affect Protein Post-translational Modification
      
        Frontiers in Cell and Developmental Biology. 10 November 2020 DOI: 10.3389/fcell.2020.593661
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
PTMsnp
      High-throughput sequencing technologies have identified millions of genetic mutations in multiple human diseases. However, the interpretation of the pathogenesis of these mutations and the discovery of driver genes that dominate disease progression is still a major challenge. Combining functional features such as protein post-translational modification (PTM) with genetic mutations is an effective way to predict such alterations. Here, we present PTMsnp, a web server that implements a Bayesian hierarchical model to identify driver genetic mutations targeting PTM sites. PTMsnp accepts genetic mutations in a standard variant call format or tabular format as input and outputs several interactive charts of PTM-related mutations that potentially affect PTMs. Additional functional annotations are performed to evaluate the impact of PTM-related mutations on protein structure and function, as well as to classify variants relevant to Mendelian disease. A total of 4,11,574 modification sites from 33 different types of PTMs and 1,776,848 somatic mutations from TCGA across 33 different cancer types are integrated into the web server, enabling identification of candidate cancer driver genes based on PTM. Applications of PTMsnp to the cancer cohorts and a GWAS dataset of type 2 diabetes identified a set of potential drivers together with several known disease-related genes, indicating its reliability in distinguishing disease-related mutations and providing potential molecular targets for new therapeutic strategies. PTMsnp is freely available at: http://ptmsnp.renlab.org.
    
   
  
  
  
    
      
    
    
      
        autoRPA: A web server for constructing cancer staging models by recursive partitioning analysis
      
      Computational and Structural Biotechnology Journal. 10 November 2020 DOI: 10.1016/j.csbj.2020.10.038
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
autoRPA
      Cancer staging provides a common language that is used to describe the severity of an individual's cancer, which plays a critical role in optimizing cancer treatment. Recursive partitioning analysis (RPA) is the most widely accepted method for cancer staging. Despite its widespread use, to date, only limited tools have been developed to implement the RPA algorithm for cancer staging. Moreover, most of the available tools can be accessed only from command lines and also lack visualization, making them difficult for clinical investigators without programing skills to use. Therefore, we developed a web server called autoRPA that is dedicated to supporting the construction of prognostic staging models and performance comparisons among different staging models. Based on the RPA algorithm and log-rank test statistics, autoRPA can establish a decision-making tree from survival data and provide clinicians an intuitive method to further prune the decision tree. Moreover, autoRPA can evaluate the contribution of each submitted covariate that is involved in the grouping process and help identify factors that significantly contribute to cancer staging. Four indicators, including hazard consistency, hazard discrimination, percentage of variation explained, and sample size balance, are introduced to validate the performance of the designed staging models. In addition, autoRPA can also be used to compare the performance of different prognostic staging models using a standard bootstrap evaluation method. The web server of autoRPA is freely available at http://rpa.renlab.org.
    
   
  
  
  
    
      
    
    
      
        DeepOMe: A Web Server for the Prediction of 2′-O-Me Sites Based on the Hybrid CNN and BLSTM Architecture
      
      Frontiers in Cell and Developmental Biology. 14 May 2021 DOI: 10.3389/fcell.2021.686894
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
DeepOMe
      2′-O-methylations (2′-O-Me or Nm) are one of the most important layers of regulatory control over gene expression. With increasing attentions focused on the characteristics, mechanisms and influences of 2′-O-Me, a revolutionary technique termed Nm-seq were established, allowing the identification of precise 2′-O-Me sites in RNA sequences with high sensitivity. However, as the costs and complexities involved with this new method, the large-scale detection and in-depth study of 2′-O-Me is still largely limited. Therefore, the development of a novel computational method to identify 2′-O-Me sites with adequate reliability is urgently needed at the current stage. To address the above issue, we proposed a hybrid deep-learning algorithm named DeepOMe that combined Convolutional Neural Networks (CNN) and Bidirectional Long Short-term Memory (BLSTM) to accurately predict 2′-O-Me sites in human transcriptome. Validating under 4-, 6-, 8-, and 10-fold cross-validation, we confirmed that our proposed model achieved a high performance (AUC close to 0.998 and AUPR close to 0.880). When testing in the independent data set, DeepOMe was substantially superior to NmSEER V2.0. To facilitate the usage of DeepOMe, a user-friendly web-server was constructed, which can be freely accessed at http://deepome.renlab.org.
    
   
  
  
  
    
      
    
    
      
        MesKit: a tool kit for dissecting cancer evolution of multi-region tumor biopsies through somatic alterations
      
      GigaScience. 21 May 2021 DOI: 10.1093/gigascience/giab036
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
MesKit
      Multi-region sequencing (MRS) has been widely used to analyze intra-tumor heterogeneity (ITH) and cancer evolution. However, comprehensive analysis of mutational data from MRS is still challenging, necessitating complicated integration of a plethora of computational and statistical approaches.
      Here, we present MesKit, an R/Bioconductor package that can assist in characterizing genetic ITH and tracing the evolutionary history of tumors based on somatic alterations detected by MRS. MesKit provides a wide range of analysis and visualization modules, including ITH evaluation, metastatic route inference, and mutational signature identification. In addition, MesKit implements an auto-layout algorithm to generate phylogenetic trees based on somatic mutations. The application of MesKit for 2 reported MRS datasets of hepatocellular carcinoma and colorectal cancer identified known heterogeneous features and evolutionary patterns, together with potential driver events during cancer evolution.
      In summary, MesKit is useful for interpreting ITH and tracing evolutionary trajectory based on MRS data. MesKit is implemented in R and available at https://bioconductor.org/packages/MesKit under the GPL v3 license.
    
   
  
  
  
    
      
    
    
      
        SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients
      
        Nucleic Acids Research. 27 September 2021 DOI: 10.1093/nar/gkab822
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
SPENCER
      As an increasing number of noncoding RNAs (ncRNAs) have been suggested to encode short bioactive peptides in cancer, the exploration of ncRNA-encoded small peptides (ncPEPs) is emerging as a fascinating field in cancer research. To assist in studies on the regulatory mechanisms of ncPEPs, we describe here a database called SPENCER (http://spencer.renlab.org). Currently, SPENCER has collected a total of 2806 mass spectrometry (MS) data points from 55 studies, covering 1007 tumor samples and 719 normal samples. Using an MS-based proteomics analysis pipeline, SPENCER identified 29 526 ncPEPs across 15 different cancer types. Specifically, 22 060 of these ncPEPs were experimentally validated in other studies. By comparing tumor and normal samples, the identified ncPEPs were divided into four expression groups: tumor-specific, upregulated in cancer, downregulated in cancer, and others. Additionally, since ncPEPs are potential targets for neoantigen-based cancer immunotherapy, SPENCER also predicted the immunogenicity of all the identified ncPEPs by assessing their MHC-I binding affinity, stability, and TCR recognition probability. As a result, 4497 ncPEPs curated in SPENCER were predicted to be immunogenic. Overall, SPENCER will be a useful resource for investigating cancer-associated ncPEPs and may boost further research in cancer.
      
   
  
  
  
    
      
    
    
      
        RPS: a comprehensive database of RNAs involved in liquid–liquid phase separation
      
        Nucleic Acids Research. 28 October 2021 DOI: 10.1093/nar/gkab986
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
RPS
      Liquid–liquid phase separation (LLPS) is critical for assembling membraneless organelles (MLOs) such as nucleoli, P-bodies, and stress granules, which are involved in various physiological processes and pathological conditions. While the critical role of RNA in the formation and the maintenance of MLOs is increasingly appreciated, there is still a lack of specific resources for LLPS-related RNAs. Here, we presented RPS (http://rps.renlab.org), a comprehensive database of LLPS-related RNAs in 20 distinct biomolecular condensates from eukaryotes and viruses. Currently, RPS contains 21,613 LLPS-related RNAs with three different evidence types, including ‘Reviewed’, ‘High-throughput’ and ‘Predicted’. RPS provides extensive annotations of LLPS-associated RNA properties, including sequence features, RNA structures, RNA–protein/RNA–RNA interactions, and RNA modifications. Moreover, RPS also provides comprehensive disease annotations to help users to explore the relationship between LLPS and disease. The user-friendly web interface of RPS allows users to access the data efficiently. In summary, we believe that RPS will serve as a valuable platform to study the role of RNA in LLPS and further improve our understanding of the biological functions of LLPS.
      
   
  
  
  
    
      
    
    
      
        TIRSF: a web server for screening gene signatures to predict Tumor immunotherapy response
      
        Nucleic Acids Research. 12 May 2022 DOI: 10.1093/nar/gkac374
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
TIRSF
      Immune checkpoint blockade (ICB) therapy has been successfully applied to clinically therapeutics in multiple cancers, but its efficacy varies greatly among different patients and cancer types. Therefore, the construction of gene signatures to identify patients who could benefit from ICB therapy is particularly important for precision cancer treatment. However, due to the lack of a user-friendly platform, the construction of such gene signatures is a great challenge for clinical investigators who have limited programming skills. In light of this challenge, we developed a web server called Tumor Immunotherapy Response Signature Finder(TIRSF) for the construction of gene signatures to predict ICB therapy response in cancer patients. TIRSF consists of three functional modules. The first module is the Signature Discovery module which provides signature construction and performance evaluation functionalities. The second is a module for response prediction based on the TIRSF signatures, which enables response prediction and prognostic analysis of immunotherapy samples. The last is a module for response prediction based on existing signatures. This module currently integrates 24 published signatures for ICB therapy response prediction. Together, all of above features can be freely accessed at http://tirsf.renlab.org/.
      
   
  
  
  
    
      
    
    
      
        IBS 2.0: an upgraded illustrator for the visualization of biological sequences
      
        Nucleic Acids Research. 17 May 2022 DOI: 10.1093/nar/gkac373
      
        [
        
            Abstract
        
        ]
        [
        
            Full Text
        
        ]
      
     
    Read more
   
  
    
      
IBS 2.0
      The visualization of biological sequences with various functional elements is fundamental for the publication of scientific achievements in the field of molecular and cellular biology. However, due to the limitations of the currently used applications, there are still considerable challenges in the preparation of biological schematic diagrams. Here, we present a professional tool called IBS 2.0 for illustrating the organization of both protein and nucleotide sequences. With the abundant graphical elements provided in IBS 2.0, biological sequences can be easily represented in a concise and clear way. Moreover, we implemented a database visualization module in IBS 2.0, enabling batch visualization of biological sequences from the UniProt and the NCBI RefSeq databases. Furthermore, to increase the design efficiency, a resource platform that allows uploading, retrieval, and browsing of existing biological sequence diagrams has been integrated into IBS 2.0. In addition, a lightweight JS library was developed in IBS 2.0 to assist the visualization of biological sequences in customized web services. To obtain the latest version of IBS 2.0, please visit https://ibs.renlab.org.
      
   
  
  
  
    
    Post-translational Modifications
    
      Our group is engaged in the study of post-translational modifications(PTMs) using computational approaches.
      We have
 been developing a high-effective algorithm named GPS (Group-based Prediction System) for the
      prediction of PTMs sites.
 Based on the GPS algorithm,over ten types of PTM predictors have been
      released. We also built a series
 databases for protein phosphorylation, lipid and lysine modifications.
      Recently, we are combining
 the computational methods with the technology of BiFC(Bimolecular
      Fluorescence
 Complementation) to develop a systematic approach for studying
 the SUMO regulation in
      Homo sapiens.
    
   
  
    
    Gene Editting with CRISPR
    Our group also focus on developing computational tools for assisting the design of CRISPR system.
      Currently, we have 
developed a high efficient binary alignment scheme to screen out potential on-target
      and off-target sites from
 the whole genome. Using machine learning methods, such as Random Forest, we
      predicted the cleavage
 efficacies of the potential target sites, and recommended an optimal gRNA design
      for the users 
based on our predictions. A subsequent experimental validation will be also 
performed
      in the near further.
    
   
  
    
    
      RNA N6-methyladenosine Modification
    
    RNA N6-methyladenosine (m6A) modification has a critical role in the regulation of many fundamental
      biological processes.
 However, the role of m6A in cancer is poorly understood. We have developed a
      computational tool, which is called
 “m6A Finder”, for predicting m6A modification sites at
      single-nucleotide resolution. We then systematically
 investigate the m6A-associated somatic mutations
      in cancers using TCGA data. We are also
 developing algorithms to analyze m6A-Seq data, such as peak
      calling
 and differential methylation analysis.