Report title: Development and application of key computing technologies for third-generation seabet club
Report time:September 18, 2020 (Friday) 11:00 amTo12:30
Reporting location: Campus Computer Building 313
Reporter: Dr. Xiao Chuanle (Zhongshan Ophthalmology Center, Sun Yat-sen seabet club)
Report Summary:
The third generation seabet club technology has read length (approximately20kbp), NoneFeatures such as PCR amplification preference and base modification sensitivity,In the genomes of animals and plantsde novoObvious advantages in assembly and epigenetic detection research。As a powerful supplement or alternative to second-generation seabet club technology,has become a hot research topic in recent years,Research results are often published inCNS and other top international journals。The current high seabet club error rate (12-15%) of third-generation seabet club data is a huge challenge facing third-generation data analysis,The huge consumption of computing resources caused by high error rates is a major bottleneck hindering the widespread application of third-generation seabet club。First,In terms of third-generation seabet club genome assembly,We proposed a global seed voting scoring model to replace the traditional sequence alignment model,Developed a rapid assembly systemMECAT。MECAT’s assembly speed for human data sets is 17-56 times that of similar software (Canu and FALCON),The seabet club published in the journal Nature Methods in 2017,Currently MECAT has assembled more than 20 plant genomes with Chinese characteristics. Secondly, inNanopore sequence correction,For the problem of local uneven distribution of Nanopore errors,We proposed an accuracy priority sequence correction model,Significantly improve the speed and accuracy of sequence correction,Developed Nanopore rapid assembly system NECAT,This software is 20 times more powerful than similar software (Canu) (seabet club,2020)。Also,In terms of epigenetic modification detection, targetingPacBioLarge GenomeDNA-6mA resource consumption problem,We proposed a parallel detection method for genome region partitioning。The first systematic revelation of humankindDistribution pattern of DNA-6mA spectrum, gene expression regulation pattern, methylase (N6AMT and ALKBH1) and their relationship with cancer,The seabet club obtained2018Posted inMolecular CellIn the magazine. also,forseabet club surface modification detection accuracy is lowTo solve the problem of complex background signals, we established identificationDeep Recurrent Neural Network (RNN) seabet club of Nanopore Epidermal Modification (5mC and 6mA),Developed the corresponding software DeepMod,Achieved high-precision detection at the single base level of the entire genome5mC and 6mA, 5mC and 6mAThe average detection seabet club can be as high as respectively99%and90%, the result was obtainedPublished in seabet club in 2019。
About the speaker:
Xiao Chuangle,Ph.D. in Bioinformatics,Winner of Guangdong Provincial Outstanding Youth Fund,Associate researcher at Zhongshan Ophthalmology Center, Sun Yat-sen seabet club,Independent State Key Laboratory of OphthalmologyPI. long-term commitmentResearch on the development and application of seabet club data analysis methods,In recent years, a series of key algorithms and supporting software have been established to address the computational bottleneck issues in basic research and applications of third-generation seabet club genomics and epigenetics。The main research directions are: (1) Third generation seabet club data genomeComputational method development: Aiming at the time-consuming problem of third-generation seabet club sequence comparison,Proposed a long sequence seed voting scoring model,And developed a rapid assembly systemMECATSignificantly improve calculation speed (seabet club Methods, 2017);forNanopore seabet club errors are widely distributed and local unevenly distributed,Proposed accuracy priority sequence correction model,Significantly improve the speed and accuracy of Nanopore sequence correction (Nature Communications,2020);(2)Development and application of epigenetic methods for third-generation seabet club: for third-generation seabet clubPacBio’s high computational cost and background noise issues,ProposedPacBio methylation parallel computing seabet club, the first systematic revelation of human beingsDNA-6mA methylation profile(Molecular Cell, 2018);and the first identification was establishedDeep learning model of Nanopore electrical signal modification,The accuracy of identifying 5mC and 6mA is as high as 99% and 90% respectively (seabet club,2019);Currently serving as the first or corresponding authorseabet club Methods、Molecular Cell、seabet club and other journals publish high-level SCI papersMore than twenty articles, developed successivelyNECAT, DeepMod, MECAT,MECAT2andFANSe2 and more than ten othersBioinformatics analysis tools,Frontiers in GeneticsSpecial issue guest editor,Genome Biology、seabet club、BioinformaticsReviewer for other professional journals.