Via PRACE SHAPE NSilico teamed up with computational experts from CINES (France) and ICHEC (Ireland) to address the key problem of rapid alignment of short DNA sequences to reference genomes by deploying the Smith-Waterman algorithm on an emerging many-core technology, the Intel Xeon Phi co-processor.
The project entitled high performance computation for short read alignment investigated high performance computational techniques for the analysis of ribosomal RNA, which is the mechanism that cells use to translate an organism’s DNA into protein. Next generation sequencing techniques are enabling the capture of vast amounts of data on the ribosomal RNA characteristics of cells in varying conditions. However reads for such RNA fragments are smaller than those typically encountered in sequencing projects, hence most alignment algorithms are optimised for longer reads.
“The SHAPE project has been a very successful collaboration between NSilico and the PRACE partners involved. NSilico has benefited from domain expertise from PRACE in first identifying a bioinformatics codebase with real potential to be deployed on cutting-edge many-core hardware. It has also since gained invaluable insights into the optimisation and parallelisation work involved in porting the code to the Intel Xeon Phi. Next steps are already being discussed on testing and deployment of the code with the release of the next generation “Knights Landing” hardware, as well as potential incorporation into NSilico’s in-house bioinformatics pipelines,” says Paul Walsh of NSilico.
Example output from the Smith-Waterman sequence alignment algorithm.
# Aligned_sequences: 2
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
# Length: 39
# Identity: 33/39 (84.6%)
# Similarity: 33/39 (84.6%)
# Gaps: 3/39 ( 7.7%)
# Score: 123.0
SW_001 1 TACCG-ACTTCTAGCGACACACCCCGGGCCCTTAGACAC 38
||||| ||||||||| ||||| .||||||..||||||||
SW_001 2 TACCGAACTTCTAGC-ACACA-TCCGGGCATTTAGACAC 38
The project team adopted two approaches to optimise and parallelise the SSW library, first using modern SIMD intrinsics and the second using OpenMP. The OpenMP parallelisation work has led to a code that shows good parallel performance results on standard x86 processors and promising results for Xeon Phi many-core hardware. While the resulting SSW library achieves expectedly limited performance gains on the current generation of the Xeon Phi, it has been re-factored in a way to readily take advantage of the next generation of hardware such as Xeon Phi “Landing” with upcoming AVX 512 features.
The results of the project were presented during the SHAPE parallel track of PRACEdays14: http://www.prace-ri.eu/pracedays14-presentations/
Title: High performance computation for short read alignment
Leader: Dr Paul Walsh; NSilico Life Science Ltd, IRELAND\
Collaborators: Dr Simon Wong, Irish Centre for High-End Computing (ICHEC), Ireland | Mr Xiangwu Lu, NSilico Life Science Ltd, Ireland | Dr Tristan Cabel, Mr Gabriel Hautreux, Mr Eric Boyer, CINES, France | Nicolas Mignerey, GENCI, France
Research field: Medicine and Life Sciences
Resource awarded: 100.000 core hours on MareNostrum @ BSC, Spain | 20.000 core hours on MareNostrum Hybrid Nodes @ BSC, Spain
More detailed results of this project, as well as the other 10 first SHAPE projects are available on the PRACE website: http://www.prace-ri.eu/SHAPE-Prototypes
NSilico is a company based in Ireland and is a developer of integrated molecular diagnostics and sequence data management and analytic tools for the life sciences and healthcare industries. The company’s offerings are based upon a unique and unrivalled blend of biological, computing, software development and clinical experience and expertise. Currently, the company has two product offerings: SimplicityTM, a cloud based bioinformatics research pipeline tool; and SimplicityEHRTM, for cancer care management. SimplicityTM is NSilico’s lead product and is one of the most comprehensive, easy-to-use, cloud-based software-as –a-service products for the automatic annotation, analysis and visualisation of high-throughput sequencing data. It is scalable and customizable to user needs and allows automated and rapid extraction and reporting of high value information that aids in the discovery of biomarkers/genetic profiles through the creation of publication standard, rich reports. Its usability and power help to dramatically reduce research time cycles. http://www.nsilico.com/
SHAPE, the SME HPC Adoption Programme in Europe is a pan-European, PRACE-based programme supporting HPC adoption by SMEs. The Programme aims to raise awareness and equip European SMEs with the expertise necessary to take advantage of the innovation possibilities openedup by High Performance Computing (HPC), thus increasing their competitiveness. http://www.prace-ri.eu/shape
The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels. The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are provided by 4 PRACE members (BSC representing Spain, CINECA representing Italy, GCS representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU’s Seventh Framework Programme (FP7/2007-2013) under grant agreements RI-283493 and RI-312763. For more information, see www.prace-ri.eu
Do you want more information? Do you want to subscribe to our mailing lists?
Please visit the PRACE website: http://www.prace-ri.eu
Or contact Marjolein Oorsprong, Communications Officer:
Telephone: +32 2 613 09 27 E-mail: M.Oorsprong[at]staff.prace-ri.eu