Recent Changes - Search:

Compiling Blast 2.2.28+ with Intel Composer XE 2013 for Xeon Phi.

by Juan Carlos Maureira

Strategy

  1. first compile all with ICC for host machine (Dual Sandy Bridge E5-2660)
  2. run base test single-thread and multi-thread in host machine
  3. compile boost with icc and -mmic flags.
  4. compile blast with icc and --mmic flags
  5. run the same base test on the xeon phi card.

Prerequisites

Reference: Binarybuilder

  1. Intel Composer XE 2013 installed and operational
  2. boost 1.54 compiled with intel

Compiling Boost for Host CPU with Intel Compilers

tar zxf boost-1.54.tar.gz
cd boost-1.52
/bootstrap.sh --prefix=$HOME --with-toolset=intel-linux
./b2 -j 16
./b2 install --prefix=$HOME

Compiling blast 2.2.28+ for Host system (x86_64)

tar zxf ncbi-blast-2.2.28+-src.tar.gz
cd ncbi-blast-2.2.28+-src
cd c++/compilers/unix
./ICC.sh --with-boost=$HOME  --with-bin-release --with-dll --without-debug --with-mt
cd ../..

At this point, some modifications are required to the makefiles in order to introduce optimization flags and fix some compilations problems. The previous step created a build directory on the blast source top directory. For the intel compiler version in this case, this directory is called ICC1400-ReleaseMTDLL64.

In the Makefile on the source top directory, fix the prefix in order to install blast locally in your account.

prefix=$HOME

Add some optimization flags, change the static linking for shared linking and change the CONF_AR (from the gnu version to the intel one) in the Makefile.mk localted at ICC1400-ReleaseMTDLL64/build/Makefile.mk. Note that the -O3 was changed by -O2 in order to avoid a spurious "internal backend signal error".

CONF_AR     = /opt/intel/bin/xiar -r
CONF_CFLAGS   =   -we70 -pthread -O2 -fPIC -xHost
CONF_CXXFLAGS =   -we70 -pthread -O2 -fPIC -xHost


CONF_APP_LDFLAGS = -Wl,-E -shared-intel
CONF_DLL_LDFLAGS = -nodefaultlibs -shared-intel $(DLL_UNDEF_FLAGS)

FAST_CFLAGS   =    -we70 -pthread -fPIC   -O2 -msse2
FAST_CXXFLAGS =    -we70 -pthread -fPIC   -O2 -msse2
FAST_LDFLAGS  =    -Wl,--enable-new-dtags -pthread    -O2 -msse2

Finally, make it all from the source top directory.

make -j 16

Go to make a coffee, since it will take a while. After a successfully build, check the build and if every goes well, install the blast into your home directory.

make check
make install

Running Blast in the host CPU

 time blastall -a 16 -p blastp -d ./nr/nr -i ./DROME.fasta -o ./output.blastp -e 1e-10 -b10 -v10

Compiling Boost for Xeon Phi

Reference: Intel Software Site

tar zxf boost-1.54.tar.gz
cd boost-1.52
/bootstrap.sh --prefix=$HOME/mic 
./bjam toolset=intel -j 6 --disable-icu --without-iostreams cflags="-mmic" \
  cxxflags="-mmic" linkflags="-mmic"
./b2 install --prefix=$HOME/mic

Compiling blast 2.2.28+ for Xeon Phi

First, we need to configure blast+ to compile with intel compilers disabling several features we dont need to evaluate it. Also by disabling these features, we make the compilation process easier.

tar zxf ncbi-blast-2.2.28+-src.tar.gz
cd ncbi-blast-2.2.28+-src
cd c++/compilers/unix
./ICC.sh --with-boost=$HOME/mic   --without-openssl --without-gnutls  \
 --without-mysql --without-opengl --without-icu --without-bdb \
 --with-bin-release --with-dll --without-debug --with-mt
cd ../..

we modify the makefile.mk file in order to include the proper paths and flags to build a k1om binaries.

CONF_AR     = /opt/intel/bin/xiar -r
CONF_LINK   = $(CXX) -Kc++ -mmic

CONF_CFLAGS   =   -we70 -pthread -O2 -fPIC -mmic
CONF_CXXFLAGS =   -we70 -pthread -O2 -fPIC -mmic

CONF_APP_LDFLAGS = -Wl,-E -shared-intel
CONF_DLL_LDFLAGS = -nodefaultlibs -shared-intel $(DLL_UNDEF_FLAGS)

FAST_CFLAGS   =    -we70 -pthread -fPIC   -O2 -mmic
FAST_CXXFLAGS =    -we70 -pthread -fPIC   -O2 -mmic
FAST_LDFLAGS  =    -Wl,--enable-new-dtags -pthread    -O2 mmic

LINK_DLL      = $(CXX) -Kc++ -mmic  -shared -o

Z_INCLUDE   = -I/home/inria/sophia/jcm/mic/include
Z_LIBS      = -L/home/inria/sophia/jcm/mic/lib -lz

BZ2_INCLUDE = -I$(includedir)/util/compress/bzip2 -I/home/inria/sophia/jcm/mic/include
BZ2_LIBS    = -L/home/inria/sophia/jcm/mic/lib

PCRE_INCLUDE   = -I/home/inria/sophia/jcm/mic/include
PCRE_LIBS      = -L/home/inria/sophia/jcm/mic/lib -lpcre

Compiling Dependencies Blast on the for Xeon Phi

Before to compile k1om binaries with autotools (autogen.sh,configure,make, etc), we need to prepare the host machine to automatically offload (or execute) k1om binaries into the Xeon Phi. Note that configure scripts usually compile and run example code to determine compilation features availability or on-the-fly code generation from templates (which is the case of blast+). So, compilation may fail when trying to execute a k1om binary on the host machine.. This process can be done in two ways:

1. by using the micnativeloadex script 2. by using the ssh -c

The overall process is the follow:

  1. create a script, let us say, /usr/bin/runmic
  2. implement the execution script with either option 1 or 2.
  3. add the execution script to the binfmt_misc register.
echo ':K1OM:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb5:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/usr/bin/runmic:' > /proc/sys/fs/binfmt_misc/register

Examples for runmic scripts are the following:

micnativeloadex version

[root@node ~]# cat /usr/bin/runmic
!/bin/bash
export SINK_LD_LIBRARY_PATH=$SINK_LD_LIBRARY_PATH:$LD_LIBRARY_PATH
export SINK_LD_LIBRARY_PATH=$SINK_LD_LIBRARY_PATH:/opt/intel/mic/lib64
export SINK_LD_LIBRARY_PATH=$SINK_LD_LIBRARY_PATH:/opt/intel/lib/mic
export SINK_LD_LIBRARY_PATH=$SINK_LD_LIBRARY_PATH:/opt/intel/mkl/lib/mic

export PATH=$PATH:`pwd`
cmd=$1
shift
args="$@"
args="${args//\'/\'}"
args="${args//\"/\\\"}"
/usr/local/bin/micnativeloadex $cmd -a "$args"
[root@node ~]# 

ssh version

#!/bin/bash

quote() {

    echo "$1" | sed "s/\"/\\'/g"

}

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/mic/lib64:/opt/intel/lib/mic:/opt/intel/mkl/lib/mic export PATH=$PATH:`pwd`

cmd=$1

shift

args=$@

ssh mic0 -C "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH; cd `pwd`; $cmd $args"

pcre-8.33

tar zxf pcre-8.33.tar.gz
cd pcre-8.33
CC=icc CXX=icc CFLAGS="-mmic" CXXFLAGS="-mmic" LDFLAGS="-mmic"  ./configure \
   --host=x86_64-k1om-linux --prefix=$HOME/mic
make -j 8
make install

zlib-1.2.8

tar zxf zlib-1.2.8.tar.gz
cd zlib-1.2.8
CC=icc CXX=icc CFLAGS="-mmic" CXXFLAGS="-mmic" LDFLAGS="-mmic"  ./configure --prefix=$HOME/mic
make -j 8
make install

bzip2-1.0.6

Change the following variables in the Makefile

CC=icc
AR=/usr/linux-k1om-4.7/bin/x86_64-k1om-linux-ar
CFLAGS=-mmic -fPIC -Wall -Winline -O2 -g $(BIGFILES)
PREFIX=$HOME/mic

Then, make and make install. Make check will fail since host cannot run k1om binaries.

libxml2-2.9.1

tar zxf libxml2-2.9.1.tar.gz
cd libxml2-2.9.1
[jcm@compute-0-6 libxml2-2.9.1]$ CC=icc CXX=icc CFLAGS="-mmic" CXXFLAGS="-mmic"  LDFLAGS="-mmic" \
   ./configure --prefix=$HOME/mic --host=x86_64-k1om-linux --without-python
make -j 8
make install

libxslt-1.1.28

tar zxf libxslt-1.1.28.tar.gz
cd libxslt-1.1.28
[jcm@compute-0-6 libxml2-2.9.1]$ CC=icc CXX=icc CFLAGS="-mmic" CXXFLAGS="-mmic"  LDFLAGS="-mmic" \
   ./configure --prefix=$HOME/mic --host=x86_64-k1om-linux \
               --without-python --with-libxml-prefix=$HOME/mic 
make -j 8
make install

Performance Evaluation

All test using 100 proteins against the whole NR database:

Host CPU

  1. Blastp with 16 cores: 29 min.
  2. Blastp with 1 core: >184m

Xeon Phi

  1. Blastp with 32 cores native on Xeon Phi > 184m (memory problems)