MEG analysis on Biowulf

From MEG Core
Jump to navigation Jump to search

Biowulf brief intro

Biowulf (biowulf.nih.gov) is the head node of the Biowulf cluster at NIH - https://hpc.nih.gov/docs/userguide.html

 #For processing of data - reserve an sinteractive session
 sinteractive --mem=6G --cpus-per-task=4
 
 #Depending on your usage, you may need more --mem and more cpu cores.  Also for jupyter notebook, you will need the --tunnel option

Helix - is the storage server attached to the biowulf cluster. !!Do not upload/download data directly to biowulf - use helix!!

 #Use scp for transfer or rsync -av (slower but some benefits)
 scp -r ./my_local_results  ${USERNAME}@helix.nih.gov:/data/mydatafolder/....  
 #Getting results from biowulf cluster
 scp -r ${USERNAME}@helix.nih.gov:/data/mydatafolder/...  ${PATH_OFSTUFF}/my_local_stuff

Analysis of data should not be performed on the biowulf head node, but run through an sinteractive node or swarm process.
To start with, there are a limited number of commands loaded on the system. To access more programs use module load. To search, use module spider.

 e.g. module load afni

Configuring your bash shell environment

If editing your bashrc -- open two terminals in biowulf. If you misconfigure your .bashrc, you will not be able to log into biowulf. Having two terminals open allows you to fix anything that errors out.

Edit .bashrc file in your home drive

 umask 002   #Gives automatic group permissions to every file you create -- very very helpful for working with your team
 
 #Add modules bin path to access the MEG modules
 PATH=/data/MEGmodules/bin:$PATH
 
 ## Set up some aliases, so you don't have to type these out
 alias sinteractive_small='sinteractive --mem=8G --cpus-per-task=4 --gres=lscratch:30'
 alias sinteractive_medium='sinteractive --mem=16G --cpus-per-task=12 --gres=lscratch:100'
 alias sinteractive_large='sinteractive --mem=24G --cpus-per-task=32 --gres=lscratch:150'

Edit .bash_profile in your home drive

 #Set your default group (normally your default is your userID - which isn't helpful for your group
 #Type `groups`  to see which groups you are part of
 newgrp  <<YOUR GROUP ID>>   


SAM MEG Data Analysis

 module load afni 
 module load ctf
 module load samsrcv3/20180713-c5e1042

MNE python data analysis

To Access Additional MEG modules

 #Add the following line to your ${HOME}/.bashrc
 module use --append /data/MEGmodules/modulefiles

Load MNE modules

 #The module will not load on the biowulf head-node because freesurfer loads
 #Create an sinteractive or spersist node - adjust memory and cpus core number accordingly
 sinteractive --mem=6G --cpus-per-task=4  
 module load mne/0.24.1   # OR module load mne  <<-- defaults to most current version
 
 ipython
 import mne

Load Commandline MNE scripts for Processing Data

!Currently in Beta Testing!

 [stoutjd@cn1023 modules]$ module list 
 No modules loaded
 [stoutjd@cn1023 modules]$ module load mne_scripts
 [+] Loading freesurfer  7.1.1  on cn1023 
 [+] Loading mne 0.24.1  ... 
 [+] Loading mne_scripts 0.1_dev  ... 
 Available:
  spatiotemporal_clustering_stats.py
 [stoutjd@cn1023 modules]$ spatiotemporal_clustering_stats.py -h 
 usage: spatiotemporal_clustering_stats.py [-h] [-topdir TOPDIR] [-search SEARCH]
                                           [-outfname OUTFNAME]
                                           [-subjects_dir SUBJECTS_DIR]
 
 options:
   -h, --help            show this help message and exit
   -topdir TOPDIR        The directory w/ stc files
   -search SEARCH        The search term to find the specific stc files for clustering
   -outfname OUTFNAME    Output filename for the cluster nifti file
   -subjects_dir SUBJECTS_DIR
                         Freesurfer subjects dir. Will download freesurfer average if not
                         already present. Defaults to os.environ['SUBJECTS_DIR'] if not
                         provided

MNE bids creation and MNE bids pipeline processing on biowulf

Start interactive session with scratch to render visualization offscreen

 sinteractive --mem=6G --cpus-per-task=4 --gres=lscratch:50

Create BIDS data from data off the scanner

 module load mne_scripts
 make_meg_bids.py -meg_input_dir MEGFOLDER -mri_brik AFNI_COREGED+orig.BRIK

Process data using MNE Bids Pipeline

 module purge 
 module load mne_bids_pipeline
 mne-bids-pipeline-run.py --config=CONFIG.py  ##Optional --steps=preprocessing,source  --subject=SUBJECTID(without sub-)

Best Practices for Group Data Preprocessing

Process your project data

Make your python script commandline callable

 #It is typical to run 1 subject per commandline call and to parrallelize over subjects in the swarm file
 -Use argpase to manually build a full commandline call with keyword inputs and function description 
 -Use fire or click to automatically create a commandline call based on function inputs
 -Use sys.argv[] to create a simple commandline input (sys.argv[0] is the filename - argv[1] is the first argument - argv[2] is second...)
 Helpful Hints: 
 -Use the meg dataset as an input, inside the python module - use the meg dataset to extract the subject ID :
   filename = os.path.basename(meg_dataset) 
   subjid = filename.split('_')[0]  OR if you have a custom ID subjid = filename[0:#characters]
 -Build output filenames using f-strings: 
   outfile_base = f'{subjid}_{taskname}.nii'
   outfilename = os.path.join(outdir, outfile_base)

Build, test, and submit your swarm file

 for i in ${GROUP_FOLDER}/*.ds; do echo my_process.py -in1 input1 -in2 -input2 -dataset $i >> swarm_file_preprocess.sh ; done
 
 #Make sure the process runs on at least one subject/dataset
 #This will run the last line of the swarmfile
 $(tail -1 ./swarm_file_preprocess.sh) 
 
 #Verify the results on the single subject / possibly look at how much RAM / CPU was used before submitting the full batch to swarm
 swarm -f ./swarm_file_preprocess.sh -g ${GigsOfRAM} -t ${CPUcores}  # -b ${How many subjects to run in row on 1 computer} - can be useful if you have a fast process

ADVANCED: Making your own python module

Build the python conda environment

It is recommended to create an install script so that this can be sent to a slurm job

 # Load conda - if set up according to the HPC page, this should work
 source /data/${USER}/conda/etc/profile.d/conda.sh; conda activate base
 
 # echo mamba create -p ${PATH_TO_OUTPUT} condaPackage1 condaPackage2 conda-forge::condaForgePackage1  -y  > installFile.sh
 # Make sure to include the -y or the job will hang waiting for user response
 # Also make sure you have an active conda prompt when submitting the swarm, or else it will fail
 echo mamba create -p /data/ML_MEG/python_modules/mne0.24.1 jupyter ipython conda-forge::mne -y  > python_install.sh
 swarm -f ./python_install.sh -g 4 -t 4

Make a module file

To display most of the contents of a module file run

 module display python  #For the python module

Output:

 ----------------------------------------------------------------------------------
  /usr/local/lmod/modulefiles/python/3.8.lua:
 ----------------------------------------------------------------------------------
 family("python")
 prepend_path("PATH","/usr/local/Anaconda/envs/py3.8/bin")
 pushenv("OMP_NUM_THREADS","1")

Copy Template to your module folder

 #MyModule is the family name of the code / ${Version}.lua
 cp /usr/local/lmod/modulefiles/python/3.8.lua  ${myModuleFilesDir}/${MyModule}/0.1.lua

Add module files to the search path

 module use --append ${PathToUserModuleFiles}

Final Step Load Your Module

 # !! The module load process does not give good feedback that it doesn't load - make sure the path in lua file is correct !!
 module load ${MyModuleName}
 #Example
 [$USERd@$NODE python_modules]$ module load mne
 
 [$USERd@$NODE python_modules]$ ipython
 Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) 
 Type 'copyright', 'credits' or 'license' for more information
 IPython 8.1.1 -- An enhanced Interactive Python. Type '?' for help.
 
 In [1]: import mne
 
 In [2]: mne.__path__
 Out[2]: ['/data/ML_MEG/python_modules/mne0.24.1/lib/python3.9/site-packages/mne']