MEG analysis on Biowulf

From MEG Core
Jump to navigation Jump to search

!!Under Construction!!

Biowulf brief intro

Biowulf (biowulf.nih.gov) is the head node of the Biowulf cluster at NIH - https://hpc.nih.gov/docs/userguide.html

 #For processing of data - reserve an sinteractive session
 sinteractive

Helix - is the storage server attached to the biowulf cluster. !!Do not upload/download data directly to biowulf - use helix!!

 #Use scp for transfer or rsync -av (slower but some benefits)
 scp -r ./my_local_results  ${USERNAME}@helix.nih.gov:/data/mydatafolder/....  
 #Getting results from biowulf cluster
 scp -r ${USERNAME}@helix.nih.gov:/data/mydatafolder/...  ${PATH_OFSTUFF}/my_local_stuff
 

Analysis of data should not be performed on the biowulf head node, but run through an sinteractive node or swarm process.
To start with, there are a limited number of commands loaded on the system. To access more programs use module load. To search, use module spider.

 e.g. module load afni

SAM MEG Data Analysis

 module load afni 
 module load ctf
 module load samsrcv3/20180713-c5e1042

MNE python data analysis

To Access Additional MEG modules

 #Add the following line to your ${HOME}/.bashrc
 module use --append /data/MEGmodules/modulefiles

Load MNE modules

 #The module will not load on the biowulf head-node because freesurfer loads
 #Create an sinteractive or spersist node - adjust memory and cpus core number accordingly
 sinteractive --mem=6G --cpus-per-task=4  
 module load mne/0.24.1   # OR module load mne  <<-- defaults to most current version
 
 ipython
 import mne

Best Practices for Group Data Preprocessing

Process your project data

Make your python script commandline callable

 #It is typical to run 1 subject per commandline call and to parrallelize over subjects in the swarm file
 -Use argpase to manually build a full commandline call with keyword inputs and function description 
 -Use fire or click to automatically create a commandline call based on function inputs
 -Use sys.argv[] to create a simple commandline input (sys.argv[0] is the filename - argv[1] is the first argument - argv[2] is second...)
 Helpful Hints: 
 -Use the meg dataset as an input, inside the python module - use the meg dataset to extract the subject ID :
   filename = os.path.basename(meg_dataset) 
   subjid = filename.split('_')[0]  OR if you have a custom ID subjid = filename[0:#characters]
 -Build output filenames using f-strings: 
   outfile_base = f'{subjid}_{taskname}.nii'
   outfilename = os.path.join(outdir, outfile_base)

Build, test, and submit your swarm file

 for i in ${GROUP_FOLDER}/*.ds; do echo my_process.py -in1 input1 -in2 -input2 -dataset $i >> swarm_file_preprocess.sh ; done
 
 #Make sure the process runs on at least one subject/dataset
 #This will run the last line of the swarmfile
 $(tail -1 ./swarm_file_preprocess.sh) 
 
 #Verify the results on the single subject / possibly look at how much RAM / CPU was used before submitting the full batch to swarm
 swarm -f ./swarm_file_preprocess.sh -g ${GigsOfRAM} -t ${CPUcores}  # -b ${How many subjects to run in row on 1 computer} - can be useful if you have a fast process

ADVANCED: Making your own python module

Build the python conda environment

It is recommended to create an install script so that this can be sent to a slurm job

 # Load conda - if set up according to the HPC page, this should work
 source /data/${USER}/conda/etc/profile.d/conda.sh; conda activate base
 
 # echo mamba create -p ${PATH_TO_OUTPUT} condaPackage1 condaPackage2 conda-forge::condaForgePackage1  -y  > installFile.sh
 # Make sure to include the -y or the job will hang waiting for user response
 # Also make sure you have an active conda prompt when submitting the swarm, or else it will fail
 echo mamba create -p /data/ML_MEG/python_modules/mne0.24.1 jupyter ipython conda-forge::mne -y  > python_install.sh
 swarm -f ./python_install.sh -g 4 -t 4

Make a module file

To display most of the contents of a module file run

 module display python  #For the python module

Output:

 ----------------------------------------------------------------------------------
  /usr/local/lmod/modulefiles/python/3.8.lua:
 ----------------------------------------------------------------------------------
 family("python")
 prepend_path("PATH","/usr/local/Anaconda/envs/py3.8/bin")
 pushenv("OMP_NUM_THREADS","1")

Copy Template to your module folder

 #MyModule is the family name of the code / ${Version}.lua
 cp /usr/local/lmod/modulefiles/python/3.8.lua  ${myModuleFilesDir}/${MyModule}/0.1.lua

Add module files to the search path

 module use --append ${PathToUserModuleFiles}

Final Step Load Your Module

 # !! The module load process does not give good feedback that it doesn't load - make sure the path in lua file is correct !!
 module load ${MyModuleName}
 #Example
 [$USERd@$NODE python_modules]$ module load mne
 
 [$USERd@$NODE python_modules]$ ipython
 Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) 
 Type 'copyright', 'credits' or 'license' for more information
 IPython 8.1.1 -- An enhanced Interactive Python. Type '?' for help.
 
 In [1]: import mne
 
 In [2]: mne.__path__
 Out[2]: ['/data/ML_MEG/python_modules/mne0.24.1/lib/python3.9/site-packages/mne']