MEG analysis on Biowulf: Difference between revisions
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
!!Under Construction!! |
|||
== Biowulf brief intro == |
== Biowulf brief intro == |
||
Biowulf (biowulf.nih.gov) is the head node of the Biowulf cluster at NIH - https://hpc.nih.gov/docs/userguide.html<br> |
Biowulf (biowulf.nih.gov) is the head node of the Biowulf cluster at NIH - https://hpc.nih.gov/docs/userguide.html<br> |
||
#For processing of data - reserve an sinteractive session |
|||
⚫ | |||
sinteractive --mem=6G --cpus-per-task=4 |
|||
#Depending on your usage, you may need more --mem and more cpu cores. Also for jupyter notebook, you will need the --tunnel option |
|||
⚫ | |||
#Use scp for transfer or rsync -av (slower but some benefits) |
|||
scp -r ./my_local_results ${USERNAME}@helix.nih.gov:/data/mydatafolder/.... |
|||
#Getting results from biowulf cluster |
|||
scp -r ${USERNAME}@helix.nih.gov:/data/mydatafolder/... ${PATH_OFSTUFF}/my_local_stuff |
|||
Analysis of data should not be performed on the biowulf head node, but run through an sinteractive node or swarm process. <br> |
Analysis of data should not be performed on the biowulf head node, but run through an sinteractive node or swarm process. <br> |
||
To start with, there are a limited number of commands loaded on the system. To access more programs use module load. To search, use module spider. |
To start with, there are a limited number of commands loaded on the system. To access more programs use module load. To search, use module spider. |
||
e.g. module load afni |
e.g. module load afni |
||
== Configuring your bash shell environment == |
|||
If editing your bashrc -- open two terminals in biowulf. If you misconfigure your .bashrc, you will not be able to log into biowulf. Having two terminals open allows you to fix anything that errors out. |
|||
=== Edit .bashrc file in your home drive === |
|||
umask 002 #Gives automatic group permissions to every file you create -- very very helpful for working with your team |
|||
#Add modules bin path to access the MEG modules |
|||
PATH=/data/MEGmodules/bin:$PATH |
|||
## Set up some aliases, so you don't have to type these out |
|||
alias sinteractive_small='sinteractive --mem=8G --cpus-per-task=4 --gres=lscratch:30' |
|||
alias sinteractive_medium='sinteractive --mem=16G --cpus-per-task=12 --gres=lscratch:100' |
|||
alias sinteractive_large='sinteractive --mem=24G --cpus-per-task=32 --gres=lscratch:150' |
|||
=== Edit .bash_profile in your home drive === |
|||
#Set your default group (normally your default is your userID - which isn't helpful for your group |
|||
#Type `groups` to see which groups you are part of |
|||
newgrp <<YOUR GROUP ID>> |
|||
Line 29: | Line 54: | ||
ipython |
ipython |
||
import mne |
import mne |
||
===Load Commandline MNE scripts for Processing Data=== |
|||
!Currently in Beta Testing! |
|||
[stoutjd@cn1023 modules]$ module list |
|||
No modules loaded |
|||
[stoutjd@cn1023 modules]$ module load mne_scripts |
|||
[+] Loading freesurfer 7.1.1 on cn1023 |
|||
[+] Loading mne 0.24.1 ... |
|||
[+] Loading mne_scripts 0.1_dev ... |
|||
Available: |
|||
spatiotemporal_clustering_stats.py |
|||
[stoutjd@cn1023 modules]$ spatiotemporal_clustering_stats.py -h |
|||
usage: spatiotemporal_clustering_stats.py [-h] [-topdir TOPDIR] [-search SEARCH] |
|||
[-outfname OUTFNAME] |
|||
[-subjects_dir SUBJECTS_DIR] |
|||
options: |
|||
-h, --help show this help message and exit |
|||
-topdir TOPDIR The directory w/ stc files |
|||
-search SEARCH The search term to find the specific stc files for clustering |
|||
-outfname OUTFNAME Output filename for the cluster nifti file |
|||
-subjects_dir SUBJECTS_DIR |
|||
Freesurfer subjects dir. Will download freesurfer average if not |
|||
already present. Defaults to os.environ['SUBJECTS_DIR'] if not |
|||
provided |
|||
===MNE bids creation and MNE bids pipeline processing on biowulf=== |
|||
====Start interactive session with scratch to render visualization offscreen==== |
|||
sinteractive --mem=6G --cpus-per-task=4 --gres=lscratch:50 |
|||
====Create BIDS data from data off the scanner==== |
|||
module load mne_scripts |
|||
make_meg_bids.py -meg_input_dir MEGFOLDER -mri_brik AFNI_COREGED+orig.BRIK |
|||
====Process data using MNE Bids Pipeline==== |
|||
module purge |
|||
module load mne_bids_pipeline |
|||
mne-bids-pipeline-run.py --config=CONFIG.py ##Optional --steps=preprocessing,source --subject=SUBJECTID(without sub-) |
|||
==Best Practices for Group Data Preprocessing== |
==Best Practices for Group Data Preprocessing== |
||
Line 34: | Line 97: | ||
====Make your python script commandline callable==== |
====Make your python script commandline callable==== |
||
#It is typical to run 1 subject per commandline call and to parrallelize over subjects in the swarm file |
#It is typical to run 1 subject per commandline call and to parrallelize over subjects in the swarm file |
||
-Use [https://docs.python.org/3/library/argparse.html argpase] to manually build a full commandline call |
-Use [https://docs.python.org/3/library/argparse.html argpase] to manually build a full commandline call with keyword inputs and function description |
||
-Use [https://google.github.io/python-fire/guide/ fire] or [https://click.palletsprojects.com/en/8.0.x/ click] to automatically create a commandline call based on function inputs |
-Use [https://google.github.io/python-fire/guide/ fire] or [https://click.palletsprojects.com/en/8.0.x/ click] to automatically create a commandline call based on function inputs |
||
-Use sys.argv[] to create a simple commandline input (sys.argv[0] is the filename - argv[1] is the first argument - argv[2] is second...) |
-Use sys.argv[] to create a simple commandline input (sys.argv[0] is the filename - argv[1] is the first argument - argv[2] is second...) |
||
Line 50: | Line 113: | ||
#Make sure the process runs on at least one subject/dataset |
#Make sure the process runs on at least one subject/dataset |
||
#This will run the last line of the swarmfile |
|||
tail -1 swarm_file_preprocess.sh |
$(tail -1 ./swarm_file_preprocess.sh) |
||
#Verify the results on the single subject / possibly look at how much RAM / CPU was used before submitting the full batch to swarm |
#Verify the results on the single subject / possibly look at how much RAM / CPU was used before submitting the full batch to swarm |
Latest revision as of 11:03, 26 January 2024
Biowulf brief intro
Biowulf (biowulf.nih.gov) is the head node of the Biowulf cluster at NIH - https://hpc.nih.gov/docs/userguide.html
#For processing of data - reserve an sinteractive session sinteractive --mem=6G --cpus-per-task=4 #Depending on your usage, you may need more --mem and more cpu cores. Also for jupyter notebook, you will need the --tunnel option
Helix - is the storage server attached to the biowulf cluster. !!Do not upload/download data directly to biowulf - use helix!!
#Use scp for transfer or rsync -av (slower but some benefits) scp -r ./my_local_results ${USERNAME}@helix.nih.gov:/data/mydatafolder/....
#Getting results from biowulf cluster scp -r ${USERNAME}@helix.nih.gov:/data/mydatafolder/... ${PATH_OFSTUFF}/my_local_stuff
Analysis of data should not be performed on the biowulf head node, but run through an sinteractive node or swarm process.
To start with, there are a limited number of commands loaded on the system. To access more programs use module load. To search, use module spider.
e.g. module load afni
Configuring your bash shell environment
If editing your bashrc -- open two terminals in biowulf. If you misconfigure your .bashrc, you will not be able to log into biowulf. Having two terminals open allows you to fix anything that errors out.
Edit .bashrc file in your home drive
umask 002 #Gives automatic group permissions to every file you create -- very very helpful for working with your team #Add modules bin path to access the MEG modules PATH=/data/MEGmodules/bin:$PATH ## Set up some aliases, so you don't have to type these out alias sinteractive_small='sinteractive --mem=8G --cpus-per-task=4 --gres=lscratch:30' alias sinteractive_medium='sinteractive --mem=16G --cpus-per-task=12 --gres=lscratch:100' alias sinteractive_large='sinteractive --mem=24G --cpus-per-task=32 --gres=lscratch:150'
Edit .bash_profile in your home drive
#Set your default group (normally your default is your userID - which isn't helpful for your group #Type `groups` to see which groups you are part of newgrp <<YOUR GROUP ID>>
SAM MEG Data Analysis
module load afni module load ctf module load samsrcv3/20180713-c5e1042
MNE python data analysis
To Access Additional MEG modules
#Add the following line to your ${HOME}/.bashrc module use --append /data/MEGmodules/modulefiles
Load MNE modules
#The module will not load on the biowulf head-node because freesurfer loads #Create an sinteractive or spersist node - adjust memory and cpus core number accordingly sinteractive --mem=6G --cpus-per-task=4 module load mne/0.24.1 # OR module load mne <<-- defaults to most current version ipython import mne
Load Commandline MNE scripts for Processing Data
!Currently in Beta Testing!
[stoutjd@cn1023 modules]$ module list No modules loaded [stoutjd@cn1023 modules]$ module load mne_scripts [+] Loading freesurfer 7.1.1 on cn1023 [+] Loading mne 0.24.1 ... [+] Loading mne_scripts 0.1_dev ... Available: spatiotemporal_clustering_stats.py [stoutjd@cn1023 modules]$ spatiotemporal_clustering_stats.py -h usage: spatiotemporal_clustering_stats.py [-h] [-topdir TOPDIR] [-search SEARCH] [-outfname OUTFNAME] [-subjects_dir SUBJECTS_DIR] options: -h, --help show this help message and exit -topdir TOPDIR The directory w/ stc files -search SEARCH The search term to find the specific stc files for clustering -outfname OUTFNAME Output filename for the cluster nifti file -subjects_dir SUBJECTS_DIR Freesurfer subjects dir. Will download freesurfer average if not already present. Defaults to os.environ['SUBJECTS_DIR'] if not provided
MNE bids creation and MNE bids pipeline processing on biowulf
Start interactive session with scratch to render visualization offscreen
sinteractive --mem=6G --cpus-per-task=4 --gres=lscratch:50
Create BIDS data from data off the scanner
module load mne_scripts make_meg_bids.py -meg_input_dir MEGFOLDER -mri_brik AFNI_COREGED+orig.BRIK
Process data using MNE Bids Pipeline
module purge module load mne_bids_pipeline mne-bids-pipeline-run.py --config=CONFIG.py ##Optional --steps=preprocessing,source --subject=SUBJECTID(without sub-)
Best Practices for Group Data Preprocessing
Process your project data
Make your python script commandline callable
#It is typical to run 1 subject per commandline call and to parrallelize over subjects in the swarm file -Use argpase to manually build a full commandline call with keyword inputs and function description -Use fire or click to automatically create a commandline call based on function inputs -Use sys.argv[] to create a simple commandline input (sys.argv[0] is the filename - argv[1] is the first argument - argv[2] is second...)
Helpful Hints: -Use the meg dataset as an input, inside the python module - use the meg dataset to extract the subject ID : filename = os.path.basename(meg_dataset) subjid = filename.split('_')[0] OR if you have a custom ID subjid = filename[0:#characters] -Build output filenames using f-strings: outfile_base = f'{subjid}_{taskname}.nii' outfilename = os.path.join(outdir, outfile_base)
Build, test, and submit your swarm file
for i in ${GROUP_FOLDER}/*.ds; do echo my_process.py -in1 input1 -in2 -input2 -dataset $i >> swarm_file_preprocess.sh ; done #Make sure the process runs on at least one subject/dataset #This will run the last line of the swarmfile $(tail -1 ./swarm_file_preprocess.sh) #Verify the results on the single subject / possibly look at how much RAM / CPU was used before submitting the full batch to swarm swarm -f ./swarm_file_preprocess.sh -g ${GigsOfRAM} -t ${CPUcores} # -b ${How many subjects to run in row on 1 computer} - can be useful if you have a fast process
ADVANCED: Making your own python module
Build the python conda environment
It is recommended to create an install script so that this can be sent to a slurm job
# Load conda - if set up according to the HPC page, this should work source /data/${USER}/conda/etc/profile.d/conda.sh; conda activate base # echo mamba create -p ${PATH_TO_OUTPUT} condaPackage1 condaPackage2 conda-forge::condaForgePackage1 -y > installFile.sh # Make sure to include the -y or the job will hang waiting for user response # Also make sure you have an active conda prompt when submitting the swarm, or else it will fail echo mamba create -p /data/ML_MEG/python_modules/mne0.24.1 jupyter ipython conda-forge::mne -y > python_install.sh swarm -f ./python_install.sh -g 4 -t 4
Make a module file
To display most of the contents of a module file run
module display python #For the python module
Output:
---------------------------------------------------------------------------------- /usr/local/lmod/modulefiles/python/3.8.lua: ---------------------------------------------------------------------------------- family("python") prepend_path("PATH","/usr/local/Anaconda/envs/py3.8/bin") pushenv("OMP_NUM_THREADS","1")
Copy Template to your module folder
#MyModule is the family name of the code / ${Version}.lua cp /usr/local/lmod/modulefiles/python/3.8.lua ${myModuleFilesDir}/${MyModule}/0.1.lua
Add module files to the search path
module use --append ${PathToUserModuleFiles}
Final Step Load Your Module
# !! The module load process does not give good feedback that it doesn't load - make sure the path in lua file is correct !! module load ${MyModuleName}
#Example [$USERd@$NODE python_modules]$ module load mne [$USERd@$NODE python_modules]$ ipython Python 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) Type 'copyright', 'credits' or 'license' for more information IPython 8.1.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: import mne In [2]: mne.__path__ Out[2]: ['/data/ML_MEG/python_modules/mne0.24.1/lib/python3.9/site-packages/mne']