Project structure

At iHuman lab, we follow the project structure (shown below) which serves as a flexible guide rather than a strict mandate. It is designed to provide a clear framework for organizing our research endeavors effectively. The structure encompasses essential components such as documentation in the docs/ directory, configuration files in config/, and a dedicated data/ directory for managing datasets. The notebooks/ directory houses Jupyter notebooks for exploratory analysis, while the src/ directory contains modularized source code for data processing, feature engineering, model development, visualization, and utility functions. Unit tests are stored in tests/ to ensure the reliability of our implementations. While this structure offers a solid foundation, it remains adaptable to meet the specific needs and nuances of each research project we undertake, allowing for customization and refinement as required.

research_project_name/

├── README.md                       # Project overview and instructions
├── docs/                           # Documentation

├── config/                         # Configuration files
   ├── config.yaml                 # Configuration file for parameters
   └── logging.yaml                # Configuration file for logging

├── data/                           # Directory for datasets
   ├── raw/                        # Raw data files (immutable)
   ├── processed/                  # Processed data files (generated)
   └── interim/                    # Intermediate data files (temporary)

├── notebooks/                      # Jupyter notebooks for exploration
   └── exploratory_analysis.ipynb

├── src/                            # Source code for the project
   ├── __init__.py
   ├── main.py                     # Main script to run the project
   ├── data/                       # Module for data processing
   │   ├── __init__.py
   │   ├── preprocess.py           # Preprocessing functions
   │   └── load_data.py            # Data loading functions
   ├── dataset/                    # Module for creating datasets
   │   ├── __init__.py
   │   └── create_dataset.py       # Functions to create datasets
   ├── features/                   # Module for creating features
   │   ├── __init__.py
   │   └── feature_engineering.py  # Feature engineering functions
   ├── models/                     # Module for defining models
   │   ├── __init__.py
   │   └── model.py                # Model architecture and training functions
   ├── visualization/              # Visualization module
   │   ├── __init__.py
   │   └── visualize.py            # Visualization functions
   └── utils.py                    # Utility functions used across modules

├── tests/                          # Unit tests

└── environment.yml                 # Conda env file specifying dependencies

To facilitate the easy creation and adoption of this project structure across our research lab, we have opted to use iHuman Lab’s Cookiecutter template.

To use the template

$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:

$ cookiecutter https://github.com/iHuman-Lab/cookiecutter-data-science.git