hipom_data_mapping/data_preprocess
Richard Wong 1f3970459f Chore: re-organized train folders to have standardized naming schemes
Feat: introduced BERT-based binary classification
2024-11-20 15:07:47 +09:00
..
abbreviations Chore: re-organized train folders to have standardized naming schemes 2024-11-20 15:07:47 +09:00
check_data Feat: added abbreviation expansion rules 2024-11-10 20:28:47 +09:00
exports Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
no_preprocess Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
rule_base_replacement Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
.gitignore Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
README.md Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
split_data.py Feat: added train and test directories 2024-10-31 15:58:20 +09:00

README.md

Data Preprocess

What is this folder

This folder contains the files for pre-processing.

We divide each processing method into their respective folders to modularize the pre-processing methods. This helps to make it easier to test different methods and reduce coupling between stages.

Instructions

First, we apply the pre-processing by running code from the desired folder.

Using no_preprocess directory as an example:

  • cd no_preprocess
  • Follow the instructions found in the sub-directory
  • After code execution, the processed file will be placed into exports/preprocessed_data.csv

We then run the data split code to create our k-fold splits.

  • cd back to the data_preprocess directory
  • python split_data.py

You will now have the datasets in exports/dataset/group_{1,2,3,4,5}