hipom_data_mapping/data_preprocess
Richard Wong c5760d127d Feat: added post_processing based on rules
others:
- added basic data analysis to get histograms of text differences
- added new final delivery model
2024-12-18 13:43:56 +09:00
..
abbreviations Feat: added more classification and mapping variations 2024-11-25 18:15:28 +09:00
check_data Chore: removed unnecessary output files 2024-11-25 18:19:52 +09:00
exports Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
no_preprocess Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
rule_base_replacement Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
.gitignore Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
README.md Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
split_data.py Feat: added post_processing based on rules 2024-12-18 13:43:56 +09:00

README.md

Data Preprocess

What is this folder

This folder contains the files for pre-processing.

We divide each processing method into their respective folders to modularize the pre-processing methods. This helps to make it easier to test different methods and reduce coupling between stages.

Instructions

First, we apply the pre-processing by running code from the desired folder.

Using no_preprocess directory as an example:

  • cd no_preprocess
  • Follow the instructions found in the sub-directory
  • After code execution, the processed file will be placed into exports/preprocessed_data.csv

We then run the data split code to create our k-fold splits.

  • cd back to the data_preprocess directory
  • python split_data.py

You will now have the datasets in exports/dataset/group_{1,2,3,4,5}