27 lines
809 B
Markdown
27 lines
809 B
Markdown
|
# Data Preprocess
|
||
|
|
||
|
## What is this folder
|
||
|
|
||
|
This folder contains the files for pre-processing.
|
||
|
|
||
|
We divide each processing method into their respective folders to modularize the
|
||
|
pre-processing methods. This helps to make it easier to test different methods
|
||
|
and reduce coupling between stages.
|
||
|
|
||
|
## Instructions
|
||
|
|
||
|
First, we apply the pre-processing by running code from the desired folder.
|
||
|
|
||
|
Using `no_preprocess` directory as an example:
|
||
|
|
||
|
- `cd no_preprocess`
|
||
|
- Follow the instructions found in the sub-directory
|
||
|
- After code execution, the processed file will be placed into
|
||
|
`exports/preprocessed_data.csv`
|
||
|
|
||
|
We then run the data split code to create our k-fold splits.
|
||
|
|
||
|
- `cd` back to the `data_preprocess` directory
|
||
|
- `python split_data.py`
|
||
|
|
||
|
You will now have the datasets in `exports/dataset/group_{1,2,3,4,5}`
|