Doc: added descriptions and instructions for the data_preprocess folder |
||
|---|---|---|
| .. | ||
| exports | ||
| no_preprocess | ||
| rule_base_replacement | ||
| .gitignore | ||
| README.md | ||
| split_data.py | ||
README.md
Data Preprocess
What is this folder
This folder contains the files for pre-processing.
We divide each processing method into their respective folders to modularize the pre-processing methods. This helps to make it easier to test different methods and reduce coupling between stages.
Instructions
First, we apply the pre-processing by running code from the desired folder.
Using no_preprocess directory as an example:
cd no_preprocess- Follow the instructions found in the sub-directory
- After code execution, the processed file will be placed into
exports/preprocessed_data.csv
We then run the data split code to create our k-fold splits.
cdback to thedata_preprocessdirectorypython split_data.py
You will now have the datasets in exports/dataset/group_{1,2,3,4,5}