| others: - added basic data analysis to get histograms of text differences - added new final delivery model | ||
|---|---|---|
| .. | ||
| abbreviations | ||
| check_data | ||
| exports | ||
| no_preprocess | ||
| rule_base_replacement | ||
| .gitignore | ||
| README.md | ||
| split_data.py | ||
		
			
				
				README.md
			
		
		
			
			
		
	
	Data Preprocess
What is this folder
This folder contains the files for pre-processing.
We divide each processing method into their respective folders to modularize the pre-processing methods. This helps to make it easier to test different methods and reduce coupling between stages.
Instructions
First, we apply the pre-processing by running code from the desired folder.
Using no_preprocess directory as an example:
- cd no_preprocess
- Follow the instructions found in the sub-directory
- After code execution, the processed file will be placed into
exports/preprocessed_data.csv
We then run the data split code to create our k-fold splits.
- cdback to the- data_preprocessdirectory
- python split_data.py
You will now have the datasets in exports/dataset/group_{1,2,3,4,5}