| Doc: added descriptions and instructions for the data_preprocess folder | ||
|---|---|---|
| .. | ||
| exports | ||
| make_figures | ||
| outputs | ||
| .gitignore | ||
| README.md | ||
| make_csv.py | ||
| select_db.py | ||
		
			
				
				README.md
			
		
		
			
			
		
	
	Data Import
What is this folder
This folder contains the files needed to import files from the remote database to local csv files.
This folder contains the following files:
- select_db.py:- use this to pull the raw datasets data_mapping.csvanddata_model_master_export.csv
 
- use this to pull the raw datasets 
- make_csv.py:- perform basic processing
- produces the following files:
- raw_data.csv:- data_mapping.csvwithout some fields
- data_mapping_mdm.csv: mdm subset of- raw_data.csv
 
 
- make_figuressub-directory- plot_class_token.ipynb: get number of thing-property combinations, and plot the histogram of thing-property counts along with the tag_description character counts
- plot_count.ipynb: get counts of ship-data and platform-data
 
- exportssub-directory:- this folder stores the files that were produced from import
 
- outputssub-directory:- this folder stores the exported figures from make_figures
 
- this folder stores the exported figures from 
Instructions
Check the following:
- Remember to activate your python environment
- Ensure that the db_connection_info.txtis linked to this directory- e.g. ln -s /some/directory/db_connection_info.txt .
 
- e.g. 
To import data, execute the following:
- cdinto this folder.
- python select_db.py
- python make_csv.py
Export files will be found in exports. This helps to keep the folder clean.