Richard Wong
4715999005
Doc: added descriptions and instructions for the data_preprocess folder |
||
---|---|---|
.. | ||
exports | ||
make_figures | ||
outputs | ||
.gitignore | ||
README.md | ||
make_csv.py | ||
select_db.py |
README.md
Data Import
What is this folder
This folder contains the files needed to import files from the remote database to local csv files.
This folder contains the following files:
select_db.py
:- use this to pull the raw datasets
data_mapping.csv
anddata_model_master_export.csv
- use this to pull the raw datasets
make_csv.py
:- perform basic processing
- produces the following files:
raw_data.csv
:data_mapping.csv
without some fieldsdata_mapping_mdm.csv
: mdm subset ofraw_data.csv
make_figures
sub-directoryplot_class_token.ipynb
: get number of thing-property combinations, and plot the histogram of thing-property counts along with the tag_description character countsplot_count.ipynb
: get counts of ship-data and platform-data
exports
sub-directory:- this folder stores the files that were produced from import
outputs
sub-directory:- this folder stores the exported figures from
make_figures
- this folder stores the exported figures from
Instructions
Check the following:
- Remember to activate your python environment
- Ensure that the
db_connection_info.txt
is linked to this directory- e.g.
ln -s /some/directory/db_connection_info.txt .
- e.g.
To import data, execute the following:
cd
into this folder.python select_db.py
python make_csv.py
Export files will be found in exports
. This helps to keep the folder clean.