hipom_data_mapping/data_import
Richard Wong 4715999005 Chore: changed ipynb to py files in the data_preprocess folder
Doc: added descriptions and instructions for the data_preprocess folder
2024-10-29 22:55:22 +09:00
..
exports Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
make_figures Chore: re-organized data_import directory to use .py files 2024-10-29 20:07:51 +09:00
outputs Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
.gitignore Chore: changed ipynb to py files in the data_preprocess folder 2024-10-29 22:55:22 +09:00
README.md Chore: re-organized data_import directory to use .py files 2024-10-29 20:07:51 +09:00
make_csv.py Chore: re-organized data_import directory to use .py files 2024-10-29 20:07:51 +09:00
select_db.py Chore: re-organized data_import directory to use .py files 2024-10-29 20:07:51 +09:00

README.md

Data Import

What is this folder

This folder contains the files needed to import files from the remote database to local csv files.

This folder contains the following files:

  • select_db.py:
    • use this to pull the raw datasets data_mapping.csv and data_model_master_export.csv
  • make_csv.py:
    • perform basic processing
    • produces the following files:
      • raw_data.csv: data_mapping.csv without some fields
      • data_mapping_mdm.csv: mdm subset of raw_data.csv
  • make_figures sub-directory
    • plot_class_token.ipynb: get number of thing-property combinations, and plot the histogram of thing-property counts along with the tag_description character counts
    • plot_count.ipynb: get counts of ship-data and platform-data
  • exports sub-directory:
    • this folder stores the files that were produced from import
  • outputs sub-directory:
    • this folder stores the exported figures from make_figures

Instructions

Check the following:

  • Remember to activate your python environment
  • Ensure that the db_connection_info.txt is linked to this directory
    • e.g. ln -s /some/directory/db_connection_info.txt .

To import data, execute the following:

  • cd into this folder.
  • python select_db.py
  • python make_csv.py

Export files will be found in exports. This helps to keep the folder clean.