ML pipeline for classification of Mercury surface composition
Inhalt
2. ML pipeline for classification of Mercury surface composition¶
2.1. General description¶
Repository for the Science case from DLR with to understand the composition and evolution of planetary surfaces.
We aim to to extract the underlying information from this MESSENGER/MASCS infrared dataset using unsupervsed classification spectral data and combine/compare with chemical composition and surfaces ages inferred from crater counting.
2.2. Analysis result¶
Expectations for the analysis result :
Minimal : assessment if current remote sensing data from NASA/ MESSENGER could resolve different surface region of Mercury and ideas on the potential of upcoming ESA/BepiColombo mission for improvement.
Perfect : classifciation of Mercury surface region with uncertainties assessment, matching with laboratory measuremnts.
2.3. Information about data set¶
Type of data:
Format of data : NASA PDS3 Data Standards
Size of data set :452 GB
Access to data set: Public from NASA/PDS Geosciences Node
input to the processing pipeline : csv/geojson
2.4. Directory structure¶
The directory structure was created automatically with the template cookiecutter-data-science of cookiecutter tool.
Basic concepts are listed here (extended) :
Data is immutable. Don’t ever edit your raw data, especially not manually, and especially not in Excel. Don’t overwrite your raw data. Don’t save multiple versions of the raw data. Treat the data (and its format) as immutable.
Notebooks are for exploration and communication. Follow a naming convention that shows the owner and the order the analysis was done in. Refactor the good parts. Don’t write code to do the same task in multiple notebooks.
Analysis is a directed acyclic graph (DAG). Often in an analysis you have long-running steps that preprocess data or train models. If these steps have been run already (and you have stored the output somewhere like the data/interim directory), you don’t want to wait to rerun them every time.
Keep secrets and configuration out of version control. You really don’t want to leak your AWS secret key or Postgres username and password on Github
Be conservative in changing the default folder structure. To keep this structure broadly applicable for many different kinds of projects, we think the best approach is to be liberal in changing the folders around for your project, but be conservative in changing the default structure for all projects.
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│ └── test <- minimal dataset for example and test purpose
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── tox.ini <- tox file with settings for running tox; see tox.readthedocs.io