Radiant Earth Foundation is hosting an international expert workshop to discuss how best to use machine learning (ML) techniques on NASA’s Earth Observation (EO) data and address environmental challenges. In particular, generation and usage of training datasets for ML applications using EO will be discussed. Participants of the workshop will evaluate recent advancements, identify existing obstacles and develop a best practices guideline to enhance the adoption of these techniques.   

This workshop is sponsored by the NASA Earth Science Data Systems (ESDS) program.

Thursday, January 23 • 9:30am - 11:00am
Breakout Session 4 (WG 1): Training data generation and accounting for errors/uncertainties

Log in to save this to your schedule and see who's attending!

Proposed Questions:
  1. How to define, measure, and document uncertainty in training data?
  2. How to treat sparsity in training data? If the data lacks spatial or temporal completeness, what are the ways to augment the data?
  3. How to decide on the size of training data required for a problem? This question cannot be answered in advance of building a model. But what steps should be followed to understand if the sample size is reasonable?
  4. How to understand and quantify representativeness and class balance/imbalance of training data? What are the metrics to assess geographical diversity and representativeness of training data?
  5. What are the requirements for compiling “benchmark training datasets” to advance model developments in each science discipline? For example, if the community is building a pollution estimation model, which is going to be integrated into a larger climate model and be integrated with CMIP comparisons, how can we benchmark the training data for this model and progressively improve it over time?
  6. How do we deal with class imbalance issues in training data for Earth science machine learning classifications?
  7. From the Earth Science Data Systems (sponsor) perspective how can we leverage the wealth of data for training models? E.g. sometimes multiple data sources can be fused to create a labeled dataset without a manual process.
  8. What are the recommendations to map gaps in training data catalogs (based on science discipline or application area)?

avatar for Lyndon Estes

Lyndon Estes

Assistant Professor, Clark University

Thursday January 23, 2020 9:30am - 11:00am
Cosmos Club (Crentz Room) 2121 Massachusetts Ave NW, Washington, DC 20008, USA

Attendees (17)