Estimating information in earth system data with machine lear ning

  1. Johnson, Juan Emmanuel
Dirigida por:
  1. Valero Laparra Pérez-Muelas Director
  2. Gustavo Camps Valls Codirector

Universidad de defensa: Universitat de València

Fecha de defensa: 21 de junio de 2021

Tribunal:
  1. Lars Kai Hansen Presidente/a
  2. Luis Gómez Chova Secretario
  3. Katalin Blix Vocal
Departamento:
  1. ENG. ELECTRÒN.

Tipo: Tesis

Teseo: 671715 DIALNET lock_openTESEO editor

Resumen

Machine learning has made great strides in today's science and engineering in general and Earth sciences in particular. However, Earth data poses particularly challenging problems for machine learning due to not only the volume of data, but also the spatial-temporal nonlinear correlations, noise and uncertainty sources, and heterogeneous sources of information. More data does not necessarily imply more information. Therefore, extracting knowledge and information content using data analysis and modeling is important and is especially prevalent in an era where data volume and heterogeneity is steadily increasing. This calls for advances in methods that can quantify information and characterize distributions accurately. Quantifying information content within our system's data and models are still unresolved problems in statistics and machine learning. This thesis introduces new machine learning models to extract knowledge and information from Earth data. We propose kernel methods, Gaussian processes and multivariate Gaussianization to handle uncertainty and information quantification and we apply these methods to a wide range of Earth system science problems. These involve many types of learning problems including classification, regression, density estimation, synthesis, error propagation and information-theoretic measures estimation. We also demonstrate how these methods perform with different data sources including sensory data (radar, multispectral, hyperspectral, infrared sounders), data products (observations, reanalysis and model simulations) and data cubes (aggregates of various spatial-temporal data sources). The presented methodologies allow us to quantify and visualize what are the salient features driving kernel classifiers, regressors or dependence measures, how to better propagate errors and distortions of input data with Gaussian processes, and where and when more information can be found in arbitrary spatial-temporal data cubes. The presented techniques open a wide range of possible use cases and applications and we anticipate a wider adoption in the Earth sciences.