- Home>
- +CoDa-Info>
- News&Info>
- CoDa-PhD thesis (20250110): Advancements in the Mathematical Foundations of Compositional Data Analysis: Convexity and Lp Norms. Application to LASSO Regression with Compositional Covariate.
CoDa-PhD thesis (20250110): Advancements in the Mathematical Foundations of Compositional Data Analysis: Convexity and Lp Norms. Application to LASSO Regression with Compositional Covariate.
CoDa-PhD Thesis in Statistics and Optimization
UNIVERSITAT DE GIRONA - 2025 01 10 - Spain
- Title: Advancements in the Mathematical Foundations of Compositional Data Analysis: Convexity and Lp Norms. Application to LASSO Regression with Compositional Covariate.
- Author: Jordi Saperas Riera
- Supervised by: Josep Antoni Martín Fernández and Glòria Mateu Figueras
- University: Polytechnic School, Department of Computer Science, Applied Mathematics and Statistics, University of Girona
- Abstract:
This doctoral thesis contributes to the development of the mathematical foundations of compositional data analysis. In particular, it adapts the definitions of convexity and Lp norms to the simplex. Convex optimization plays a crucial role in numerous statistical techniques, especially in solving minimization problems to find optimal solutions. In the context of compositional data, it is essential to redefine convex sets and functions within the simplex to fulfil the structure of Aitchison’s geometry. The present work addresses this by adapting convex optimization to the compositional case. It presents rigorous definitions of convex sets and functions in the simplex, providing examples that allow for a coherent application to real-world compositional data sets. Examples of convex optimization, such as penalized regression, principal component analysis, and others, contain metrics. Thus, Lp norms are redefined for the simplex, and their main properties are explored in the compositional context. Finally, this thesis applies these advancements in mathematical foundations to the LASSO regression methodology. The Least Absolute Shrinkage and Selection Operator (LASSO) regularisation method is widely recognized for its effectiveness in fitting linear models while performing variable selection. However, applying the LASSO to compositional data entails new challenges, since the penalty term involved must respect Aitchison’s geometry. In response to this, an approach is proposed defining a novel compositional norm named L1-plr, which is consistent with the structure of the simplex. The resulting LASSO model effectively reduces dimensions by selecting meaningful logratios between parts, representing a significant advancement in its application to compositional data. Furthermore, a comparison is made between LASSO regression models obtained by using different norms in the penalty term, specifically investigating how the regularization process affects the subcompositional structure of the fitted linear model.