Francisco Pérez Galarce

Research Lines

I am highly interested in the following areas of research and am always eager to collaborate with others who share these interests:

Biases and uncertainty in Machine Learning: Focus on developing advanced machine learning techniques that address and reduce biases, improving the fairness and robustness of predictions in uncertain environments.
Optimization models with uncertainty: Specialize in creating optimization models that tackle complex, real-world issues, accounting for uncertainty to provide reliable and efficient solutions.
Advanced Analytics: Implement cutting-edge analytics to analyze and solve intricate problems in various real-world settings, ensuring practical and impactful outcomes.

Journal papers

Alfredo Candia-Véjar, Álvaro Faúndez, Camila Rojas, Isidora Jeria, Natacha Benítez, María Ignacia Tupper, Romina Inostroza, Etienne Bellenger, Pérez-Galarce, F.. (2026). The Formation of Collaborative Learning Teams in Schools Through Mathematical Optimization for Improving the Classroom Climate. IEEE Access . https://doi.org/10.1109/ACCESS.2026.3653114

School climate plays a fundamental role in teaching and learning processes, directly influencing the achievement of educational objectives. Moreover, climate-related issues often manifest as bullying, which remains a persistent challenge for education systems worldwide. In this context, teamwork emerges as a key strategy to foster student interaction, strengthen friendship networks, and reduce bullying and aggression. However, traditional team assignment methods—such as student self-selection, teacher allocation, or random distribution—fail to account for the complexity of student interactions and individual characteristics. This study introduces a web-based decision support system designed to assist in team formation within educational settings by utilizing optimization models and algorithms, thereby promoting a more interconnected and inclusive student network. The models incorporate factors such as team diversity, victim protection, and student preferences. Its primary input is a social network along with derived metrics (e.g., betweenness centrality and triadic relationships), which feed into three non-linear integer optimization models that aim to consolidate, create, or enhance student interpersonal relationships. Additionally, the models were evaluated through a large quasi-experiment using both quantitative and qualitative analyses across four dimensions particularly relevant to educational activities: team dynamics, attitude toward the team, team cohesion, and team performance. Empirical results demonstrate that it is possible to improve the classroom climate without compromising the four team-related dimensions. This collaborative learning technology serves as a valuable resource for enhancing collaborative educational strategies and fostering a more inclusive and supportive classroom environment.
Fernando Montenegro‐Dos Santos, Pérez-Galarce, F. , Carlos A Monardes‐Concha, Sergio Cruz‐Zárate, Alfredo Candia‐Véjar (2026). A bi‐objective optimization model to plan vaccination campaigns aided by temporary centers. International Transactions in Operational Research. https://doi.org/10.1111/itor.70154

Vaccination campaigns have saved thousands of lives and reached even the world's most remote areas. However, the COVID-19 pandemic significantly changed the strategies for such campaigns. It is now crucial to rigorously consider new factors when planning the operational aspects of these campaigns, particularly for last-mile vaccine dispensing. First, a large share of the population must be vaccinated, with emphasis on vulnerable groups at higher risk. Second, if the virus is highly contagious and capable of asymptomatic transmission, implementing public health measures at vaccination sites becomes paramount. To address these challenges, we propose a bi-objective optimization model to support healthcare decision-making. It defines a schedule for locating temporary vaccination centers and assigning populations to permanent and temporary sites over a planning horizon. Moreover, we formalize and validate a minimum set of properties for a function-based prioritization mechanism to ensure consistent prioritization. One objective prioritizes vaccinating vulnerable groups according to our novel function-based approach, while the other minimizes the operational costs of temporary centers. In the complex environment of a pandemic, the strategic integration and use of temporary and permanent vaccination centers is vital for creating a robust and responsive vaccination strategy. Temporary centers, whose locations are determined by the model, improve access for hard-to-reach groups, contribute to minimizing crowding at permanent sites, and lower infection risks. Our results, demonstrated through a case study, show that the model effectively allocates resources and generates various vaccination plans that cater to different priority levels assigned to each objective, thereby offering valuable decision support.
Pérez-Galarce, F., Martinez-Palomera, J., Pichara, K., Huijse, P., Catelan, Márcio. (2025). A self-regulated convolutional neural network for classifying variable stars. Monthly Notices of the Royal Astronomical Society. https://doi.org/10.1093/mnras/staf840

Over the last two decades, machine learning models have been widely applied and have proven effective in classifying variable stars, particularly with the adoption of deep learning architectures such as convolutional neural networks, recurrent neural networks, and transformer models. While these models have achieved high accuracy, they require high-quality, representative data and a large number of labelled samples for each star type to generalise well, which can be challenging in time-domain surveys. This challenge often leads to models learning and reinforcing biases inherent in the training data, an issue that is not easily detectable when validation is performed on subsamples from the same catalogue. The problem of biases in variable star data has been largely overlooked, and a definitive solution has yet to be established. In this paper, we propose a new approach to improve the reliability of classifiers in variable star classification by introducing a self-regulated training process. This process utilises synthetic samples generated by a physics-enhanced latent space variational autoencoder, incorporating six physical parameters from Gaia Data Release 3. Our method features a dynamic interaction between a classifier and a generative model, where the generative model produces ad-hoc synthetic light curves to reduce confusion during classifier training and populate underrepresented regions in the physical parameter space. Experiments conducted under various scenarios demonstrate that our self-regulated training approach outperforms traditional training methods for classifying variable stars on biased datasets, showing statistically significant improvements.
Klundert, J. V. D., Pérez-Galarce, F., Olivares, M, Pengel, L, de Weerd, A. (2025). The comparative performance of models predicting patient and graft survival after kidney transplantation: A systematic review. Transplantation Reviews. https://doi.org/10.1016/j.trre.2025.100934

Cox proportional hazard models have long been the model of choice for survival prediction after kidney transplantation. In recent years, a variety of novel model types have been proposed. We investigate the prediction performance across different model types, including machine learning models and traditional model types. A systematic review was conducted following PROBAST and CHARMS, also considering extensions to TRIPOD+AI and PROBAST+AI, for data collection and risk of bias assessment. The review only included publications that reported on prediction performance for models of different types. A comparative analysis tested performance differences between the model types. The review included 37 publications which presented 134 comparative studies. The designs of many studies left room for improvement and most studies had high risk of bias. The collected data admitted testing of performance differences for 22 pairs of model types, ten of which yielded significant differences. Support Vector Machines and Logistic Regression were never found to outperform other model types. Other comparisons, however, provide inconclusive comparative performance results and none of the model types performed consistently and significantly better than alternatives. Rigorous review of current evidence and comparative performance evidence finds no significant kidney transplant survival prediction performance differences that Cox Proportional Hazard models are being outperformed. The design of many of the studies implies high risk of bias and more and better designed studies which reutilize best performing models are needed. This enables to resolve model biases, reporting issues, and to increase the power of comparative performance analysis.
Klundert, J. V. D., De Vries, H., Pérez-Galarce, F., Valdes, N., & Simon, F. (2025). The Effectiveness, Equity and Explainability of Health Service Resource Allocation -With Applications in Kidney Transplantation & Family Planning. Frontiers in Health Services. https://doi.org/10.3389/frhs.2025.1545864

Halfway to the deadline of the 2030 agenda, humankind continues to face long-standing yet urgent policy and management challenges to address resource shortages and deliver on Sustainable Development Goal 3; health and well-being for all at all ages. More than half of the global population lacks access to essential health services. Additional resources are required and need to be allocated effectively and equitably.Resource allocation models, however, have struggled to accurately predict effects and to present optimal allocations, thus hampering effectiveness and equity improvement. The current advances in machine learning present opportunities to better predict allocation effects and to prescribe solutions that better balance effectiveness and equity. The most advanced of these models tend to be 'black box' models that lack explainability. This lack of explainability is problematic as it can clash with professional values and hide biases that negatively impact effectiveness and equity.Through a novel theoretical framework and two diverse case studies, this manuscript explores the trade-offs between effectiveness, equity, and explainability. The case studies consider family planning in a low income country and kidney allocation in a high income country. Both case studies find that the least explainable models hardly offer improvements in effectiveness and equity over explainable alternatives. As this may more widely apply to health resource allocation decisions, explainable analytics, which are more likely to be trusted and used, might better enable progress towards SDG3 for now. Future research on explainability, also in relation to equity and fairness of allocation policies, can help deliver on the promise of advanced predictive and prescriptive analytics.
Sotelo, C., Santa-Gonzalez, R., Pérez-Galarce, F., Monardes, C. (2024). An optimization model for planning of emergency shelters after a tsunami. Socio-Economics Planning Science. https://doi.org/10.1016/j.seps.2024.101909

Vertical evacuation helps people escape tsunami risks by elevating them above the level of tsunami inundation, usually by moving to higher ground or taking refuge in tall buildings or other elevated structures. Unlike horizontal evacuation, which involves moving away from the coast to higher ground, vertical evacuation reduces the demand for horizontal evacuation routes that can become congested and impede evacuation efforts. Therefore, investing in critical infrastructure that enables vertical evacuation is crucial in tsunami-prone areas. This study proposes a multi-objective optimization model to help decision-makers assign critical infrastructure for vertical evacuation in tsunami-prone areas. Critical infrastructure includes buildings that can provide shelter during a tsunami and road networks for rapid access to shelter points. The proposed model balances three objectives: (1) minimizing investment costs in critical infrastructure, (2) maximizing the population covered by shelters, and (3) minimizing the evacuation time for evacuees to reach the shelters. This model is tested on real-world data from the Coquimbo-La Serena coastal conurbation in the Coquimbo region of Chile. The study contributes to the literature on tsunami evacuation modeling and provides valuable information for decision-makers to plan and invest in critical infrastructure for vertical evacuation during tsunamis. A sensitivity analysis of various parameters is conducted, and managerial insights are provided.
Montenegro, F., Pérez-Galarce, F., Monardes, C., Candia, A., Nagano, M. (2023). A non-myopic Rolling Horizon scheme for rescheduling in agricultural harvest. Computers and Electronics in Agriculture. https://doi.org/10.1016/j.compag.2023.108392

Over the last decade, agriculture has evolved from a human-intensive activity to a highly automated process. Multiple technological advances (e.g., harvest machines, sensors, and drones) have been incorporated to collect and transmit information, increasing harvest efficiency and more accurate and timely decisions. These advances have opened new opportunities to apply optimization models during the harvest season. In this context, to apply these models, it is necessary to consider the underlying uncertainty in agricultural operations that comes mainly from weather conditions and the biological characteristics of crops. One of the traditional strategies used to reactively manage these uncertainties in optimization models is the Rolling Horizon (RH) strategy. However, RH is typically myopic about the future, and it can be challenging to implement this approach when commitments with suppliers are signed. This work proposes a non-myopic rolling horizon method to reschedule the agricultural harvest plan. Furthermore, our RH scheme is exemplified by means of olive oil harvesting and production. Our method is based on a baseline plan generation, and after that, an adaptive rescheduling scheme is generated under new conditions. A bi-objective rescheduling problem seeking to maximize production and minimize plan variability is formulated. Computational experiments are conducted to study our methodology’s impact in several rescheduling periods. A good performance in two challenging agricultural scenarios is highlighted. This proposal offers the community a framework for reactively managing complex harvest operations.
Pérez-Galarce, F., Pichara, K., Huijse, P., Catelan, M., Mery, D. (2023). Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift. Astronomy and Computing. https://doi.org/10.1016/j.ascom.2023.100694

In recent decades, machine learning has provided valuable models and algorithms for processing and extracting knowledge from time-series surveys. Different classifiers have been proposed and performed to an excellent standard. Nevertheless, few papers have tackled the data shift problem in labeled training sets, which occurs when there is a mismatch between the data distribution in the training set and the testing set. This drawback can damage the prediction performance in unseen data. Consequently, we propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem during the training of a multi-layer perceptron for RR Lyrae classification. We collect ranges for characteristic features to construct a symbolic representation of prior knowledge, which was used to model the informative regularizer component. Simultaneously, we design a two-step back-propagation algorithm to integrate this knowledge into the neural network, whereby one step is applied in each epoch to minimize classification error, while another is applied to ensure regularization. Our algorithm defines a subset of parameters (a mask) for each loss function. This approach handles the forgetting effect, which stems from a trade-off between these loss functions (learning from data versus learning expert knowledge) during training. Experiments were conducted using recently proposed shifted benchmark sets for RR Lyrae stars, outperforming baseline models by up to 3% through a more reliable classifier. Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
Salinas, Y., Pichara, K., Brahm, R., Pérez-Galarce, F., Mery, D. (2023). Distinguishing a planetary transit from false positives: A Transformer-based classification for planetary transit signals. Monthly Notices of the Royal Astronomical Society. https://doi.org/10.1093/mnras/stad1173

Current space-based missions, such as the Transiting Exoplanet Survey Satellite (TESS), provide a large database of light curves that must be analysed efficiently and systematically. In recent years, deep learning (DL) methods, particularly convolutional neural networks (CNN), have been used to classify transit signals of candidate exoplanets automatically. However, CNNs have some drawbacks; for example, they require many layers to capture dependencies on sequential data, such as light curves, making the network so large that it eventually becomes impractical. The self-attention mechanism is a DL technique that attempts to mimic the action of selectively focusing on some relevant things while ignoring others. Models, such as the Transformer architecture, were recently proposed for sequential data with successful results. Based on these successful models, we present a new architecture for the automatic classification of transit signals. Our proposed architecture is designed to capture the most significant features of a transit signal and stellar parameters through the self-attention mechanism. In addition to model prediction, we take advantage of attention map inspection, obtaining a more interpretable DL approach. Thus, we can identify the relevance of each element to differentiate a transit signal from false positives, simplifying the manual examination of candidates. We show that our architecture achieves competitive results concerning the CNNs applied for recognizing exoplanetary transit signals in data from the TESS telescope. Based on these results, we demonstrate that applying this state-of-the-art DL model to light curves can be a powerful technique for transit signal detection while offering a level of interpretability.
Moya, J., Pérez-Galarce, F., Tamarasco, C., Astudillo, C., Candia, A. (2023). Machine learning models for severity classification and length-of-stay forecasting in emergency units. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2023.119864

Length-of-stay (LoS) prediction and severity classification for patients in emergency units in a clinic or hospital are crucial problems for public and private health networks. An accurate estimation of these parameters is essential for better planning resources, which are usually scarce. Although it is possible to find several works that propose traditional Machine Learning (ML) models to face these challenges, few works have exploited advances in Natural Language Processing (NLP) on Spanish raw-text vector representations. Consequently, we take advantage of those advances, incorporating sentence embeddings in traditional ML models to improve predictions. Moreover, we apply a strategy based on SHapley Additive exPlanations (SHAP) values to provide explanations for these predictions. The results of our case study demonstrate an increase in the accuracy of the predictions using raw text with a minimum preprocessing. The precision increased by up to 2% in the classification of the patient’s post-care destination and by up to 8% in the prediction of LoS in the hospital. This evidence encourages practitioners to use available text to anticipate the patient’s need for hospitalization more accurately at the earliest stage of the care process.
Govea, Z., Pérez-Galarce, F., Candia-Véjar, A. (2022). An optimization model for the fair distribution of prize money in ATP tournaments. International Journal of Computer Science in Sport. https://doi.org/10.2478/ijcss-2022-0002

The Association of Tennis Professionals (ATP) distributes a considerable amount of money in prizes each year. Studies have shown that only the top 100 ranked players can self-finance; hence, it is convenient to introduce changes to the prize distribution to promote a more sustainable system. A Linear Programming model to distribute the tournament’s budget under a new concept for the fair distribution of prize money is proposed. Additionally, to distribute the prizes, a function based on the effort of the players is designed. The model was applied to tournaments to demonstrate the impact on improving the player’s prizes distribution.
Pérez-Galarce, F., Pichara, K., Huijse, P., Catelan, M., Mery, D. (2021). Informative Bayesian model selection for RR Lyrae star classifiers. Monthly Notices of the Royal Astronomical Society, 503(1), 484-497. https://doi.org/10.1093/mnras/stab320

Machine learning has achieved an important role in the automatic classification of variable stars, and several classifiers have been proposed over the last decade. These classifiers have achieved impressive performance in several astronomical catalogues. However, some scientific articles have also shown that the training data therein contain multiple sources of bias. Hence, the performance of those classifiers on objects not belonging to the training data is uncertain, potentially resulting in the selection of incorrect models. Besides, it gives rise to the deployment of misleading classifiers. An example of the latter is the creation of open-source labelled catalogues with biased predictions. In this paper, we develop a method based on an informative marginal likelihood to evaluate variable star classifiers. We collect deterministic rules that are based on physical descriptors of RR Lyrae stars, and then, to mitigate the biases, we introduce those rules into the marginal likelihood estimation. We perform experiments with a set of Bayesian logistic regressions, which are trained to classify RR Lyraes, and we found that our method outperforms traditional non-informative cross-validation strategies, even when penalized models are assessed. Our methodology provides a more rigorous alternative to assess machine learning models using astronomical knowledge. From this approach, applications to other classes of variable stars and algorithmic improvements can be developed.
Pérez-Galarce, F., Maculan, N., Candia-Véjar, A. (2021). Improved robust shortest paths by penalized investments. RAIRO-Operations Research. https://doi.org/10.1051/ro/2021086

Connectivity after disasters has become a critical problem in the management of modern cities. This comes from the need of the decision-makers to ensure urgent medical attention by providing access to health facilities and to other relevant services needed by the population. Managing congestion could help maintain some routes operative even in complex scenarios such as natural disasters, terrorist attacks, protests, or riots. Recent advances in Humanitarian Logistics have handled this problem using different modeling approaches but have principally focused on the response phase. In this paper, firstly, we propose a penalized variant of an existing mathematical model for the robust s–t path problem with investments. With the aim of solving the robust several-to-one path problem with investments, and due to the high complexity of this new problem, a heuristic is proposed. Moreover, this approach allows us to improve travel times in both specific paths and in a set of routes in a systemic framework. The new problem and the proposed heuristic are illustrated by an example, which corresponds to a typical city network, that provides a concrete vision of the potential application of the framework. Lastly, some managerial insights are given by the analysis of results exhibited in the example network.
Revillot, D., Pérez-Galarce, F., Álvarez-Miranda, E. (2020). Optimizing the storage assignment and order-picking for the compact drive-in storage system. International Journal of Production Research. https://doi.org/10.1080/00207543.2019.1687951

One of the most common systems in non-automated warehouses, is drive-in pallet racking with a shared storage policy (which is usually based on the duration-of-stay). Such scheme targets towards an efficient use of storage space, since its operation costs are directly related to the size and layout of the warehouse. In this paper, two mathematical programming models and two greedy-randomised based heuristics for finding (nearly) optimal storage and retrieval operation sequences for this type of storage system are proposed. The computational effectiveness of the proposed approaches is measured by considering two sets of synthetic instances. The obtained results show that the proposed heuristics are not only able to compute high-quality solutions (as observed when being compared with the optimal solutions attained by the mathematical programming models), but it is also capable of providing solutions in very short running times even for large instances for which the mathematical programming model failed to find feasible solutions. At the light of these results, the best heuristic is also tested using a rolling-horizon planning strategy in a real-world case study, obtained from a Chilean company. It turns out that the attained results are more effective than the company's current storage policy.
Pérez-Galarce, F., Candia-Véjar, A., Astudillo, C., Bardeen, M. (2018). Algorithms for the Minmax Regret Path Problem with Interval Data. Information Sciences. https://doi.org/10.1016/j.ins.2018.06.016

The Shortest Path in networks is an important problem incombinatorial optimization and has many applications in areas like telecommunications and transportation. It is known that this problem is easy to solve in its classic deterministic version, but it is also known that it is an NP-Hard problem for several generalizations. The Shortest Path Problem consists in finding a simple path connecting a source node and a terminal node in an arc-weighted directed network. In some real-world situations the weights are not completely known and then this problem is transformed into an optimization one under uncertainty. It is assumed that an interval estimate is given for each arc length and no further information about the statistical distribution of the weights is known. Uncertainty has been modeled in different ways in optimization. Our aim in this paper is to study the Minmax Regret path with interval data problem by presenting a new exact branch and cut algorithm and, additionally, new heuristics. A set of difficult and large size instances are defined and computational experiments are conducted for the analysis of the different approaches designed to solve the problem. The main contribution of our paper is to provide an assessment of the performance of the proposed algorithms and an empirical evidence of the superiority of a simulated annealing approach based on a new neighborhood over the other heuristics proposed.
Pérez-Galarce, F., Canales, L. J., Vergara, C., Candia-Véjar, A. (2017). An optimization model for the location of disaster refuges. Socio-Economic Planning Sciences, 59, 56-66. https://doi.org/10.1016/j.seps.2016.12.001

We developed a flexible planning tool by means of an optimization approach that is able to locate and assign refuge centers through facilitating buildings to provide shelter and medical and psychological assistance to the victims taking into account the quality of service. The article proposes variants within the optimization model that allow the decision maker to evaluate different realities in the plan. The performance of the models inside the red zone of the 2010 earthquake in Chile is illustrated.
Herrera-Cáceres, C., Pérez-Galarce, F., Álvarez-Miranda, E., Candia-Véjar, A. (2017). Optimization of the harvest planning in the olive oil production: A case study in Chile. Computers and Electronics in Agriculture, 141, 147-159. https://doi.org/10.1016/j.compag.2017.07.017

In this work, a mathematical programming model for aiding the decision-making process of olive harvest planning is proposed. The model aims at finding a harvest schedule of different land units that maximizes the total amount of the oil extracted in the mill. Such a harvest plan must ensure quality standards, respect technological limitations, coordinate operations between the field and the mill, and satisfy a budget associated with the harvest operations. Moreover, the presented approach considers the effect of climatological phenomena (rain and frost) during the harvest season, which results in a reduction of olive crops. The model was tested on a real problem of a company located in the central zone of Chile. The experiments with the model show that it is able to obtain better solutions than those obtained by the traditional operation planning when it is tested with real datasets from the company. The optimization model is flexible, allowing the management of several parameters like the project budget and the risks generated by the climate. Thus, it can provide alternative harvest plans in a short time by simulating different climatological scenarios. From a managerial point of view, some lessons about the advantages and difficulties of the model were learned from its use in the company.
Pérez-Galarce, F., Álvarez-Miranda, E., Candia-Véjar, A., Toth, P. (2014). On exact solutions for the Minmax Regret Spanning Tree Problem. Computers & Operations Research, 47, 114-122. https://doi.org/10.1016/j.cor.2014.02.007

The Minmax Regret Spanning Tree problem is studied in this paper. This is a generalization of the well-known Minimum Spanning Tree problem, which considers uncertainty in the cost function. Particularly, it is assumed that the cost parameter associated with each edge is an interval whose lower and upper limits are known, and the Minmax Regret is the optimization criterion. The Minmax Regret Spanning Tree problem is an NP-Hard optimization problem for which exact and heuristic approaches have been proposed. Several exact algorithms are proposed and computationally compared with the most effective approaches of the literature. It is shown that a proposed branch-and-cut approach outperforms the previous approaches when considering several classes of instances from the literature.
Álvarez-Miranda, E., Candia-Véjar, A., Carrizosa, E., Pérez-Galarce, F. (2014). Vulnerability assessment of spatial networks: models and solutions. Lecture Notes in Computer Science (pp. 433-444). https://doi.org/10.1007/978-3-319-09174-7_37

In this paper we present a collection of combinatorial optimization problems that allows to assess the vulnerability of spatial networks in the presence of disruptions. The proposed measures of vulnerability along with the model of failure are suitable in many applications where the consideration of failures in the transportation system is crucial. By means of computational results, we show how the proposed methodology allows us to find useful information regarding the capacity of a network to resist disruptions and under which circumstances the network collapses.