Measuring the Confidence of Forecasting Models

By Imanol.garcia

Despite the fact that AI technologies are mature enough, their adoption by policy makers and public servants is very uneven, and in general, much lower than one would expect. There are obstacles that hinder the widespread extension of AI technologies, both cultural and technical. AI technologies will not spread massively until the scientific community is able to develop reliable technology from the user and from the different data providers. On the other hand, the use of these technologies involves risks that must be managed appropriately. To ensure that we are on the right track, it is necessary to abide by a human-cantered approach to AI, without losing the goal of improving human well-being.

The term Actionability [1] is defined as the characteristic that any system based on data analysis or artificial intelligence must present to be implemented to yield insights of practical value, so managers can harness them in their decision-making processes. This implies some key characteristics: usability, adequacy of the tool to the user skills, confidence, uncertainty of the estimation and prediction results, interpretability, understanding of how information is processed along the data modelling pipeline, self-sustainability, ability of models to evolve and scalability, capacity to be deployed in large scale or other scenarios, among other non-functional requirements.

Specifically, data-based models are usually subject to uncertainty, involving non-deterministic stochastic processes, in the input data, learning, training and application mechanisms, and also present in the results. The uncertainty makes decision–making more complex, but to ignore it is to ignore reality. As result, it is essential to provide an objective measure of the reliability and precision of the results, winning in terms of trust.

There are two main types of uncertainty: random uncertainty, which captures the inherent noise of observations, for example, sensor noise, which means that the uncertainty cannot be reduced even if more data is collected. On the other hand, epistemic uncertainty explains the uncertainty in the model parameters, which represents the lack of knowledge about the model generated by our collected data. This uncertainty can be explained using a sufficient amount of data and is often referred to as model uncertainty. In any case, the methods that try to determine both the quality and reliability of the models have something in common, and that is that they are based on estimates. In this sense, the stochastic nature implies the need to carry out a statistical analysis of the results obtained in the evaluations of the models.

Classically, this task is faced with statistical tests, although the statistics community has raised doubts about this approach. Recently, Bayesian alternatives (e.g. Monte-carlo dropout or Markov chains) have been proposed to carry out the analysis, offering the advantage of an intuitive context for the analysis of the uncertainty associated with the evaluation of the algorithms. Other notable techniques include conformal prediction, ensemble methods or specific methods such as deep Gaussian processes or Laplace approximations.

It is crucial to identify the potential uncertainty quantification methods and select the most adequate according to the use case and specific prediction problem in terms of kinds of input data and underlying models.

[1] Deliverable D4.1 Strategies and algorithms for data modelling and visualizations. URBANITE Consortium. 2021

[2] M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. Rajendra, A review of uncertainty quantification in deep learning: Techniques, applications and challenges, Information Fusion, Volume 76, 2021