Glossary

The following describes essential concepts used throughout Relevance-Based Predictions, sorted in order of focus for the reader or API user.

Observation

One element among many that are described by a common set of attributes (independent variables), distributed across time or space, and which collectively provide guidance about an outcome that has yet to be revealed. Classical statistics often refers to an observation as a multivariate data point.

Attributes

A recorded value that is used individually or alongside other attributes to describe an observation. Otherwise known as independent variables, predictive variables, factors, or features. In the API parameters, this is typically a variable named X.

Outcome

A measurement of interest that is usually observed alongside other attributes, and which one wishes to predict. In classical statistics, this is otherwise known as dependent variables. In the API parameters, this is typically a variable named y.

Circumstance

A set of attribute values that collectively describes an observation. In the API parameters, this is typically a variable named theta.

Informativeness

A measure of the information conveyed by the circumstances of an observation, based on the inverse relationship of information and probability. Informativeness is a component of relevance and it does not depend on units of measurement.

Similarity

A measure of the closeness between one circumstance and another, based on their attributes.

Relevance

A measure of the importance of an observation to forming a prediction. Its components are the informativeness of past circumstances, the informativeness of current circumstances, and the similarity of past circumstances to current circumstances.

Asymmetry

Asymmetry measure the extent to which predictions differ when they are formed from a prediction that includes the most relevant observations compared to one that includes the least relevant observations. It is computed as the average dissimilarity of the predictions from these two methods.

asymmetry = { (\rho(w_{included}, y)-\rho(w_{excluded}, y))^2 \over 2}

Partial sample regression will produce a more reliable prediction than full-sample linear regression if outcomes have an asymmetric relationship with the independent variables. Full-sample linear regression always assumes a symmetric relationship, whereas relevance-based predictions recognizes that the relationship between outcomes and variables could change depending on circumstances.

Fit

Fit measures the relationship between relevance and outcomes for a single prediction. Fit is specific to a given prediction. The average fit across all predictions gives a general measure of a model's reliability.

fit = \rho(w, y)^2

Fit also provides a principled way to evaluate the relative reliability of alternative calibration for each prediction task. A larger value indicates that observations that are similarly relevant have similar outcomes, in which case one should have more confidence in the prediction. A smaller value indicates that relevance does not line up with the outcomes, in which case one should view the prediction more cautiously.

Adjusted Fit

Adjusted fit provides a measure of expected reliability for each grid cell's prediction. Adjusted fit recognizes the benefit of censoring non-relevant observations that obscure the patterns that exist among the relevant observations.

adjusted  fit= K(fit+asymmetry)

It also recognizes that reliance on a small number of predictive variables is more likely to produce a spurious result than a collection of many variables.

Variable Importance

Relevance-based prediction has a built-in measure of variable importance, which is called relevance-based importance. It measures the extent to which a predictive variable contributes to the reliability of a prediction. It is calculated as the average adjusted fit of the grid prediction cells that include a given variable minus the average adjusted fit of the grid cells that omit that variable.

This measure of variable importance accounts for conditionality, which t-statistics fails to address. It closely resembles a Shapley value, but it accounts for an individual prediction's reliability in addition to reliability on average.

Outlier Influence

The fit of observations with themselves. It is always greater than zero, owing to the inherent bias of comparing observations with themselves, and it is large to the extent that unusual circumstances coincide with unusual outcomes.

Agreement

The fit of observations with their peers. It may be positive, negative, or zero, and is not systematically biased.

PreviousQuickstart Guide

Last updated 1 year ago