Inria Lille, SequeL Team
Wednesday, September 29th, 2021
We aim to maximise a target function
based on a sequence of function values...
A/B/C Testing
(Source: Julie Paci)
Design of clinical trials
(Source: cbinsights.com)
Hyper-parameter optimization (HPO)
(Source: Ramraj Chandradevan)
A bandit model is denoted by \(\mu\)
Learning protocol: at each round \(t\), choose an arm \(I_t\) to play and observes a stochastic/noisy reward \(r_{I_t}\sim\nu_{I_t}\) that follows the corresponding underlying distribution.
$$\mu^\star = \textcolor{red}{\max}_i \mu_i$$
$$I^\star = \textcolor{red}{\operatorname{arg\,max}}_i \mu_i$$
Formally, a BAI algorithm is composed of:
We are interested in Top-Two Thompson Sampling (Russo, 2016)
Why?
What is known about TTTS... (posterior convergence)
Theorem (Russo, 2016) Under TTTS and under the some boundedness assumptions on the prior and the parameter space, it holds a.s. \[ \lim_{n\rightarrow{\infty}} -\frac{1}{n}\log(1-\alpha_{n,I^\star}) = \textcolor{lightskyblue}{\frac{1}{T_{\beta}^\star(\boldsymbol{\mu})}}, \] where \[ \quad \alpha_{n,i} \triangleq \Pi_{n}(\theta_i \gt \max_{j\neq i}\theta_j). \]
What is known about TTTS... (characteristic time)
Definition Let \(\Sigma_K = \{\omega : \sum_{k=1}^K \omega_k = 1, \omega_k \geq 0\}\) and define for all \(i\neq I^\star\) \[ C_i(\omega,\omega') \triangleq \min_{x\in\mathcal{I}} \omega d(\mu_{I^\star};x) + \omega' d(\mu_i;x), \] where \(d(\mu,\mu')\) is the KL-divergence. We define \[ \textcolor{lightskyblue}{T_{\beta}^\star(\boldsymbol{\mu})^{-1}} \triangleq \max_{\substack{\omega \in \Sigma_K\\\omega_{I^\star}=\beta}}\min_{i\neq I^\star} C_i(\omega_{I^\star},\omega_i). \]
In particular, for Gaussian bandits... \[ \textcolor{lightskyblue}{T_{\beta}^\star(\boldsymbol{\mu})^{-1}} = \max_{\omega:\omega_{I^\star}=\beta}\min_{i\neq I^\star} \frac{(\mu_{I^\star}-\mu_i)^2}{2\sigma^2(1/\omega_i+1/\beta)}. \]
Limitations of TTTS
Sample complexity
Theorem 3 (AISTATS 20) The TTTS sampling rule coupled with the Chernoff stopping rule form a \(\delta\)-correct BAI strategy. If all arm means are distinct, \[ \limsup_{\delta\rightarrow{0}} \frac{\mathbb{E}[\tau_{\delta}]}{\log(1/\delta)} \leq \textcolor{lightskyblue}{T_{\beta}^\star(\boldsymbol{\mu})}. \]
Lower bound (Garivier and Kaufmann, 2016): \[\liminf_{\delta \rightarrow 0}\frac{\mathbb{E}[\tau_\delta]}{\log(1/\delta)} \geq \textcolor{lightskyblue}{T_{\beta}^\star(\boldsymbol{\mu})}.\]
\(\delta\)-correctness
Stopping rule: \[ \tau_\delta^{\text{Ch.}} \triangleq \inf \left\lbrace n \in \mathbb{N} : \max_{i \in \mathcal{A}} \min_{j \in \mathcal{A} \setminus \{i\} } \textcolor{lightskyblue}{W_{n}(i,j)} \gt d_{n,\delta} \right\rbrace. \]
Transportation cost: \(\mu_{n,i,j} \triangleq (T_{n,i}\mu_{n,i} +T_{n,j}\mu_{n,j})/(T_{n,i}+T_{n,j})\), then we define \[ \color{lightskyblue}{W_n(i,j)} ≜ \left\{ \begin{array}{ll} 0 & \operatorname{if} \mu_{n,j} \geq \mu_{n,i},\\ W_{n,i,j}+W_{n,j,i} & \operatorname{otherwise}, \end{array}\right. \] where \(W_{n,i,j} \triangleq T_{n,i} d\left(\mu_{n,i},\mu_{n,i,j}\right)\) for any \(i,j\).
In particular, for Gaussian bandits: \[ \color{lightskyblue}{W_n(i,j)} = \dfrac{(\mu_{n,i}-\mu_{n,j})^2}{2\sigma^2(1/T_{n,i}+1/T_{n,j})}\mathbb{1}\{\mu_{n,j}<\mu_{n,i}\}. \]
Theorem (AISTATS 20) The TTTS sampling rule coupled with the Chernoff stopping rule with a threshold \(d_{n,\delta} \simeq \log(1/\delta) + c\log(\log(n))\) and the recommendation rule \(J_t = \operatorname{arg\,max}_i \mu_{n,i}\), form a \(\delta\)-correct BAI strategy.
Bayesian stopping rule: \[\tau_{\delta} ≜ \inf \left\{ n\in\mathbb{N}:\max_{i\in\mathcal{A}} \textcolor{limegreen}{\alpha_{n,i}} \geq c_{n,\delta} \right\}\,.\]
Sufficient condition for \(\beta\)-optimality
Lemma (AISTATS 20) Let \(\delta,\beta\in (0,1)\). For any sampling rule which satisfies \(\mathbb{E}[\textcolor{lightskyblue}{T_{\beta}^\epsilon}] \lt \infty\) for all \(\epsilon \gt 0\), we have \[ \limsup_{\delta\rightarrow{0}} \frac{\mathbb{E}[\tau_{\delta}]}{\log(1/\delta)} \leq T_{\beta}^\star(\boldsymbol{\mu}), \] if the sampling rule is coupled with the Chernoff stopping rule.
\(\textcolor{lightskyblue}{T_{\beta}^\epsilon} \triangleq \inf \left\{ N\in\mathbb{N}: \max_{i\in\mathcal{A}} \vert T_{n,i}/n-\omega_i^\beta \vert \leq \epsilon, \forall n \geq N \right\}\).
Core theorem
Theorem (AISTATS 20) Under TTTS, \(\mathbb{E}[\textcolor{lightskyblue}{T_{\beta}^\epsilon}] \lt +\infty\).
TTTS
T3C (Top-Two Transportation Cost)
Sample complexity
Theorem (AISTATS 20) The T3C sampling rule coupled with the Chernoff stopping rule form a \(\delta\)-correct BAI strategy. If all arm means are distinct, \[ \limsup_{\delta\rightarrow{0}} \frac{\mathbb{E}[\tau_{\delta}]}{\log(1/\delta)} \leq \textcolor{lightskyblue}{T_{\beta}^\star(\boldsymbol{\mu})}. \]
Time consumption
Sampling rule | T3C | TTTS | Uniform |
---|---|---|---|
Exec. time (s) | \(1.6 \times 10^{-5}\) | \(2.3 \times 10^{-4}\) | \(6.0 \times 10^{-6}\) |
Average stopping time (Bernoulli)
Average stopping time (Gaussian)
Common assumptions
Linear estimator
Definition For any arm \(\boldsymbol{x}\in\mathcal{X}\) we define its alternative set, denoted by Alt(\(\boldsymbol{x}\)) the set of parameters where \(\boldsymbol{x}\) is not the best arm, i.e.
It can fail...
A hard problem instance for linear best-arm identification
Ordering (ICML 20)
\(T^\star(\boldsymbol{\theta})^{-1}\) can be regarded as the value of a zero-sum game between a learner playing action \(\boldsymbol{x}\) according to a weight vector \(\boldsymbol{\omega}\) and the nature playing an alternative \(\boldsymbol{\theta}'\):
Leads to the algorithm of LinGame which is asymptotically optimal... (ICML 20)
Ensure a ε-approximation of the saddle point of the lower-bound game.
The usual hard instance (with different \(\delta\))
Random unit sphere (with different dimension)
Continuum-armed bandits: \(f\) lies in some metric space
Infinitely-armed bandits (Berry et al., 1997; Wang et al., 2008; Bonald and Proutière, 2013; Carpentier and Valko, 2015): arms drawn from some reservoir distribution
We tackle hyper-parameter tuning for supervised learning tasks
We see the problem as BAI in a stochastic infinitely-armed bandit: arm means are drawn from some reservoir distribution \(\nu_0\)
Goal: output an arm with mean close to \(\mu^\star\)
HPO as a BAI problem
BAI | HPO |
---|---|
query \(\nu_0\) | pick a new configuration \(\textcolor{yellow}{\lambda}\) |
sample an arm | train the classifier |
reward | cross-validation loss |
Dynamic Top-Two Thompson Sampling (ICML-AutoML 20)
In summary: in each round, query a new arm endowed with a Beta(1,1) prior, without sampling it, and run TTTS on the new set of arms
Order statistic trick
With \(\mathcal{L}_{t-1}\) the list of arms that have been effectively sampled at time \(t\), we run TTTS on the set \(\mathcal{L}_{t-1} \cup \{\mu_0\}\) where \(\mu_0\) is a pseudo-arm with posterior Beta(\(t-|\mathcal{L}_{t-1}|, 1\)).
Setting
Classifier | Hyper-parameter | Type | Bounds |
---|---|---|---|
SVM | C | \(\mathbb{R}^+\) | \(\left[ 10^{-5}, 10^{5} \right]\) |
\(\gamma\) | \(\mathbb{R}^+\) | \(\left[ 10^{-5}, 10^{5} \right]\) | |
MLP | hidden layer size | Integer | \(\left[ 5, 50 \right]\) |
\(\alpha\) | \(\mathbb{R}^+\) | \(\left[ 0, 0.9 \right]\) | |
learning rate init | \(\mathbb{R}^+\) | \(\left[ 10^{-5}, 10^{-1} \right]\) |
Some results (I)
Some results (II)