Abstract

This paper focuses on the empirical derivation of regret bounds for mobile systems that can optimize their locations in real-time within a spatiotemporally varying renewable energy resource. The case studies in this paper focus specifically on an airborne wind energy system, where the replacement of towers with tethers and a lifting body allows the system to adjust its altitude continuously, with the goal of operating at the altitude that maximizes net power production. While prior publications have proposed control strategies for this problem, often with favorable results based on simulations that use real wind data, they lack any theoretical or statistical performance guarantees. In this work, we make use of a very large synthetic dataset, identified through parameters from real wind data, to derive probabilistic bounds on the difference between optimal and actual performance, termed regret. The results are presented for a variety of control strategies, including maximum probability of improvement, upper confidence bound, greedy, and constant altitude approaches. In addition, we use dimensional analysis to generalize the aforementioned results to other spatiotemporally varying environments, making the results applicable to a wider variety of renewably powered mobile systems. Finally, to deal with more general environmental mean models, we introduce a novel approach to modify calculable regret bounds to accommodate any mean model through what we term an “effective spatial domain.”

1 Introduction

Spatiotemporally varying environments are common in mobile robotics (see Refs. [1] and [2]), unmanned aerial vehicles (see Ref. [3]), and the emerging field of renewably powered robotic systems (see Refs. [4] and [5]). In these instances, the objective is to utilize one or more mobile robots to perform a mission in an unknown, spatiotemporally varying environment. Within the renewable energy community, mobile systems can also be used as energy-harvesting devices in spatiotemporally varying environments. In this work, we focus on one such system that possesses mobility to explore and exploit its spatiotemporally varying environment: an airborne wind energy system. Airborne wind energy (AWE) systems, such as the Altaeros Buoyant Airborne Turbine [6], replace conventional towers with tethers and a lifting body (wing or aerostat) that holds the airborne generator aloft (Fig. 1). These systems are effective due to their ability to reach high altitudes, where wind speeds can be much higher than wind speeds typically seen by ground mounted turbines. In addition, AWE systems can adjust their operating altitudes and flight configurations. This capability can be broken down generally into two operational modes: crosswind flight, whereby AWE systems fly in optimized high-speed motions perpendicular to the wind, and altitude optimization. Both lead to challenges involving control in a partially observable, spatiotemporally varying environment, but for this paper we focus on altitude optimization. Altitude optimization involves an important tradeoff between exploration and exploitation, which is common to mobile systems operating in spatiotemporally varying environments. Specifically, because the wind speed is unknown at all altitudes besides the current altitude of the AWE system, the controller must balance exploring other altitudes to search for better wind conditions (for more power generation later) and exploiting the current estimate of the maximum wind speed (for more power generation now).

Fig. 1
Altaeros buoyant airborne turbine [6]
Fig. 1
Altaeros buoyant airborne turbine [6]
Close modal

A large body of research has examined different control strategies for balancing exploration and exploitation in the context of AWE systems (see Refs. [710]). Each of these results has demonstrated, based on simulations driven by real data, that appropriately designed altitude optimization algorithms can significantly outperform omniscient, constant-altitude flight.

An examination of the aforementioned results reveals an expansive set of data-driven simulation results, but a lack of theoretical or statistical performance guarantees. It is intuitive to focus on the difference between optimal and realized performance, termed regret, in formulating such guarantees. In fact, a number of papers investigate the formulation of regret bounds in classical bandit problems [1113]. These problems are often explained via a slot machine (i.e., bandit) analogy. Here, a gambler must, at every instance, select one of many arms to pull on, where each arm yields a different reward that is subject to some stochastic distribution. In the context of our problem, the gambler is the mobile (AWE) system, the arms are spatial locations (altitudes), and the rewards are performance outputs (power output or wind speed). Additionally, a few papers investigate the use of Gaussian process modeling to determine regret bounds in bandit problems [14,15]. Most of these papers specify bounds in terms of “big O” notation as functions of time. For practical purposes, this does not allow for calculable bounds that can be used to compare different control strategies. The papers that do provide calculable bounds assume stationary rewards, which do not account for temporally varying environments. This paper will focus on using statistical analyses to provide meaningful bounds on regret that are designed for temporally varying environments.

This paper first provides an empirical quantification of regret using real-time altitude setpoint optimization of an AWE system as a case study. We use wind speed versus altitude data to identify statistical hyperparameters of a Gaussian process (GP) model which are used to generate a synthetic wind dataset. For several altitude control strategies, including maximum probability of improvement (MPI), upper confidence bound (UCB), greedy, and baseline constant altitude strategies, we compute statistical regret bounds using a Monte Carlo simulation setup. In particular, we generate empirical cumulative distribution functions for regret under each control strategy. This represents the first rigorous effort at “scoring” each strategy. Because prior experimentally validated efforts have already addressed lower level tracking of altitude setpoints (see Ref. [16]), the content of this paper focuses exclusively on the higher level real-time altitude setpoint optimization.

These contributions represent an extended version of our recent conference publication [17]. We focus in this paper on extending our earlier work to generalize statistical regret bounds to environments characterized by different statistical parameters. We first use dimensional analysis to show that under a constant mean model, regret bounds can be expressed in terms of a small number of dimensionless parameters. To lift the constant mean assumption, we introduce the novel concept of an “effective spatial domain” size, which we show to be useful in generalizing regret bounds to spatiotemporally varying environments with different, nonconstant mean characteristics.

1.1 Notation.

In this paper, Pr[A] represents the probability of event A occurring, E[B] represents the expected value of random variable B, and N(C,D) represents a normal distribution with mean C and variance D.

2 Mathematical Preliminaries

In this paper, the variable z(T) will be used to represent the spatial decision variable (the chosen altitude in the AWE application), v(z(T)) will be used to represent the performance (wind speed as a surrogate for power generation) at the chosen z, for a given time-step, T. In a more general setting, z(T) will be a vector for multidimensional decision variables.

To quantify performance, we work with a quantity known as regret. Put simply, regret is the difference in performance between the optimal strategy given perfect knowledge and the strategy chosen by a controller based on imperfect knowledge. Mathematically
r(T)=v(z*(T))v(z(T))
(1)
where r(T) is the instantaneous regret, v(·) is the performance function evaluated at a location, z(T) is the location chosen by the control strategy, and z*(T) is the maximizer of the performance function. In most instances, cumulative regret and average regret are more informative statistics than instantaneous regret. These quantities are given by
R(T)=T¯=1Tr(T¯)=T¯=1Tv(z*(T¯))v(z(T¯))
(2)
Ravg(T)=R(T)T
(3)

where R(T) represents cumulative regret and Ravg(T) represents average regret. These metrics are especially useful because it is often beneficial to tradeoff some sacrifice in instantaneous performance for improved future performance that results from better exploration of the environment. This tradeoff is often referred to as exploration versus exploitation [18].

Having defined regret, we turn to the quantification of regret bounds. Because the spatiotemporally varying environment is stochastic, there always exists a chance that any control strategy will yield zero reward at any time. Consequently, it is impossible to derive an upper bound on regret (other than a trivial zero-reward bound) with 100% confidence. To circumvent this issue, we examine regret bounds in a probabilistic sense as encoded in the following axiom:

axiom 1. For everyε(0,1),τ>0,δ(ε)>0such that
Pr[r(T)δ(ε)]=1ε,Tτ
(4)
This relationship can also be expressed in terms of average regret
Pr[Ravg(T)δ¯(ε¯)]=1ε¯,Tτ
(5)

In words, this axiom states that there exists some upper bound δ on regret that holds with probability 1ε (after some number of initial time steps τ).

A key contribution of this paper lies in the estimation of δ(ε) (and δ¯(ε¯) in the case of average regret) for different control strategies; the regret bound acts as a scoring mechanism for each strategy. It is worth noting that the relationship between δ and 1ε is equivalent to the cumulative distribution function for regret. Two sample relationships between δ and ε are illustrated in Fig. 2. By definition, ε is bounded between 0 and 1. Given that low regret is desirable, an εδ curve closer to the ε-axis describes a superior control strategy.

Fig. 2
Expected shape of ε−δ curve
Fig. 2
Expected shape of ε−δ curve
Close modal

2.1 Airborne Wind Energy Case Study.

For our AWE case study, regret represents the difference between the optimal value of the wind speed given omniscient knowledge and the achieved value of the wind speed under a given control algorithm. A regret bound in this case represents a measure of how close the performance of a given algorithm would be to the optimal omniscient performance in a stochastic environment. The goal of this paper is to define methods for calculating the probabilities with which a variety of regret bounds will hold, resulting in figures similar to Fig. 2.

3 Control Architectures

Ultimately, each candidate control strategy must make a decision (operating altitude in the case of the AWE system) based on available data. In all candidate strategies, the chosen spatial location is a function of the prediction mean and prediction variance, based on the data at spatial locations that have been visited up to that time. In this work, we use GP modeling with a squared exponential covariance kernel and power law wind shear model as the mean function to compute the prediction mean and variance. Readers are referred to Ref. [19] for full details of GP modeling and Ref. [17] for full details for the AWE application. For the AWE application, the hyperparameters used in the GP model were identified based on one year of training data from a 900-MHz Doppler wind profiler deployed in Lewes, Delaware [20], using the methodology from Ref. [19] as
lt=22min,lz=270m,σ=5.1m/sa=3.77,b=0.14

3.1 Candidate Control Strategies.

We consider three control strategies for control of the system's spatial location: MPI, UCB, and a “greedy” algorithm, with full details in Ref. [17]. These three control strategies are used often in literature for similar applications (see, for example, Refs. [11,21,22]). We also introduce two benchmark strategies that maintain a constant spatial location (altitude, in the case of the AWE application), defined as “fixed location—omniscient” (FLO) and “fixed location—average” (FLA). FLA uses the constant location that results in mean performance, and should thus be beaten by any reasonable control strategy. FLO uses the constant location that maximizes performance, and is potentially beatable due to the temporal variation of the reward function. Both FLA and FLO require omniscient knowledge.

Figure 3 shows sample altitude trajectories followed by the AWE system under an omniscient optimal control strategy (where full spatiotemporal knowledge of the wind is assumed, thereby representing an upper bound on attainable performance) in the top plot, along with the UCB, MPI, and greedy strategies in the bottom three plots, respectively. The contours represent the on-board estimates of wind speed versus altitude, which in the case of the bottom three plots are based on a partially observable wind field. It can be seen that the greedy approach, which does not incentivize exploration, maintains the worst estimate of the wind profile.

Fig. 3
Comparison of several on-board models. The red line is a sample trajectory chosen by each algorithm. The contour plot represents the on-board Gaussian process estimate of the wind speed. The top panel's contour plot visualizes the actual wind speed.
Fig. 3
Comparison of several on-board models. The red line is a sample trajectory chosen by each algorithm. The contour plot represents the on-board Gaussian process estimate of the wind speed. The top panel's contour plot visualizes the actual wind speed.
Close modal

4 Generating Synthetic Data

Although data are available for certain applications, including wind speed versus altitude data for the AWE application, it is not sufficient to run Monte Carlo simulations of competing algorithms, which also does not lend insight into how well the regret bound estimates will extrapolate to other applications for which data are not available. To generate a vast amount of data with customizable spatial and temporal qualities, we leverage existing data to extract spatiotemporal statistical parameters, then generate a very large synthetic dataset that adheres to these identified parameters.

Using identified covariance kernel parameters, synthetic data is generated by applying a coloring filter to Gaussian white noise to maintain consistent covariance matrices between the generated data and the initial training data. This is done by using the Cholesky decomposition of the desired covariance matrix as a coloring filter according to the following process. It is important to note that white noise is used as a starting point in the process to capture the variance in the environmental variable (wind speed in the AWE example), not to reflect noise in the measurements available to the control system. The coloring filters are used to capture spatial and temporal correlation.

First, the square matrix Ln×n (where n is the length of coordinate vector x) is calculated from the Cholesky decomposition of the covariance matrix. Here, x is a vector (or matrix) spanning the (possibly multidimensional) domain of the desired synthetic data, and K(x, x) is the associated square covariance matrix between all points in x
LLT=K(x,x)
(6)
Next, a white noise signal Nn with the desired variance qualities of synthetic data (σ2) is premultiplied by L to correlate the noise according to the desired covariance matrix
NN(0,σ2)
(7)
vN=LN
(8)
Finally, this filtered noise signal vN is overlaid on the mean model m(x) to obtain a set of synthetic data that can be used for simulation, vsyn
vsyn(x)=m(x)+vN(x)
(9)

5 Dimensional Analysis for Extrapolation to Other Environments

One goal of determining regret bounds is to extrapolate those bounds to environments with different sets of hyperparameters, under a given controller. Here, we consider this extrapolation in the case of constant mean environments, whereas Sec. 7 introduces an additional variable that allows for the generalization to variable mean environments. To start this process, we begin with the following Proposition, which follows from the Buckingham Pi Theorem [23]:

Proposition 1. By the Buckingham Pi Theorem, the interdependence of the variables in Table 1 (which describe the system under consideration in this work, under a given controller) can be modeled as in Eqs. (10) and (11)
Π1=f(Π2,Π3,Π4)
(10)
E(Ravg(T))ltlz=f(σltlz,δtlt,zmaxzminlz)
(11)

whereΠ1,Π2,Π3,Π4are given in Table 2.

Table 1

Control and environmental variables and associated units

VariableDimensions
E(Ravg(T))m/s
σm/s
δts
zmaxzminm
lts
lzm
VariableDimensions
E(Ravg(T))m/s
σm/s
δts
zmaxzminm
lts
lzm

Note that E(·) denotes an expectation, which is required for dimensional analysis, since Ravg(T) is a random variable, whereas the other five variables are deterministic.

Table 2

Pi groups

Pi groupVariables
Π1E(Ravg(T))ltlz
Π2σltlz
Π3δtlt
Π4zmaxzminlz
Pi groupVariables
Π1E(Ravg(T))ltlz
Π2σltlz
Π3δtlt
Π4zmaxzminlz

Proof. Under a given controller, the system under consideration in this work involves n = 6 variables, namely, expected average regret (E(Ravg(T))), the size of the spatial domain (zmaxzmin), the control time-step (δt), and the environmental hyperparameters (σ, lt, and lz). Across these variables, we have k = 2 fundamental dimensions, namely, length, and time (see Table 1).

From the Buckingham Pi Theorem, the number of nondimensional quantities, Πi, in the governing equation is p=nk=4. Therefore, there exists a function g(Π1,Π2,Π3,Π4) for which:
g(Π1,Π2,Π3,Π4)=0
(12)

By rearranging, we get Eq. (10). By taking lt and lz as fundamental quantities and E(Ravg(T)),zmaxzmin, δt, and σ as derived quantities, we can calculate Π1,Π2,Π3,Π4 by nondimensionalizing with lt and lz (see Table 2). This yields Eq. (11).▪

Further analysis of synthetic data using Monte Carlo simulations indicates that average regret Ravg(T) varies linearly with respect to σ, the standard deviation of the reward in the environment, as is corroborated by Fig. 4. In this figure, each of the lines corresponds to a single set of system and environment parameters δt, zmaxzmin, lt, and lz with variable σ. Each of the lines tested has a coefficient of determination value of at least 0.995, strongly suggesting that regret does in fact vary linearly with σ. The consequence of this observation is that Π2 can be factored out of the unknown function, and lt and lz can be canceled, resulting in Eq. (13)
E(Ravg(T))=σg(δtlt,zmaxzminlz)
(13)
Fig. 4
Linearity of average regret with respect to σ. Each line corresponds to a unique combination of δt,(zmax−zmin),lz, andlt.
Fig. 4
Linearity of average regret with respect to σ. Each line corresponds to a unique combination of δt,(zmax−zmin),lz, andlt.
Close modal

To show that the theoretical results from the dimensional analysis hold up in simulation, Figs. 5 and 6 show the individual effects of the spatial and temporal parameters on regret. Clustering of points on these plots suggests that only the values of the spatial and temporal parameters affect regret regardless of the values for domain size, time-step, and either length scale.

Fig. 5
Average regret versus spatial parameter; note unique (zmax−zmin)/lz values do not correspond to unique points because (zmax−zmin) and lz vary independently
Fig. 5
Average regret versus spatial parameter; note unique (zmax−zmin)/lz values do not correspond to unique points because (zmax−zmin) and lz vary independently
Close modal
Fig. 6
Average regret versus temporal parameter, note unique δt/lt values do not correspond to unique points because δt and lt vary independently
Fig. 6
Average regret versus temporal parameter, note unique δt/lt values do not correspond to unique points because δt and lt vary independently
Close modal

Figure 5 shows how regret varies when varying both domain size and spatial length scale. Specifically, either increasing the size of the spatial domain or decreasing the spatial length scale increases average regret, as expected.

Figure 6 is similar to Fig. 5 but examines the effects of the temporal parameter instead of the spatial parameter. Here, decreasing mobility (by increasing the control time-step) and shortening the temporal length scale both cause regret to grow, as expected. Note that regret appears to be reaching an asymptote in this plot. This is because high values of the temporal parameter mean that successive data points are essentially uncorrelated due to the severity of the temporal variation in the environment. Thus, exploration is no longer useful at this point, and further increases in δt or decreases in lt do not further increase regret.

6 Effective Spatial Domain

A number of applications, including the AWE system, are characterized by mean functions that vary spatially. We present the concept of an effective spatial domain, a useful tool for generalizing results to nonconstant mean environments. Figure 7 shows three separate mean functions, one of which is constant. A high-performing algorithm will prioritize visiting locations where the mean function is high. Thus, environments with larger spatial variability in the mean function (e.g., the dotted curve in Fig. 7) can focus on smaller subsets of the full spatial domain while still achieving strong performance. Consequently, for a spatial domain size, larger spatial variation in the mean will result in a smaller effective spatial domain that must be explored.

Fig. 7
Sample mean functions
Fig. 7
Sample mean functions
Close modal

To quantify the concept of an effective spatial domain and begin to understand how it impacts average regret, recall the spatial parameter (zmaxzmin)/lz presented in Eq. (11). This parameter can be thought of as a nondimensional measure of the size of the domain which the control system may operate. As certain environments result in a smaller feasible fraction of the domain, this parameter should be scaled to account for various mean function models, as described through the following Conjecture:

Conjecture 1. The dimensional analysis-based dynamic similarity results of Sec. 6with a spatially varying mean function m(z) can be replaced by
Ravg(T)=σf(δtlt,keff(zmaxzmin)lz)
(14)
using a modified nondimensional spatial parameter given bykeff(zmaxzmin)/lz, wherekeffis the effective domain multiplier and can be calculated as
keff=2zmaxzminzminzmaxPr[v(z,ta)μmax]dz
(15)

Here, ta is some arbitrary time,[zmin,zmax]is the spatial domain, and μmax is the maximum of the mean function (as opposed to the maximum of the actual data). For a power law mean (and any other strictly increasing mean), this is simply the value of the mean function atzmax.

Equation (15) describes a method for calculating the tail of the probability distribution of the environmental variable that lies above the maximum value of the mean function of that variable, scaled such that the maximum value of keff is unity. It can be thought of as a probabilistic measure of the fraction of the spatial domain likely to be visited by an efficient control algorithm.

A visualization of this calculation for the solid mean profile from Fig. 7 can be seen in Fig. 8. This mean function is a sample of the traditionally accepted power law mean wind shear profile. Shown in blue and green in Fig. 8 are sample probability density functions of the environmental resource at different altitudes. The area highlighted under each function corresponds to the probability that the resource at that altitude will be higher than the maximum value of the mean function. This is the interior of the integral from Eq. (15). Integrating this quantity and dividing by the domain size averages this probability across all altitudes. Noting that the maximum value with a symmetric probability distribution is 1/2, we multiply by 2, causing values of keff to fall in the range [0,1].

Fig. 8
Schematic for effective domain calculation; note the shaded regions correspond to the interior of the integral in Eq.(15)
Fig. 8
Schematic for effective domain calculation; note the shaded regions correspond to the interior of the integral in Eq.(15)
Close modal

Figure 9 shows contours of regret versus mean function parameters on the left and plots of keff versus the same domain on the right. The parameters varied are the leading coefficient v0 and the exponent b from a power law wind shear model, v(z)=v0(zzref)b. By inspection, lines of constant regret on the left plot are closely correlated to the lines of constant effective spatial domain on the right plot, lending credibility Conjecture 1. The same data are plotted as a scatter plot in Fig. 10. This plot shows a strong correlation between regret and effective domain multiplier for power law mean models with a Spearman rank correlation coefficient of 0.984, strongly implying the validity of this approach for power law mean profiles.

Fig. 9
Regret and effective domain versus mean function parameters
Fig. 9
Regret and effective domain versus mean function parameters
Close modal
Fig. 10
Average regret versus effective domain multiplier for various power law mean models. Note the strong correlation between regret and effective domain (Spearman correlation coefficient = 0.984).
Fig. 10
Average regret versus effective domain multiplier for various power law mean models. Note the strong correlation between regret and effective domain (Spearman correlation coefficient = 0.984).
Close modal

To further analyze the relationship between the effective domain and regret, Fig. 11 shows lines of constant effective domain size with variable power law coefficients and exponents. The horizontal axis plots the power law leading coefficient v0, but the exponent b varies as well to keep the effective domain a constant for each line. The fact that each of these lines is near horizontal strongly supports the hypothesis that effective domain size can replace domain size to account for nonzero mean functions. That is, for nonzero mean environments, Eq. (13) can be replaced by Eq. (14).

Fig. 11
Average regret versus single mean parameter for constant effective domain with power law mean. Note that constant regret for constant effective domain is invariant to the varying shape of the mean function.
Fig. 11
Average regret versus single mean parameter for constant effective domain with power law mean. Note that constant regret for constant effective domain is invariant to the varying shape of the mean function.
Close modal

Figure 12 shows Monte Carlo simulation results of regret versus effective domain for three separate mean models. All three follow near-identical trends. The fact that three completely independent mean models have results that match when accounting for effective domain size shows the generalizability of the effective domain parameter in accounting for varying mean models in regret bound calculations.

Fig. 12
Average regret versus effective domain for three mean models. This method is effective for variations on a single mean function as well as entirely different function structures.
Fig. 12
Average regret versus effective domain for three mean models. This method is effective for variations on a single mean function as well as entirely different function structures.
Close modal

7 Empirical Comparison of Control Strategies—Airborne Wind Energy System

We now return to our AWE system case study, namely, the Altaeros BAT. Using each of the acquisition functions defined above, 100,000 simulations were run, each with a different set of synthetic wind data generated using the standard power law wind shear model and the covariance and mean function hyperparameters given in Sec. 3.

For every set of synthetic data, each control strategy's performance was simulated independently. Instantaneous and average regret were logged for each run of each control strategy. These synthetic data were used to generate the regret bound comparisons of Figs. 13 and 14, which describe the relationship between the upper bound on regret, δ, and its associated probability, 1ε, as defined in Eqs. (4) and (5).

Fig. 13
Average regret ε−δ relation calculated by a 100,000 iteration Monte Carlo Study with hyperparameters from Sec. 3
Fig. 13
Average regret ε−δ relation calculated by a 100,000 iteration Monte Carlo Study with hyperparameters from Sec. 3
Close modal
Fig. 14
Instantaneous regret ε−δ relation calculated by a 100,000 iteration Monte Carlo Study with hyperparameters from Sec. 3
Fig. 14
Instantaneous regret ε−δ relation calculated by a 100,000 iteration Monte Carlo Study with hyperparameters from Sec. 3
Close modal

Several conclusions can be drawn from these results. First, each algorithm outperforms the average stationary strategy (FLA) by a significant margin, so each has some merit. However, the greedy strategy fails to outperform the omniscient stationary strategy (FLO), reinforcing the idea that the explicit incentivization of exploration in the acquisition function is important to long-term exploitation. Both UCB and MPI outperform the omniscient stationary strategy (FLO) by a significant margin, showing that each is an effective algorithm in a wide range of cases.

Both plots have an asymptote along the δ axis since there is no way to provide a hard finite upper bound on regret. Furthermore, each graph has a different trend as it approaches ε = 1. Because it is possible to obtain zero instantaneous regret (in a discretized spatial environment), there is a range of ε over which δ is zero for each algorithm. Comparing this to the average regret plots, δ¯ increases extremely quickly in going from ε¯=1 to ε¯=.99 because it is nearly impossible to obtain zero cumulative (and therefore average) regret.

8 Conclusions

Novel contributions from this paper include dimensional analysis for the expected form of regret bounds in a nonstationary environment as well as the introduction of an “effective spatial domain” multiplier to account for nonzero environmental means. The dimensional analysis showed that for any environment that can be classified by a variance, a spatial parameter, and a temporal parameter, the regret incurred through operation in such an environment can also be classified as a function of three similar, nondimensionalized parameters. This analysis also considered the mean model of the environment, characterizing an “effective spatial domain” multiplier that allows for a variety of mean models to be characterized under the single analysis in this paper. Real-time optimization algorithms and regret bound results were concretely assessed based on an AWE system, focusing on several candidate control algorithms (UCB, MPI, and greedy).

Funding Data

  • National Science Foundation Award Number 1711579, entitled “Collaborative Research: Multi-Scale, Multi-Rate Spatiotemporal Optimal Control with Application to Airborne Wind Energy Systems” (Funder ID: 10.13039/100000001).

References

1.
Solanas
,
A.
, and
Garcia
,
M.
,
2004
, “
Coordinated Multi-Robot Exploration Through Unsupervised Clustering of Unknown Space
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
, Sendai, Japan, Sept. 28–Oct. 2.10.1109/IROS.2004.1389437
2.
Quann
,
M.
,
Ojeda
,
L.
,
Smith
,
W.
,
Rizzo
,
D.
,
Castanier
,
M.
, and
Barton
,
K.
,
2017
, “
An Energy-Efficient Method for Multi-Robot Reconnaissance in an Unknown Environment
,”
American Control Conference
, Seattle, WA, May 24–26.10.23919/ACC.2017.7963292
3.
Bentz
,
W.
,
Hoang
,
T.
,
Bayasgalan
,
E.
, and
Panagou
,
D.
,
2018
, “
Complete 3-D Dynamic Coverage in Energy-Constrained multi-UAV Sensor Networks
,”
Auton. Robots
,
42
(
4
), pp.
825
851
.10.1007/s10514-017-9661-x
4.
Meinig
,
C.
,
Lawrence-Slavas
,
N.
,
Jenkins
,
R.
, and
Tabisola
,
H.
,
2015
, “
The Use of Saildrones to Examine Spring Conditions in the Bering Sea: Vehicle Specification and Mission Performance
,”
OCEANS 2015
, Washington, DC, May 18–21, pp.
1
6
.10.23919/OCEANS.2015.7404348
5.
Hartl
,
A. E.
, and
Mazzoleni
,
A. P.
,
2010
, “
Dynamic Modeling of a Wind-Driven Tumbleweed Rover Including Atmospheric Effects
,”
J. Spacecr. Rockets
,
47
(
3
), pp.
493
502
.10.2514/1.45174
6.
Yallop
,
O.
,
2014
, “
Inflatable Turbines: The Windfarms of the Future?
,” accessed Mar. 29, 2019, https://www.telegraph.co.uk/news/earth/energy/windpower/10774688/Inflatable-turbines-the-windfarms-of-the-future.html
7.
Bafandeh
,
A.
, and
Vermillion
,
C.
,
2016
, “
Real-Time Altitude Optimization of Airborne Wind Energy Systems Using Lyapunov-Based Switched Extremum Seeking Control
,”
American Control Conference
, Boston, MA, July 6–8, pp.
4990
4995
.10.1109/ACC.2016.7526144
8.
Baheri
,
A.
,
Bin-Karim
,
S.
,
Bafandeh
,
A.
, and
Vermillion
,
C.
,
2017
, “
Real-Time Control Using Bayesian Optimization: A Case Study in Airborne Wind Energy Systems
,”
Control Eng. Pract.
,
69
, pp.
131
140
.10.1016/j.conengprac.2017.09.007
9.
Bafandeh
,
A.
,
Bin-Karim
,
S.
,
Baheri
,
A.
, and
Vermillion
,
C.
,
2018
, “
A Comparative Assessment of Hierarchical Control Structures for Spatiotemporally-Varying Systems, With Application to Airborne Wind Energy
,”
Control Eng. Pract.
,
74
, pp.
71
83
.10.1016/j.conengprac.2018.02.008
10.
Dunn
,
L.
,
Vermillion
,
C.
,
Chow
,
F. K.
, and
Moura
,
S.
,
2019
, “
On Wind Speed Sensor Configurations and Altitude Control in Airborne Wind Energy Systems
,”
American Control Conference
, Philadelphia, PA.
11.
Slivkins
,
A.
, and
Upfal
,
E.
,
2008
, “
Adapting to a Changing Environment: The Brownian Restless Bandits
,” 21st Conference on Learning Theory (
COLT
), Helsinki, Finland, July 9–12, pp.
343
354
.https://www.learningtheory.org/colt2008/papers/45-Slivkins.pdf
12.
Besbes
,
O.
,
Gur
,
Y.
, and
Zeevi
,
A.
,
2014
, “
Stochastic Multi-Armed-Bandit Problem With Non-stationary Rewards
,”
Advances in Neural Information Processing Systems 27
,
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N. D.
Lawrence
, and
K. Q.
Weinberger
, eds.,
Curran Associates, Inc
, Montreal, QC, Canada, pp.
199
207
.
13.
Garivier
,
A.
, and
Moulines
,
E.
,
2011
, “
On Upper-Confidence Bound Policies for Switching Bandit Problems
,”
Algorithmic Learning Theory
,
J.
Kivinen
,
C.
Szepesvári
,
E.
Ukkonen
, and
T.
Zeugmann
, eds.,
Springer
,
Berlin, Germany
, pp.
174
188
.
14.
Srinivas
,
N.
,
Krause
,
A.
,
Kakade
,
S. M.
, and
Seeger
,
M. W.
,
2009
, “
Gaussian Process Bandits Without Regret: An Experimental Design Approach
,”
Proceedings of the 27th International Conference on Machine Learning 2010, Haifa, Israel
.
15.
Krause
,
A.
, and
Ong
,
C. S.
,
2011
, “
Contextual Gaussian Process Bandit Optimization
,”
Advances in Neural Information Processing Systems
,
J.
Shawe-Taylor
,
R. S.
Zemel
,
P. L.
Bartlett
,
F.
Pereira
, and
K. Q.
Weinberger
, eds.,
Curran Associates, Inc.
, Grenada, Spain, pp.
2447
2455
.
16.
Bafandeh
,
A.
, and
Vermillion
,
C.
,
2017
, “
Altitude Optimization of Airborne Wind Energy Systems Via Switched Extremum Seeking–Design, Analysis, and Economic Assessment
,”
IEEE Trans. Control Syst. Technol.
,
25
(
6
), pp.
2022
2033
.10.1109/TCST.2016.2632534
17.
Haydon
,
B.
,
Cole
,
J.
,
Dunn
,
L.
,
Keyantuo
,
P.
,
Chow
,
T.
,
Moura
,
S.
, and
Vermillion
,
C.
,
2019
, “
Empirical Regret Bounds for Control in Spatiotemporally Varying Environments: A Case Study in Airborne Wind Energy
,”
ASME
Paper No. DSCC2019-9068
.10.1115/DSCC2019-9068
18.
Audibert
,
J.-Y.
,
Munos
,
R.
, and
Szepesvári
,
C.
,
2009
, “
Exploration– Exploitation Tradeoff Using Variance Estimates in Multi-Armed Bandits
,”
Theor. Comput. Sci.
,
410
(
19
), pp.
1876
1902
.10.1016/j.tcs.2009.01.016
19.
Williams
,
C. K.
, and
Rasmussen
,
C. E.
,
2006
,
Gaussian Processes for Machine Learning
,
MIT Press
,
Cambridge, MA
.
20.
Archer
,
C.
,
2014
,
Wind Profiler at Cape Henlopen
, Report.
21.
Kushner
,
H. J.
,
1964
, “
A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise
,”
ASME J. Basic Eng.
,
86
(
1
), pp.
97
106
.10.1115/1.3653121
22.
Cox
,
D.
, and
John
,
S.
,
1992
, “
SDO: A Statistical Method for Global Optimization
,”
Multidisciplinary Design Optimization: State of the Art
, Vol.
2
,
IEEE International Conference on Systems
, Man, and Cybernetics, Chicago, IL, Oct. 18–21, pp.
1241
1246
.https://www.researchgate.net/publication/2764365_SDO_A_Statistical_Method_for_Global_Optimization
23.
Buckingham
,
E.
,
1914
, “
On Physically Similar Systems; Illustrations of the Use of Dimensional Equations
,”
Phys. Rev.
,
4
(
4
), pp.
345
376
.10.1103/PhysRev.4.345