3 Fixed duration state constrained optimal control

The construction of a discontinuous near optimal feedback control law universal on a prescribed set first appeared in Krasovskii [31] and was elaborated upon in [32] and [33]; see also Krasovskii and Subbotin [34], [35]. Unlike these motivating seminal works, we take a proximal analytic approach in our constructions, in line with the results of the previous section on stabilizability.

Suppose that $% latex2html id marker 1746 $S\subset \mathbb{R}^{n}$$ is a compact set which is weakly invariant (or in alternate terminology, viable or holdable); that is, for any $% latex2html id marker 1748 $(\tau ,\alpha )\in \mathbb{R}\times S$$ there exists a control $% latex2html id marker 1750 $% u(\cdot )$$ such that $x(t)=x(t;\tau ,\alpha ,u(\cdot ))\in S$ for all $t\geq \tau$ . Let $% latex2html id marker 1756 $\ell :\mathbb{R}^{n}\to \mathbb{R}$$ be continuous, and let $% latex2html id marker 1758 $T\in \mathbb{R}$$ be fixed. For an initial phase $(\tau ,\alpha )\in (-\infty ,T]\times S$ , consider the following fixed time endpoint cost optimal control problem $% latex2html id marker 1762 $% P(\tau ,\alpha )$$ with state constraint

:

minimize $\ell (x(T))$

subject to

$\dot{x}=f(x,u),\quad x(\tau )=\alpha ,\quad x(t)\in S ~~\forall \,t\in [\tau ,T]$ .

Then, by standard ``sequential compactness of trajectories'' arguments, the following facts are readily verified:

As before, by a feedback we simply mean any selection of

of the form $% latex2html id marker 1808 $k:\mathbb{R}\times\mathbb{R}^{n}\rightarrow U$$ . We will commence to sketch the method in Clarke, Rifford, and Stern [18] for producing a feedback

which generates a near optimal trajectory which nearly satisfies the state constraint, with respect to the $\pi$ -trajectory discretized solution concept; complete details can be found in that reference. This feedback will be operative universally for all initial phases in a specified bounded subset of $(-\infty ,T]\times S$ , and it is robust with respect to measurement and external errors. The main idea is to adapt the arguments employed in [12] in proving the stabilizability result given by Theorem 2.4 above to the present problem, with the value function taking over the role played by the CLF

in our prior feedback stabilizability considerations. Note well, however, that a serious technical difficulty must be overcome in achieving this: The method in Theorem 2.4 required local Lipschitzness of the CLF

(in obtaining (2.11)), but the value function in the present problem may not even be continuous, as was pointed out in Remark 3.1.

We require the following notation for ``enlarged'' dynamics. For $\varepsilon > 0$ , we denote

Theorem 3.2 Let $t_{0}\in (-\infty ,T)$ be specified. Then, for any given $\varepsilon > 0$ , there exists a feedback along with positive numbers $\delta _{0}$ and $% latex2html id marker 1844 $% E_{q}$$ such that, for every $\delta \in (0,\delta _{0})$ , there exists $% latex2html id marker 1848 $% E_{p}(\delta )>0$$ as follows: for every initial phase

$\begin{displaymath} (\tau ,\alpha )\in \lbrack t_{0},T]\times S \end{displaymath}$

(3.2)

and any partition $\pi$ of $[\tau ,T]$ with

$\begin{displaymath} \frac{\delta }{2}\leq t_{i+1}-t_{i}\leq \delta ,\quad i=0,1,\ldots ,N_{\pi }-1,~~t_{N_{\pi }}=T, \end{displaymath}$

(3.3)

the error bounds

$\begin{displaymath} \Vert p(t_{i})\Vert \leq E_{p}(\delta ),\quad i=0,1,\ldots ,N_{\pi }-1, \end{displaymath}$

(3.4)

$\begin{displaymath} \Vert q\Vert _{\infty }\leq E_{q} \end{displaymath}$

(3.5)

imply that the associated $\pi$ -trajectory $x_{\pi }$ , with respect to $% latex2html id marker 1858 $(\ref{adym})$$ , satisfying
$x_{\pi }(\tau )=\alpha$ also satisfies

$\begin{displaymath} \ell (x_{\pi }(T))\leq V(\tau ,\alpha )+\varepsilon \end{displaymath}$

(3.6)

and

$\begin{displaymath} x_{\pi }(t)\in S+\varepsilon B_{n}\quad \forall \,t\in [\tau ,T]. \end{displaymath}$

(3.7)

Hence, it is asserted that the feedback

produces a $\pi$ -trajectory for the enlarged dynamics (3.1), which is $\varepsilon$ -optimal and which remains $\varepsilon$ -near

in a manner which is robust and effective universally for any initial phase in the generalized rectangle $[t_0,T]\times S$ .

Without loss of generality, we shall for notational ease assume

and take

in the statement of Theorem 3.2.

Define a lower semicontinuous extended real valued function $% latex2html id marker 1878 $\widetilde V:\mathbb{R}\times \mathbb{R}^n \to (-\infty,\infty]$$ as

Given $\beta >0$ , we now define the lower semicontinuous extended real valued function $% latex2html id marker 1900 $\widetilde{V}^{\beta }:\mathbb{R}\times \mathbb{R} ^{n}\to (-\infty ,\infty ]$$ as

For a parameter value $\lambda >0$ , we denote by $\widetilde V^\beta_\lambda$ the quadratic inf-convolution of $\widetilde V^\beta$ ; that is

The idea of using the quadratic inf-convolution in order to construct near optimal strategies goes back to Subbotin and his coworkers, where it was employed in a differential games context; see, e.g., Subbotin [51].

These extrema are attained due to the compactness of

, continuity of $% latex2html id marker 1928 $% \ell $$ , lower semicontinuity of $\widetilde V^\beta$ , and continuity of $% latex2html id marker 1932 $% \widetilde V^\beta_\lambda$$ . The fact that the second equality involving $% latex2html id marker 1934 $% m_\ell$$ holds for any $\lambda >0$ is evident from (3.14). Note also that

Suppose $\partial _{P}\widetilde{V}_{\lambda }^{\beta }(t,x)\neq \phi$ at some $% latex2html id marker 1940 $(t,x)\in \mathbb{R}\times \mathbb{R}^{n}$$ . Basic proximal analytic facts about the quadratic inf-convolution (see Clarke, Ledyaev, and Wolenski [17] as well as the exposition in [20]) are that

We now fix $\hat T \in (0,T)$ ; subsequently it is required that $T-\hat T$ be taken sufficiently small. The next lemma follows easily from the previous one and (3.11).

We now introduce notation for the sublevel sets of $\widetilde V^\beta$ and $% latex2html id marker 1966 $% \widetilde V_\lambda^\beta$$

We shall also require the following lemma, which asserts how the sublevel sets of $\widetilde V^\beta$ are approximated by those of its quadratic inf-convolution. (We denote the Hausdorff metric by ``haus''.)

Now fix $\eta >0$ ; we will not require the smallness of this parameter. It is easy to see that for any $\beta$ and $\lambda$ one has

This puts us in a position to adapt the general technique used in proving Theorem 2.4 to the function $\widetilde V^\beta_\lambda$ with $% latex2html id marker 1996 $% \lambda$$ chosen as above. For the given $\varepsilon$ in the statement of Theorem 3.2, the parameters $\varepsilon^{\prime}$ and $\beta$ are taken sufficiently small, $% latex2html id marker 2004 $% \tilde T$$ near

, and $T^{\prime}$ near $\tilde T$ , in such a way that further estimates lead to the required conclusion. The idea of the proof is to use (3.21) and (3.22) in order to show that $x_\pi$ achieves appropriate nonincrease while never leaving the set $% latex2html id marker 2014 ${\rm int} \{\widetilde S^\beta_\lambda(m_u+\eta)\}$$ ; a shell based construction is employed, as described in connection with Theorem 2.4.

Remark 3.6

When $% latex2html id marker 2016 $S=\mathbb{R}^{n}$$ (no state constraint) and the error functions and are both zero (no measurement or external errors), the above result was proven in Nobakhtian and Stern [42] without enlarging the dynamics. (Euler polygonal arcs were employed in [42] as opposed to $\pi$ -trajectories here; we need not dwell upon the distinction.) In that less general version of Theorem 3.2, (3.3) is replaced by the one-sided condition

$\begin{displaymath} t_{i+1}-t_{i}\leq \delta ,\quad i=0,1,\ldots ,N_{\pi }-1. \end{displaymath}$ (3.23)

As was pointed out earlier, it is the presence of the state measurement error which necessitates the lower bound in (3.3) of Theorem 3.2.
Berkovitz [6] provided a method of universal feedback construction for optimal control, quite different from those mentioned above, but one which also relies upon a nonsmooth Hamilton-Jacobi approach. In the context of the present article, Berkovitz's approach can be described as follows. Since the value function of the problem is known to satisfy the generalized Hamilton-Jacobi inequality

$\begin{displaymath} % latex2html id marker 755\min_{v\in f(x,U)}DV(t,x;1,v) = 0,\quad (t,x) \in (-\infty,T)\times \mathbb{R}^n, \end{displaymath}$ (3.24)

one approach (which is known to work when is smooth) is to consider a set-valued ``feedback map'' such that

$\begin{displaymath} f(t,x,U(t,x)) = argmin_{v\in f(x,U)} DV(t,x;1,v). \end{displaymath}$ (3.25)

One is then led to consider the differential inclusion

$\begin{displaymath} \dot x \in f(x,U(t,x)). \end{displaymath}$ (3.26)

It transpires that under the present hypotheses, any solution of this differential inclusion corresponds to an optimal trajectory of the optimal control problem. On the other hand, as is noted in [6], the multifunction on the right-hand-side of (3.26) in general lacks sufficient regularity (most notably, convexity, compactness, and upper semicontinuity) for the existence of solutions to hold in general, or, for that matter, for discretized solution procedures to be applicable. However, it is known (see Subbotina [53]) that under sufficient smoothness of the dynamics and cost functional $\ell$ , the feedback map is compact valued and upper semicontinuous, but convexity of can still fail.
An approach to feedback construction related to [6] and [53] was undertaken by Cannarsa and Frankowska in [8]; in that work, additional conditions on the cost functional and dynamics were given which provide the requisite regularity in Berkovitz's original procedure, namely, smoothness of .
In Rowland and Vinter [45], a modification of Berkovitz's method is given which overcomes the lack of regularity of without imposing extra conditions. Rowland and Vinter provided a discretization procedure (but not a feedback law) which in the limit produces an optimal trajectory for any initial phase.
If is known, then a special case of Theorem 4.8.1 of [20] (which first appeared as Theorem 10.1 in [14]) provides a proximal aiming method for constructing a feedback, such that all its limiting discretized (in this case, Euler polygonal arcs) solutions are optimal (that is, $\varepsilon =0$ ), for a given initial data pair $% latex2html id marker 2050 $% (\tau ,\alpha )$$ . Actually, the invariance-based proof shows that a somewhat better result holds: the feedback produces optimal limit solutions for any initial data in the set

$\begin{displaymath} % latex2html id marker 2052S:=\{(\tau ^{\prime },\alpha ^{... ...}:V(\tau ^{\prime },\alpha ^{\prime })\leq V(\tau ,\alpha )\}. \end{displaymath}$

The universality property of the feedback produced in Theorem 3.2 is an important distinction, and in a sense, the weakening of ``optimal'' to `` $% latex2html id marker 2054 $% \varepsilon $$ -optimal for any given $\varepsilon > 0$ '' in Theorem 3.2 can be viewed as the price paid for universality, albeit a small one in any practical sense. Whether this price is truly unavoidable is an open question, since we do not at present have a counterexample to the $% latex2html id marker 2058 $% \varepsilon =0$$ case (either for $\pi$ -trajectories or for limiting $\pi$ -trajectories). On the other hand, Subbotina [52] (see also Krasovskii and Subbotin [35]) has provided an example of a fixed duration differential game which does not possess a universal saddle point, under hypotheses which imply the existence of a saddle point for each individual startpoint.
In Theorem 10.2 of [14], a sufficient condition is given for the existence of a universal $\varepsilon$ -optimal feedback, in the classical ordinary differential equations (as opposed to the discretized or limiting discretized) solution sense. This condition requires finding a Lipschitz semisolution to a strict Hamilton-Jacobi inequality, but with the proximal subdifferential $\partial _{P}V$ replaced by the generalized subdifferential $\partial _{C}V$ of Clarke, which is in general a larger object than the -subdifferential. Because of this, the value function in general does not satisfy this condition, so there is the difficulty of finding an appropriate semisolution if one seeks to apply this result.
In Clarke, Ledyaev, and Subbotin [15], a proximal analytic method is given for constructing universal $\varepsilon$ -optimal feedback controls in differential games of pursuit, in the Krasovskii-Subbotin framework; see also [16]. This work is related to that of Garnysheva and Subbotin [27], [26], who constructed suboptimal discontinuous feedback by using what they called aiming at ``quasi-gradients''; see also Subbotin [51]. The feedbacks in [15] were constructed with the aid of the quadratic inf-convolutions of a not necessarily continuous proximal semisolution to a Hamilton-Jacobi inequality; this lack of continuity is a natural feature of the value function in time-optimal and pursuit problems, as it is in the fixed duration state constrained control problem considered above.
For maximum principle based approaches to the general problem of optimal control in the presence of state constraints, see Ferreira, Fontes, and Vinter [23] as well as Vinter and Zheng [55].

3.1 A strengthening under additional assumptions on

Let us now posit the following additional geometric assumptions on the state constraint set

In Clarke, Rifford, and Stern [18], the following result is proven by means of a state constrained tracking lemma.

We denote $% latex2html id marker 2098 $\hat S := {\rm cl}[{\rm comp} (S)]$$ , and for

, we denote the r-inner approximation of

Proposition 3.7 Let hold. Let $t_{0}\in (-\infty ,T)$ and $\varepsilon > 0$ be specified. Then for taken sufficiently small, there exists a feedback $% latex2html id marker 2114 $% k_{r}$$ along with positive numbers $\delta _{0}^{r}$ and $E_{q}^{r}$ such that, for every $\delta \in (0,\delta _{0}^{r})$ there exists $E_{p}(\delta )>0$ as follows: for every initial phase

$\begin{displaymath} (\tau ,\alpha )\in \lbrack t_{0},T]\times S_{r} \end{displaymath}$

(3.28)

and any partition $\pi$ of $[\tau ,T]$ with

$\begin{displaymath} \frac{\delta }{2}\leq t_{i+1}-t_{i}\leq \delta ,\quad i=1,2,\ldots ,N_{\pi }-1, \end{displaymath}$

(3.29)

the error bounds

$\begin{displaymath} \Vert p(t_{i})\Vert \leq E_{p}(\delta ),\quad i=0,1,\ldots ,N_{\pi }-1, \end{displaymath}$

(3.30)

and

$\begin{displaymath} \Vert q\Vert _{\infty }\leq E_{q}^{r} \end{displaymath}$

(3.31)

imply that the associated $\pi$ -trajectory $x_{\pi }$ of $% latex2html id marker 2132 $(\ref{de1})$$ satisfying $x_{\pi }(\tau )=\alpha$ also satisfies

$\begin{displaymath} \ell (x_{\pi }(T))\leq V(\tau ,\alpha )+\varepsilon \end{displaymath}$

(3.32)

and

$\begin{displaymath} x_{\pi }(t)\in S\quad \forall \,t\in \lbrack \tau ,T]. \end{displaymath}$

(3.33)

In other words, under the strengthened hypotheses on

, for a given tolerance $\varepsilon > 0$ , if one considers any sufficiently tight inner approximation

, there exists a robust feedback

effective universally for initial phases in $[t_0,T]\times S_r$ , such that for each such initial phase, the $\pi$ -trajectory produced (for the original, i.e. not enlarged dynamics) is $\varepsilon$ -optimal and remains in

. This is in contrast to Theorem 3.2, where the $% latex2html id marker 2154 $% \pi $$ -trajectory only remains $\varepsilon$ -near

under enlarged dynamics.

Further results,including a Hamilton-Jacobi characterization of the state constrained value, are to appear in [18].