CFD-based design optimization of ducted hydrokinetic turbines

Hydrokinetic turbines extract kinetic energy from moving water to generate renewable electricity, thus contributing to sustainable energy production and reducing reliance on fossil fuels. It has been hypothesized that a duct can accelerate and condition the fluid flow passing the turbine blades, improving the overall energy extraction efficiency. However, no substantial evidence has been provided so far for hydrokinetic turbines. To investigate this problem, we perform a CFD-based optimization study with a blade-resolved Reynolds-averaged Navier–Stokes (RANS) solver to explore the design of a ducted hydrokinetic turbine that maximizes the efficiency of energy extraction. A gradient-based optimization approach is utilized to effectively deal with the high-dimensional design space of the blade and duct geometry, with gradients being calculated through the adjoint method. The final design is re-evaluated through higher-fidelity unsteady RANS (URANS) simulations. Our optimized ducted turbine achieves an efficiency of about 54% over a range of operating conditions, higher than the typical 46% efficiency of unducted turbines.


Introduction
The increasing demand for renewable energy has motivated extensive research on hydrokinetic energy conversion systems that extract energy from natural riverine and oceanic flows.Various types of conversion systems have been investigated for decades, including horizontal-and vertical-axis turbines and oscillating hydrofoils [2,3,4].Horizontal-axis turbines have been studied the most because of the relatively mature technology [5,6,7,8,9].
A popular benchmark for horizontal-axis hydrokinetic turbines is the Bahaj model, which has been experimentally tested in a cavitation tunnel and a towing tank [1].The unducted Bahaj model generates power with an efficiency of about 46% (the ratio of generated power to the inflow power) at the optimal operating condition.This is the typical efficiency level of well-designed hydrokinetic turbines [10].To evaluate this efficiency, we can compare it to the well-known Betz's limit of 59.3% [11], which is derived based on the one-dimensional (1D) momentum theory in an unbounded flow domain.There is no general consensus on whether the Betz's limit should be considered as a hard upper bound on the efficiency of practical energy conversion systems in unbounded flow due to simplifications in the theory.However, it seems clear that further improvement can be sought regarding the current 46% efficiency of horizontal-axis hydrokinetic turbines.
One idea to improve the efficiency is to use a duct (also known as a shroud or diffuser) to accelerate the fluid flow passing the turbine blades, thus improving the efficiency of the device.Some researchers have incorporated the duct effect in the 1D momentum theory (or its extended version), with some of which predicting an efficiency well above the Betz's limit [12,13,14,15,16,17,18,19,20,21].In spite of the insight into the duct effects, the physics may be oversimplified (sometimes misrepresented) meaning the efficiency predicted by these models may not be achievable in practice.The complex turbine-duct interaction involves flow features, such as flow separation, that cannot be captured by the analytical models.These phenomena can significantly affect the mass flow through the duct and the system efficiency [19].
To account for complex turbine-duct interaction and more reliably evaluate ducted turbine performance, we must resort to computational fluid dynamics (CFD) simulations or experiments.Table 1 lists research efforts that used such approaches.Most studies were conducted for wind turbines, but a few were specific to hydrokinetic turbines.As shown in the table, var-Researchers Type Shape Method C P,Ab C P,Amax Re Aranake and Duraisamy [22] Wind Foil Numerical (RANS + BET) 1.57 0.85 ∼ 5 × 10 6 Venters et al. [23] Wind Foil Numerical (RANS + actuator disk) - Table 1: Previous publications on ducted wind and hydrokinetic turbines.For research evaluating the performance with blade swept area as the reference area, we provide the values of C P,A b and the converted values C P,Amax (which we calculate when necessary).
The Reynolds numbers are based on the maximum diameter of a whole device as defined in Eq. (2).BEMT is the blade element momentum theory, which is a body-force model combined with RANS to account for two-way coupling with the duct.BET is the blade element theory, which is combined with RANS to calculate separately the flow in an empty duct and turbine performance under such flow (i.e., a simpler version of RANS+BEMT with only one-way coupling).
ious duct shapes have been proposed and tested for wind turbines, with reported efficiencies ranging from 0.41 to 0.85, surpassing the Betz's limit (0.59).However, these results must be interpreted in the context of the limitations of the analyses.The CFD models used for evaluations include steady Reynolds-averaged Navier-Stokes (RANS) and unsteady RANS (URANS) solvers, where the turbine blades are modeled using a blade-resolved or a body-force approach.Within these approaches, steady RANS may have difficulties with flow separation along the duct surface in many designs, as well as in capturing transient wake flow patterns and turbine-duct interactions [34,35,36].In the body-force approach the actuator disk model measures the extracted power using the product of velocity and thrust at the blade section, which usually results in an over-prediction of the efficiency because only a fraction of the computed power can be converted to the actual (rotational) power.
In addition, the definitions of efficiency for ducted turbines are inconsistent in the studies listed in Table 1.The inflow power is defined with respect to either the blade swept area or the maximum projection area of the device (or duct).The efficiency based on the blade swept area, C P,A b , can be significantly higher than that based on the device area, C P,Amax , but C P,A b does not provide a fair comparison with the efficiency of an unducted turbine as explained later.In evaluating and comparing the performance of ducted turbine designs, we must use the same metric, so we convert all power coefficients using the maximum area as the reference-C P,Amax in Table 1.
Considering the above two points, there are significant caveats in the results listed in Table 1.The highest fidelity simulation in Table 1 (the blade-resolved URANS approach performed by Knight et al. [19]) predicts a C P,Amax of 45%, which does not show an advantage of using the duct.The experimental evaluations of Oka et al. [26] may be more credible, but they also suffer from uncertainties, such as measurement errors and proximity of the device to the floor, which causes blockage that affects the measured efficiency [37].Finally, most of the results obtained for wind turbines do not translate to hydrokinetic turbines.For example, to sustain higher loads in water, a wind turbine design with a large flange may not be feasible for a hydrokinetic turbine.Additionally, a hydrokinetic turbine blade requires a lower aspect ratio and larger sectional thickness to sustain the higher loads in water [38].
Another limitation of the research listed in Table 1 is that the duct designs were not optimized.Instead, these designs were generated by human intuition or a grid search in a low-dimensional design space.The only exception is the design by Aranake and Duraisamy [22], who performed gradient-based optimization to develop a ducted wind turbine design.However, they used low-fidelity blade element theory to model the ducted turbine performance without adequately taking turbine-duct interaction into account.An optimal ducted turbine requires numerical optimization that simultaneously considers the blade and duct geometry with detailed shape parametrization.This is challenging because of the high computational cost of CFD evaluations and the high-dimensional design space.Another challenge is selecting the appropriate CFD model in the optimization process.As mentioned earlier, steady RANS is relatively inexpensive but may lead to inaccuracies in predicting the performance of designs where boundary layer separation occurs in the duct.
In this paper, we perform CFD-based design optimization of a ducted hydrokinetic turbine.We use 21 parameters to control the shape of the duct (length and multiple sectional radii) and turbine blades (pitch and spanwise twist/chord distributions).We perform gradient-based optimization with gradients computed by a discrete adjoint method [39] coupled with steady RANS blade-resolved simulations.This effort builds on previous design optimizations of unducted wind turbine [40,41].Because of the potential inaccuracies of steady RANS for separated flow, the success of this approach hinges on whether our gradient-based optimization induces a design free of flow separation.This is, fortunately, indeed the case since designs with flow separation tend to be associated with lower efficiency, even when evaluated by the less accurate RANS solver (given enough grid resolution).Our optimized design is re-evaluated by a higher-fidelity URANS blade-resolved solver.The benefits of the duct are demonstrated upon a comparison with the unducted Bahaj turbine, optimized unducted turbine, and our baseline ducted turbines.We follow up with discussions to provide insights on the optimized geometry and the associated flow mechanisms that contribute to improved energy extraction efficiency.
The paper is organized as follows.In Section 2, we discuss the problem statement, including the description of the physical problem of turbine energy extraction and the setup of the optimization problem.Section 3 introduces methodology in CFD simulations and optimization process.The results of optimization and higher-fidelity re-evaluation are described in Section 4, where we discuss the optimized duct geometry and flow mechanisms.Finally, conclusions are provided in Section 5.The computations involved in this work are implemented in open-source codes OpenFOAM [42] and DAFoam [43].

Physical Problem
Consider a turbine operating in a uniform inflow U ∞ in an unbounded fluid domain, as shown in Figure 1.The turbine converts inflow power (energy) into rotational power, where the effectiveness of this conversion is characterized by the power coefficient, where P is the generated rotational power that is given by P = QΩ (torque Q times the rotational speed of the blades Ω). ρ is the fluid density and A is some reference area.For an unducted turbine, A can be chosen as either the blade swept area A b or the maximum projection area of the device A max , which are identical.For a ducted turbine, however, using the two values A max and A b as A leads to different C P 's, since A max is greater than A b .We argue that A max is the appropriate choice for ducted turbines in order to have a fair comparison of their performance with unducted turbines.The reason is that, with A = A max , we are essentially comparing the generated power when the inflow power is the same for unducted and ducted turbines.On the other hand, using A b for ducted turbines results in a larger value of C P (even above 1) that can be misleading when compared to the efficiency of unducted turbines (see examples in [31,32,33]).For the above reasons, we adopt A = A max for the evaluation of C P in this work, and we will hereafter simply write it as A, referring to the maximum device area for both ducted and unducted turbines.Given a turbine, its efficiency C P is in general a function of two other non-dimensional parameters, namely the tip-speed ratio (λ) and Reynolds number Re (based on the diameter of the device), defined as where R is the turbine blade radius, ν is the fluid kinematic viscosity.In this paper, we fix U ∞ = 1.4m/s, ν = 1 × 10 −6 m 2 /s, A = 1.853m 2 , and D max = (4/π)A = 1.536m, leading to Re ≈ 2 × 10 6 for both ducted and unducted turbines (see Figure 1).For Reynolds number of O(10 6 ), the flow is considered fully turbulent, and the dependence of C P on Re in this range is expected to be relatively weak.We will evaluate C P for a broad range of λ at this Reynolds number for both ducted and unducted turbines.

Optimization Problem
Our objective is to optimize a ducted turbine geometry to maximize its hydrodynamic efficiency C P at given U ∞ (= 1.4m/s) and Ω(= 17.5rad/s).This design process is applied to both ducted and unducted turbines for a fair performance comparison.In the following, we present the mathematical optimization problem for a ducted turbine, which is the more sophisticated case.The optimization for an unducted turbine can be conducted similarly but with a simpler setup that does not include the duct parameters and the tip clearance constraint.
The constrained optimization problem for a ducted turbine can be stated as where {θ i } 8 i=1 are the twist angles at 8 sections of the blade, controlling the blade root pitch and twist profile as shown in Figure 2a.The cross-sectional areas of 8 sections of the blade, normalized with respect to their baseline, are denoted as {b i /b B i } 8 i=1 .Modifying these variables leads to a change in the size of the blade section, but the sectional (foil) shape remains unchanged.Thus, b i /b B i gives the scaling factor for each section, as shown in Figure 2b.The variables {d j } 4 j=1 are the diameters at 4 sections along the duct as illustrated in Figure 3a, with d 3 being the throat section, where the rotor is installed.This section is located at 26.4% of the duct length, following a baseline design of the duct [19].The bound (3d) ensures that d 3 is always located at the throat in the optimization and that all duct diameters do not exceed the exit diameter D exit = D max = (4/π)A, as depicted in Figure 3a.A large exit area reduces flow velocity at the exit through the streamtube expansion, which in turn increases the flow momentum extraction at the blades.
Our setup prevents the optimizer from unrealistically increasing the size of the exit section and ensures a fair comparison between different designs, as discussed in Sec. 1.The variable l, representing the duct length (with l B the baseline value), governs the scaling of the duct with respect to the fixed point at the throat (Figure 3b).The constraint (3f) keeps the tip gap ratio as a constant of 9% throughout the optimization process, consistent with the baseline design.It leads to the blade radius R changing with the variation of throat diameter d 3 .
The design variable bounds are set up in a trial-and-error manner to make sure that the optimized variables do not reach the bounds on the optimized design.The full turbine geometry morphs smoothly throughout the optimization.This continuous morphing is ensured through the Free-Form Deformation method which will be discussed later in Section 3.5.1.
Since we use a gradient-based optimization method (3), local optima potentially exist in the design space.Hence, we adopt the multistart strategy, using two different baseline designs (hereafter named baseline design A and B) with drastically different performances.Both baseline designs adopt the same thin-wall curved-shaped duct as in [19].The two baseline designs differ in the blade geometry (see Figure 4).Design A adopts the original Bahaj model with a 0.44m radius and a 20 • root pitch.Design B adopts the Bahaj model with a 0.44m radius, a 45 • root pitch, and a modified twist profile as in [19].This modified twist profile is obtained by matching the local angle of  attack of each blade section in the duct to that of the unducted turbine counterpart through an iterative procedure.When evaluating with the unsteady RANS solver, baseline designs A and B yield C P = 28% and C P = 45%, respectively, at λ = 5.5.The hub is not included in the model to simplify the geometry parametrization using the Free-Form Deformation method.The optimization problem is summarized in Table 2.

Methodology
This section describes the methodology in optimization and CFD evaluations.An overall flowchart is shown in Figure 5.The whole process involves  the optimization and the re-evaluation of the optimized design using higherfidelity simulations.The optimization and high-fidelity re-evaluation use DAFoam [43] and OpenFOAM [42], respectively.In what follows, we will describe each component of the methodology in subsections, as outlined in Figure 5.To provide a self-contained but easy-to-follow paper for readers, we put additional details in the appendix and keep the main paper as concise as possible.We start from CFD models involved in both the optimization and re-evaluations and then follow up with other components in the optimization framework.

CFD models
The governing equations for the flow field around the turbine (Figure 1) are the Navier-Stokes equations where U is the flow velocity and p is the pressure.We apply velocity inlet and pressure outlet boundary conditions, and no-slip boundary conditions on the blade and duct surfaces.We consider the Reynolds-averaged Navier-Stokes (RANS) equations with grids only resolving the averaged components of the flow.One can apply the Reynolds decomposition U = U + u (and the same for pressure) to Eq. ( 4).U denotes the averaged velocity in a time window or by an ensemble and u represents the zero-mean turbulent fluctuation.This leads to the unsteady RANS equation: where u u is the Reynolds stresses that need to be approximated by turbulence models.In this work, we use the k − ω SST turbulence model together with the automatic near-wall treatment (see Appendix A for details on both).
The rotating blades are handled in simulations by two blade-resolved approaches: the Multiple Reference Frames (MRF) and the rotating-sliding mesh approach (RS).The former is used for steady RANS solutions (i.e., a solution with time-derivative terms set to zero) with multiple different reference frames, while the latter is used directly in the unsteady solution of Eq. ( 5).The MRF is used in optimization.The RS is defined as the higherfidelity approach and used for optimized result re-evaluations (see Figure 5).We include a detailed introduction of the two approaches in the following sections.

Multiple Reference Frames (MRF) Method
The MRF method is an efficient method for modeling turbomachinery flow.In the MRF method, the computational mesh stays stationary, and the rotational effect is handled through a rotational reference frame.In particular, the fluid domain is separated into two regions: a rotational region surrounding the turbine blades with a blade-fixed reference frame, and the remaining stationary region with an inertial reference frame, as shown in Figure 6.In both regions, the flow is considered steady with respect to the corresponding reference frame, so only steady RANS equations need to be solved.
To be more specific, in the rotational region, the blades are stationary and experience a steady inflow.The flow velocity in the blade-fixed reference frame can be expressed by where U is the velocity in the inertial reference frame, Ω is the rotation vector of the turbine blades, and r is the distance vector from the axis of rotation to the point of interest (position vector).The steady RANS equations in the rotational region need to be established with the blade-fixed reference frame, which requires further formulations of both Eq. ( 5) and the k − ω SST model equations.Although the implementation in OpenFOAM/DAFoam solves this complete set of equations, here we only present the rotational-region formulation regarding Eq. ( 4) to provide the key insights of the method.Combining Eq. ( 4) and Eq. ( 6), we obtain (see Appendix B for a detailed derivation) The steady equations solved in the rotational region are Eq.( 7), with ∂U R /∂t = 0. Therefore, in the MRF method, steady versions of Eq. ( 4) and Eq. ( 7) are solved in stationary and rotating regions.Solving these steady equations can be done using the SIMPLE algorithm [44] implemented as simpleFoam in OpenFOAM.
Although the RANS-MRF method provides an efficient numerical solution for the turbine problem (i.e., only two steady RANS equations need to be solved), its accuracy can be compromised because of two issues.First, the rotational region and stationary region are usually chosen in a subjective manner.There is no guarantee that the rotational region covers all the flow features resulting from the rotating and discrete blades.Any mismatch between the choice of the region and the nature of the flow can lead to errors at the interface and thus in the final results.Secondly, for many designs, the flow can be unsteady in nature, especially when flow separation occurs from the duct and/or blade surfaces.Assuming a steady state solution, as in the RANS-MRF method, can lead to significant errors for this type of unsteady flow.As a result, the RANS-MRF method is considered a lower-fidelity model in the context of this paper.

Rotating-sliding mesh approach
The rotating-sliding mesh (RS) allows for the direct simulation of the unsteady RANS (URANS) equations (Eq.( 5)) with mesh domains that exhibit relative motion.This is needed for modeling rotating geometries.The underlying idea of this method is to allow a region of the computational mesh to rotate with the turbine blades, as illustrated in Figure 7.The rotating mesh also creates a technical problem that the mesh at the rotating/non-rotating interface becomes non-conformal, i.e., the nodes at two sides of the interface do not match up.The data transfer across the interface, therefore, needs to be handled by a special interpolation method involving a "supermesh" [45], as described in Appendix C. Coupling URANS with the RS is, in principle, much more accurate than the RANS-MRF method as it captures the unsteady nature of the flow.This can be critical in simulating flow around a ducted turbine because the possible flow separation from the duct surface can be captured better.However, in the URANS-RS, a small time step is needed to resolve the blade rotation, so a long simulation time is required for the solution to reach a quasi-steady state.The computational cost of the URANS-RS is hence much higher than that of the RANS-MRF method.In this paper, the URANS-RS is considered a higher-fidelity method and is only used in re-evaluating the performance of optimized designs.

Mesh Configuration
The unstructured computational mesh is generated using the OpenFOAM meshing tool snappyHexmesh.A mesh overview is shown in Figure 8.The size of the computational domain is 10.4D × 10.4D × 23.7L, where D = (4/π)A = 1.536m is the maximum diameter of the duct, and L = 2.107m is the length of the duct that is taken from the baseline design.This domain size is sufficient to avoid a blockage effect upon tests.
To model flow near the turbine and immediately downstream with higher accuracy, we use a refinement region of 2D × 2D × 2.5L around the turbine.Within the refinement region, we first apply a level-4 refinement, i.e., each cell of the original mesh is divided into (2) 4 cells in each direction.Then we add around the duct and blade surfaces prism layers that contain further continuously refined cells toward the surfaces (2 and 3 layers are used for the former and latter, both with the expansion ratio of 1.1).The prism layer provides better resolution for boundary layers and is critical for us to obtain well-convergent results in the grid sensitivity study shown later in Section 4. In this work, we use three grid resolutions, M0, M1, and M2, with an increasing number of cells, i.e., further refinement from M0 to M2.The coarsest grid M0 is used in the RANS-MRF in the optimization process and has 2-3 million cells (the exact number depends on turbine geometry and re-mesh procedure in optimization).In M1 and M2, the full mesh region (including background mesh and refinement region) is uniformly refined in each direction by the factors of about 1.3 and 1.6, respectively, leading to 4-5 million cells for M1 and 7-8 million cells for M2.All grids M0, M1, and M2 are used in the URANS-RS re-evaluation, including the grid sensitivity study.

Optimization
We use Sequential Quadratic Programming (SQP) [46], implemented in SNOPT [47], to solve the optimization problem.One of the challenging tasks is to obtain the gradient of the objective function ∇C P with respect to all design variables.Sections 3.5.1 and 3.5.2discuss respectively two components (see Figure 5) involved in the gradient computation: (1) geometry parametrization and mesh deformation; (2) adjoint method.The gradient information is then used in the SQP algorithm to obtain the next design  points.We iterate the procedures until satisfying convergence criteria or further design improvements are not achievable.

Geometry parametrization via FFD method
We need to parametrize the geometry to deform the surface mesh.In this work, we use the Free-Form Deformation (FFD) method [48] for the geometry parametrization, implemented in the package pyGeo [49,50].The principle of the FFD method is to enclose the surface mesh nodes in an FFD box with a specified number of control points (also known as FFD points).The FFD points are analytically connected to the enclosed surface nodes using tri-variate B-splines.More details are presented in Appendix D. Controlling the FFD points enables smooth deformation of the enclosed geometry.Figure 9 shows two examples of geometry deformation controlled by the FFD method in two and three dimensions.
Figure 10 shows an overview of the FFD setup for our ducted turbine.Two levels of FFD boxes are used, with one parent box (black) enclosing all   The child FFD box for the duct contains 112 FFD points placed on seven sections, but overall only one variable is defined to control all FFD point to change the duct length l.When changing the duct length, all duct FFD points move in the axial direction with perturbations proportional to their distances from the throat.This movement is to ensure that the throat is consistently located at 26.4% of the overall duct length.Note that seven sections are not necessary.This choice is mainly for convenience during setup.
The parent FFD box handles the constraint (3f) on the tip-gap ratio and the condition D exit = (4/π)A = 1.536m.Twenty-eight FFD points are placed on seven sections in the parent box, in which the FFD points for the last three sections are closely packed horizontally and remain stationary throughout the optimization, such that D exit = (4/π)A is guaranteed.The 16 FFD points on the first 4 sections from the duct inlet are used to control the duct diameters, i.e., design variables d i .As these FFD points move radially, the child FFD boxes (enclosed in the parent FFD box) deform and move the embedded surface geometry accordingly.Therefore, the constraint (3f) is automatically satisfied since the blade expands/contracts proportionally to the duct throat.This complex FFD setup for ducted turbines, we believe, is novel in the engineering application of the FFD method.

Adjoint method for derivative computation
To compute the derivative of an objective function (in this case, the power coefficient C P ) with respect to design variables, the adjoint method [46] is used.In order to clearly explain the adjoint method, we first introduce notations as follows: Let x ≡ {{θ i , b i } 8 i=1 , {d j } 4 j=1 , l} ∈ R Nx with N x = 21 as the design variables.Let s ∈ R M be the state variables in the solution of the RANS-MRF equation.Here M ∼ O(10 7 ) includes three velocities and pressure at each cell in the computational grid.Our goal is to compute dC P /dx ∈ R 21 .If a finite-difference method is used to compute the derivative, one needs at least 22 CFD simulations for derivative computation, even with the lowest-order approximation.This is computationally prohibitive for our application.
For the adjoint method to compute dC P /dx, the first step is to write the function C P (x, s(x)), and express its total derivative with respect to x as where ∂C P /∂x should be considered as the change of power coefficient C P as the design variables (i.e.geometry) are varied, with flow solution s remaining unchanged.∂C P /∂s is the change of C P as the flow solution s changes with a fixed turbine geometry.These partial derivatives are relatively easy to compute, with more details presented in Appendix E.
The term that is difficult to compute in Eq. ( 8) is ds/dx.To compute it, one needs to further involve the RANS-MRF state equations in terms of their discretized residual form R(x, s(x)) = 0.Here R(x, s(x)) ∈ R M considering the same number of equations as the number of unknowns in s.Since R(x, s(x)) should remain zero with a change of x (if the flow solution is correctly obtained), we have Direct solution of Eq. ( 9) gives It is worthwhile to discuss the computational cost associated with Eq. ( 10) at this point.The matrix multiplication in Eq. ( 10) leads to a computational complexity of O(M 2 N x ) that is very expensive since M ∼ O(10 7 ) and N x is also large.This has to be added by the cost to invert a M × M matrix, which is, in general, more expensive.Even if one uses some iterative solver for linear systems to solve Eq. ( 9), the procedure needs to be repeated for N x times since ds/dx (as well as the RHS) has N x columns.The computation is, therefore, also very expensive.
On the other hand, the computational cost can be significantly reduced by simply substituting Eq. (10) to Eq. ( 8) and considering a re-grouping of the multiplications: Instead of computing Eq. ( 10), we first compute the multiplication grouped in the parenthesis in Eq. ( 11).This computation can be done by solving the so-called adjoint equation (the adjoint is equivalent to the transpose of a real matrix in our case) whose solution transpose provides as the parenthesis term in Eq. (11).The solution of Eq. ( 12) involves solving a linear system only once, instead of N x times as needed for Eq. ( 9), and hence is much less expensive (also compared to the direct computation of Eq. ( 10)).The computational cost to solve Eq. ( 12) is generally similar to the RANS-MRF computation.Therefore, in each iteration of the optimization, the computational cost is in the same order as one RANS-MRF solution.The only remaining component is the calculation of partial derivatives ∂R/∂s in Eq. ( 12), which can be found in Appendix E with other derivatives mentioned above.

Volume mesh deformation
When marching to the next design point, the turbine geometry is deformed.The entire computational volume mesh is deformed accordingly.
The volume mesh deformation is computed based on the analytic inversedistance weighting method [51], implemented in the IDWarp package [52].Given a 2D surface, for example a blade surface, with N surface mesh nodes, the geometry deformation leads to the movement of each node.We assign two quantities (M i , b i ) for each node with i = 1, 2, ..., N , where b i is the translation distance of the node and M i is the rotation matrix such that n new i = M i n old i with n new i and n old i the normal vectors at the node.In particular, both n i 's are computed by a weighted average of the normal vectors for all surrounding cell faces of the node.After (M i , b i ) are obtained for i = 1, 2, ..., N , we can compute the deformation of any volume mesh by summing the contribution from each surface node, i.e., ∆r = N i=1 w i (M i r + b i − r) with r any volume node and ∆r its movement.The weighting factor w i has the empirical form [51,52] that grows in a polynomial form with the inverse distance between the volume and surface nodes.Figure 11 shows the deformation of the computational mesh during a ducted turbine optimization as an example.

Results and Discussion
In this section, we present the results of optimization (3), followed by re-evaluation using the higher-fidelity URANS solver, as well as discussions on the optimized geometry and flow mechanism.Before showing the optimization results, we present two additional studies conducted in our work.
The first is the validation of the RANS-MRF and the URANS-RS solvers with experimental data.Since we are not aware of systematic measurements of the performance of ducted hydrokinetic turbines, we use the experimental results of the unducted Bahaj turbine for validation.Figure 12 shows the power coefficient C P and thrust coefficient C T (the axial force on the turbine normalized by the momentum of the inflow) for the Bahaj turbine at a range of λ, obtained from the RANS-MRF and the URANS-RS, in comparison with the experimental results [1], all at experimental Reynolds number Re = 1 × 10 6 .The RANS-MRF solver is run with a second-order numerical scheme for the convection term in the RANS equations, which will be changed in the optimization process as described later in detail.From Figure 12, it is clear that for the unducted Bahaj turbine, both solvers predict similar results mostly consistent with the experimental data.The URANS solver seems more accurate in evaluating C T for all λ and C P at lower λ (e.g., the value on which our optimization is based).The second is a grid-search study with 5 design parameters using the RANS-MRF and coarse-grid (even coarser than M0) flow solvers that we conducted before the gradient-based optimization.This search, as detailed in Appendix F, does not provide a successful design of the ducted hydrokinetic turbine with improved efficiency (compared to the unducted Bahaj model).This failure is very likely due to the low-dimensional parameter space, which is insufficient to explore effective designs.It also implies that designs with more parameters using a gradient-based method, as we present below, are necessary for designing complex geometries such as ducted turbines.

Optimization and Re-evaluation
We solve the optimization problem (3) using methods described in Section 3.5.In the RANS-MRF solver, a first-order numerical scheme is used for computing the convective term (i.e., to construct the flux in cell faces).We note that the first-order scheme is more dissipative than the normallyused second-order scheme, but the former is critical to obtain convergent flow solutions for many duct designs, especially those associated with flow separation.Specifically, upon extensive tests, we find that the second-order scheme shows fluctuating flow solutions in many cases, which in turn affects the accuracy of the adjoint method, preventing an accurate gradient computation.On the other hand, while the first-order scheme may provide less accurate solutions for cases with separated flow, the obtained C P is usually low for these cases and the optimization leads to designs with no flow separation and with improved efficiency.In each RANS-MRF simulation, we consider the solution converged as the residuals stop dropping, which in general occurs when residuals of momentum equation reach O(10 −4 ∼ 10 −5 ).
Figure 13 shows the change of C P as the optimization progresses, starting from both baseline designs A and B (hereafter optimizations A and B).We first notice that the two starting points yield C P = 25% and 41%, respectively, evaluated by the RANS-MRF.Both values are lower than the counterparts (28% and 45%) reported earlier from the URANS-RS.We see that optimization of both baseline designs leads to a fast increase of C P at the beginning until both C P values (almost) plateau.The small bumps on both curves in Figure 13 correspond to the restarting/re-meshing procedure.This remeshing step is necessary because geometry deformations that are too large lead to mesh quality degradation and hence deteriorate the quality of the flow and adjoint solutions.A manual restarting/re-meshing procedure improves the optimization behavior, leading to additional increases of C P .Both optimizations A and B are stopped when the C P value plateaus even with further restarting/re-meshing.In practice, we find that this convergent situation corresponds to the SNOPT optimality metric [47] of approximately 10 −1.7 , which is consistent with cases in an unducted wind turbine optimization [40].For such optimality conditions, although ∇C P does not completely vanish, the benefit of further optimizing the turbine is compromised by the mesh deformation so that some practical optimal points are reached.The two optimized designs yield similar values of C P , namely 0.4822 from A and 0.4782 from B achieved respectively at λ = 6.39 and λ = 6.18 (since the blade radius R is optimized, which affects λ).
Since the RANS-MRF is the low-fidelity solver (due to issues mentioned in Section 3.2 and the first-order convection scheme), we re-evaluate the optimized designs A and B using the high-fidelity URANS-RS.The obtained values from the URANS-RS on the M0 grid are added to Figure 13, which yield 54.6% and 52.9% for the optimized designs A and B. Here, the difference between two solvers for ducted turbines (due to the complexity of the flow) is remarkably larger than that for unducted turbines as shown in Figure 12.
A grid sensitivity study is also conducted, which evaluates C P for the optimized designs A and B using the URANS-RS on meshes M0, M1, and M2 with increasing resolution.In these URANS solutions, an adaptive time step is used (with an average time step of 1 × 10 −5 seconds, i.e., about 0.01 • turbine rotation for 1 time step) with total simulation times of 15 seconds to reach the quasi-steady state.With the NSF Stampede2 cluster, the simulations using the M2 grid take about 240 hours on 576 CPUs.The results are shown in Table 3, which indicates that C P values vary only by O(1%) (in terms of the absolute value), i.e., they are not very sensitive to the large range of variation of grid resolutions.Based on results from M2, the two final designs yield similar C P ≈ 54% that is much higher than 46% of the unducted Bahaj model (or standard unducted turbines).Table 3: URANS-RS re-evaluation of the optimized design A (left) and B (right) with grid sensitivity studies, including cell numbers, C P , as well as y + values on both duct and blades.These y + values are in the applicable range of the automatic wall treatment (Appendix A).
We further evaluate C P of the two optimized designs at a range of λ using the URANS-RS with the M0 grid.The results are shown in Figure 14 together with C P of the unducted Bahaj model, as well as the optimized unducted turbine using the same setup.We note that the unducted turbine is optimized for fixed Ω = 21 rad/s, corresponding to λ = 6.We see that the optimized ducted turbine designs not only work well for the designed value of Ω but also perform with high efficiency for a large range of λ.Moreover, the maximum C P for each design is in fact not achieved at the designed λ (marked by stars in the figure) but at some larger value of λ.Considering Figure 14, the maximum C P for the two designs are 56% and 54% achieved at λ = 6.94 and 6.99, respectively.
We finally examine the geometries of the optimized designs A and B. Figure 15 shows the optimized duct shapes of the two designs, laid on top of the baseline duct design [19].Overall, we see that the optimized designs have shorter duct lengths and enlarged throat area.The optimized design A has a duct with length of 0.861m (59.1% reduction compared to the baseline ducted turbine) and a throat radius of 0.559m (16.5% increase).The optimized design B has a duct with length of 1.350m (35.9% reduction compared to the baseline ducted turbine) and a throat radius of 0.554m (15.5% increase).The twist and chord length profiles of the optimized/baseline designs are shown in Figure 16.We see that the chord lengths of the optimized designs do not vary much from the baseline, but significant changes occur in the twist profile through the optimization.Moreover, starting from drastically different twist profiles in baseline designs A and B, the two optimized designs converge to very similar twist distributions, especially for r/R > 0.35, where most torque is generated.Different geometries of the optimized designs A and B may indicate that they locate on two local optima in the design parameter space.Given the comparable performance despite the different duct lengths, we conclude that the blade twists and duct throat areas are the driving design parameters.

Analysis of Flow Mechanism
In this section, we analyze the flow fields of unducted turbines and baseline/optimized ducted turbines in order to understand the major flow mechanism leading to the improvement of performance.The optimized design A is used as an illustration.We first plot in Figure 17 variations of some relevant performance metrics together with the variation of C P (first row) in the first 22 iterations of the optimization process (before the first restarting/remeshing).These metrics include the flow rate J passing the turbine blades and the thrust coefficient C T = T /(0.5ρU 2 ∞ A) on the duct and blades, shown respectively in the second, third, and fourth rows of Figure 17.
We also divide the 22 iterations of the optimization process into three stages I, II, and III, respectively: stages with the fast growth of C P (marked by '+' in the figure), slow growth of C P ( * ), and plateau of C P (×).
In stage I, while the duct C T remains almost unchanged, the blade C T increases rapidly.This is the most favorable situation to improve C P since clearly more and more loading from the total is distributed on the blades.In stage II, this favorable variation of C T cannot be maintained (i.e., its potential has been exhausted in stage I), and the opposite trend is observed with increased duct C T and decreased blade C T .The further (slow) increase of C P in stage II, therefore, must be associated with a different mechanism that is perhaps the more effective transition from blade loading to rotational motion (or torque).Both duct and blade C T become unchanged in stage III as C P plateaus.The overall increase of C T in the whole process is 14% and the C P increase is 87%.This is another favorable feature of the current optimization since it would be much more demanding for supporting structures with a much higher C T .Finally, the flow rate J is highly correlated with C P in the whole optimization process and increases constantly until its simultaneous plateau with C P .
The above analysis motivates us to study further the flow rate metric, which remains consistent with the trend of C P in the optimization process.
In Figure 18, we show flow visualizations for the optimized unducted turbine, baseline turbine A, and optimized turbine A, obtained in the quasi-steady solution from the URANS solver.To facilitate a fair comparison, we only show the streamlines in the flow tube that passes the turbine blades.Since the inflow velocity is fixed at 1.4m/s for all cases, the flow rate in the tube is proportional to the area of the tube at the inlet.
From Figure 18, it is clear that the optimized design corresponds to the case with the largest flow tube inlet area.Physically, this indicates that a well-designed duct draws a larger volume of water (compared to unducted and baseline turbines) into the throat, which is accompanied by a higher flow rate (2.25m 3 /s compared to 1.80m 3 /s and 1.55m 3 /s for the other two cases) across the blades.This metric of flow rate is the most effective indicator of the ducted turbine performance, instead of the flow speed at the throat.As an example, the baseline design A is associated with an accelerated flow speed at the throat but not improved efficiency.This analysis explains the  enhanced performance of the optimized ducted turbine through the improved flow conditioning provided by the duct, confirming the long-existing hypothesis in the field of hydrokinetic turbines.

Conclusions
In this paper, we conduct gradient-based design optimization of ducted hydrokinetic turbines using CFD and the adjoint method.Two baseline designs with drastically different performances are chosen as starting points of the optimization.The resultant designs for both cases yield similar performance with C P ≈ 54% when evaluated by the high-fidelity URANS solver.Both designs capture similar critical geometrical features in terms of the duct throat area and blade twist profile.This value of C P is 8% higher than standard unducted turbines, including the Bahaj model.We further demonstrate that the optimized designs not only achieve high C P at the design rotational speed, but also yield high performance over a wide range of rotating speeds and thus λ.Finally, we study the flow mechanism associated with performance improvement and show that C P among different designs is correlated to the flow rates passing the turbine blades.The optimized design corresponds to the case with a maximum flow rate due to the suction of a well-designed duct.
The current work demonstrates the great potential of gradient-based optimization with the adjoint method in designing geometrically complex renewable energy devices such as ducted turbines.Nevertheless, the current optimized design needs further modifications for model tests and real-world applications.Two major issues of the current design are: (1) it has a thinwall-shaped duct that is difficult to manufacture; (2) it does not include a hub that is necessary for installation.Both issues result from the limitation of the FFD method for geometry parametrization that one needs to improve for better designs.We are now working on the Engineering Sketch Pad (ESP) [53] for geometry parametrization, which can, in principle, overcome the above two issues.With the ESP method replacing the FFD method, we expect our next-round design to result in some geometry that is ready for a model test in the towing tank at the University of Michigan.tion grant number #1548562 and also by the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.flow prediction in the near-wall region is insensitive and robust for y + from O(0.1) to O(100), and the automatic wall treatment has been practically applied for cases with the first cell up to the range in the logarithmic layer, i.e., y + < 300.

Appendix B. Derivation of equations in the rotating reference frame
We start by stating transformations of velocity and acceleration in inertial and rotating reference frames where variables with subscript R are evaluated in the rotating reference frame.Here we consider the case dΩ/dt = 0 that is consistent with our application.Derivations of Eq. (B.2) can be found in standard textbooks on dynamics, such as [62].Substituting Eq. (B.1) into Eq.(4a) and making use of the fact that divergence of a curl of a vector is zero, we obtain the continuity equation in rotating reference frame ∇ • U R = 0.In order to derive momentum equations in the rotating reference frame, we consider Eq. (4b) with material derivatives replacing the unsteady and convection terms, to which we can then substitute Eq. (B.2) to obtain The viscous term ∇ • (ν∇U ) can be transformed to ∇ • (ν∇U R ) using the fact that ∇ • [ν∇ (Ω × r)] = 0. We then expand the material derivative in Eq. (B.3) to obtain the momentum equations in the rotating reference frame: While Eq. (B.4) can be considered the final equation in a rotating reference frame, it can be simplified further for easier implementation in the finite volume method.For this purpose, we re-arrange the convection term as where the following equality is used for the third line The final equations in the rotating reference frame therefore yield which is Eq. ( 7) in the main paper.For a steady solution of Eq. (B.7), we can consider U as the unknowns that are consistent with the outer stationary region and use one unified code for the solution with some minor modifications (in terms of forcing Ω × U and the U R term leading to a correction in computing the cell face flux) for the rotating region.
the cell face flux on the target side can be computed by interpolation between the supermesh and target mesh.
In general, the interpolation via supermesh involves a Galerkin projection method described in detail in [45].Here we provide a simple example to illustrate the gist of the approach, as sketched in  in terms of derivative, can be analytically expressed as ∂X(u, v, w) ∂P i,j,k = B i,mu (u)B j,mv (v)B k,mw (w).(D.4) Since X ∈ R 3 and P ∈ R 3 , the derivative on the LHS of Eq. (D.4) requires further clarification.It essentially means that the three piecewise derivatives of one R 3 vector with respect to another R 3 vector are equal with one another and also equal to the RHS.In our case, linked FFD points are used as one degree of freedom (DoF).The derivative of the geometry coordinates with respect to the particular DoF motion can be computed by summing the contribution of each FFD point using Eq.(D.4).

Appendix E. Partial Derivative Computation in the Adjoint Method
This section presents the underlying principle of how the partial derivatives in the adjoint method can be computed.All these derivatives are computed by backward automatic differentiation [46] in DAFoam with graph coloring method for acceleration [64].
Four types of partial derivatives are involved: For the former two, we consider P = Ω r×pdS, where p is the pressure, r is the distance to the rotating axis, the integration is over the blade surface with differential element dS.It is clear that C P depends on both the state variable s (i.e., pressure p) and design variable x (i.e., the integration surface).The derivative ∂C P /∂s can be simply computed by perturbing the pressure field in the flow solution.For computing ∂C P /∂x, we write P = Ω i r i × p i ∆S i which is in discrete form.A perturbation in x results in perturbations of the FFD points, which deforms the geometry.This, in turn, leads to the surface mesh deformation that affects the r i and ∆S i .The partial derivative ∂C P /∂x can therefore be computed using the chain rule to connect all processes.
For the latter two, we consider the discretized governing equation in residual form, i.e.R = 0. Perturbation of s changes the velocity field, pressure field, and the velocity flux (at cell faces) so that R is perturbed.The derivative ∂R/∂s can then be computed correspondingly.Perturbation of x leads to the deformation of all computational mesh in the full fluid domain (discussed in Section 3.5.3).The construction of velocity flux (or more precisely, the coefficients in the algebraic governing equation) is thus affected, leading to the computation of ∂R/∂x correspondingly.

Appendix F. Discrete grid-search in a lower-dimensional design space
This section describes our effort in a brute-force grid search of a lowerdimensional design space.Taking the baseline duct design as a reference [19], we first create 5 sets of duct configurations, each set representing one type of duct with 5 varying designs each.As sketched in Figure F.20, the 5 sets are: (1) varying outlet area from the baseline design; (2) varying outlet area from the baseline design with throat section as the duct inlet; (3) varying both inlet and outlet areas from the baseline design with a finite-thickness duct (outer surface simply as a straight line); (4) varying outlet area from the baseline design with a flange (inspired by Ohya and Karasudani [25]); (5) varying inlet area from the baseline design.In total, these represent 25 duct designs, with the ratio of maximum and minimum duct areas ranging from 1.25 to 2.75.For each of these duct designs, other design variables considered are listed below with discrete values: The RANS-MRF solver is used for each design to evaluate its performance on a coarse grid with 1-2 million cells (even coarser than M0).In total, about 450 cases are run with results of their efficiency C P shown in Figure F.21. Since it is difficult to visualize results in the design space of more than three dimensions, we simply plot all results as functions of λ, i.e., for each λ, a large number of C p values resulting from varying other design variables are stacked.While it is not our goal to further distinguish designs at each λ, it is clear that the maximum C p among all O(450) cases is only 38% evaluated by the RANS-MRF.We pick up a few designs with relatively high C p and re-evaluate their performances using the higher-fidelity URANS solver on the  M0 grid, with results also shown in Figure F.21.We see that the maximum efficiency computed by the URANS is 45%, which happens to correspond to our baseline design B (in terms of both the duct and blade geometry) but is still lower than 46% from the unducted Bahaj model.Therefore, a simple brute-force grid search, as performed here, does not provide any duct turbine design with higher efficiency than that of standard unducted turbines.

Figure 1 :
Figure 1: Physical problem of (a) an unducted turbine and (b) a ducted turbine, with the same device area A, subject to inflow U ∞ in an unbounded domain.

Figure 2 :
Figure 2: Design variables of blades.Gray and blue colors represent original and modified designs, respectively.(a) Blade twist angles ({θ i } 8 i=1 ).(b) Blade section areas ({b i } 8 i=1 ), where we also include the baseline {b B i } 8 i=1 .Blade sections can be contracted (0 < b i /b B i < 1) or expanded (b i /b B i > 1).
(a) Duct sectional radii change.Expansion and contraction (b) Duct length change.Elongation and shortening

Figure 3 :
Figure 3: Design variables of a duct.Gray and blue colors represent original and modified designs, respectively.(a) Duct radial expansion (left) and contraction (right), controlled by sectional diameter variables {d i } 4 i=1 , which are bounded above by D exit and below by the variable d 3 at the throat.The blade radius R is scaled to maintain the tip gap ratio with 2R/d 3 = 0.91.(b) Duct elongation (left) and shortening (right), controlled by the design variable l, with respectively l/l B ≥ 1 and 0 ≤ l/l B ≤ 1.

Figure 4 :
Figure 4: Baseline designs of a ducted turbine.(a) Different views of baseline ducted turbine; (b) Twist distributions of baseline designs A and B.

Figure 5 :
Figure 5: Diagram of the overall process, which includes optimization and re-evaluation of the optimal designs.

Figure 6 :
Figure 6: MRF method illustration.The computational domain is split into stationary and rotating (shaded blue) regions.The inertial and rotating reference frames are denoted by gray and yellow axes, respectively.
Zoomed-in side view

Figure 8 :
Figure 8: Mesh configuration for a ducted turbine, with (a) front view, (b) side view, and (c) zoomed-in side view to show the refinement region.
(a) 2D example of the FFD method (b) 3D example of the FFD method
duct and blade geometries and two children boxes (red and blue) enclosing the duct and blades.The 21 design variables in Section 2.2 can now be represented by 21 degrees of freedom (DoF) associated with the FFD points.The child FFD box for an individual blade has 32 FFD points placed on 8 sections.Twist variables rotate the four FFD points about the reference axis located at the quarter chord line.Scale variables scale the cross-section by moving the four control points to expand or contract simultaneously.The FFD points across different blades are linked to ensure the same deformation for all blades.

Figure 10 :
Figure 10: Geometry parametrization of a ducted turbine through the FFD method.Top: different views of the overall FFD setup, where the parental FFD box (black) controls the radial scales of both duct and blades, the child duct FFD box (red) controls the length of the duct, and the child blade (blue) FFD box controls the pitch/twist angles and sectional areas of the blades.Bottom: Closer views of the FFD setup for the blades.

Figure 11 :
Figure 11: Computational mesh deformation during the geometry deformation of a ducted turbine.

Figure 13 :
Figure 13: History of C P for optimizations A (green) and B (blue).The URANS evaluation of C P for optimized designs A and B is shown by diamond symbols.The restarting points are denoted as asterisks( * ).

Figure 14 :
Figure 14: C P as a function of λ for unducted Bahaj turbine (gray), optimized unducted turbine (dark gray), optimized ducted turbine A (green), and optimized ducted turbine B (blue).The design points (at a given rotating speed) are marked by stars.

Figure 16 :
Figure 16: Distributions of twists (top) and chord lengths (bottom) for baseline designs A (dashed green) and B (dashed blue), optimized designs A (solid green) and B (solid blue).

Figure 17 :
Figure 17: Variations of C P (first row), flow rate J (second row), duct C T (third row), and blade C T (fourth row) in the first 22 iterations of the optimization.Three stages of the fast growth of C P , slow growth of C P , and plateau of C P are marked by plus sign(+), asterisk( * ), and cross(×).
(a) Optimized unducted turbine (b) Baseline design A (c) Optimized design A

Figure 18 :
Figure 18: Streamlines in a flow tube passing the blades of (a) the optimized unducted turbine, (b) baseline design A, and (c) optimized design A.
Figure C.19.With donor mesh T D and target mesh T T that are non-conformal (Figure C.19(a)), our goal is to construct the supermesh T S that serves as a common ground for interpolation.The supermesh T S has to satisfy the properties: 1. Nodes on T S contains all the nodes on T D and T T (and intersections of their edges).2. For every (face cell) element in T S , the intersection of it with any element of T D or T T must either be zero or the whole element.The connection between T D and T S is shown in Figure C.19(b), with the shaded area denoting cell faces of T D .Since each cell face of T D is now perfectly split into triangle cell faces of T S , we can assign properties to the center of the volume supermesh (associated with triangle faces) according to the area ratio of the triangles.In other words, the property at T D is distributed into fractions according to the area fraction of triangles in T S .The properties at T S can then be used for interpolation with that at the target cell T T since each target cell face is also perfectly split into triangle faces of T S (Figure C.19(c)).Specifically, we interpolate between the volume cell of T S and the corresponding volume cell at the T T side to obtain the flux value (which is associated with the corresponding triangle area).These fluxes are then added to form the value for a single cell face of T T .

Figure C. 19 :
Figure C.19:An example of supermesh construction at a non-conformal interface.(a) Donor mesh T D and target mesh T T are non-conformal.A triangular supermesh T S with the shaded area denoting elements (b) on T D and (c) on T T .

Figure F. 20 :
Figure F.20: Five sets of duct designs in the grid-search study.

Figure F. 21 :
Figure F.21: C P 's obtained from O(450) the RANS-MRF simulations for discrete points in the design space, with results stacked to each λ.The URANS re-evaluations of some designs with relatively high C P 's from the RANS-MRF yield results shown by diamonds.

Table 2 :
Setup of optimizations A and B