Mitigating sub-synchronous oscillation using intelligent damping control of DFIG based on improved TD3 algorithm with knowledge fusion

The occurrence of sub-synchronous oscillation (SSO) phenomenon in doubly-fed induction generators (DFIGs)-based wind turbines threatens the secure and stable operation of the power grid. Conventional sub-synchronous damping controllers encounter challenges in adapting to the dynamic operating conditions of power systems. This paper introduces an Intelligent Sub-Synchronous Damping Controller (I-SSDC) for DFIGs that integrates deep reinforcement learning (DRL) and knowledge to address the limitations of conventional methods for SSO mitigation. The initial step involves formulating a framework for I-SSDC using the improved twin delayed deep deterministic policy gradient (TD3) algorithm incorporating Softmax. Following this, a surrogate model is constructed, employing Weighted Linear Regression and regularization. This model is designed to identify the predominant influencing factors of SSO, focusing on the selection of the output signal (installation position) to optimize decision-making in I-SSDC. The objective is to enhance the controller’s environmental adaptability and interpretability. Moreover, knowledge and experience related to SSOs are integrated into agent training to improve the exploration efficiency of the agent. Case studies under various operating conditions of the test power system validate the efficacy of the proposed I-SSDC in suppressing SSOs.

An Intelligent Sub-Synchronous Damping Controller (I-SSDC) based on the improved Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is developed for DFIGs to mitigate SSOs.The inclusion of the softmax operation addresses the underestimation bias in the TD3 algorithm, enhancing its efficacy in the damping control process.A training method employing multiple samples is adopted, tailored for the suppression of time-varying and operationally diverse SSOs.A surrogate model is constructed using weighted linear regression and regularization, enabling the selection of the installation position for I-SSDC by explaining the regression model of participation factors.Compared with purely data-driven models, I-SSDC has better interpretability.Improvement strategies based on knowledge fusion are proposed to address the low training efficiency of current purely data-driven methods for intelligent agents.This strategy significantly accelerates the training convergence speed, which is beneficial for practical engineering applications.The performance of the I-SSDC is compared with traditional SSDC (T-SSDC) considering multiple operating conditions, including wind speed, active/reactive power output of wind farms, number of fans, and line series compensation degree.This paper is organized as follows: "Methods Principle of Proposed I-SSDC" presents the system's model with I-SSDC and the mitigation principle.Then, a data-driven SSO mitigation method using an intelligent sub-synchronous damping controller is proposed in the section "Design of I-SSDC based on improved TD3 algorithm with knowledge fusion".The simulation results under multiple operating conditions are presented in section "Case study and experimental design".Finally, major conclusions and potential directions for further investigation are given in Section "Conclusions".

Methods principle of proposed I-SSDC
The equivalent model for SSO damping in DFIG-based wind farms with I-SSDC is shown in Fig. 1.The DFIG model comprises sections for wind turbines (WTs), DFIGs, inverter DCs, and rotor-side and grid-side converter controllers.The transmission system section, featuring series compensation, consists of a 220 kV line and a 500 kV line, with series capacitor compensation linked to the 500 kV line.Tables 1 to 3 in Appendix A meticulously delineate the parameters for each WTG, as well as the transmission line and transformer parameters.

Modelling of the DFIG-based WT and its conversion system
The WT is the primary link of energy conversion in the wind power generation system.Mechanical output power and torque generated by the WT can be expressed as follows 24 .
Here, ρ represents air density, V w denotes wind speed, R signifies the radius of the WT rotor, β indicates the pitch angle in the variable pitch system, λ refers to the tip-speed ratio of the rotor, and c p represents the WT's power coefficient.λ and c p can be expressed by the following equations: The DFIG is a complex system with high order, multivariable, nonlinearity, and strong coupling.In DFIG, the stator is directly connected to the grid, and its rotor is connected to the grid via a back-to-back converter for AC excitation.The stator and rotor voltage equations in the d-q reference frame can be illustrated as follows 25 : (1) Here, u ds , u qs , u dr , and u qr represent the d-axis and q-axis components of the stator and rotor voltages, respec- tively; ω s is the synchronous magnetic field rotation angular velocity; ω r denotes the rotor angular velocity; ψ ds , ψ qs , ψ dr , and ψ qr represent the d-axis and q-axis components of the stator and rotor magnetic fluxes.
The electromagnetic torque equation is as Eq.(10), where n p is the number of pole pairs: Rotor-side converter control consists of current inner loop control and power outer loop control.The reference value of the inner current loop depends on the maximum power point tracking (MPPT) curve ( T e_ref ) and reactive power control of the outer power loop ( Q e_ref ), respectively.The difference between the reference value and the rotor current feedback ( i dr ,i qr ) is sent to the PI controller, and the u dr and u qr of the rotor voltage control are obtained.Through the conversion of d-q reference to a-b-c reference and PWM signal modulation, the power decoupling control of the rotor-side is realized.The grid-side converter also employs double closed-loop decoupling control.The reference value of current inner loop control is obtained from the deviation of DC voltage ( V dc ) and reactive power ( Q g ) in outer loop control by the PI controller.The difference between the current reference value of the converter and the feedback quantity ( i dg , i qg ) is input into the inner loop PI controller to obtain the voltage control signal ( u dg , u qr ) of the converter on the grid-side.

Mitigation principle of I-SSDC for DFIGs
The mitigation principle of the proposed I-SSDC for DFIGs resembles that of traditional SSDC, as illustrated in Fig. 1.Selecting electrical signals with significant sub-synchronous components from the measured electrical quantities of grid-connected nodes as the output signal y(t) of the controlled system.This signal is then fed into the I-SSDC, producing a control output signal u(t) .SSOs in the DFIG-based wind farm grid-connected system primarily stem from the interaction between the controller of RSC and the series capacitor compensation circuit 26 .Consequently, the output signal u(t) is integrated into the control loops of the RSC as the sup- plementary control signals, thereby generating damping torque/power and providing positive damping for the system.In contrast to traditional SSDC, this paper introduces I-SSDC to enhance adaptability to continuously changing environments and operational conditions.DRL, renowned for its learning and adjustment capabilities, proves advantageous for complex and dynamically changing systems like wind farms, as it does not rely on precise system models.TD3 is a DRL algorithm designed for deterministic strategies, making it well-suited for decision-making tasks involving continuous action spaces.Given that the environmental state variables of DFIG-based wind farms are continuous, I-SSDC adopts an improved TD3 algorithm based on measurement data to intelligently adjust damping control parameters through reinforcement learning, effectively mitigating SSOs.

Principle of I-SSDC Based on DRL
Reinforcement learning involves a continuous interaction process between an agent and its environment to determine an optimal policy that maximizes the expected return.The key components include the environment, the agent, a set of states (s) representing the environment, a set of actions (a) representing the agent's actions, and rewards (r) given to the agent.This interactive process is depicted in Fig. 2. In the context of I-SSDC based on DRL, the DFIG-based wind farm grid-connected system serves as the environment, with measured electrical quantities acting as the state for the agent.The agent determines the optimal action policy based on the state and received reward, i.e., the additional damping control signal.3. The reward function serves as a crucial driving signal for the intelligent agent to explore the optimal action strategy.The oscillation amplitudes of active and reactive power at the grid-connected node are critical for oscillation suppression.Therefore, the reward function for the agent is designed as Eq. ( 13): Here, 1 and 2 require continuous experimentation and modification during training.

Design of I-SSDC based on the improved TD3 algorithm with knowledge fusion
Framework of I-SSDC Figure 3 illustrates the overall schematic diagram of I-SSDC.The left side depicts the power system environment requiring additional damping, while the DRL agent on the right utilizes an S-TD3 algorithm, transforming the controller design into an MDP.The S-TD3 algorithm, an enhancement of TD3, incorporates the Softmax operation to control the gap between the value function and the optimal value, resulting in improved decisionmaking effects.A regression model is established between the electrical quantities of rotor-side control loops and contributing factors to adapt to the inhibitory effects of controllers in diverse scenarios.On this basis, a surrogate model extracts key characteristics from the electrical quantities to determine the optimal installation position for I-SSDC.The framework supports the integration of knowledge and experience related to the SSO parameter, restricting the agent's exploration space.This is achieved by dividing the experience replay into successful experience replay and failure experience replay, followed by mixed sampling for agent training.In each episode, the agent attempts actions based on input states and performs a one-time domain simulation.The episode ends if the system's oscillation amplitude is less than ϵ; otherwise, the search continues.Utilizing the samples generated by the interaction between the agent and the environment, the agent is trained to obtain the optimal action strategy for suppressing SSO through as few action attempts as possible.

Adaptive output signal selection of I-SSDC based on a surrogate model
When deploying an additional damping controller to mitigate SSO, the controller's output signal may impact its control efficacy.Moreover, the configuration of damping values often requires adjustments based on the operational state of the system 27,28 .Potential output signals available for the SSDC for control over RSC encompass the power control loops, d-axis and q-axis current control loops, and d-axis and q-axis voltage control loops.Employing traditional model-based observability and controllability indicators for generating output signals necessitates extensive online calculations.In this paper, the extraction of critical features from measured data of the rotor-side is employed to discern the optimal output for the controller, thereby enhancing the model's performance and interpretability.
Exploring the relationship between electrical quantities and SSO is imperative to identify effective data features.The linearized state-space equations for the DFIG-based grid-connected system are presented below: Here, X denotes the system's state variables, ΔV represents the grid-side input voltage at the WT connection point, and Y = [ P, Q] signifies the active and reactive power injected by the WT into the power system.By employing the conventional modal analysis method 29 , the small disturbance stability of the system is characterized by the eigenvalues of the system state matrix A. Each pair of complex conjugate eigenvalues corresponds to an oscillatory mode.The participation factors ( P ki ) are utilized to depict the influence of various system state variables on each oscillatory mode, as illustrated in Eq. ( 16).
In Eq. ( 16), ν ki and u ki represent the elements in the k-th row and i-th column of the left and right eigenvector matrices corresponding to the eigenvalue i , respectively.Meanwhile, P ki signifies the correlation between the i-th mode (associated with the eigenvalue i ) and the k-th state variable.A larger P ki indicates a more significant influence of the state variable on this mode 30 .
With the characteristic matrix A of the closed-loop system given, the participation factors P ki can be deter- mined, i.e., P ki = ∂(A) , where ∂(•) denotes the mapping relationship function between the participation fac- tor and the system characteristic matrix.The characteristic matrix A of the closed-loop system is dependent on the system's operating point M. SSCI is primarily caused by the interaction between the RSC control and the series capacitor compensation circuit.When analyzing the SSCI problem, the influences of the gridside converter, filters, DC capacitor, and phase-locked loop can be disregarded.Therefore, M is expressed as M = T e , Q e , i dr , i qr , u dr , u qr , where T e , Q e , i dr , i qr , u dr , and u qr represent the electrical quantities of the rotor- side control system.Consequently, the relationship between the participation factors and the system's electrical quantities can be expressed as Eq. ( 17): In Eq. ( 17), g(•) denotes the mapping relationship function between the participation factors and the system characteristic matrix; h(•) represents the mapping relationship function between the system characteristic matrix and the system operating point; l(•) signifies the mapping relationship function between the participation factors and the system operating point.
A regression model for the dominant participation factors under the primary SSO mode is established using a neural network method 31 to estimate various operational scenarios.Building upon the regression model, this section employs a local surrogate model to extract critical features from multiple input characteristics under the currently studied sample, thereby determining the controller's installation position selection.
The linear surrogate model g(z) approximates the original model f (x) 32 , and its form is as Eq. ( 18).
Here, x represents the input of the sample, i.e., the electrical quantities of the rotor-side.z represents n impor- tant variables in x that significantly impact SSO.By utilizing machine learning to obtain the parameters w i of the surrogate model g(z) as interpretative results, the electrical quantity with the highest weight is chosen as the injection position for the output of I-SSDC.
Based on the input variables of the original model, sampling is performed with the training data of the surrogate model centered around the decision-making data (x 0 , y 0 ).The estimated results of the original model's sampling data are labeled.Constructing a linear model g′(x) based on weighted linear regression and L1 regularization, the model is trained using the sampled data.The important state variable z is selected from the sparsity of the parameters in g′(x), highlights the important state variables z that influence SSO.Since the regularization penalty in g'(x) is relatively strong, it leads to a larger parameter bias in the model solution.Therefore, further using z as input, a surrogate model g(z) is constructed using weighted linear regression and L2 regularization, training the model to make g(z) ≈ f(x).The objective function of the linear surrogate model is described as Equation ( 19): In Equation (19), L(w) denotes the regularization term, and ρ(z i ) refers to the weight coefficient of the sampled data.L(w) comprises L1 and L2 regularization terms, expressed as the follows: The weighting coefficients of the samples can be determined using logistic regression, as shown Equation ( 22): ( 16) Here, the closer the sampled data is to x 0 during training, the larger the weight.σ is a free parameter, and the smaller the value of 2σ , the smaller the fitting neighborhood range of the linear surrogate model for x.

S-TD3
The TD3 algorithm, structured as an Actor-Critic system as depicted in Fig. 3, engages in continuous interaction with the power system environment.This interaction acquires optimal values for the six neural network parameters, subsequently achieving an optimal configuration for the damping controller.This process is commonly referred to as offline training.The TD3 algorithm represents an enhancement of DDPG, introducing features such as clipped double-Q learning, delayed policy updates, and target policy smoothing 33 .Throughout the training process, the parameters θ and ω of the Actor network ( π θ ) and critic network ( Q w ) are updated through gradient descent to minimize their respective loss functions.
The objective of the Actor network is to maximize the value function, utilizing a gradient descent approach to optimize the parameters θ.
In Eq. ( 23), n represents the number of training samples extracted from the experience replay, s t and a t denote the state and action at time t , respectively.Following the deterministic policy gradient, the parameters θ of π θ are updated as θ = θ + µ θ ∇ θ J(θ) , where µ θ is the learning rate of the Actor network.Simultaneously, the parameters θ ′ of the Actor target network are updated as θ ′ = τ θ + (1 − τ )θ ′ , with τ as the update coefficient.
The critic network optimizes parameter w by minimizing the loss function Loss(w) , defined as Eq. ( 24): Here, y t signifies the target Q value at time t.The TD3 algorithm simultaneously learns two critic target networks (Q' w1 and Q' w2 ) and selects the minimum value for policy updates.While the TD3 algorithm incorporates a clipped double Q-learning mechanism to prevent Q value overestimation, it may introduce a low estimation bias on Q values, impacting performance.To effectively address these drawbacks, this paper introduces the S-TD3 algorithm, utilizing the Softmax function to estimate the value function.The softmax function can regulate the gap between the value function and the optimal value, reduce the frequency of obtaining local optimal solutions, and decrease the sensitivity of algorithm initialization parameters 34 .The target Q value (y t ) can be expressed as: Here, Q ′ (s t+1 , a t+1 ) represents the minimum value of Q' w obtained from the two critic target networks; p() is the probability density function of the Gaussian distribution; β is the parameter in the softmax function; γ is the reward discount factor.a t+1 is calculated by the Actor target network π ′ θ and a t+1 = π ′ θ (s t+1 ) + ε , where ε is the added noise based on the normal distribution.The parameter w of Q w are updated according to the gradient rule, w = w − µ w ∇ w Loss(w) , where µ w is the learning rate of the critic network.The parameters w ′ of the critic target network are updated as

Agent training based on knowledge fusion
Knowledge fusion involves merging prior domain expertise with deep learning methodologies to enhance model performance and interpretability.This section presents an improved strategy that integrates relevant knowledge into the training of an agent within a data-driven approach, employing knowledge constraints to guide the agent's exploration space.
The S-TD3 algorithm, a form of DRL, involves the agent interacting with a simulation environment, generating samples subsequently placed into an experience replay for training 35,36 .However, during the initial training phase, the agent is randomly initialized, posing a challenge for the agent to produce high-quality samples during interaction.This challenge results in a slow convergence of the agent toward an approximately optimal decision.To expedite the convergence speed of the algorithm, as depicted in Figure 3, knowledge rules related to SSO analysis are integrated into the decision-making process of the intelligent agent.
In instances of SSOs, the model of the actually collected measurement signals can be represented as Equation (28) 37 : ) y t = r(s t , a t ) + γ softmax Q ′ (s t+1 , a t+1 ) (26) softmax(Q′(s t+1 , a t+1 )) = E exp(βQ′(s t+1 ,a t+1)) Q′(s t+1 ,a t+1) p(a t+1) E exp(βQ′(s t+1 ,a t+1)) p(a t+1) feature under the prevailing conditions, as depicted in Figure 4.The sub-synchronous oscillation mode exhibits a pronounced correlation with the stator power of the DFIG-based WT.Consequently, the power loops of the RSC control are selected for sub-synchronous oscillation suppression.To assess the real-time dynamic control efficacy of I-SSDC under varying system operating conditions, the initial operating state of the wind farm is characterized by a wind speed (v wind ) of 9m/s, DFIG output active power (P) of 0.5 pu, and DFIG output reactive power (Q e ) of 0. At t = 0.2 s, the series compensation degree (K c ) is increased to 40%, and at t = 1 s, the reactive power (Q e ) changes to − 0.5 pu. Figure 5 illustrates the time-domain waveform of the wind power plant's active power output over time.
As depicted in Fig. 5a, active power oscillation disperses after 0.2 s due to the increased string complement.However, the oscillation amplitude is exacerbated by changes in output reactive power after 1 s, leading to the superposition of oscillation frequencies.Figure 5b illustrates that the amplitude of active power decreases substantially following the increase in string complement, owing to the involvement of the I-SSDC in the control process.This amplitude remains relatively stable after 0.45 s.With changes in output reactive power after 1 s, the I-SSDC adaptively updates according to system variations, maintaining its previous value once the oscillation subsides after 1.4 s.The oscillation frequency of the active power fluctuates after 1.2 s.
When the operating state undergoes a sudden change, the adaptive updating capability of the I-SSDC controller leads to a rapid decrease in the amplitude of the generator's output active power oscillations.Once the oscillations are suppressed, the output active power returns to its previous value.This ensures the stable operation of the system under varying conditions.As the system operating point continues to change, the I-SSDC demonstrates excellent dynamic tracking control capability, successfully suppressing each disturbance-triggered SSCI consistently.

Analysis of I-SSDC adaptability across a wide operating range
To assess the enhanced adaptability of I-SSDC, a comparative analysis with T-SSDC) based on the phase compensation principle 9,10 is conducted across a broad operational spectrum.This study comprehensively examines the adaptability of the proposed I-SSDC method compared to its traditional counterpart.The controller's efficacy is evaluated under diverse operating conditions, including wind speed (V wind ), active power of DFIG output (P), reactive power output of DFIG output (Q e ), Controller parameter (I d -k p ), and series compensation degree (K c ). Establishing a stable state as the baseline operating point A serves as a reference for subsequent parameter adjustments, resulting in additional operating points B to I.Each operating point represents various levels of system stability.Points B to E modify a single operating parameter, points F to H alter multiple operating parameters concurrently, and point I serves as a non-training operating point.These points encompass a wide range of SSO frequencies due to substantial variations in operating parameters, posing a considerable challenge to the controller's suppression capabilities.The operating conditions for each point are detailed in Table 1.The comparative evaluation between I-SSDC and T-SSDC at points B to I is presented in Fig. 6.
As shown in Fig. 6a, operating points B-E alter the wind speed, string complement degree, controller parameters, and output reactive power of the system, respectively.After 0.4 s, the system oscillates at a single frequency.Operating points F-H exhibit multiple superimposed oscillation frequencies due to changes in multiple operating parameters, resulting in higher oscillation amplitudes.From Fig. 6b, it is evident that after the system begins to oscillate, the oscillation amplitude at operating points B-E starts to decrease due to the involvement of the T-SSDC in the control.Significant reduction in oscillation amplitude is observed after 0.2 s of control participation, with the system returning to stability after 1.2 s.The speed of oscillation damping varies slightly among operation points B-E due to differing initial amplitudes.For operating points F-H, the system state changes after 0.4 s, with a gradual increase in the trend of oscillation amplitude reduction.However, even after 2 s, the oscillation is not entirely quelled, showing the worst suppression effect at operating point H.As depicted in Fig. 6c, the oscillations at operating points B-H subside after 0.6 s following the system state change, with a faster decrease in amplitude.
While T-SSDC succeeds in restoring system stability at operating points B to E, the suppression process is time-consuming, and there is a notable initial oscillation amplitude during control.However, at operating points F to G, SSCI is inadequately suppressed, indicating the limited adaptability of SSDC with fixed parameters.
I-SSDC adeptly suppresses SSO at operating points B to I, exhibiting faster convergence and less overshoot compared to T-SSDC.Notably, even at operating point I, the suppression strategy remains effective, showcasing the robustness of I-SSDC, even under non-training samples.In summary, a comprehensive comparative analysis is undertaken to demonstrate the superior adaptability of the proposed I-SSDC method compared to the effectiveness of T-SSDC, which is assessed across diverse operating environments.

Conclusion
This paper addresses SSO in a DFIG-based wind farm grid-connected system by introducing an intelligent damping controller that amalgamates knowledge with improved TD3, ensuring the secure and stable operation of the power grid.Deep deterministic policies govern decisions regarding the control variable, particularly additional damping control, for action exploration.The Softmax operation enhances the accuracy of the trained model.A

1 .
The state, which is the perceptual information provided by the environment to the agent in the SSO suppression problem considered in this paper, is the input control signal to I-SSDC.Common input signals for additional damping controllers include rotor speed, terminal voltage of DFIGs, rotor current, etc.The input control signal for SSDC should possess characteristics that facilitate easy acquisition and fast transmission to minimize signal acquisition delays.The state set is defined as the oscillation amplitude of electrical quantities of grid-connected DFIG-based WTs: 2. The action space comprises relevant decision variables in the optimization model.To suppress SSOs, the controller of RSC can be enhanced by adding additional damping control, thus injecting an extra damping control signal into the control system.The selected controller output signals include the inner and outer loop control output signals of the DFIG's RSC.Dual-loop control is advantageous for suppressing SSOs.The action set can be defined as the injected additional damping control signals into the control system:(11)

Figure 2 .
Figure 2. Interaction process between agent and environment.

Figure 3 .
Figure 3.The Overall Schematic diagram of I-SSDC.

Figure 4 .
Figure 4. Influence weights of RSC control electrical quantities.

Figure 5 .
Figure 5. Suppression effect of the I-SSDC as the operating state changes over time.

Figure 6 .
Figure 6.Suppression effect of the I-SSDC across a wide operating range.

Table 1 .
Operating conditions of each operating point.