Main

During courtship, Drosophila males chase and sing to females10,11 (Extended Data Fig. 1a); song is generated via wing vibration and composed into bouts of two primary modes termed ‘pulse’ and ‘sine’ (Fig. 1a). Male song patterning, timing and intensity are known to be modulated by feedback cues stemming from the female3,12. Here we investigate how song production neurons in the brain and VNC4,5,6,7,13,14 are functionally organized to generate different song patterns in different contexts. We utilize a combination of broad-range optogenetic activation in freely behaving animals, automated behavioural quantification, neural recordings and manipulations, and circuit modelling.

Fig. 1: Context-dependent differences in song sequencing in D. melanogaster.
figure 1

a, Drosophila male courtship song is structured into bouts comprising two main modes: ‘pulse’ (p) and ‘sine’ (s). We focus on song bout patterning, although the duration, amplitude and spectral modulation of pulse and sine trains constitute other sources of song variability11,12 (Extended Data Fig. 1c). b, Song bouts consist of either simple pulse or sine trains, or complex sequences involving continuous alternations between modes. c, Population-averaged probability (median ± median absolute deviation from the median) of wild-type males singing simple pulse, simple sine or complex bouts at a given mfDist. The grey vertical line indicates the distance threshold of 4 mm used to define far and near song bouts. d, The distribution of song sequence types differs far versus near the female. Complex p are complex bouts starting in pulse mode. Complex s are complex bouts starting in sine mode. Both far from and near the female, simple pulse bouts constituted the majority of all bouts (more than 95% and around 55%, respectively), followed by complex ‘ps...’ bouts near the female (around 30%). Simple sine bouts constituted the minority of bouts at all distances. e, P(female location) during the production of simple pulse (red, right half) versus complex bouts (purple, left half) in male-centric coordinates (male at origin), averaged across recordings. mfAngle is the angle of the female thorax relative to the body axis of the male. Complex bouts are more likely to be produced when females are close and in front of the male. f,g, Average mfDist (f) and mfAngle (g) during simple and complex song bouts. h, Average duration of simple and complex song bouts. For ch, n = 20 wild-type males (biological replicates) courting wild-type females (see Supplementary Table 2 for genotypes). For d, fh, central mark indicates the median; the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.Whiskers extend to 1.5 times the interquartile range away from the box edges. For fh, Wilcoxon rank-sum test for equal medians. *P < 0.05, **P < 0.01, ***P < 0.001, NS, not significant.

Source data

Context alters sequencing of male song

Each song bout consists of either simple trains of a single mode (pulse or sine only) or complex trains of rapid alternations between song modes (Fig. 1a,b), and males continually switch between singing simple and complex songs throughout courtship (Extended Data Fig. 1b). Previous work has demonstrated that males produce pulse song, the louder mode, at larger distances to the female, and sine song once closer3,12,13,15, and has suggested that alternation between modes involved dedicated descending pathways for pulse and sine song that mutually inhibit each other to control song output16. We collected a large dataset of courtship interactions, combining high-resolution video and audio3,17 (Extended Data Fig. 1d). When examining song bout composition, we found that at close proximity to the female (less than 4 mm), males sing longer, complex bouts composed of alternations between pulse and sine elements, but beyond 4 mm, they sing shorter pulse-only bouts (Fig. 1c–f); these two contexts occur throughout courtship and also correspond to differences in male forward velocity (Extended Data Fig. 1e–g). Although song bout composition is a smooth function of distance, we term these two contexts ‘near’ and ‘far’ throughout the study, for simplicity. Song bout complexity may be desirable to the female, as the majority of bouts immediately preceding copulation are complex (Extended Data Fig. 1h,i).

Song at all distances is biased to bouts with leading pulse song (‘p’ for pulse-only bout or ‘ps…’ for complex bout starting with pulse; Fig. 1d), suggesting that the song pathway is organized to drive activity in pulse-generating neurons initially, in both contexts. The production of complex sequences might then arise via reciprocal interactions between pulse-producing and sine-producing neurons, but only in the near context. Finally, as the change in song complexity near the female is coupled with longer song bouts (Fig. 1f,h), inhibition to the song pathway (to suppress song when no female is present, or to keep song bouts short when far from a female) may be lifted when the male is near the female. Below we test these hypotheses.

VNC rebound circuits enable complex song

We expressed csChrimson18 in two types of song-producing neurons, either pIP10 brain-to-VNC descending neurons4,14 (one neuron per hemisphere; Fig. 2a) or TN1 VNC neurons5,19 (a population of roughly 30 neurons in the wing neuropil of the VNC divided into 5 subtypes (TN1A–E); Fig. 2a), and analysed song produced following bilateral activation. Even though Drosophila males sing via unilateral wing vibration, both the extended wing and the closed wing receive similar motor activity during song production, indicating that song patterning is independent of wing choice20.

Fig. 2: Reciprocal interactions between pulse-producing and sine-producing neurons in the presence of a female.
figure 2

a, Broad-range optogenetic stimulation of song neurons pIP10 and TN1 (see Methods). One block is 15 trials for 8 s each. Neuron schematic in a was adapted from ref. 5, Elsevier, and ref. 14, Elsevier, under a Creative Commons licence CC BY 4.0. b, Song production per trial and time-resolved song probabilities across trials following optogenetic activation of pIP10 neurons in a solitary male. Responses are shown for 3 out of 20 randomized stimulus blocks. Pulse and sine probability for the third example stimulus block, averaged across n = 20 recordings (bottom). Rebound sine is the production of sine song immediately following pulse song production. Opto stim, optogenetic stimulation. ce,hj, Average song probabilities (b) for all stimulus blocks (distinct stimuli per pair of rows); activation of pIP10 (ce) or TN1 (hj) in solitary males (c,h), males paired with a wild-type female (d,i) and decapitated solitary males (e,j). f, Rebound sine probability following activation of pIP10 neurons (highest irradiance level only) in solitary, female-paired or headless males. Female presence promotes complex bout (pulse followed by rebound sine) generation following pIP10 activation. g,l, Average duration of song bouts generated via activation of pIP10 (g) or TN1 (l) neurons in solitary, female-paired or headless males. Female presence promotes longer song bouts following activation of either neuron type. k, Rebound pulse probability following activation of TN1 neurons (highest irradiance level only) in solitary, female-paired or headless solitary males. Female presence promotes complex bout generation (sine followed by rebound pulse) following TN1 activation. m, Simplified circuit model of song pathway; female cues ‘unlock’ complex bout generation via modulation of post-inhibitory (post-inh) rebound excitability (exc) in pulse-driving and sine-driving neurons of the ventral nerve cord (VNC). Disinh., disinhibition. Pir, post-inhibitory rebound. For cg, n = 20 (solitary), 20 (with female) and 10 (headless) males (biological replicates). For hl, n = 23 (solitary), 28 (with female) and 10 (headless) males (biological replicates). For f,g,k,l, Wilcoxon rank-sum test for equal medians. *P < 0.05, **P < 0.01 and ***P < 0.001. Central mark indicates the median; the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.Whiskers extend to 1.5 times the interquartile range away from the box edges.

Source data

By utilizing an optogenetic stimulation protocol that spanned multiple orders of magnitude in both irradiance and duty cycle (Fig. 2a,b), we explored how varying activity in these two cell types affected song production. Consistent with previous findings4,5,14, activation of either pIP10 or TN1 neurons in solitary males drove stimulus-locked pulse or sine song, respectively (Fig. 2c,h). However, in a fraction of males, strong optogenetic stimuli drove pIP10 neurons to produce ‘rebound’ sine song following the offset of pulse song (Fig. 2b,c and Extended Data Fig. 2a; consistent with the observation in ref. 13). Strong stimulation of TN1 neurons drove reliable sine song with some intermittent pulse song (Fig. 2h and Extended Data Fig. 2b), as expected given that the TN1 population (see Methods) comprises some pulse-driving neurons5,21.

The restriction of rebound sine following activation of pIP10 to high optogenetic activation levels suggests that the activity dynamics that generate complex bouts are under inhibition in solitary males, possibly due to a lack of male arousal. Consistent with this hypothesis, optogenetic activation of either TN1 or pIP10 in males paired with females reliably drove long bouts of complex song across a broader range of stimulus parameters versus in solitary males (Fig. 2d,f,g,i,k–l and Extended Data Fig. 2c,d) and this complex song was driven predominantly when near the female (Extended Data Fig. 2e,f), suggesting that female sensory cues unlock the ability of pIP10 or TN1 neurons to produce complex song; rebound song is not produced in the presence of males (Extended Data Fig. 2g,h).

Mutual inhibition between pulse-producing and sine-producing neurons (red and blue nodes in Fig. 2m), combined with cell-intrinsic rebound excitability8, could account for complex bout generation in the functionally disinhibited circuit; activity of the pulse node (depicted as a single node, but comprising multiple pulse-driving cell types) would drive pulse song production and inhibit sine production, whereas termination of activity in the pulse nodes would stop pulse song production and release inhibition of the sine node (again, probably comprising multiple sine-driving cell types), leading to post-inhibitory rebound activity and production of rebound sine song. In this simplified model (in which pIP10 provides input primarily to the neurons of the pulse node), the pulse and sine nodes consist of two units: one that provides excitation (to drive motor output) and another that provides inhibition (to suppress the other song mode). As activation of either pIP10 or TN1 neurons in decapitated male flies still resulted in rebound sine or pulse song, respectively, comparable with that produced in intact males (Fig. 2e,f,j,k and Extended Data Fig. 2i,j), the rebound circuit must be fully contained within the VNC.

Neural signatures of the rebound circuit

We next activated pIP10 neurons while recording from Doublesex (Dsx) neurons in the VNC (Fig. 3a,b; see Methods). VNC Dsx+ neurons include both TN1 neurons5 and dPR1 neurons4, and all are excitatory22. Although TN1A neurons drive sine song, other subtypes of Dsx+ TN1 neurons probably contribute to pulse song production5,21. We therefore expected to observe neural activity among the TN1 population both correlated and anti-correlated with pIP10 activation (and therefore implicated in pulse or rebound sine song production, respectively); importantly, these subsets should be distinct from each other across repeated optogenetic stimulation.

Fig. 3: Investigation of rebound dynamics among Dsx+ TN1 neurons of the VNC.
figure 3

a, Two-photon calcium imaging from VNC Dsx+ TN1 neurons combined with optogenetic activation of pIP10 descending neurons (see Methods for details; see Supplementary Table 2 for genotypes). The numbers of pulse-related and sine-related subpopulations are according to refs. 5,21. Schematic in a was created using BioRender (https://biorender.com). b, TN1 neurons show diverse calcium response dynamics following pIP10 activation (123 TN1 neurons across 3 biological replicate flies). Each row shows the normalized calcium response (dF/F) of a single soma, averaged over seven trials (and each trial contained four stimulus presentations), and responses are sorted by their correlation (corr) with the optogenetic stimulus. The optogenetic stimulus pattern was chosen to produce pulse song followed by rebound sine (see Fig. 2c) in solitary males. c, Pairwise correlation of trial-averaged activity between any two TN1 neurons from one hemisphere in one male (to avoid any cross-hemisphere effects due to, for example, wing choice), following pIP10 activation. Examples of strong anti-correlation and correlation are shown on the right (I/II; P < 1 × 10−50), where light-red boxes indicate stimulus intervals. d, Time-averaged fluorescence of the calcium indicator GCaMP6s expressed in Dsx+ TN1 neurons. The red and blue regions of interest correspond to anti-correlated or correlated pairs shown in c. a.u., arbitrary units. Schematic in d was adapted from ref. 5, Elsevier. e, Distribution of calcium response correlation coefficients (computed per hemisphere and per male) across TN1 recordings in n = 3 males. Colours are as in c. Owing to the near-perfect anti-correlation observed for some neuron pairs, we assume that the intermediary neurons of the rebound circuit (Fig. 2m) are mutually inhibitory, in addition to providing inhibition to song-generating neurons. For c,e, ‘i’ and ‘j’ denote indices to pairs of recorded TN1 neurons.

Source data

We observed a broad range of temporal response patterns within the TN1 population following pIP10 activation. The activity of nearly half of the recorded TN1 neurons was positively correlated to the pIP10 stimulus, whereas a smaller fraction showed activity perfectly anti-correlated to the stimulus (Fig. 3b), with alternating peaks in activity persisting beyond the stimulation (Fig. 3c–e and Extended Data Fig. 3a,b). We found these anti-correlated pairs on both sides of the VNC (as expected, to drive pulse–sine rebound activity in both wings). We had expected to detect only a small number of sine-producing neurons, given that the fraction of stimulus presentations with rebound sine rarely exceeded 30% in solitary or headless males (Fig. 2c,e,f). Imaging from either pIP10 axons or Dsx+ dPR1 neurons showed tight correlation to the optogenetic stimulus (not shown). We hypothesize the existence of intermediary inhibitory neurons that are responsible for coupling in the rebound circuit (Fig. 2m).

To confirm the proposed role of rebound excitability in driving sine and complex song, we recorded song of homozygous mutant males lacking the rebound-facilitating hyperpolarization-activated cation current Ih23. These mutant males were able to sing, but sang mostly simple pulse bouts, independent of distance to the female (Extended Data Fig. 3c–f). Reducing expression of either Ih or Rdl (GABA-A receptor, required for post-inhibitory rebound) in TN1 neurons also reduced song complexity (Extended Data Fig. 3g–i).

Female sensory cues enable complex song

To determine what brain mechanisms drive the rebound circuit in the VNC, we explored the role of P1a6,24, a subset of pC1 neurons25 and pC2 (refs. 7,22,26) cell types, previously implicated in song production.

P1a neurons are driven by taste cues collected during tapping27; these neurons in turn can drive a persistent arousal state28. Because P1a neurons have been suggested to be upstream of pIP10 neurons4, we hypothesized that activating P1a neurons in solitary males would mimic our results with pIP10 activation in the presence of a female (Fig. 2d). By contrast, we found that activation of P1a neurons in solitary males produced persistent and variable song (Fig. 4a and Extended Data Fig. 5a) along with suppression of wing extension during the optogenetic stimulus (Extended Data Fig. 5b). Although activation of additional pC1 neurons13 or longer P1a activation28 can drive stimulus-locked song, our data indicate that P1a activity alone is probably insufficient for temporally precise initiation of complex song.

Fig. 4: Acute female sensory cues promote complex song bout generation.
figure 4

a,b,d, Pulse and sine song probabilities following optogenetic activation of P1a (a), pC2 (b), or both pIP10 and P1a neurons (d), in solitary male flies (n = 17, 16 and 16 biological replicates; genotypes are available in Supplementary Table 2). Schematic in a was adapted from ref. 24, eLife Sciences, under a Creative Commons licence CC CY 4.0. Schematic in b was adapted from ref. 7, Elsevier. c, Peak song probability per optogenetic stimulus duration for pC2 neurons (25 μW mm−2). e,f, Peak rebound sine probability (e) and average bout duration (f) for optogenetic activation (25 and 205 μW mm−2) of pC2, pIP10 or P1a neurons in solitary males or males paired with a wild-type female (data shown in a,b,d; Fig. 2c,d). gi, Song amount (g), proportion of simple pulse bouts (h) and song complexity (mean number of pulse–sine or sine–pulse alternations) (i) in pC2 > TNT males paired with wild-type females. j, Automated tap detection (green; see Methods) and mfDist (black) with a 4-mm threshold for far or near context (grey horizontal dashed line) from an example recording (left). Male locations during tap and no tap events, in female (f)-centric coordinates (recording is the same as on the left) (right). k, Examples of simple (top) and complex (bottom) pulse bouts along with detected taps (green). l, Average tap rate (see Methods) before and during simple and complex pulse-leading bouts, driven by activation of pIP10 in males paired with a wild-type female (Fig. 2d; n = 18 biological replicates). m, To fit generalized linear models (GLMs) predicting pulse bout type (simple versus complex), we used movement features or P1 rate (shades of cyan; see Methods) over the 5 s preceding the end of the first pulse train. fmAngle, female–male angle; mfAngle, male–female angle. n, GLM relative deviance reduction for features predicting bout type (m). Input features are ranked by their predictive power (n = 51 model fits on random subsets of data from n = 18 biological replicates; see Methods). fFV, female forward velocity; fLS, female lateral speed; fRS, female rotational speed; mFV, male forward velocity; mLS, male lateral speed; mRS, male rotational speed. o, GLM filters for the four most predictive features in n. p, Updated model of the brain circuitry involved in male song sequencing. In e,f,h,i,l,n, Wilcoxon rank-sum test for equal medians. *P < 0.05, **P < 0.01 and ***P < 0.001. For g, **P < 0.01, two-sample Kolmogorov–Smirnoff test for equal distributions. For gi, n = 21 and n = 17 biological replicates for experimental and control groups. For g,k,m, red and blue indicate pulse and sine song, respectively. For ei,l,n, central mark indicates the median; the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. Whiskers extend to 1.5 times the interquartile range away from the box edges.

Source data

pC2 neurons in males29 consist of two subtypes (pC2l and pC2m; Extended Data Fig. 4a,b) and detect both visual and auditory cues7,26; in the female FlyWire connectome30,31, pC2l neurons receive direct inputs from both visual (lobula columnar neurons) and auditory projection neurons. Activation of pC2 neurons in solitary males drove pulse song followed by rebound sine, similar to pIP10 activation in the presence of a female (Fig. 4b and Extended Data Fig. 5c). We also observed persistent and variable song in the period outside of optogenetic activation, as well as a near-linear relationship between the duration of the optogenetic stimulus and the amount of rebound sine song (Fig. 4c), suggesting that pC2 neural activity controls the transition from simple to complex bout generation. Pulse song is also composed of two main types (Pfast and Pslow)13, and the duration of pC2 activity determined the selection of pulse type: brief activity mainly drove Pfast (like activation of pIP10 in solitary males), whereas more sustained activity increased the relative amount of Pslow (like activation of pIP10 near a female; Extended Data Fig. 5d). These results support the conclusion that pC2 neurons serve as a main determinant of song composition.

Together, these results suggest that pC2 neurons directly drive pulse song production via pIP10, but simultaneously drive P1a neurons to generate persistent song and functionally disinhibit the rebound circuit in the VNC to enable complex song bouts (Fig. 4p). In line with this hypothesis, we found that simultaneous activation of pIP10 and P1a neurons in solitary males produced highly reliable and long complex bouts, well beyond the levels observed for activation of the individual neuron types, including pIP10 activation in males near a female (Fig. 4d–f and Extended Data Fig. 5e).

We next focused on whether pC2 neurons are required for singing. A previous study7 has reported increased amounts of song in males with blocked synaptic transmission in pC2 (via expression of tetanus toxin light chain (TNT))32, so we recreated the pC2 > TNT flies and re-ran the silencing experiment in our new behavioural rig (Extended Data Fig. 1d). By contrast, we found that silencing pC2 chemical synapses led to an overall reduction in song (Fig. 4g), an increase in the relative amount of simple pulse bouts and a reduction in song complexity (Fig. 4h,i). These new results support a model with pC2 at the top of the song circuit hierarchy; a direct connection from pC2 to pIP10 has been confirmed via expansion microscopy33.

We propose that P1a neurons mediate functional disinhibition (rather than direct excitation) of the VNC rebound circuit. Although similar mechanisms, the former is computationally favourable, as disinhibitory gating preserves the dynamic range for processing of sensory information in target neurons, reduces spurious responses9 and is more consistent with our observation that P1a activity does not directly drive song bouts (Fig. 4a). If P1a neurons disinhibit the rebound circuit, then a separate source of excitation is needed to drive song sequences, now identified as pC2 neurons that mediate parallel drive to both pIP10 and P1a neurons (Fig. 4p).

Although male brain connectome data are not yet publicly available, we analysed the female FlyWire connectome30,31 for GABAergic disinhibitory motifs downstream of pC1 neurons (P1a neurons are a subset of male pC1 neurons; pC1 neurons also exist in females). We found that disinhibition is a common motif downstream of all subtypes of pC1 neurons in females (Extended Data Fig. 5f,g). We activated P1a neurons while imaging from all GABAergic neurons in male flies (Extended Data Fig. 5h), and found regions of interest corresponding to neurons with either activity immediately following P1a activation (we term these ‘F1 follower neurons’) or inhibited by F1 follower activity (‘F2 follower neurons’; Extended Data Fig. 5i–k). F2 followers were dispersed, suggesting the existence of multiple disinhibitory circuits (Extended Data Fig. 5f,g).

We investigated the contribution of both visual3,12 and chemosensory27 cues in driving complex song. We found that male tap rate (see Methods; Fig. 4j and Extended Data Fig. 5l) is higher during complex song bouts versus either before these bouts or during or before simple bouts (Fig. 4k,l; also true for wild-type song, Extended Data Fig. 5m), suggesting that acute (tap-triggered) activation of P1a during an ongoing bout, rather than P1a-mediated arousal on longer timescales, promotes complex bout generation. Consistent with this, priming the male (and driving P1a) via exposure to a female (Extended Data Fig. 5o) only weakly enhanced the complexity of optogenetically driven song compared with solitary males not subject to priming (Extended Data Fig. 5p–u). These results corroborate that P1a neurons have a modulatory effect on behaviour at short timescales34.

Using generalized linear modelling3 (Fig. 4m–o; see Methods), we found that reductions in the angle of the female’s body relative to the body axis of the male (mfAngle; see Fig. 1e), in addition to male–female distance (mfDist), within the 1 s leading up to the end of the first pulse train in a song bout, were the most predictive of whether a pulse bout ended and remained simple, or continued to become a complex bout (Fig. 4m–o). Compared with mfAngle, an estimate of P1a activity derived from the tap detection data (see Methods) had only roughly 40% predictive power, suggesting that tapping and the resulting activity of P1a neurons alone do not fully predict bout complexity. These results imply that combined sensory modalities contribute to song bout complexity: probably visual activity (encoding female distance and angle) relayed through pC2 and tap rate relayed through P1a both contribute to driving complex song sequences (Fig. 4p). pC2 neurons can also be driven by auditory activity in the presence of another male7. For wild-type song, these results hold, but in addition the male’s own speed (his forward velocity) is predictive of bout complexity (Extended Data Fig. 5n), consistent with previous work showing that speed influences song choice, even in blind males3. Indeed, in the absence of a female, persistent and variable song driven by P1a activation (Fig. 4a) is preceded by an increase or decrease in self motion, respectively (Extended Data Fig. 7i). Together, these results support a model (Fig. 4p) in which different sensory cues (for example, vision or taste) and parallel pathways contribute to the choice of simple versus complex bouts during male–female courtship.

A circuit model of song patterning

Our behavioural and neural imaging results suggest how naturalistic song statistics arise from the specific functional architecture of the male song circuit. First, we found evidence for a core rebound circuit in the VNC with mutual inhibition between pulse-producing and sine-producing neurons and rebound dynamics in the inhibited nodes (Fig. 2m and Extended Data Fig. 3d–f,h,i). Second, we found evidence for a direct pulse pathway from the brain to the VNC that integrates sensory signals from the female (Fig. 4p). Third, our results suggest a disinhibitory brain pathway onto both nodes of the core circuit, which is driven by sensory input of different modalities (for example, taste and vision via P1a and pC2, respectively). Two mechanisms to drive P1a and downstream circuitry could facilitate continuous complex song production during different aspects of courtship (Extended Data Fig. 1a). To test whether these few computational features are sufficient to explain naturalistic song statistics, we implemented them in a spiking neural circuit model (see Methods; Fig. 5a), comprising only four nodes (termed ‘pC2’, ‘inh’ for inhibitory, ‘p’ for pulse and ‘s’ for sine). Sensory input to the pC2 node was modelled as naturalistic mfDist (see Methods and Supplementary Table 3 for details). This simple model was sufficient to recapitulate naturalistic song bout statistics both far from and near the female (Fig. 5b–e; compare Fig. 1d,f).

Fig. 5: Neural circuit model of context-dependent song patterning.
figure 5

a, Circuit model for male song patterning far from and near a female. mfDist (top), the only input to the model, enters the circuit via the pC2 node (see Methods), which drives the pulse pathway. Strong input (near the female) additionally disinhibits the VNC rebound circuit, enabling complex song production (alternating activity of the pulse and sine nodes). Grey indicates nodes becoming inactive at far or near conditions. Here, bout termination mainly relies on increases in mfDist (Extended Data Fig. 6a,b), consistent with ref. 3. b, Spiking neuronal network of four nodes (pC2, inh, p and s) representing the key computational features of the circuit in a, disinhibition, rebound excitability and mutual inhibition, fit to wild-type courtship data (see Methods). Model simulations with brief and weak (top) or long and strong (bottom) input to pC2 (corresponding to mfDist = 4.2 and 1.5 mm) result in either simple (‘p’) or complex (‘psp...’) song outputs. c, Song statistics for genetic algorithm fits of the model in b to song data at far (top) or near (bottom) distance (see Methods; experimental distributions shown in Extended Data Fig. 7k). The model reproduces bout statistics of courting wild-type flies (see Fig. 1d). d,e, Average mfDist (d) or population-averaged probability (mean ± mean absolute deviation from the mean) at a given mfDist (e) of simulated simple pulse, simple sine or complex bouts (models as in c) matches observations in courting wild-type flies (see Fig. 1c,f). Vertical grey line in e separates near and far contexts. f, Fit error (genetic algorithm objective function) for the full model versus models with individual computational features knocked out (see Methods), or disinhibition replaced with an excitatory motif (‘exc modulation’; see Methods; Extended Data Fig. 7a–c). For cf, n = 24 (ce) and n = 93 (f) genetic algorithm model fits to song (400 and 200 s each for cf) randomly chosen from n = 20 wild-type recordings (biological replicates). For d,e, Wilcoxon rank-sum test for equal medians. *P < 0.05, **P < 0.01 and ***P < 0.001. For c,d,f, central mark indicates the median; the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. Whiskers extend to 1.5 times the interquartile range away from the box edges.

Source data

Removing individual computational features in the model (see Methods) resulted in overall worse fits to the data than the full model (Fig. 5f), especially when removing disinhibition or rebound excitability of the sine node. Fit performance for a model lacking rebound pulse but capable of rebound sine was similar to that of the full model, highlighting the relative importance of rebound excitability of the sine node (compared with the pulse node) as a computational feature of the song circuit. This is consistent with our conclusion that the pulse production pathway is driven directly via sensory input to pC2 and subsequently to pIP10 (Fig. 4p), but that the sine node does not require direct drive. Indirect drive of the sine node explains the small amount of simple sine song observed in both experiments and simulations (Figs. 1d and 5c), as disinhibition-mediated rebound activity can occasionally drive the sine neuron first (Extended Data Fig. 6a), depending on the internal (membrane voltage) states of the sine and pulse neurons. One possible advantage of the proposed song circuit design based on dominant or leading input to one node of a core rebound circuit is simplicity of control, as theoretically, this architecture allows for switching between simple pulse song and arbitrarily complex pulse–sine sequences, by solely adjusting the level and timing of pIP10 activity. To test this, we used closed-loop optogenetic activation of pIP10 during courtship, triggered on the real-time detection of sine song (see Methods; Extended Data Fig. 7d,e), and found that such activation increased both bout complexity and duration (Extended Data Fig. 7f), uncovering that in Drosophila, patterned activity of a single descending neuron (acting on a disinhibited VNC circuit due to female presence) suffices to generate highly complex song outputs.

Experimental data were best described when our circuit model comprised a disinhibitory motif, not a quasi-equivalent excitatory motif, as this failed to produce song bouts with leading sine song (Fig. 5f and Extended Data Fig. 7a–c). In principle, context-dependent (dis-)inhibition could also be achieved via combinations of descending neuromodulatory or peptidergic systems, and ionotropic systems, although such modulation would need to be on timescales of milliseconds to seconds. In addition, in the biological circuit, other factors such as spike-frequency adaptation (present but not explicitly modelled here; Fig. 5b) could have a role. In line with this hypothesis, we performed in vivo patch-clamp recordings of pIP10 and found clear signs of spike-frequency adaptation (Extended Data Fig. 8a–c).

Our circuit model predicts that blocking descending inputs to the core pulse node should strongly reduce the amount of bouts with leading pulse song. To test this prediction, we re-examined published data13 with expression of TNT32 or inward-rectifying potassium channels (Kir2.1)35 in pIP10 neurons. As both TNT and Kir2.1 prevent chemical synaptic transmission, as expected, the amount of simple pulse-only bouts during courtship with a female was significantly reduced compared with genetic controls (Extended Data Fig. 7g). However, males expressing TNT (but not Kir2.1) in pIP10 produced more sine-leading bouts than controls (Extended Data Fig. 7h), suggesting a potential role for electrical synapses (which remain intact in TNT flies) in mediating sine song generation. Electrical synapses between pIP10 and the inhibitory interneurons of the pulse-rebound circuit (Extended Data Fig. 7j) might help to generate the near-perfect anti-correlation between subsets of TN1 neurons that we observed (Fig. 3c–e and Extended Data Fig. 3a,b).

Discussion

The ability to alter the sequencing of actions to match the current environmental context is observed across animals and behaviours, including for social interactions36,37,38. Here we provide insights into the underlying mechanisms by focusing on song production in two contexts in Drosophila melanogaster: near versus far from a female. Using quantitative behaviour, modelling, broad-range optogenetics, circuit manipulations and neural recordings, we found that simple song (of primarily the pulse mode) is driven by low-level or brief activation of pC2 brain neurons, which drive a pair of pIP10 brain-to-VNC descending neurons. To generate complex bouts, stronger, longer-duration pC2 neuron activity simultaneously drives pIP10 and recruits P1a neurons to functionally disinhibit core circuitry in the VNC, allowing pIP10 descending signals to produce rapid alternations of pulse and sine song. Song alternations are facilitated by combination of mutual inhibition and rebound excitability in pulse-driving and sine-driving neurons of the VNC, allowing for sine song production without the need for excitatory drive. Here, the sensory context, encoded ultimately by acute P1a neural activity, determines which song repertoire (simple pulse or complex) is accessible to descending commands, effectively implementing context dependence via two operational modes of a single circuit39.

Context dependence of acoustic communication is known in other species, including songbirds40 and primates41; the circuit mechanisms that we have uncovered here may therefore serve as a useful template in investigating those systems at the cellular level. The presence of the female has opposing effects on song variability in flies and birds, species in which females prefer either variable42 or stereotyped43 song, respectively. In flies, we showed that female proximity relieves the core song circuit from inhibition to promote song variability (rapid pulse–sine alternations of varying length), whereas in birds, female presence suppresses song variability via direct inhibition of basal ganglia neurons44.

Context dependence has also been reported for escape responses in noctuid moths, crickets and flies; in the moth, two distinct wing motor patterns (directed turning away from low-intensity ultrasound and power dive to escape high-intensity ultrasound) arise from continuous changes in sensory cues45, similar to our finding of context-dependent changes in song output. In crickets and flies, context dependence of escape behaviours is achieved via gating of a single ascending interneuron by the flight motor pattern generator46, or via state-dependent gating of descending neuron activity47, similar to our proposed role of P1a brain neurons in mediating context-dependent song patterning via functional disinhibition of the VNC circuit.

Relating our results with previous work on song production and patterning in Drosophila, we show that first, previous work has suggested that pIP10 neurons drive only the pulse mode of song4,14; however, those studies did not explore the broad range of optogenetic activation parameters used here, highlighting the value of varying neural activity levels during behaviour to uncover circuit dynamics.

Second, although our computational model of the song circuit can recapitulate song dynamics using only mfDist as contextual information, previous work has demonstrated that the male’s own locomotor speed is also highly predictive of song patterning; however, although we do not yet know where self-motion information enters the song pathway, our model predicts that it should be integrated at the level of pIP10 or downstream, pushing the song pathway towards pulse song production, without engaging the disinhibition arm of the pathway (via P1a neurons) that would lead to sine song production.

Third, previous work uncovered that there are two distinct types of pulse song termed Pfast and Pslow, and that the choice of pulse type depends on distance to the female13: males produce Pfast (the louder mode of song) at further distances, switching to Pslow (the softer pulse type) when close. Our data indicate that the relative amount of Pfast and Pslow is ultimately controlled by the activity of brain pC2 neurons (Extended Data Fig. 5d). How VNC neurons4 coordinate the production of the two pulse types remains to be elucidated, but they must ultimately act via the ps1 motor neuron5, which has been shown to be required for males to switch from Pslow to Pfast when far from females13.

Fourth, our study also provides a mechanistic explanation for a previous discovery of two hidden internal states in the male brain underlying song production, termed ‘close’ and ‘chasing’48. Our work suggests that the P1a disinhibition arm of the pathway underlies the difference in these two states; in the close state, in which sine song dominates and males are close to females, the P1a disinhibition circuit is engaged and sensory-driven pIP10 activity drives pulse–sine complex bouts. In the chasing state, in which males are farther from females and moving faster, the P1a disinhibition circuit is not engaged and pIP10 activity drives primarily pulse-only simple bouts. This interpretation explains the observation that males continually toggle between close and chasing states throughout courtship, that close-state durations are longer than chasing-state durations, and why activation of pIP10 neurons in the presence of a female paradoxically both drove pulse song and pushed males into a state (close) that promoted sine song production48.

Last, our work also adds to the range of roles of the P1a neural cluster in modulating social behaviour at different timescales24,26,28,34,49. Although previous work emphasized the role of P1a in gating and sustaining male courtship behaviour by controlling a minutes-long arousal state, here we identified an acute role for P1a in shaping behaviour, similar to ref. 34. We showed that recent activation (timescales of milliseconds to seconds) of P1a neurons unlocks the potential for males to produce complex song (whereas separately, P1a neurons promote persistent singing). This may explain why males continually tap females throughout courtship: not only to maintain arousal but also to gate the production of long (complex) song bouts preferred by the female42.

Our computational model of the song circuit reveals that few key features (mutual inhibition, rebound excitability and disinhibition) are sufficient, in combination with excitatory drive from fluctuating contextual cues, to recapitulate natural song dynamics (Fig. 5). These same features have been shown to contribute to motor pattern generation in both invertebrates and vertebrates50,51,52,53, although they are combined in new ways within the male song circuit. Such a minimalist circuit design both offers a simple control mechanism for reacting to rapid changes in sensory context, and requires only few developmental changes to either derive this circuit from a unisex template16 or alter the circuit to generate new song types in other species14. Yet, we do not rule out the existence of redundant or additional pathways, including descending connections to sine-driving neurons in the VNC. Although emerging connectomes for the male brain and VNC19 will reveal additional neurons and circuit elements that shape male song patterning (for example, uncovering the circuits that mediate functional disinhibition downstream of P1a excitatory neurons or the detailed connectivity between VNC neurons downstream of pIP10), our study highlights how hypotheses about circuit function can be tested via quantitative analysis and modelling of natural, context-dependent behaviour.

Methods

Fly strains and rearing

See Supplementary Tables 1 and 2.

Behavioural apparatus

Behavioural experiments were performed in two custom-made circular chambers (modified from ref. 13) within black acrylic enclosures. Ambient light was provided through an LED pad inside each enclosure (3.5′′ × 6′′ white, Metaphase Technologies). For each chamber, video was recorded at 60 fps (FLIR Blackfly S Mono 1.3 MP USB3 Vision ON Semi PYTHON 1300, BFS-U3-13Y3M-C, with TechSpec 25 mm C Series VIS-NIR fixed focal length lens) using the Motif recording system and API (loopbio GmbH), run via Python 2.7, and using infrared illumination of around 22 μW mm2 (Advanced Illumination High Performance Bright Field Ring Light, 6.0′′ O.D., wash down, IR LEDs, iC2, flying leads) and an infrared bandpass filter to block the red light used for optogenetics (Thorlabs premium bandpass filter; diameter 25 mm, central wavelength = 850 nm, full width at half maximum = 10 nm). Sound was recorded at 10 kHz from 16 particle velocity microphones (Knowles NR-23158-000) tiling the floor of each chamber. Microphones were hand-painted with IR absorbing dye to limit reflection artefacts in recorded videos (Epolin Spectre 160). Temperature was monitored inside each chamber using an analogue thermosensor (Adafruit TMP36).

Optogenetics

Flies were kept for 3–5 days on regular fly food or food supplemented with all-trans retinal (ATR) at 1 ml ATR solution (100 mM in 95% ethanol) per 100 ml of food. ATR-fed flies were reared in the dark. CsChrimson was activated at 1−205 μW mm2, using 627-nm LEDs (Luxeon Star).

Behavioural assays

For all behavioural experiments, virgin males and virgin females were used 3–5 days after eclosion. Experiments were started within 120 min of the incubator lights turning on. Males and females were single and group housed, respectively. Flies were gently loaded into the behavioural chamber before an experiment, using a custom-made aspirator. Females were placed first for paired experiments. Chamber lids were painted with Sigmacote (SL2, Sigma-Aldrich) to prevent flies from walking on the ceiling, and kept under a fume hood to dry for at least 50 min before an experiment. Videos were manually scored for copulation. Data beyond copulation were excluded from analysis, unless statistical biases required exclusion of the entire recording.

Free courtship

Free courtship recordings were performed for 30 min, as previously described3.

Optogenetic neural activation

A fixed stimulus frequency of 1/8 Hz was used for optogenetic neural activation. Stimulus irradiance could take four distinct values (0, 1, 25 and 205 μW mm−2), spanning three orders of magnitude, and stimulus duty cycle could take five distinct values (1/64, 1/32, 1/16, 1/8, and 2/8), and both irradiance and duty cycle were combined in a full factorial design, resulting in 16 distinct blocks (pooling blocks with zero irradiance) that were presented in pseudo-randomized order for 120 s each.

Offline song segmentation

For subsequent offline analysis, song was segmented as previously described11,13, using a modified sine detection parameter to account for different acoustics in the setup used here (Params.pval = 1 × 10−7). For a given recording, the output of the song segmentation algorithm included information about the start and end of each bout and each sine train, as well as the centre of each detected pulse, and a snippet of noise not including song. To reduce the risk of contaminating bout statistics with artificially split bouts due to low amplitude of sine song (the softer song mode), we excluded all bouts containing sine song with amplitude below a chosen signal-to-noise (SNR) threshold. Specifically, we estimated the noise amplitude using the noise segment that is automatically detected and returned by the song segmentation software (thus not containing song), by first reducing the 16-dimensional (for 16 microphones) noise segment to a one-dimensional vector by storing the noise value of the loudest microphone at each time point, and then defining noise amplitude as the 99th percentile of the absolute value of the one-dimensional noise vector. Sine amplitude was calculated similarly, such that the SNR for a given sine bout was the ratio of the sine amplitude and the noise amplitude. We excluded bouts containing sine song with an SNR below 1.3 from further analysis. Furthermore, the song segmenter occasionally split individual sine trains, due to intermittent noise. Uncorrected, this could, for example, split a ‘psp’ bout into one ‘ps’ and one ‘sp’ bout very close in time. This allowed us to use a simple temporal threshold to merge such bouts if the inter-bout interval was below 0.5 s. The segmentation software is freely available at https://github.com/murthylab/MurthyLab_FlySongSegmenter.

Tracking

Male and female poses (locations of head, thorax, and left and right wing tip) were automatically estimated and tracked, and manually proofread for all videos using SLEAP17 (sleap.ai).

Song behaviour analysis

Song probabilities

For experiments with open-loop optogenetic neural activation, the probability for a male to sing pulse or sine song at any point in time during a trial of a given stimulus block was computed as the fraction of trials containing pulse or sine song. For analyses separating song probabilities into far and near contexts, the average mfDist within a trial was thresholded to assign the trial to one of the two contexts. Song probabilities for each context were then calculated using only those trials assigned to that context.

Song sequences

Song segmentation provided information about the start and end of each bout, and all pulse and sine events within a bout, allowing to assign each bout a label describing the sequence of contained pulse and sine trains (‘p’ for a bout containing only pulse song, ‘spspspsp’ for a bout starting with sine song followed by several alternations between pulse and sine). For statistics, we reduced the amount of different bout types by abbreviating all bouts with one or more song alternations as ‘ps...’ or ‘sp...’ and referred to these as ‘complex p’ or ‘complex s’. ‘Acute’ and ‘persistent’ bouts were defined as bouts starting during a stimulus or after stimulus offset, respectively. Rebound song was defined as song that started after stimulus offset, in a bout that started during a stimulus (for example, if the initial pulse train in a ps bout starts during a stimulus, but the following sine train starts after stimulus offset, that is considered rebound sine).

Tap detector model

The tap detector model was constructed using a convolutional neural network. The convolutional neural network consisted of two two-dimensional convolutional layers followed by two fully connected layers. The two convolutional layers had 32 output and 64 output channels, respectively, a kernel size of 5 and a stride of 1. The outputs of each convolutional layer were passed through a rectified linear unit nonlinearity and a two-dimensional max pooling layer with a kernel size of two and stride of two. The first fully connected layer had 53,824 input and 32 output features followed by a rectified linear unit nonlinearity, and the second fully connected layer had 32 input and 2 output features corresponding to scores for a tap or non-tap. The model was trained using the AdamW algorithm for 100 epochs with a batch size of 16 and a learning rate of 0.0001. The model was constructed and trained using the PyTorch library54.

To train the convolutional neural network, video frames (size 128 × 128) of courting flies centred on the male were manually labelled as a tap or non-tap event using a custom graphical user interface. Ten videos were used for creating the tap dataset, with 12,606 manual annotations total. Of these annotated frames, 70% were used for training and 30% were held out for model validation. Receiver-operating characteristic analysis was performed on held out data to determine the relationship between model recall (true-positive rate) and fallout (false-positive rate) as a function of tap detection threshold.

Tap rate analysis

Tap rate was quantified as the number of taps within a song bout, divided by the duration of the bout (to compare with time before a bout, we used the number of taps within an equally sized window preceding the bout, divided by bout duration).

Tap-based model of P1a neural activity

We convolved the binary output of the tap detection network (tap = 1/no tap = 0, using a threshold on tap probability of P(tap) ≥ 0.9) with the known calcium fluorescence of P1 neurons in response to a single tap of the female abdomen (tap-triggered average27) to get an estimate of P1a neural activity in freely courting males on a moment-to-moment basis. We deconvolved the estimated calcium fluorescence signal with a kernel of the GCaMP6s calcium response (time constant of 2.6 s)55 to obtain an estimate of P1a rate, which we used for further analysis.

Bout-triggered analysis of tap rate

For a given recording, the binary tap detector output at video resolution was first upsampled to audio resolution, using the camera trigger signal for synchronization. For each song bout with leading pulse song (simple p or complex ps...), the number of detected taps occurring during the bout, nduring, was counted, and this was divided by the duration of the bout, B, to produce tap rate during the bout, Rduring = nduring/B. As a control, the tap rate before the bout was computed as the number of taps occurring in an equally sized time window B immediately preceding the bout, Rbefore = nbefore/B. Tap rates were averaged (using the mean) per animal across simple and complex bouts and used for further analysis.

Generalized linear model analysis

To estimate the relative predictive power of different sensory features on the choice of bout (here, complex versus simple p), we used the generalized linear modelling framework with a sparse before penalize non-predictive history weights, as previously described3,56. In brief, ten sensory features (male and female forward velocity (mFV and fFV), lateral speed (mLS and fLS), rotational speed (mRS and fRS), the angle of the male (female) thorax relative to the female (male) body axis (fmAngle and mfAngle), the distance between the male and female thorax (mfDist), and the instantaneous rate of P1a neurons estimated from detected taps (P1 rate)) were first smoothed using a moving average filter with a width of 20 video frames (0.33 s). Then, 21 uniformly distributed samples were extracted from the smoothed features within the 5 s of history leading up to the end of the first pulse train of each bout with the leading pulse song (for simple pulse bouts, this corresponded to the end of the bout). Extracted features were z-scored per feature, to account for different feature dimensions and scales. Inputs to the generalized linear model (GLM) were the transformed features and a corresponding binary vector indicating whether a given feature history corresponded to a simple or complex pulse bout, and outputs were estimated filters for each feature (providing information on which dynamics in the feature, within the history window, were most predictive for bout type) and the relative deviance reduction (a measure of model performance). To estimate fit robustness, we repeated GLM fitting 51 times, each time using 70% of the input data (sampled randomly without replacement). For each feature, the mean across fits and the mean absolute deviation from the mean across fits were calculated and used for display.

Two-photon functional imaging

We imaged the activity of Dsx+ cells in the VNC following pIP10 optogenetic activation using a custom-built two-photon laser scanning microscope57,58. Virgin male flies (5–8 days old) were mounted and dissected as previously described59, with minor differences. In brief, we positioned the fly ventral head and thorax side facing up to the underside of the dissection chamber, exposing both the ventral side of the central brain and the ventral side of the VNC. From the head, we removed the proboscis, surrounding cuticle, air sacks, tracheas, and additional fat or soft tissue. From the VNC, we removed thoracic tissue ventral to the VNC (for example, legs and cuticle), exposing the first and second segments of the VNC. Perfusion saline was continuously delivered to the meniscus between the objective and the dissection chamber throughout the experiment. We imaged Dsx+ TN1 cells (one hemisphere at a time), located in the ventral side of the second segment of the VNC. Specifically, although we used flies that express the calcium indicator GCaMP6s in all Dsx+ neurons, we only imaged the prothoracic and mesothoracic neuromeres, and the accessory mesothoracic neuropil of the VNC. Together, these regions house the Pr1–3, Pr4, Ms1–3 and TN1 cluster of neurons60, whose somas have distinct and identifiable locations. We manually segmented somas from these regions that, based on their anatomical location, were unambiguously identified as TN1 neurons. TN1 can be distinguished from dPR1 (which belongs to the Pr1–3 cluster) based on the position of the somas in the anteroposterior axis. Similarly, TN1 can be readily distinguished from its neighbouring clusters (Pr4 and Ms1–3) based on its more lateral and ventral location relative to the accessory mesothoracic neuropil, as well as the smaller size of its somas. Our manual segmentation was based on these criteria rather than on neural responses. We recorded 3–4 subvolumes of approximately 70 × 70 × 20 µm3 at a speed of 1 Hz (0.3 × 0.3 × 2 µm3 to 0.4 × 0.4 × 2 µm3 voxel size), covering the full ventral-to-dorsal extent of the TN1 cluster (~70 µm). Volumetric data were collected using ScanImage 2017 and processed using FlyCaIMan58 (https://github.com/murthylab/FlyCaImAn) via Matlab 2018b. In brief, volumetric time series of the GCaMP6s signal was motion corrected in the xyz axes using the NoRMCorre algorithm61, and temporally resampled to correct for different slice timing across planes of the same volume and to align timestamps of volumes relative to the start of the optogenetic stimulation (linear interpolation). Subvolumes consecutively recorded along the z axis were stitched along the z axis using NoRMCorre. Dsx+ TN1 somas were segmented by using the constrained non-negative matrix factorization algorithm to obtain temporal traces and spatial footprints of each soma as implemented in CaImAn58,62 (the initial number and xyz location of all TN1 somas were manually pre-defined). For pIP10 activation, we used an optogenetic protocol that combined long stimuli driving strong pulse and weaker rebound sine when activating pIP10 in solitary, freely behaving males (Fig. 2c,e). Specifically, we used a stimulus of 2 s ON (at 13 µW mm−2 irradiance) and 2 s OFF repeated four times to maximize the magnitude of evoked GCaMP responses. Imaging started 10 s before stimulus onset, where baseline activity was measured, and lasted 10 s after stimulus offset.

Neural circuit model of song bout statistics

Network simulations were performed using the Brian2 package63 with Python3. Individual neurons were defined as variants of the Izhikevich model64 with known spiking properties (such as rebound or tonic spiking) that matched experimental predictions. In brief, the neuronal membrane potential v was modelled via three ordinary differential equations:

$$\frac{{\rm{d}}v}{{\rm{d}}t}=0.04{v}^{2}+5v+140-u+I+\frac{{g}_{e}+{g}_{i}}{{\tau }_{{\rm{syn}}}}$$
(1)
$$\frac{{\rm{d}}u}{{\rm{d}}t}=a(bv-u)$$
(2)
$$\frac{{\rm{d}}{g}_{e,i}}{{\rm{d}}t}=-\frac{{g}_{e,i}}{{\tau }_{e,i}},$$
(3)

with the membrane recovery variable u, the timescale a and the sensitivity b to subthreshold fluctuations of the membrane potential of the recovery variable, and the input current I. ge and gi are excitatory and inhibitory conductances, and τsyn is the synaptic time constant. Whenever the membrane potential reached 30 mV, this was considered an action potential and the membrane variables were reset via

$$v=c,\qquad u=u+{\rm{d}}.$$
(4)

The full song circuit model comprised four Izhikevich neurons, termed p (pulse), s (sine), pC2 and inh. Parameters a, b, c and d were chosen to enable post-inhibitory rebound dynamics for the pulse and sine node, and tonic spiking for the pC2 and inh nodes (Supplementary Table 3). Inhibitory connections were defined mutually between pulse and sine, from inh to both pulse and sine, and from pC2 to inh. A single excitatory connection was defined from pC2 to pulse. Together, pC2 provided excitatory input to the pulse node and functional inhibition to the inh node, mimicking the direct pulse pathway from pC2 via pIP10 to the VNC, and the proposed disinhibitory pathway from pC2 via P1a (activated for short mfDist and strong input to pC2; Fig. 5a), respectively. For each spike in a presynaptic neuron, the synaptic conductance ge,i was incremented by we,i. we,i were free parameters that were fit during genetic algorithm optimization. The remaining free parameters were the amount of tonic input current into the inh node (Itonic, regulating the amount of tonic inhibition onto the core pulse–sine circuit, mimicking the male’s default, unaroused, state), and a multiplicative factor Ie that controlled the gain of the sensory input current into pC2. The sensory input current into pC2 was the mfDist during a given recording of wild-type courtship, subjected to nonlinear (NL) transformation via

$${I}_{{\rm{pC2}}}={I}_{e}\cdot \,{\rm{NL(mfDist)}},$$
(5)
$$\,{\rm{NL(mfDist)}}\,=\frac{\alpha }{1+\exp \left(-\beta \cdot ({x}_{0}-{\rm{mfDist}})\right)},$$
(6)

to facilitate strong/weak input current to pC2 at short/large distance. Numerical simulations of the network were performed using Euler integration, and spike times of each node were recorded for further analysis. Specifically, ‘song sequences’ of the model were defined based on the activity of the pulse and sine node, such that a coherent spike train of one node that was at least 300 ms separated from the next spike of the other node was considered a simple bout, whereas alternating activity of the two nodes within 300 ms was considered a complex bout. This simplifying assumption allowed us to fit the model to experimental song statistics, using genetic algorithm optimization (see below). We did not explicitly model a mechanism to control bout duration, and we expect that additional features such as recurrent excitation in the pulse and sine nodes are required to sustain pulse or sine trains. All model parameters are specified in Supplementary Table 4.

Genetic algorithm optimization

The distribution of model bout types in response to a given naturalistic stimulus was directly comparable with the actual distribution of male song bouts corresponding to the sensory stimulus, which we exploited to fit the four free parameters of the model (a scalar gain factor for the input to the pC2 node, the strength of a constant input current to the inh node, and one global weight each for all excitatory and inhibitory connections) to the experimental data. Specifically, we used genetic algorithm optimization (the geneticalgorithm package in Python, https://pypi.org/project/geneticalgorithm/) to minimize the root-mean-squared difference between the experimental and simulated bout distribution (using six bout types, ‘p’, ‘ps’, ‘psp...’, ‘s’, ‘sp’ and ‘sps...’, to provide more information to the algorithm than when using the four categories ultimately used for analysis; this led to slightly better model fits), as well as the absolute difference between the number of experimental and simulated bouts (Δnbout), via the objective function root-mean-squared difference + 0.1  Δnbout (see Supplementary Table 4 for optimization parameters and ranges). The relative scaling of the two objectives was chosen to prioritize reproducing the bout distribution over the number of bouts. All genetic algorithm parameters are specified in Supplementary Table 4. Four hundred-second pieces of song data, randomly chosen from all 20 wild-type recordings with at least 10% of song bouts produced far from the female (mfDist > 4 mm), were used as input to the genetic algorithm.

Knockout simulations

To test the relevance of different computational features of the circuit model, we compared genetic algorithm fit performance for the full model (here using 200-s song snippets, randomly chosen from all wild-type recordings) to fit performance for versions of the model with individual computational features ‘knocked out’ or replaced. Specifically, although in the full model both the p and the s nodes were rebound excitable (by choosing the appropriate values for parameters a, b, c and d (see Supplementary Table 3), rebound excitability was knocked out in the pulse (no rebound pulse), sine (no rebound sine) or both nodes (no rebound) by adjusting parameters a, b, c and d (to turn these nodes from ‘rebound spiking’ into ‘tonic spiking’; see Supplementary Table 3). Disinhibition was knocked out by removing the inhibitory synapses of the inh node onto the pulse and sine nodes. To compare fits to experimental data for the default model comprising disinhibition and a model comprising excitatory modulation of the pulse and sine nodes, we replaced the inhibitory weights onto and from the inh node with excitatory weights, forming an excitatory node (‘exc’) for which we removed the tonic input that was present for the inh node in the disinhibitory model.

Irradiance measurements

Irradiance levels reported for optogenetic neural activation in freely behaving flies were measured (using a Thorlabs PM100D power meter) at the centre of the experimental chamber, with the chamber lid in place. Two identical experimental setups were used for behavioural experiments, and irradiance levels were calibrated to have uniform voltage-to-irradiance conversion across setups.

Irradiance reported for optogenetic stimuli during two-photon calcium imaging was measured (also using a Thorlabs PM100D power meter) at approximately the level of the preparation (after the objective).

Statistics

Statistical analyses were performed either in Matlab 2019a or Python 3.7. The two-sided Wilcoxon rank-sum test (Mann–Whitney U-test) for equal medians was used for statistical group comparisons unless noted otherwise. Error bars indicate mean ± mean absolute deviation from the mean unless otherwise specified. Sample sizes were not predetermined but are similar to those reported in previous publications13,34. Experimenters were not blinded to the conditions of the experiments during data collection and analysis. Experimental groups were defined based on genotype, and data acquisition was randomized with respect to different genotypes. All attempts at replication were successful. For box plots, the central mark indicates the median, the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. Whiskers extend to 1.5 times the interquartile range away from the box edges.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.