## Abstract

We consider the question of how sensory networks enable the detection of sensory stimuli in a combinatorial coding space. We are specifically interested in the olfactory system, wherein recent experimental studies have reported the existence of rich, enigmatic response patterns associated with stimulus onset and offset. This study aims to identify the functional relevance of such response patterns (i.e., what benefits does such neural activity provide in the context of detecting stimuli in a natural environment). We study this problem through the lens of normative, optimization-based modeling. Here, we define the notion of a low-dimensional latent representation of stimulus identity, which is generated through action of the sensory network. The objective of our optimization framework is to ensure high-fidelity tracking of a nominal representation in this latent space in an energy-efficient manner. It turns out that the optimal motifs emerging from this framework possess morphologic similarity with prototypical onset and offset responses observed *in vivo* in locusts (*Schistocerca americana*) of either sex. Furthermore, this objective can be exactly achieved by a network with reciprocal excitatory–inhibitory competitive dynamics, similar to interactions between projection neurons and local neurons in the early olfactory system of insects. The derived model also makes several predictions regarding maintenance of robust latent representations in the presence of confounding background information and trade-offs between the energy of sensory activity and resultant behavioral measures such as speed and accuracy of stimulus detection.

**SIGNIFICANCE STATEMENT** A key area of study in olfactory coding involves understanding the transformation from high-dimensional sensory stimulus to low-dimensional decoded representation. Here, we examine not only the dimensionality reduction of this mapping but also its temporal dynamics, with specific focus on stimuli that are temporally continuous. Through optimization-based synthesis, we examine how sensory networks can track representations without prior assumption of discrete trial structure. We show that such tracking can be achieved by canonical network architectures and dynamics, and that the resulting responses resemble observations from neurons in the insect olfactory system. Thus, our results provide hypotheses regarding the functional role of olfactory circuit activity at both single neuronal and population scales.

## Introduction

We consider the question of how early sensory networks produce actionable neural representation in response to sensory stimuli arriving in a dynamic fashion. The specific focus here is on the olfactory system, wherein the architecture is schematically conserved across species (Strausfeld and Hildebrand, 1999). In this system, early networks receive external excitation from periphery and transform it into intermediate representations which are then routed to higher-level brain areas for further processing and behavioral and motor response generation (Kay and Stopfer, 2006; Martin et al., 2011; Aldworth and Stopfer, 2015). Our goal is to identify the functional significance of certain characteristic neural response patterns observed as animals encounter sensory stimuli.

We are specifically motivated by experimental findings illustrating a rich taxonomy of stimulus-evoked, time-varying responses or sensory trajectories associated with both stimulus onset and offset (Stopfer et al., 2003; Kay and Stopfer, 2006; Bathellier et al., 2008; Saha et al., 2013b, 2017). It has been reported that the most noticeable change in activity of sensory neurons occurs immediately following stimulus onset and then following stimulus offset (Stopfer et al., 2003; Mazor and Laurent, 2005; Bathellier et al., 2008; Raman et al., 2010), and these responses are in fact orthogonal to each other at a population level (Saha et al., 2017). In this work, we are not attempting to “pattern match” these responses per se, but rather provide a functional interpretation of what these prototypical spatiotemporal response motifs achieve in terms of their ability to mediate stimulus detection. Through both theory and validation, we address the following: (1) whether there are particular decoding objectives that explain the manifestation of observed sensory response motifs; and (2) whether there are physiologically plausible neural circuit architectures that are capable of realizing motifs generated by optimization of the objective in (1).

The central theoretical premise of our paper is that activity of the early sensory network drives a latent representation of accrued evidence regarding presence or absence of stimulus. The neural responses thus encode not only detection but also “undetection” or withdrawal of the source of excitation. Adapting from approaches used in behavioral neuroscience, we formulate a dynamical decoder that uses high-dimensional, combinatorial input to generate a low-dimensional latent representation. Thereafter, we use a top-down, normative approach wherein we generate the response motifs by minimizing an objective function. We formulate this objective function so that it emphasizes that latent representations should quickly and accurately convey information about peripheral stimulus in an energy-efficient manner. Neural response motifs borne out of such an optimization problem bear morphologic similarity to those observed *in vivo*, leading us to assert our claim of ascribed functional relevance.

Our work builds on a rich theory of olfactory coding focused on the question of how reliable and informative representations propagate through the sensory hierarchy (Bhandawat et al., 2007; Schaefer and Margrie, 2007; Zhang and Sharpee, 2016). In particular, we note recent efforts to ascribe particular functions to the circuit architectures, including enabling stimulus reconstruction (Qin et al., 2019), categorization (Dasgupta et al., 2017), and novelty detection (Dasgupta et al., 2018). However, these findings have mostly been pursued in a static input–output domain (i.e., with tacit assumption of instantaneous, algebraic signal transformation). In contrast, we work in the space of sensory dynamics and time-varying representations, a topic that has received much attention from a descriptive standpoint (Laurent, 1996; Rabinovich et al., 2000; Laurent et al., 2001) but less so from the perspective of normative synthesis. In particular, through formal mathematical arguments, we investigate what advantages the big switch between onset and offset responses achieves in the biological world, where organisms encounter stimuli in a dynamic fashion.

Our results show that our normative model produces emergent phenomena that predict many nuanced features of actual sensory network activity as observed in locust (Saha et al., 2017; Mazor and Laurent, 2005; Nizampatnam et al., 2018). Further, our normative synthesis procedure yields a set of network dynamics that is highly compatible with the known physiology of these circuit *in vivo*. Hereafter, we proceed to formulate our theoretical setup before presenting our key synthesis and validation results.

## Materials and Methods

### Computational model

#### Decoder and latent space

Our problem setup is premised on the canonical architecture of the insect early olfactory system (Kay and Stopfer, 2006; Masse et al., 2009), wherein chemical cues are transduced to neural signals through olfactory receptor neurons (ORNs), which then propogate activity via glomeruli to projection neurons (PNs) en route to higher brain areas. Our focus is on the dynamical transformations mediated by PNs and their local circuitry.

Our model for sensory tracking hinges on the definition of a latent space, , which contains information about stimulus identity. Each dimension of the latent space indicates accrued evidence regarding the presence of high-level stimulus features (Raman et al., 2011). The assumption is that such features provide an actionable representation of the stimulus that can then be used to enable higher-level processing and behavior. Specifically, PN activity, , is linearly decoded into the latent space via (1):
where such that , and represents the spectrum of A. Here, , where and is the identity matrix. The matrix linearly mixes the contribution of PNs onto each dimension of the latent vector. Equation (1) imposes that the change in accrued evidence at any time t is proportional to the difference between the information encoded by representative PNs and evidence lost due to intrinsic leaky dynamics of the decoder. Figure 1*A* illustrates our problem setup, noting that we are focused on activity downstream of the ORNs (i.e., the dashed vertical line). Afferent activity from the receptor neurons is denoted by . It is important to note that our approach does not assume any network structure nor dynamics for the PNs; rather, our goal is to synthesize these dynamics.

### Enabling labile, accurate latent representations via sensory tracking

In particular, the formulation in Equation 1 allows us to introduce our normative premise (i.e., that the PN activity x drives accurate representations of accrued evidence, thus conveying information about stimulus presence and identity). Further, the dynamics of x should allow for fast transitions in ν.

This idea is readily captured through the notion of tracking in the latent space. From the initial state , we seek to construct dynamics for the PN x such that ν can quickly and accurately reach an arbitrary point in the latent space, z. Mathematically, we formulate a quadratic objective function:

The terms in this objective function are highly interpretable from a biological perspective. The first term in the integrand penalizes the deviation of the decoded state from z. The second term penalizes the energy of the PN response, whereas the last term penalizes large fluctuations in the PN response. The objective is formulated over an infinite horizon (i.e., there is no bias or expectation regarding the amount of time that the representation is to be tracked).

The objective and normative optimization is schematized in Figure 1*B*,*C*. At the stimulus onset (Fig. 1*B*), the latent state is to track some nominal representation z. On termination of the stimulus, the latent state continues tracking z, which is now positioned at the origin of the m-dimensional latent space signifying absence of a stimulus. Here, we considered a pulsatile stimulus structure in accordance with the natural olfactory environment, where anorganism perceives olfactory stimulation as pockets of odor plumes (Vickers, 2000). Furthermore, in this work, our focus was on the neural dynamics downstream of the receptor neurons (Fig. 1*A*); therefore, we assumed that each stimulus is presented at a fixed intensity. The matrix Q in our setup enforces that the tracking in the latent space occurs accurately, without transient or asymptotic misreprsentation (see Fig. 1*B*). The matrix S imposes penalty on the energy expenditure, whereas the matrix R regularizes rapid fluctuation in firing rate activity. Upon withdrawal of the stimulus (Fig. 1*C*), the representation returns to neutral. In essense, this model embeds the idea of active stimulus detection and undetection.” The question at hand is how x should be specified (according to Equation 2) for robust and accurate detection of odor stimuli.

Table 1 summarizes all mathematical symbols used in this paper as well as their interpretation.

### Reduction of optimization framework to infinite horizon linear quadradic regulator problem

The objective function (as in Eq. 2) subject to the decoder dynamics (Eq. 1) constitutes our optimization problem.

We denote . Thus, the problem can be rewritten as follows:

Now, we define . We assume that the fixed representation z remains constant as long as the same stimulus conditions prevail at the periphery (i.e., ). Therefore, we can write the following:

With this, Equation 3 reduces to: where and . The transformed optimization problem (OP) (Eq. 5) amounts to an infinite horizon linear quadratic control problem (Boyd and Barratt, 1991; Anderson and Moore, 2007), a classical control theoretic framework for finding the optimal control input to a dynamical system. In our context, the “controls” are the activity of the PNs and the task is to realize these controls by means of a dynamical network.

### Solving the optimization problem

As the matrices Q, S, and R in Equation 3 are positive definite, it is straightforward to establish that , , and hence, problem Equation 5 has a unique solution (Anderson and Moore, 2007). Indeed, the solution to Equation 5 with initial conditions is given by where K is the solution of the algebraic Riccati equation (ARE), given as follows:

Therefore, we can now write where and , and .

The excitation to the PN layer arrives from first-level ORNs. We denote the afferent input from these receptor neurons as . In the model,

It is important to note that the model does not predict ORN dynamics or connectivity per se, only that is provided to the network the PNs. There may be multiple ways to synthesize an ORN network that realizes this transformation, and specifying such a construction is not the primary focus of this paper.

The instantaneous decoded evidence can be written as follows (recall Eq. 1):

We initialize the decoder to a zero initial state (i.e., ), leading to

Using Equations 11 and 9 in Equation 8, we obtain the optimal neural dynamics:

When the latent decoder is a noisy one or when the neural entities are modeled as those associated with background noise, we can posit a similar problem. In such a model, the objective function is written as follows:

The reduction technique for this stochastic problem is similar to what has been shown above (Boyd and Barratt, 1991) for the deterministic case, and the solution of the ARE for this stochastic linear quadratic regulator is identical to what we have derived above except for the addition of a noise term (see Eq. 14 below):

We have used this stochastic version to generate figures depicting Principal Component (PC) trajectories and correlation of ensemble response.

### Extracting a first-order network to realize the optimal solution

The primary step in extracting a first-order network from the derived optimal motif is to introduce a population of auxiliary neurons whose activity is given by the following dynamics: where is the PN activity, is the number of PNs, and .

Thus, the function of the matrix is to compute a weighted sum of the PN activity and propagate it to the auxiliary neurons. Using Equations 12 and 15, we can write the following:

Here, , and must satisfy the condition to realize the deduced optimal dynamics. We choose (Laurent, 1996). Therefore, is a wide matrix, and the matrix decomposition problem is ill posed, which might have no solution or an infinite number of solutions. Instead of solving this underdetermined set of equations for an arbitrary choice of , we reformulate the problem in a tractable format. We introduced a constrained iterative optimization framework (Shlizerman et al., 2014) that minimizes (see Algorithm 1) to derive the matrices . The constraints in question are motivated by reports of existence of distinct excitatory/inhibitory neural units in the insect antennal lobe (AL; Wilson and Laurent, 2005; Kay and Stopfer, 2006; Shang et al., 2007). Together, Equations (15) and (16) provide the first-order rate equations of the sensory network (See Figure 2 for schematic of the network).

### Parameter selection

We chose the decoder parameter as in our simulations (exceptions mentioned in relevant Figure caption). The penalty matrices are chosen as , , . Also, unless mentioned otherwise, we have chosen (number of neural units driving the decoder) and (dimension of the latent space). For simulations using the stochastic model, we have considered to be unit variance Gaussian noise with = 0.05.

The secondary iterative optimization that generates can be outlined as follows.

Algorithm 1

choose s.t. ;

set ; set

**While** iter ≤ maxIters **do**

, subject to

, subject to

**If** score **then**

set ;

**else**

continue;

**Result: If** **then**

return ,

**Else**

**If**

return ,

**Else**

No solution for this initialization.

Here, is the Frobenius matrix norm, and regularization coefficients are chosen such that , . We initialize the algorithm with an arbitrary wiring for such that its pseudoinverse exists, and we initialize the matrix as , as the pseudoinverse is the best fit for a least-squares solution. Thereafter, we proceed using Algorithm 1 to find a pair of matrices and such that the resultant network realizes the optimal motifs as well as the well characterized network constraints are satisfied. The optimization problem is solved using the disciplined convex optimization package CVX implemented in MATLAB (MathWorks; Grant et al., 2008).

### Dimensionality reduction analysis for simulated PN response

Similar to methods by Saha et al. (2017), we applied principal component analysis (PCA) on our simulated PN response data to visualize the n-dimensional time-varying activity of PNs. Each exposure to a stimulus of given identity continues for τ seconds, followed by a period of no excitation of equal duration. As we synthesized PN response at 10 ms intervals, we generated a time-series data matrix of dimensions (where ) for each stimulus encounter. We concatenated such data matrices obtained for different odorants to compare and contrast the trajectories evoked. The resulting matrix was used to generate an response covariance matrix.

To visualize the neural activity during on and off response phases, a time window comprising activity at stimulus onset and withdrawal is selected (4 s on followed by 4 s off). This high-dimensional time-varying data are then projected along the first three eigenvectors of the response covariance matrix. Low-dimensional data points in consecutive time instances were connected to construct trajectories in the three-dimensional space. On the other hand, to compare the activities invoked by red/blue stimulus in on and off phases, a 4 s window comprising either on or off response is used.

### Definition of metrics to analyze decoder performance

We analyze the quality of the latent representation generated by action of PNs through the following quantities.

• **Accuracy**: Accuracy quantifies the deviation of the latent space representation produced by PN activity from the true representation. This quantification was made at the end of the stimulus period, given mathematically by
where is a distant metric between the points *u* and *v*, and is the time instant when the pulse in withdrawn. We use a Euclidean metric for distance in this case.

• **Latency**: Latency quantifies the time required by the synthesized network model to provide an accurate representation of the stimulus. It is computed as the time necessary to reach of the nominal representation z. We used .

• **Similarity**: Yet another way of evaluating the quality of the latent representation produced by neural activity is by measuring the cosine of the angle between at time and z. This can be interpreted as computing the similarity between the latent representation at the end of stimulus period and the desired nominal representation and is given mathematically as follows:

However, we find that the trends observed via this quantification closely match that produced by *accuracy*. This is intuitive as the cosine metric can be written in terms of the Euclidean norm. Therefore, in this paper, we provide results using only accuracy and latency measures.

### Experimental design

#### Odor stimulation

We followed a similar protocol as in our previous studies for odor stimulation (Saha et al., 2013b, 2017). The odor solutions were diluted in mineral oil (from Sigma-Aldrich) to achieve 1% dilution (v/v). Twenty milliliters of diluted odor solution was placed in a 60 ml sealed glass bottle with inlet and outlet lines. A constant volume (0.1 L min) from the odor bottle headspace was injected into the carrier stream using a pneumatic pico-pump (PV-820; WPI Inc.) during odor presentations. A vacuum funnel was placed right behind the animal preparation to ensure the removal of odor vapors. Odor presentations were 4 s long. Each trial was 40 s in duration with an intertrial interval set to 20 s.

#### Extracellular recordings

Post fifth instar locusts (*Schistocerca americana*) of either sex were selected and were immobilized first, and then the brain was exposed, desheathed as reported in previous studies (Laurent and Davidowitz, 1994; Brown et al., 2005; Saha et al., 2013a). Extracellular recordings of PNs were performed using NeuroNexus 16-channel, 4 × 4 silicon probes. Impedances of all electrodes were kept in the 200–300 kΩ range. Raw extracellular signals were amplified at 10k gain using a custom-made 16-channel amplifier (Biology Electronics Shop; Caltech), filtered between 300 Hz and 6 kHz, and acquired at a 15 kHz sampling rate using a custom-written LabVIEW (National Instruments) software.

#### Intracellular recordings

Animal preparation for intracellular recordings was the same as that used for extracellular recordings. Patch-clamp (in vivo) recordings from PNs were performed using the pipettes filled with locust intracellular solution (in mm; Laurent et al., 1993): 155 K aspartate, 1.5 MgCl2, 1 CaCl2, 10 HEPES, 10 EGTA, 2 ATP disodium salt, 3 d-glucose, 0.1 cAMP. All of these chemicals were purchased from Sigma-Adrich. The pH of the patch solution was adjusted to 7.0 using 1 M NaOH, and osmolarity was adjusted to the 320–325 mm range using sucrose. The impedances of electrodes ranged between 5 and 15 MΩ. Raw voltage traces were amplified (Axoclamp 900a, Molecular Devices) and acquired at 16 kHz sampling rate using LabVIEW software.

## Results

### Sensory tracking is achieved by a biologically plausible network architecture

The posited optimization problem is as follows:

The solution of Equation 19 yields the following PN dynamics (see Materials and Methods for details):

Equation (20) reveals the existence of a network that incorporates both slow and fast processing (via and , respectively) in response to an external stimulus. Also present is dependence on the exogenous stimulus via . However, how the slower time scale is achieved *in vivo* is not explicitly explained by Equation 20. The question here is whether a biologically plausible first-order network architecture can be realized by decomposition of Equation 20 into first-order rate equations. We find that by introducing an auxiliary population of neurons which share reciprocal connections with the second-order PNs, this can be achieved. Equations 21 and 22 below is a representative network architecture arising from Equation 20 that produces the optimal motif as seen in Figure 3:

Here, is the PN activity, is the activity of the auxiliary population, and and are the number of neurons belonging to each population subgroup. The connectivity matrices in Equations 21 and 22 are analytically specified as , , , and , where and (see Materials and Methods for details). In the insect AL, PNs extend excitatory connections onto an auxiliary population of neurons known as local neurons (LNs), which in turn provide inhibitory control on the PNs (Wilson and Laurent, 2005; Kay and Stopfer, 2006; Shang et al., 2007). In our network architecture, connections between PNs and the auxiliary population are encapsulated in the matrices and . The structure of these matrices depend on choice of and (chosen such that ). We set the dimensions of these matrices such that , as cellular studies have reported that typically the size of the population of LNs is smaller than that of PNs (Laurent, 1996). are chosen through a constrained iterative optimization scheme (see Algorithm 1). The constraints for this secondary optimization problem are stated on the basis of the canonical excitatory–inhibitory structure found in the insect antennal lobe: (1) (PN to LN) must be non-negative, and (2) (LN to PN) must be nonpositive. We initialized our proposed algorithm by selecting to be a positive random block matrix such that its pseudoinverse existed, and we set the initial choice for as , the Moore–Penrose pseudoinverse of . The choice of such an initialization was fashioned after our prior knowledge regarding existence of stimulus-specific tuning of the PNs (Assisi et al., 2011; Saha et al., 2017). It is important to emphasize here that the solution of this iterative scheme is nonunique due to the structure of the problem ( is a wide matrix and is its left inverse). In other words, in general there will be many network architectures that realize the optimal motif from Equation 19. The common trend observed across various solutions (each a mathematically valid network architecture) is that the connections between PNs and LNs are essentially random (salt and pepper). This is consistent with *in vivo* observations wherein each PN extends its efferent terminals onto only a subset of the auxiliary population and receives inhibitory control only from a subset of the population (Bazhenov et al., 2001; Carey and Carlson, 2011; Saha et al., 2017). It is worthwhile to point out that the response motifs for LNs indeed depend on the resulting matrix, as it encapsulates how each PN projects its synapses on the putative LNs. But the motifs for PNs remain the phasic/tonic ones derived directly from Equation 19 because the cost function of the secondary optimization ensures .

Intrapopulation interactions of PNs and LNs are captured in the matrices and . The diagonal terms in these matrices represent the self-decay dynamics associated with each neuron. Our construction also revealed very weak inhibitory connections within the PN population pool. Such connections are not known to exist experimentally, and we surmised that these small negative weights could be proxies of fast inhibitory synapses (Bazhenov et al., 2001; Bazhenov et al., 2005) occurring elsewhere in the antennal lobe (see also Discussion).

### Network dynamics produce stimulus onset and offset responses that are observed *in vivo*

Although the architecture of the normative model is consistent with that of the antennal lobe, it is not yet clear whether its dynamics and activity also match the types of responses observed *in vivo*. To study this issue, we set up an illustrative model for binary discrimination in a combinatorial setting. We emphasize here that the choice of a two-dimensional state space is made without loss of generality (see “Results” for an example of a system with higher-dimensional latent space).

The combinatorial nature of the encoding space is embedded in b (see Eq. 1), which determines the tuning of individual neurons. In the example network, we used Gaussian tuning curves as per Figure 3, so that some neurons respond preferentially to one stimulus or the other (here, visualized as red vs blue), whereas some do not exhibit a tuning preference (referred to as untuned). It is observed that the PNs generate two significant phasic bursts of activity: one following odor onset and the other on odor termination (see Fig. 3*B*). These motifs are comparable to observations made *in vivo* in the locust antennal lobe (see Figure 4). Moreover, the on and off responses have orthogonal orientations in a dimensionality-reduced space and are negatively correlated (see Figure 5*A*–*C* and methods for dimensionality reduction details). Although spatiotemporal patterns of stimulus-evoked activity in the AL of insects has been studied previously in the literature (Laurent, 1996; Raman et al., 2010; Saha et al., 2017), our normative formulation provides an insight into the functional relevance of such response patterns (i.e., these motifs allow for robust and accurate stimulus representations for downstream processing in an energy-efficient manner).

Depending on their tuning, PN responses fall into one of the following categories: (1) a rapid increase in firing at the onset of stimulus that subsequently settles to a steady-state response of reduced amplitude; on removal of stimulus, there exists a brief period of inhibited activity (below baseline) before the system returns to baseline; or (2) a state of suppressed firing activity through the entire duration of stimulus but excitatory (above baseline) firing activity when the stimulus is withdrawn (see Fig. 3*B*,*C*). On the other hand, the auxiliary neurons (i.e., the putative LNs) display tonic activity in response to stimuli, returning to baseline gradually when the stimulus is no longer active. Although the tuning of PNs has been studied extensively (Stopfer et al., 2003; Wilson et al., 2004), whether similar combinatorial architecture exists in the LNs is yet to be fully characterized. In Figure 3*C*, the blue/red coloration does not imply prior tuning but simply indicates whether the neurons were excited/inhibited by the stimulus (Saha et al., 2017).

Interestingly, these predicted response dynamics have been observed in recordings from the locust olfactory system, wherein recent findings suggest both phasic and tonic temporal responses associated with stimulus onset and offset (Saha et al., 2017). Specifically, experimental observations indicate that when a pulse of stimulus is provided, the neurons in the antennal lobe exhibit a phasic transient followed by a tonic activity for the duration of the stimulus (referred to as *on response*). On termination of the pulse, there is another short-lived burst of activity before activity slowly returns to the quiescent regime (an *off response*; Saha et al., 2017). Figure 4 shows intracellular voltage traces recorded from four representative PNs in the locust antennal lobe. As can be noted, the stimulus-evoked responses in these PNs occur either when the stimulus is present followed by hyperpolarization (ON type), or following the termination of the stimulus (OFF type; note that membrane potential is hyperpolarized when the odor was presented). This segregation of stimulus-evoked PN responses into these distinct categories is maintained even when a larger number of neurons probed with a wider panel was considered (Saha et al., 2017). It should be mentioned here that the ensemble response vectors during onset and offset responses have distinct geometric orientations and are negatively correlated with each other as reported by both model simulations and experimental recordings (Fig. 5*A*–*C*).

### Orthogonalization of population response achieves robust latent representation

In the previous section, we explored how the PNs achieve stimulus detection in a latent space via spatiotemporal response patterns. We sought to understand how PN response patterns that are optimized for stimulus detection affect other aspects of sensory processing. One prediction that was made by the model pertains to processing stimuli that are encountered in sequence, often without a sufficient interstimulus interval for the system to recover fully. Our model revealed that in such situations, resultant PN responses emphasize, albeit more strongly, on distinct orientations for distinct stimuli (in the reduced dimension space; Fig. 6*A*–*C*). This finding has been previously referred to as the contrast enhancement computation (Nizampatnam et al., 2018) that highlights the uniqueness of the current stimulus with respect to prior distractions (Fig. 6*E*). Here, we used our defined latent space of intermediate representation to analyze the need for such patterning. It turns out that what appeared as contrast enhancement/novelty detection through activity of PNs in fact enabled maintenance of robust latent representation (Fig. 6*D*) in the face of a variety of background distractions.

Importantly, the objective function in our optimization framework (Equation 2) stressed only on formation of accurate latent representation in an energy-efficient manner for a single stimulus at a time. Regardless, this framework predicted response features that are observed in the biological world where stimuli are often perceived under a variety of contexts (Nizampatnam et al., 2018). This functional interpretation of what is achieved by neural response modulation when stimuli appear in conjuction with a multitude of distractions is an interesting outcome of this study.

### Active reset enables consecutive fast and accurate representations

As suggested by the above finding, the synthesized network embodies the notion of *active reset* via its off response. We further analyzed the functional advantage of such distinct reversal of firing trends at the conclusion of the active stimulus period by systematic simulations of our model. Here, we considered two model alternatives: (a) the synthesized network and (b) a network whose activity ceases immediately on withdrawal of the stimulus and the decoder relies only on its natural decay to return to neutral (“passive reset”). We simulated each of these models with a sequence of stimuli (Fig. 6*A*, right). It turned out that the active reset mechanism mediated via the off response prepares the system promptly for the next incoming excitation. In the absence of such swift reversal of firing activity at the conclusion of a stimulus, the system “recovered” at a much slower rate, leading to confounding latent representation of succeeding stimuli (Fig. 7*A*). Clearly, the extent of misrepresentation depends on the degree of passive reset (i.e., the parameter a) as well as the time allowed for the system to recover between pulses (Fig. 7*B*; the definition of accuracy in this context is included in the Materials and Methods).

### The optimal response motif predicts the existence of a fixed-point attractor

It has been found that PN population response comprising phasic/tonic activity patterns when analyzed in the reduced principal component space traces distinct, odor-specific trajectories. Starting from the origin or a neutral point, each odor-evoked trajectory quickly coalesced to a specific “fixed point” in the neural state space (Galán et al., 2004; Mazor and Laurent, 2005). The model in this study is designed to “track” a nominal representation in the low-dimensional decoded latent space with high confidence. In doing so, the derived optimal neural responses displayed a sharp transient activity at both stimulus onset and offset (Fig. 3*B*); in the period between these transients, the PNs responded to the stimulus by a steady-state response of low amplitude. Therefore, the model indeed predicts the existence of an odor-specific fixed-point attractor in the neural state space (see Figure 8). Beginning from a neutral state, it took only a few milliseconds to be in the proximity of the attractor determined by stimulus identity; likewise, on withdrawal of the stimulus, it is the transient activity that propeled the system to quickly return to its neutral regime. The interesting observation here is the speed with which the neural response mediated detection and “undetection,” as our optimization framework (Eq. 2) is formulated to produce representations over an infinite time horizon without any bias toward when specific events (i.e., detection) should occur.

### Encoding performance degrades gracefully with respect to increasing tuning overlap

Combinatorial coding in olfaction is ubiquitously present (Menini, 2009; Kundu et al., 2016), as this scheme provides efficient utilization of limited neural resources for identification and processing of a very large number of odors. The question is, how robust is the performance of the model across different degrees of overlap in the tuning curves? In our mathematical setup, C is the fraction of neurons that responds to both red/blue stimuli with no overt preference to either stimuli; therefore, it also indicates the extent of overlap between the tuning curves or similarity between the feature space of afferent stimuli. The synthesized model predicted that the network can maintain accurate representations of the stimuli for a large range of values of C (see Fig. 9*A*,*B*). Beyond this range of overlap, however, there is a rapid decline in the quality of representation. It is evident that systems which endured large expenditures in terms of energy and maintained strict demands for error minimization were more tolerant to increased overlap in the combinatorial space (see Fig. 9*B*) than their counterparts.

In this context the next question is, can this graceful deterioration in performance be attributed to some embedded feature of the optimal motif? On analysis, we found that in fact it is the activity of the competing neurons that alleviated both transient and asymptotic misrepresentation over a considerable range of overlap in the tuning curves. We have previously identified that the neurons preferentially selective to a given stimulus remain in a state of suppressed activity in the presence of a competing stimulus (Figure 3*B*). When this activity is eliminated by reducing the contribution of the red-tuned neurons to the latent decoder to baseline, we observed that the latent state trajectory tracked z poorly. Furthermore, the inherent property of graceful degradation of the synthesized network also disappeared in the absence of competitive inhibition (see Fig. 9*C*,*D*).

### Optimal motifs mediate a trade-off between latency, energy, and accuracy

The choice of the error penalty matrix Q and energy regularization matrix S in Equation 2 shaped the optimal neural responses and subsequently influenced the performance of sensory tracking in the latent space. Intuitively, as Q is scaled to higher values, stricter demands are made to ensure high-fidelity tracking, whereas increasing the scale of S encourages greater degree of conservation of energy. The notion of a performance trade-off here is embedded, therefore, in the choice of matrices Q and S (Fig. 10*A*,*B*), as they share an antipodal relationship with respect to accuracy as well as latency (time to reach of the nominal evidence level). In the Materials and Methods section of this paper, we have reported how choosing a cosine metric to measure the quality of latent representation can be used. Figure 10*C*,*D* illustrates the response morphology associated with different penalties. This indicated that low penalty on tracking error and high regularization on energy produced smaller overshoots on stimulus onset. Such responses, although still capable of forming latent space representations, did so more slowly and less accurately (Fig. 10*A*,*B*, cyan squares on surface plots). Phasic overshoot, conversely, is indicative of a network designed to promote accurate and fast representations (Fig. 10*A*,*B*, pink circles on surface plots). Thus, our finding is compatible with the argument that short stimulus exposures are sufficient for olfactory discrimination (Uchida and Mainen, 2003), at the cost of higher energy expenditure within sensory networks.

### Higher-dimensional generalization of the latent space

In this section, we address how the latent space can be generalized to higher dimensions so that a large number of odor inputs can be represented. Here, we consider six odors (blue/magenta/red/olive/green/cyan) that map to a three-dimensional high-level feature space (Fig. 11*A*,*B*) through matrix b. Similar to the two-dimensional case, neural activity causes accrued evidence toward track z in the latent space (Fig. 11*C*). Through this simulation, we found out that through choice of an appropriate tuning matrix, it is possible to represent a relatively large number of odors in a low-dimensional latent space.

## Discussion

In this work, we have proposed a mechanistic model for sensory detection that identifies the functional significance of spatiotemporal motifs observed ubiquitously across species in the context of olfaction. Previous studies on this topic (Stopfer et al., 2003; Mazor and Laurent, 2005; Bathellier et al., 2008; Raman et al., 2010; Saha et al., 2013b, 2017) have used *post hoc* analysis to surmise the role of the aforementioned responses. In contrast, we present here a normative modeling perspective: we formally optimize PN response dynamics to achieve input separation in an intermediate latent space in an energy-efficient manner.

The key contributions of this paper are summarized as follows:

We showed that sensory onset and offset responses can be explained by a latent decoding objective function and associated mathematical optimization thereof.

We showed that a biologically plausible network architecture is capable of producing the optimal responses elucidated by 1.

We showed that this optimization leads to several emergent predictions regarding the mapping from stimulus to latent representation for more complex stimuli.

### Normative interpretation of PN response dynamics

We adopted a top-down approach and formulated a representation tracking problem wherein PNs must fluidly and efficiently encode the stimulus such that it generates a robust and coherent latent representation for downstream processes. It turned out that, to achieve this objective, the optimal PN dynamics exhibited two distinct overshoots above baseline activity, one on onset of stimulus and the other on withdrawal (Fig. 3). The analytical responses also exhibited distinct geometric orientations during stimulus on and off phases, analogous to what has been reported *in vivo* (Fig. 5). It is worth noting here that the optimization problem we have solved amounts to the linear quadratic regulator problem, a classical construct in control theory (Anderson and Moore, 2007).

We have shown here that an excitatory–inhibitory network architecture with first-order dynamics can implement the motifs generated by the optimization framework. We derived the connectivity weights between the second-order PNs and an auxiliary population of LNs by means of an iterative optimization scheme (see Fig. 2 and Materials and Methods for details). Our results, however, indicate that there might be several plausible ways to mathematically specify such a network.

We probed our model further to highlight the relevance of certain response features that appear both in simulations and through experimentation. The tractable construct facilitates a comprehensive understanding through rigorous mathematical simulations for different ranges of parameters and model alternatives. We analyzed the quality of the latent space representations formed using metrics of accuracy and latency to interpret what benefits neural response features, such as competitive inhibition, contrast enhancement, etc., provide in detecting a stimulus. A key result of this study was that the constructed sensory networks were capable of distinguishing between highly similar inputs (see Fig. 9). There is in fact no inherent trade-off between accuracy (i.e., input separation) and speed; rather, both are monotonic with respect to energetic considerations (Fig. 10). Several studies in the literature have discussed these notions through experimental observations and data-driven, bottom-up modeling approaches (Wehr and Laurent, 1999; Rabinovich et al., 2000; Ito et al., 2008; Chittka et al., 2009; Saha et al., 2017). Here, we presented a top-down normative framework that began from the specification of a mathematical form for the decoding objective and systematically navigated the results to provide a functional interpretation of patterns observed *in vivo*.

### Normative model produces known sensory neural dynamics beyond onset and offset responses

From our high-level detection objective, our normative model predicted several complex neural computations observed in the early olfactory networks beyond onset and offset responses.

For instance, experimental studies (Galán et al., 2004; Mazor and Laurent, 2005) have established that the trajectory in the neural state space quickly converges to a stimulus-specific fixed point: the transient activity being crucial in driving the trajectories toward these fixed-point attractors. Persistence of weak response (tonic activity) after the initial strong onset response (phasic activity) emerged in the synthesized model due to interplay between the first and second terms in the objective function—namely, accuracy of latent representation and energy of sensory activity (Fig. 8). So, the model indeed provides an objective way to think about the need for these fixed-point attractors.

Additionally, our model also provided a novel interpretation of contrast enhancement computation (Nizampatnam et al., 2018) known to exist at a population level in PNs. It turned out that by highlighting the uniqueness of the current stimulus while sensing stimuli in series with very small interstimulus intervals, the PNs ensure robustness of latent representation required for reliable functioning of downstream processes (Fig. 6).

To summarize, we found that our framework, built only to ensure accurate representation tracking in a low-dimensional latent space, exhibits (emergent) observations reported in the activity of olfactory systems, thus providing a unified explanation for these activity patterns.

### Generality of the decoding model

The decoder we have used is a multivariate linear dynamical system driven by PN activity. Mathematically, this is similar to drift-diffusion-type models that have been used to study high-level decision-making [e.g., the two-alternative forced choice task (Bogacz et al., 2006; Colman, 2015)]. In addition to the obvious difference in the level of behavioral abstraction being considered, there are important differences in formulation between our work and these prior results. In particular, our framework caters to the need of representations being fluidly created and then abolished: they must *persist* through the duration of the stimulus and *dissipate* quickly after its withdrawal. This departs from theoretical decision-making paradigms such as “interrogation” or “free fesponse” (Bogacz et al., 2006) where a subject must make a detection within a set amount of time, and where the decoding process essentially terminates at the time of decision. Indeed, there is no “threshold” in our model, only a latent representation that is continuously evolving in its state space.

In the scenario where the number of excitation sources(i.e., odors) is very large (possibly infinite), we can continue to use this normative framework for analysis by choosing an appropriate b matrix. In such a case, we could formulate b so that each row contains weights corresponding to a member of an appropriate basis set. Then, it is possible to translate the infinite set of inputs to their corresponding latent representations by linear combination of the chosen basis functions. The exact weight distributions for such basis set is yet to be determined and remains a question for future study.

### Features not explained

There are a number of important caveats and limitations that should be pointed out regarding the interpretation of the normative model presented through a set of first-order firing rate dynamics (Dayan and Abbott, 2001). In particular, our synthesized network has linear dynamics wherein neurons can assume both positive and negative firing rates. Such a formulation is acceptable if we envision the rate variables as a change relative to some positive baseline rate of activity. Harder to reconcile in the linear model is a lack of upper and lower bounds of firing rate. Solving the optimization problem with such constraints is a harder problem mathematically (Chachuat, 2007) that we have not yet resolved.

Our construction produces local neurons that are exclusively inhibitory in nature and they interact only via the excitatory neurons. However, there are reports of existence of excitatory local neurons in the antennal lobe of some insects (Shang et al., 2007; Masse et al., 2009; Assisi et al., 2012). The functional role of such excitatory LNs is not accounted for in our model. However, in the mathematical formulation, these excitatory components can easily be realized by relaxing the nonpositivity constraint in the iterative optimization procedure (see Materials and Methods for details). Our construction also produces very weak lateral inhibition directly between PNs (; Fig. 2). Although there is no experimental evidence of existence of such PN-PN inhibitory synapses, fast GABAergic synapses (Bazhenov et al., 2001, 2005) are known to exist in the insect antennal lobe (between LN-LN and LN-PN) and thought to be functionally relevant for maintaining the synchrony of PN responses (Bazhenov et al., 2001). We theorize that the inhibitory synaptic weights between PNs produced by solving the reduced linear quadratic regulator problem (Eq. 5) are in fact proxies for these fast inhibitory synapses that have not been explicitly considered in the synthesized network.

In the objective function (Eq. 2), we use a Euclidean distance to evaluate the quality of the latent representation as it tracks the incoming stimulation. However, downstream neurons might achieve input separation before higher-level processing by more involved computations. Experimental studies identify spatial and temporal filtering, decorrelation, etc. (Friedrich, 2013) as operations that could be plausibly executed by sensory neurons. In the present work, we choose to use the well characterized ℓ_{2} norm as a surrogate for specialized computations that might occur *in vivo*. It is as yet an open question how incorporating such functions would impact our synthesis results.

Additionally, we have not considered any form of adaptation or habituation in our network, choosing to focus on a single exposure regime. The question of how the optimal weights come to be (e.g., via development or learning) is not considered herein.

Finally, our model makes no statements about the translation from latent representations to behavior. Behavior can exhibit nonlinear and sometimes paradoxical characteristics with respect to stimulus intensity, composition, and context. Accounting for these factors within the optimization problem is highly intriguing, but is a more complex formulation that is left for future study.

## Footnotes

The authors declare no competing financial interests.

This work has been supported by Grants EF-1724218 (S.C., B.R.), IIS-1453022 (B.R.), and CMMI-1653589 (S.C.) from the National Science Foundation. S.C. holds a Career Award at the Scientific Interface from the Burroughs-Wellcome Fund.

- Correspondence should be addressed to ShiNung Ching at shinung{at}wustl.edu