Loading metrics

Open Access

Peer-reviewed

Research Article

Attractor neural networks with double well synapses

Yu Feng,

Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliation Department of Physics, Duke University, Durham, North Carolina, United States of America

https://orcid.org/0000-0003-3688-0284

⨯
Nicolas Brunel

Roles Conceptualization, Formal analysis, Methodology, Project administration, Supervision, Validation, Writing – review & editing

* E-mail: nicolas.brunel@duke.edu

Affiliations Department of Physics, Duke University, Durham, North Carolina, United States of America, Department of Neurobiology, Duke University, Durham, North Carolina, United States of America

https://orcid.org/0000-0002-2272-3248

⨯

Attractor neural networks with double well synapses

Yu Feng,
Nicolas Brunel

Published: February 7, 2024
https://doi.org/10.1371/journal.pcbi.1011354

Figures

Abstract

It is widely believed that memory storage depends on activity-dependent synaptic modifications. Classical studies of learning and memory in neural networks describe synaptic efficacy either as continuous or discrete. However, recent results suggest an intermediate scenario in which synaptic efficacy can be described by a continuous variable, but whose distribution is peaked around a small set of discrete values. Motivated by these results, we explored a model in which each synapse is described by a continuous variable that evolves in a potential with multiple minima. External inputs to the network can switch synapses from one potential well to another. Our analytical and numerical results show that this model can interpolate between models with discrete synapses which correspond to the deep potential limit, and models in which synapses evolve in a single quadratic potential. We find that the storage capacity of the network with double well synapses exhibits a power law dependence on the network size, rather than the logarithmic dependence observed in models with single well synapses. In addition, synapses with deeper potential wells lead to more robust information storage in the presence of noise. When memories are sparsely encoded, the scaling of the capacity with network size is similar to previously studied network models in the sparse coding limit.

Author summary

A long-lasting question in neuroscience is whether synaptic efficacies should be described as continuous variable or discrete variables. Recent experiments indicate that it is a combination of both: synaptic efficacy changes continuously, but its distribution peaks at several discrete values. In this study, we introduce a synapse model described by a double well potential, and investigate the memory properties of networks of neurons connected with such synapses. Our results show in networks with a bimodal weight distribution, the storage capacity depends on network size as a power law. In addition, we demonstrate that networks with such synapses store information more robustly in the presence of noise, compared to networks with synapses with a single well potential.

Citation: Feng Y, Brunel N (2024) Attractor neural networks with double well synapses. PLoS Comput Biol 20(2): e1011354. https://doi.org/10.1371/journal.pcbi.1011354

Editor: Thomas Serre, Brown University, UNITED STATES

Received: July 16, 2023; Accepted: January 23, 2024; Published: February 7, 2024

Copyright: © 2024 Feng, Brunel. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and computational code is provided: https://github.com/YuFengDuke/double-well-model.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Synapses can change their strength in response to neuronal activity [1–5], and synaptic changes are thought to be critical for the brain to build memories [6]. A popular theoretical framework for studying learning and memory is the attractor neural network scenario [7–9]. In this class of models, memories correspond to the stable fixed points of the neuronal activity dynamics. External stimuli can modify the synaptic connectivity of the network following certain plasticity rules, and these synaptic modifications can create new stable fixed points of neuronal dynamics. An external input is said to be stored in the network if there exists a stable fixed point that is highly correlated with it. An extensively studied question is the storage capacity of such networks [10–12], i.e. the number of stored memories, or the amount of information that can be stored. One key question in theoretical neuroscience is to find biologically plausible learning rules that lead to reasonably large storage capacities.

Multiple synaptic plasticity experiments have suggested that the strengths of individual synapses are modified in a discrete manner. A few experiments have shown that synaptic plasticity is well described by switches between two discrete states (all-or-none), rather than by arbitrary continuous changes in efficacy [13, 14]. In addition, super-resolution imaging experiments have shown that dendritic spines contain a small discrete (1-4) number of nanomodules (i.e., clusters of receptors) [15], consistent with plasticity experiments and indicating that synapses are quantized. This picture of discrete synapses is in contrast with most studies of learning and memory in neuronal network models, in which synapses are taken to be real continuous variables.

Multiple efforts have been made to compute the storage capacity of networks with discrete synapses. Sompolinsky studied a network with a specific binarized Hebbian rule [16], and showed its capacity is close to the capacity of the Hopfield network [17]. Sompolinsky’s model assumes synapses can still store all continuous changes in synaptic efficacy during the learning phase, and synapses get clipped after they have learned all patterns. In contrast, Tsodyks [18], and Amit and Fusi [19, 20] introduced online learning rules in which synapses are always discrete. These studies found that discreteness of synapses can cause a large drop in storage capacity [18, 19], unless stored patterns become extremely sparse [20, 21]. Later studies introduced models with hidden states [22–25], but it remains unknown how these synaptic plasticity models perform in attractor neural networks.

While theoretical work has focused on either fully analog or fully discrete synapses, recent experimental results suggest an intermediate scenario in which the synaptic efficacy can be described by a continuous variable, but whose distribution is peaked around a small set of discrete values [26]. Motivated by this data, we introduce and explore a model in which synapses are described by a continuous variable that evolves in a potential with multiple minima (‘potential wells’, see Fig 1A for an example with two wells). Synapses remain in the same potential well when receiving weak or short-lasting stimuli, but can switch to a new state when receiving strong or long-lasting stimuli. Our analytical and numerical results show this model can connect models with discrete synapses which correspond to the deep potential limit [20], and models in which synapses evolve in a single quadratic potential [27]. We show that the storage capacity of the network with double well synapses (and thus a bimodal weight distribution) has a power law dependence on network size. In contrast, models with a single-well potential show a logarithmic dependence when the parameters of the potential are not allowed to vary with network size N, as shown by previous studies [28]. We also show that the networks with double well synapses store information more robustly in the presence of noise.

Download:

Fig 1.

(A) Sketch of the synaptic model. In the presence of external input, a synapse can stay in the same well (I), or jump into another well (II), depending on its current state and the amplitude of the input. C describes the distance between the two wells, while r₁ characterizes its depth. The larger r₁, the faster synapses decay towards the minima of the potential. (B) Overlap between the input presented at time v and its corresponding attractor state, m_v, as a function of time elapsed since the presentation of this input u − v, for different values of r₁. The solid lines represent the theoretical prediction and squares represent the simulation results with a network of size N = 30, 000 (mean and standard deviation computed over ten independent realizations). Dashed lines mark the storage capacity where m_v vanishes. For all lines, the value of C is chosen to optimize storage capacity. Other parameters are: r₂ = 1, r₃ = 0, θ = 0, f = 0.5, c = 0.05.

https://doi.org/10.1371/journal.pcbi.1011354.g001

2 Model

We study a sparsely connected network of N binary neurons whose states V_i = 0, 1 (i = 1, …, N) are given by a parallel update rule: (1) where dt ≪ 1 is the timescale of neuronal dynamics relative to synaptic dynamics, θ is a threshold, Θ is the Heaviside step function, and h_i(t) is the local field of neuron i: (2) where J_ij is the strength of the synapse from neuron j to neuron i, and c_ijs are i.i.d Bernoulli random variables, with c_ij = 1 with probability c ≪ 1, and 0 otherwise for all i ≠ j, while c_ii = 0 for all i.

Synapses are assumed to be evolving on a much slower time scale than neuronal dynamics, and synaptic dynamics are supposed to be driven by external inputs that drive the network to specific states. Each synapse evolves according to (3) where U is a potential function with multiple minima, δ is the Dirac delta function, are i.i.d random binary variables that represent the state of the neuron imposed by external inputs at time , with , , and are i.i.d random Gaussian variables describing the noise at synapse j → i at time v. Note that the chosen time unit is the interval between presentations of two successive external inputs. The dynamics of J_ij is determined by three terms, the potential U, external inputs I and noise ϵ, of respective strengths r₁, r₂ and r₃.

The first term in the r.h.s. of Eq (3) determines the dynamics of the synapse in the absence of inputs. For simplicity, we use a double well potential U in which each well is given by a quadratic function, (4) Thus, in the absence of inputs, each synapse decays to one of the two potential wells—a low efficacy state J_ij = −C, or a high efficacy state J_ij = C—exponentially, with a time constant 1/(2r₁). Small values of r₁ describe shallow potentials (and hence long decay times towards the minima) while large values of r₁ describe deep potentials (and hence fast decay times towards the minima).

The second term in the r.h.s. of Eq (3) is activity-dependent. For binary neurons, the function I can take at most four values, one for each combination of pre and post-synaptic activity. We study two scenarios: In the balanced input case, the coding level is f = 0.5, and we use (5) Note that for U = 0 (i.e. the flat potential case), and in the absence of noise, the model leads to a connectivity matrix J that is identical to the Hopfield model [17]. This model however suffers from a ‘blackout’ catastrophe after a critical number of inputs, all memories suddenly become irretrievable. In the case of a potential with a single well (C = 0) the model leads to a connectivity matrix that is identical to a ‘palimpsest’ model introduced by Mézard et al [27]. This model instead can retrieve the most recently presented memories, but forgets memories some time after they have been presented.

In the sparse input case, the coding level is f ≪ 1. In this case it is no longer a good idea to use a balanced function I, since this choice would lead to the vast majority of synapses being in the high efficacy potential well. In this scenario, we use two different I functions. The first one is inspired by the Tsodyks-Feigelman model [29, 30] (6) In the sparse input case, we also use another I function, motivated by the plasticity rule in [20]: (7) where I₁ > 0 sets the strength of potentiation when both pre and post-synaptic neurons are active, I₂ < 0 sets the strength of depression when one neuron is active while the other is inactive, and no changes occur when both neurons are inactive. Here, we focus on I₁ = 2C, I₂ = −f(1 − f). The rationale is that when the distribution of synaptic strengths peaks around ±C, a potentiation strength of 2C leads to a transition from a lower to a higher state with high probability. When the connection is depressed, I₂ takes the same value as in Eq (6), leading to a small transition probability from a higher to a lower state. This scheme mirrors the learning rule described in [20] for binary synapses.

The third term in the r.h.s. of Eq (3) is the noise term caused by fluctuations. These fluctuations could be of multiple origins: Fluctuations of neural activity; or fluctuations in the state of the synapse, due to fluctuations in vesicle release, numbers of activated receptors at the post-synaptic side, and fluctuations in biochemical reactions involved in synaptic efficacy changes.

Once the input patterns are given, the learning dynamics (3) are determined by four non-negative parameters: C determines the distance between two minima of U; r₁ how strongly the potential U affects the dynamics (i.e. how fast synapses decay to the minima of the potential); r₂ is the input signal strength and r₃ is the noise strength. In the presence of inputs, a synapse can stay in its current state, or jump into another state depending on its position in the potential well. This dynamics is illustrated in Fig 1A.

With such synaptic plasticity dynamics and appropriate parameters, the network described by Eq (1) has multiple fixed points, that are close to the patterns imposed by the external inputs that were presented most recently. A pattern is said to be stored in memory if a network configuration close to the pattern is a stable fixed point of the dynamics. For this online synaptic plasticity rule Eq (3), we found that the most recently shown patterns η^u, η^u−1, …, η^u−p can be retrieved as stable fixed-point attractors at a given time , while older patterns cannot be retrieved. p is defined as the maximal number of patterns that are still retrievable or, equivalently, the maximal age at which patterns can still be retrieved. This scenario is similar to previously studied attractor neural network models with online learning and decay term in the synaptic updates [27, 31], or networks with binary synapses [18, 32, 33] in which memories are forgotten exponentially.

In order to calculate the storage capacity for the network defined by Eqs (1) and (3), we consider a network at a particular time u when pattern u has just been presented, and investigate whether a pattern shown in the past v < u is still stored in the connectivity matrix. To determine whether there still exists an attractor state close to pattern v, we introduce the overlap between the pattern shown at time v and the network state at time u: (8) where represent the network state at time u. The overlap m_v measures the retrieval quality of the pattern η^v stored in memory (see e.g. [30, 34]). States with m_v ∼ 1 mean that the network can correctly retrieve the stored memory. In practice, in numerical simulations, the network is initialized at , and the dynamics is run until convergence to a fixed point. Simulations reveal that sometimes, the network converges to the most recently shown pattern instead of the pattern v. Thus, we need to consider also the overlap with the most recently shown pattern, m_u. In the balanced input case, we use analytical calculations to compute the mean-field equations for the overlap m_v, and whether solutions to these equations with m_v > 0 are stable with respect to perturbations in the direction of the most recently shown pattern [31]. Details of analytical calculations are described in Methods I and II. These calculations rely on approximating distributions of synaptic inputs, conditioned on , , by Gaussian distributions. In the sparse input case, we used numerical simulations only, since this approximation is not accurate in this case for the values of f and N used in this paper.

The two overlaps m_u and m_v can be obtained by calculating the distribution of the local field h_i at the time t = u. In the large N limit (i.e., cN ≫ 1), the local field defined in Eq (2) follows a Gaussian distribution and its mean and variance are determined by the distribution of J_ij. The time-dependent distribution of synaptic weights whose dynamics obey Eq (3) can be found numerically (see Methods I for details). Eventually, we obtain two self-consistency equations for m_v and m_u: (9) where , and μ¹¹, μ¹⁰, d are functions of m_u, m_v defined in Eqs (32) and (33) (see Methods II for details). In practice, one can fix v and gradually increase u until there is no solution satisfying m_v ≠ 0, or this solution becomes unstable when a small overlap m_u is present. The storage capacity p is then obtained at the maximum value of u − v where a stable solution m_v ≠ 0 exists.

3 Results

3.1 Learning and retrieval dynamics for balanced input in the absence of noise

We start by considering the balanced input case (f = 0.5), and zero noise (r₃ = 0). Without loss of generality, we can fix the value of r₂ = 1, since the dynamics are invariant with respect to the transformation We also set θ = 0, since in this case the distribution of synaptic weights is symmetric around zero. For a given set of parameters, we use both mean field and numerical simulations to compute the overlap of a retrieved memory as a function of its age, and the storage capacity, defined as the maximum age at which memory can be retrieved. Fig 1B shows the overlap of a memory learned at time v, m_v as a function of age (time u−v elapsed since the pattern was learned), for different potential depths r₁ (other parameter values are indicated in the caption). We can see that the overlap decreases with age until it drops abruptly. The figure shows that analytic results are in good agreement with simulations, using a network with 30,000 neurons. It also shows that increasing r₁ (i.e. the depth of the potential wells) decreases storage capacity. This is consistent with previous observations that clipping synaptic strength reduces capacity [35], and that online plasticity rules with discrete synapses typically have much lower capacity than rules with continuous synapses [18–20].

The storage capacity p thus strongly depends on the depth of potential wells as measured by r₁, but it also depends on the width of the potential wells C relative to the jumps caused by input patterns. To investigate the storage capacity of the network, we begin by examining the single potential case, where the width of the potential wells C is set to zero.

Single well potential.

When C = 0, the potential defined in Eq (4) becomes a single parabolic function. In this case, each synapse decays to zero exponentially with a time constant τ = 1/(2r₁) in the absence of inputs, according to the dynamics defined in Eq (3). This scenario is similar to the previously investigated palimpsest model [27, 36] or the pure forgetting model of ref. [28]. The storage capacity p in this case depends on the time constant τ as [28]: (10) where τ₀ = 2A(f)N, and where A(f) is a function of the coding level and thus is a constant in the balanced input case. When τ > τ₀, no patterns are retrievable, similar to the Hopfield model scenario when the number of patterns exceeds the critical value. When τ < τ₀, the model can always retrieve recent patterns and becomes a palimpsest model. By allowing the time constant τ in Eq (10) to scale linearly with the network size N, the storage capacity p can have a linear dependence with N, as investigated in previous studies [27, 36]. However, this scaling assumes an unrealistically large synaptic decay time for large networks. In this study, we consider the decay constant τ or potential depth r₁ to be a property of individual synapses, rather than being dependent on network size, and thus r₁ ∼ O(1) in the large N limit. In this scenario, the weight dynamics with a single well potential only leads to a storage capacity that increases logarithmically with N, as shown in Fig 2C.

Download:

Fig 2. Dependence of storage capacity p on potential width C and network size N, for representative examples of the decay rate r₁.

(A) Dependence of p on C for . In the small r₁ limit, the optimal potential width C* is zero (i.e., a single well potential), and p decreases weakly with C. The number of stored patterns increases logarithmically with network size, as demonstrated in (C). (B) Dependence of p on C for . The number of stored patterns p reaches its maximum at a nonzero value of C. The optimal storage capacity increases as a power law of N, as demonstrated in (D). (C) Storage capacity of the single-well potential model (C = 0) as a function of with network size N. The storage capacity decreases when r₁ increases as indicated by Eq (10). (D) Storage capacity of the double well potential model (C = C*, full lines) as a function of network size N. When , the optimal storage capacity is much larger than the storage capacity of the single well potential model (dashed lines with the same color). The dashed-dotted black line represents for reference. Other parameters in this figure are r₂ = 1, r₃ = 0, θ = 0, f = 0.5, c = 0.05.

https://doi.org/10.1371/journal.pcbi.1011354.g002

In the following, we will demonstrate that the storage capacity of the palimpsest model with fixed τ can be substantially enhanced by introducing the double well potential in the synaptic dynamics.

Double well potential.

The capacity of the model with double well potential is influenced by the width of the potential wells C, and the model displays distinct behaviors depending on r₁. When r₁ is small, the storage capacity p is optimized for C = 0 (single well potential), but decreases only weakly with C, as shown in Fig 2A. In this case, the model is similar to the previously studied pure forgetting model [28], and the storage capacity only increases logarithmically with N, as demonstrated by the solid lines in Fig 2C.

However, when r₁ is larger than a critical value , so that the potential significantly influences the dynamics of the weights in between presentations, the capacity is no longer maximized at C = 0. Rather, there is an optimal value C* that maximizes capacity, as demonstrated in Fig 2B. When C = 0, the storage capacity increases logarithmically with the network size. However, the storage capacity at the optimal value of C increases much faster than the storage capacity with C = 0, as shown in Fig 2B. When C is increased beyond the optimal value, the storage capacity drops abruptly since jumps induced by inputs are too small to cross the boundary between the two wells. The optimal C depends on both r₁ and N. We find that the critical value of r₁ at which the optimal C becomes nonzero is as shown in Fig 3A. In a broad parameter region, the optimal value C is larger than one, which means that multiple pattern presentations are required to induce the switch between two stable states of synapses. Our results also indicate that smaller r₁ or larger N lead to a higher value of the optimal C when .

Download:

Fig 3.

(A) Dependence of optimal potential width C* on network size N and potential depth r₁. There is a critical value that separates two regimes: When , capacity is optimized for a single well potential, C* = 0. When , C* is non-zero, and increases as N increases and r₁ decreases, indicating more pattern presentations are required to induce the switch between two stable states of synapses. Our analytical and numerical computation suggest that , as indicated by the boundary of the black region. (B) Scaling exponent between p and N as a function of r₁, with r₃ = 0. Solid red lines: theoretical prediction for optimal C. Dashed red line: Theoretical prediction for C = 0. Blue circles: Simulation results for optimal C (mean and standard deviation computed over ten independent realizations). For , p ∼ log(N) gives a = 0. When , the model with C = C* has a power law dependence on N (red solid line), while the model with C = 0 gives a = 0 (dashed red line). The exponent a decreases with r₁ and converges to 0.5 in the large r₁ limit. Insets show distributions of synaptic weights for representative examples. The distribution of J_ijs goes from unimodal when C = 0, to bimodal for the optimal C and to quasi-discrete when r₁ becomes large.

https://doi.org/10.1371/journal.pcbi.1011354.g003

For each value of r₁ and N, we focused on the value of C that maximizes capacity. For the region , we found that the optimal storage capacity p with double well potential can be well described by a power law of the network size N (11) as indicated in Fig 2D. We show the dependence of the exponent a on r₁ in Fig 3B, together with representative distributions of weights at a few values of r₁. This plot shows three qualitatively different regions:

Shallow potential region. In the flat potential limit (i.e., ), p has a logarithmic dependence on N, and thus the exponent a ∼ 0. The distribution of the weights in this region is unimodal. The model in this limit is equivalent to models with exponentially decaying memories [27, 36].

Intermediate region. When r₁ increases beyond , the storage capacity of the network can be significantly enhanced by optimizing the potential width C. In this region, the network exhibits a bimodal weight distribution, and the exponent a is slightly larger than 0.5 (solid line in Fig 3B), while the network with unimodal weight distribution (C = 0) still gives a = 0 (dashed line in Fig 3B).

Deep potential region. When r₁ further increases, the weights are more and more attracted to the minima of the potential well in between pattern presentations, and the model becomes more and more similar to a binary synapse model. In the large r₁ limit, the exponent a tends to 0.5, and the distribution of weights becomes close to the sum of two delta functions δ(J + C) + δ(J − C). This scenario is similar to the model with stochastic binary synapses studied in [19, 20].

3.2 Comparison between double well synapses and binary Markovian synapses

We now turn to a comparison between our model and models with binary Markovian synapses. For a fair comparison between these two models, we binarize weights in the double well model, (12) w_ij(t) is thus a binarized connectivity matrix whose entries take + 1 or −1 values, depending on which well the current value of J_ij belongs to. The binary Markov model has two states represented by + 1 and −1, and each weight has a probability of switching from −1 to + 1 when it is potentiated (i.e., ), or from + 1 to −1 when it is depressed (i.e., ). The transition probability between the two states is set to be the same as the transition probability between the wells in the double well model. Note that while the dynamics of J_ij in the double well model is Markovian, the dynamics of binarized weights is not, since transitions between states do not depend only on the current presented pattern and the current value of w_ij, but also on the current value of J_ij which is influenced by patterns shown in the past.

For the comparison between the two models, we use two measures that quantify how much information is stored in such models. In attractor neural networks, stored information is typically measured by the number of stored patterns p, as introduced in the previous section. Another common measure is the signal-to-noise ratio (SNR) [22, 24, 37] defined for a connectivity matrix w at some time t after a particular pattern has been presented as (13) The SNR describes the degree to which the connectivity matrix at time t is correlated with an ideal connectivity matrix, in which all synaptic weights are entirely determined by a particular pattern stored at some time in the past. For instance, for binary inputs and binary synapses, where is given by Eq (5). In Eq (13) the standard deviation is calculated over all elements in the matrix while w_ij(∞) is the state of the connectivity matrix at t → ∞. is the ideal state of the weight determined by the learning rule Eq (5). At time t = 0, is 1 if it experiences a potentiation event and is −1 if it experiences a depression event (i.e., in Eq (5)). Note that SNR ignores the neuronal dynamics and is usually considered an upper bound on how memories stored in synaptic connectivity can be read out using neural activity. In the binarized double well model, the SNR can be calculated using methods described in Methods III. The SNR and the upper bound of SNR for the Markov model are calculated using Eqs (41) and (43) [24].

Fig 4 shows comparisons between the two models. Fig 4A shows that the storage capacity, defined by the number of patterns that can be retrieved as fixed point attractors, is much larger in the model with double well synapses than in the model with binary synapses, with transition probabilities chosen to match the probabilities of switching potential well in the double well model. Fig 4B shows a comparison of the SNR of the double well model with that of a corresponding Markov model. The Markov model and the double well model have the same initial SNR when a pattern is first learned, because transition probabilities are identical in both models. However, the SNR of the Markov model initially decays slowly and then rapidly drops off after t ∼ 10, while the SNR of the double well model does not decrease monotonically due to the analog depth provided by the double well synapses, which allows the synaptic state to have a richer dependence on neuronal activity history. Consequently, the SNR of the double well model can surpass the upper bound of SNR in the Markov model and decay more slowly than the SNR of the Markov model, as shown in Fig 4B.

Download:

Fig 4.

(A) Comparison between the storage capacity of Markovian binary synapses and binarized double well synapses defined in Eq (12). The parameters of the double well model are r₁ = 0.1, r₂ = 1, r₃ = 0, θ = 0, f = 0.5, cN = 2000. The width of the potential C is chosen to maximize storage capacity. The transition probabilities of binary Markovian model are chosen to match probabilities of switching wells in the double well model. (B) Comparison between the SNRs of Markovian synapses and binarized double well synapses, as a function of time elapsed since the presentation of a given pattern. Green curve: SNR of the double well model with C = C* calculated using Eq (40). Red line: SNR of the Markov model with the same transition probability. Black line: Upper bound of SNR for two-state Markov models. Dashed line: critical SNR below which the memories are no longer retrievable [28]. The intersection between the SNR curve and this dashed line determines the storage capacity of the network. (C) Cartoon of the temporal evolution of the distribution of synapses that undergo potentiation at t = 0, i.e. those for which just before potentiation (top) and after (bottom). The highlighted region denotes the region around the maximum of the potential where synapses that reside in one of the wells can make a transition to the other well (r₂ = 1). Just before potentiation (t = 0), the distribution is symmetric. The distribution is then shifted up by r₂, before decaying towards the respective wells. As the distribution becomes asymmetric near 0 (within the range of 0 ± r₂), the next presented uncorrelated patterns will cause more probability mass to shift towards the positive side than towards the negative side, as demonstrated in the bottom figure. (D) Distribution of connections that have been potentiated at time t = 0, at later times t = 2 and t = 3. Black dots indicate ±r₂ = ±1, while red dots indicate the minima of the potential, ±C = ±2.7. As demonstrated in (C), an uncorrelated pattern leads to a larger probability of switching from the low well to the high well than the opposite transition, resulting in an increase in SNR from t = 2 to t = 3, as seen in (B). At longer times, the distribution eventually returns to a symmetric distribution, leading to a decrease in SNR.

https://doi.org/10.1371/journal.pcbi.1011354.g004

To better understand the history-dependent nature of double well synapses, we show the changes in the distribution of potentiated connection over time in Fig 4C. After a pattern is learned, the distribution of becomes asymmetric, and subsequent uncorrelated patterns can cause a probability flux towards the positive side, as demonstrated in Fig 4C. This happens in particular for the parameters of Fig 4 at t = 2, which leads to an increased SNR from t = 2 to t = 3, as seen in Fig 4B. After a sufficient amount of time, the distribution eventually returns to a symmetric asymptotic distribution, resulting in a decay of the SNR towards zero.

3.3 Robustness of stored information to noise

Neural circuits are required to not only learn new information but also to retain old memories. Recent experiments suggest that a significant fraction of synaptic plasticity is random noise, independent of learning [38, 39]. How can neural networks maintain long-lasting memories in the presence of noise? One long-lasting hypothesis is that discrete-like synapses could benefit neural networks by increasing their robustness with respect to noise (see e.g. [40]). Here, we can verify this hypothesis in the model with double well synapses by adding noise to the model. This noise is described by the last term in Eq (3), where r₃ quantifies the magnitude of the noise term.

We first investigated how the potential width affects the network’s storage capacity in the presence of noise. We set the potential depth to a fixed value of r₁ = 0.1 and calculated the storage capacity for every combination of r₃ and C. As illustrated in Fig 2B, there is an optimal value C* that maximizes the storage capacity of the network without noise. If the potential width increases beyond C*, the Hebbian learning term defined in Eq (5) is insufficient to induce the transition between different states, resulting in an abrupt decrease in storage capacity. For models with C < C*, the network’s storage capacity decreases monotonically with the noise strength r₃, as shown in Fig 5. However, for models with C > C* (e.g., C = 4, C = 5), there is a peak in the storage capacity as r₃ increases, indicating that noise facilitates the learning of new memories.

Download:

Fig 5. Dependence of Storage Capacity on Noise Strength r₃ for different values of Potential Width C.

When C is small (e.g., C = 0, 1, 2, 3), the storage capacity p monotonically decreases with the noise strength r₃. However, for larger values of C (e.g., C = 4, 5), the storage capacity first increases with noise intensity, reaches a peak and then decreases. Thus, noise facilitates the learning of new memories in such cases. Other parameters are given as N = 10, 000, r₁ = 0.1, r₂ = 1, θ = 0, f = 0.5, c = 0.05.

https://doi.org/10.1371/journal.pcbi.1011354.g005

In summary, there are two potential ways in which noise can impact the double-well model. In instances where the potential width C is small, noise interferes with the information stored in the synapses, resulting in a reduction of storage capacity. When C is larger, the Hebbian learning term is insufficient by itself to induce transitions of sufficient numbers of synapses, leading to an inability of the network to learn new memories. In this scenario, introducing noise can help synapses make transitions between wells, therefore increasing the storage capacity of the network. This scenario is comparable to the model presented in [19, 20].

We also analyzed the effect of potential depth on the robustness of stored memories. For each fixed potential depth r₁, we can choose the potential width C that optimizes the storage capacity. With the optimal C, the storage capacity of the network always decreases when the noise strength increases, as shown in Fig 6. However, the storage capacity decreases more slowly if the potential is deeper, which indicates the model is more robust to input noise. To quantify this effect, we introduced a robustness parameter as (14) where and M is the standard deviation of weight strength defined in Eq (22). measures the perturbation to the weight caused by the input noise. A larger R means the storage capacity decreases more slowly in the presence of input noise.

Download:

Fig 6. Trade-off between capacity and robustness when varying potential depth r₁.

(A) The storage capacity decreases when the noise strength increases strongly depends on r₁. Small values of r₁ lead to large storage capacity, but storage is highly sensitive to noise. Large values of r₁ lead to small capacity, but storage is highly robust. (B) The trade-off between p₀ and R, where p₀ is the storage capacity in the absence of noise (r₃ = 0), and R quantifies robustness to noise, Eq (14). With deeper potentials, the storage capacity is smaller but storage is more robust. Here, C is optimized for each value of r₁ and other parameters are given as: N = 10, 000, r₂ = 1, θ = 0, f = 0.5, c = 0.05.

https://doi.org/10.1371/journal.pcbi.1011354.g006

We explored how the robustness R depends on the depth of the potential r₁. As shown in Fig 6, when r₁ increases, the storage capacity of the network in the absence of noise decreases. However, the robustness R becomes larger, indicating a trade-off between storage capacity and robustness as potential wells become deeper. From a biological perspective, each minimum of the potential can correspond to a cluster of receptors [15]. The parameter r₁ may be related to the rate at which these clusters form. The timescale τ = 1/2r₁ provides synapses with the analog depth to retain the history information of neuronal activity. Our results indicate that the balance between the memory storage capacity and the robustness of stored information would lead to an optimal r₁.

3.4 Learning dynamics in the sparse coding limit

We next turn to a more biologically realistic scenario in which neurons are described by 0,1 variables, and the probability that a neuron is active in a given pattern, f, is f ≪ 0.5. We study the storage capacity of the network with double well synapses with the learning rules described in Eqs (6) and (7) in the sparse coding limit f ∼ ln N/N [41]. In this limit, it was shown that networks with binary synapses and online learning have a storage capacity that is comparable to the optimal capacity, unlike with other scalings of f with N where the capacity is sub-optimal [20, 41–43]. We first focus on the noiseless case (r₃ = 0). For each value of r₁, we optimize the storage capacity by adjusting the neuronal threshold θ and the potential width C. The relationship between the storage capacity p and the network size N is shown in Fig 7 for different choices of the learning rule and different potential depths r₁.

Download:

Fig 7. Storage capacity of the model with double well synapses in the sparse coding limit.

(A) Storage capacity of the network as a function of r₁ with the learning rule defined in Eq (6) (B) Storage capacity of the network as a function of r₁ with the learning rule defined in Eq (7) (C) p as a function of N in the flat potential case (r₁ = 10⁻³) for the model with learning rule defined in Eq (6). Optimal p is plotted as a function of N in a semi-log plot. Dashed red line: p = alog(N/b), a = 396.5, b = 294.5, R² = 0.9964. (D) p as a function of N in the deep potential case (r₁ = 0.5) for the model with the learning rule defined in Eq (7). Dashed red line: p = a(N/log(N))^b, a = 0.212, b = 1.91, R² = 0.9927. In all panels, data points are averaged over 10 realizations, and error bars represent the standard deviation. C and θ are chosen to optimize for storage capacity. Other parameter are given as: r₂ = 1, r₃ = 0, c = 1, f = 4 log(N)/N.

https://doi.org/10.1371/journal.pcbi.1011354.g007

We start with the learning rule defined in Eq (6), which is similar to the Tsodyks-Feigel’man learning rule [29, 30]. For this learning rule, we found that the optimal C is zero for all r₁, and the distribution of synaptic strengths is unimodal. As demonstrated in Fig 7, the storage capacity of the network increases logarithmically with the network size N, with a prefactor that is proportional to τ = 1/(2r₁) as indicated by Eq (10). This case is similar to the model with continuous synapses and exponentially decaying memories, as discussed in [27, 28, 31].

In contrast, the learning rule described in Eq (7), which is similar to the rule introduced by Amit and Fusi [20], exhibits a large storage capacity when the potential is deep, as illustrated in Fig 7B. In the deep potential regime, the weight distribution becomes strongly bimodal, with two peaks centered around ±C. When there is a coincidence of pre and post-synaptic activity, the learning rule in Eq (7) allows the synaptic connection to cross the potential barrier with probability ∼ 1. When one neuron is active while the other is silent, synapses switch to the low state only with low probability. This scenario is similar to the model discussed in [20]. As shown in Fig 7D, the storage capacity of this learning rule leads to p ∼ (N/log(N))² in the sparse coding limit.

Our result demonstrates that the double well model can achieve a much larger storage capacity in the sparse coding scenario than when memories are densely coded, similarly to previously studied models. In the flat potential case, the storage capacity scales as p ∼ log(N), but with a prefactor that increases when coding becomes sparse. However, in the deep potential case, the capacity scales supralinearly with network size, p ∼ (N/log(N))², in the sparse coding limit, while it is sublinear when coding is dense.

We also investigated the robustness of the network with learning rules defined in Eqs (6) and (7) in the sparse coding case. We explored how the storage capacity changes in the presence of noise, in both shallow and deep potential scenarios. As shown in Fig 8, when the noise strength r₃ increases, the storage capacity of the network with deep potentials decays much more slowly than the storage capacity of the network in the flat potential case. This result is consistent with the result for the balanced input case. Notice that for the given network size N = 1, 500, the storage capacity in the deep potential case is smaller than the storage capacity in the flat potential case for small r₃, despite the fact that it should increase faster with N. This is because the flat potential case has a much larger decay constant, τ = 1/2r₁, leading to a larger prefactor in the relationship between N and p as indicated in Eq (10).

Download:

Fig 8. Robustness of the double well model in the sparse coding case.

Orange squares represent the storage capacity for the model with the learning rule defined in Eq (6) in the flat potential case (r₁ = 10⁻³). Blue squares represent the storage capacity for the model with the learning rule defined in Eq (7) in the deep potential case (r₁ = 0.5). Each point is averaged over 10 independent realizations, and the error bar represents the standard deviation. Other parameters are given as: r₂ = 1, c = 1, N = 1, 500, f = 4log(N)/N.

https://doi.org/10.1371/journal.pcbi.1011354.g008

4 Discussion

A long-standing question in neuroscience is whether individual synaptic plasticity events are better described as small changes on a continuum of possible values that are then stable over long time scales, or as large changes among a small discrete set of states [13, 26, 44]. These two scenarios have been studied essentially independently of one another, in networks with unconstrained analog synapses [17, 27, 30], or in networks with binary synapses [18–21].

In this study, we introduced a model that can interpolate between these two scenarios. Synapses in our model evolve in a double well potential, whose shape is described by its depth r₁ and width C. When the depth goes to zero, the model becomes equivalent to classic models with continuous synapses, while in the opposite limit of a large depth, the model becomes similar to models with binary synapses. In the absence of inputs, a synapse decays to one of the potential minima exponentially, with a time constant τ = 1/2r₁. External input can cause synapses to jump from one well to the other, leading to a stable change in the absence of noise or additional external inputs leading to subsequent jumps. When the size of the jumps is smaller than the distance between the minimum of a well and the barrier that separates the two wells, repetitive stimulation is needed for a synapse to jump from one well to the other. The number of required stimuli is determined by the potential width C. These dynamics are reminiscent of phenomena such as synaptic consolidation or late long-term potentiation. Experimental studies on plasticity in the hippocampus have shown that weak stimulation leads to synaptic changes that decay to their initial value in a relatively short period. However, a strong stimulation results in an enhanced synaptic connection, that persists for the entirety of the recording [45]. This synaptic consolidation process involves multiple complex biochemical dynamics, and our double well potential model could be seen as a minimal implementation of this idea in the framework of attractor neural networks.

There are multiple lines of evidence that suggest that synapses have a small set of stable states. Synaptic plasticity has been shown to be implemented by the addition of unitary synaptic ‘nanomodules’ to dendritic spines, the loci of excitatory synapses onto pyramidal cells, the main excitatory neuronal type in the cortex and hippocampus [15]. These nanomodules contain spatially localized clusters of pre-and postsynaptic proteins that are aligned with each other. These clusters form due to a reversible diffusion-trapping process of receptors and scaffold proteins on the membrane [46]. Our model can be seen as a minimal model for this scenario, where the two minima of our potential well could correspond to spine configurations with one and two clusters of receptors, which represent close to 90% of the observed spines in [15]. The depth of the potential could be related to the rate at which these clusters form, which depends on various factors, such as the fluidity of the synaptic membrane and the affinity of the receptors for scaffold proteins. The potential width C determines the synaptic plasticity gap, describing the number of stimuli required to induce the formation of new nanomodules. The idea of discrete stable states is also consistent with electron microscopy data showing distributions of spine sizes in the cortex are bimodal [26], and models of synaptic plasticity relying on biochemical interaction networks of the post-synaptic density [47–49]. Our model predicts that measurements of the efficacy of single synapses on long time scales, in conditions in which learning occurs, should reveal a bimodal distribution.

Our results show that a network in which synapses have two potential wells, and consequently there is a bimodal distribution of synaptic strengths, has a larger storage capacity than a model in which synapses evolve in a single well, and therefore the distribution of weights is unimodal, provided or equivalently . Furthermore, our results show that C*, the optimal potential width, is greater than one whenever r₁ is larger than its critical value, as shown in Fig 3A. The optimal value of C increases with both network size N and the time constant τ. For large networks, this implies that many repetitive stimuli are needed to induce transitions between potential wells. In this regime, the storage capacity of the network increases as a power law of the network size, instead of a logarithmic relationship. The exponent is close to 0.5 for densely encoded memories, but it becomes equal to 2 (with logarithmic corrections) with sparsely encoded memories when the number of active neurons per memory is logarithmic in network size. These exponents are similar to networks with binary synapses [20, 21]. Our results also suggest that synaptic noise can facilitate the learning of new memories, in situations in which the Hebbian learning term is insufficient to induce transitions between potential wells. Overall, this suggests that a network with such synapses can store memories robustly without requiring high precision in synaptic dynamics.

Our model captures the process of synaptic consolidation in an abstract fashion, as the decay of the synaptic state towards one out of two potential minima. The decay is characterized by a single time constant, τ, which results in a storage capacity that depends sublinearly on the network size in the balanced input case. Experimental and theoretical work has revealed that synaptic plasticity contains several phases across multiple time scales [50–54]. It will be interesting to generalize the model we have proposed to include dynamics with multiple timescales, which would account for various biochemical processes during synaptic consolidation. Such an extension may increase the storage capacity of the network, leading to an almost linear dependence on network size, as suggested by previous research [37]. The phenomenon of synaptic consolidation can be well described by the theory of synaptic tagging and capture [50, 52, 55]. According to this theory, stimulation can result in the tagging of synapses. When weak stimulation is applied, the tag gradually decays. Strong stimulation instead can tag more synapses, initiating the production of plasticity-related proteins that lead to the consolidation of the tagged synapses. Although our double well synapse model has some similarities with the theory of synaptic tagging and capture, it does not account for a number of biological details. For instance, our model only considered independent synapses, while in reality, different synapses on the same dendritic tree can share tagging, since plasticity-related proteins are synthesized in the soma of postsynaptic neurons, which are available to all presynaptic pathways. Moreover, experiments suggest that synapses have varying stimulation thresholds for discrete changes in their efficacy [13], while we assumed homogeneous synapses. Future research will be necessary to include these biological details or investigate these phenomena with more realistic neuron and network models. Another extension of our model would be to increase the number of wells of the potential. We also note that other forms of memory consolidation are likely to play an important role. A recent study demonstrated that adding a pattern reactivation mechanism in attractor neural networks with synapses evolving in a single well potential, mimicking systems memory consolidation [56], can also produce power law scaling storage capacity and lead to lifelong learning scenarios [28]. It will be interesting to incorporate synaptic consolidation and pattern reactivation together in an attractor neural network to explore how double well synapses affect memory function in the context of lifelong learning.

5 Methods

I. The probability density function of synaptic weights

The goal of this section is to derive an equation for g(J, t), the probability density of synaptic strength J at time t, and distributions of synaptic strengths conditioned by inputs experienced in the past. We focus on the noiseless case (i.e., r₃ = 0) and the balanced input case with input function defined in Eq (5). From Eq (3), one can derive a master equation for g(J, t): (15) (16) where t^−,+ denote times right before or after a new pattern is presented. Eq (15) describes the transitions from states g(J + r₂, t⁻) and g(J − r₂, t⁻) to g(J, t⁺) when an input is presented. Eq (16) describes the temporal evolution of the distribution g in between presentations, due to the potential term. With the piece-wise parabolic potential defined by Eq (4), the solution of Eq (16) is given as: (17) where (18) Note that g(J, t + Δt) = 0 when . After a sufficiently long time, the distribution of synaptic weights should only depend on the time elapsed since the last input presentation, and should therefore be periodic with period 1: (19)

In addition, the probability density function should satisfy the usual boundary conditions, (20) and the normalization condition

With Eqs (15) and (20), one can solve the asymptotic distribution of the weights right before a new pattern is presented, g_∞(J) ≡ g(J, t⁻) for sufficiently large . To obtain the asymptotic distribution of weights, we discretize J to take N_d = 4000 regularly spaced values in the range of [J_min, J_max]. The value of J_max (J_min) is large (small) enough so that, values outside the range are never visited. In practice, we set J_min = −20, J_max = 20. The initial values of the distribution are taken to be random uniform i.i.d., such that the initial distribution is normalized to 1. We then use a discretized version of equations (16,18,19) to update the distribution until convergence.

The mean and root mean square (RMS) of asymptotic distribution g_∞(J) are given as (21) (22)

We next turn to the distribution of synaptic strengths, conditioned on plasticity events at time t = v and t = u. We use the η^u to denote the most recently presented pattern and η^v to denote the previously stored pattern where v < u. Distributions of synaptic strengths at time t, conditioned on a plasticity event at t = v, are denoted by , where σ^v = I(v) = ±1 denotes potentiation or depression. Just after the presentation of the pattern, these distributions are given by: (23) The distributions then evolve according to Eqs (15) and (16), replacing g(J, t) by . At any given time, one can calculate the mean and RMS of the weight distributions : (24) (25) Distributions of synaptic strengths at the time t = u,conditioned on plasticity events at time t = v and t = u are denoted by , where σ^v = I(v) = ±1 and σ^u = I(u) = ±1. The distribution of is given by (26) Its mean and RMS are (27) (28)

II. Storage capacity of the double well potential model

We now turn to the question of whether at time u,the network is still able to retrieve a pattern η^v that was stored at a previous time v < u. Here, we focus on the standard coding (‘balanced input’) scenario f = 0.5, and θ = 0. Simulations show that sometimes, when initialized at a pattern stored in the past, the network mistakenly retrieves the most recently shown pattern η^u instead [31]. To calculate the storage capacity, we thus need to define two overlaps: (29) where represents the fixed point of Eq (1) with initial condition , and η^u represent the most recently shown pattern.

When the network has overlaps m_v and m_u with patterns shown at times u and v, the probability that a neuron is in a given state V_i, conditioned on can be written as: (30) where s_v, s_u ∈ {0, 1}. In Eq (30), the first(second) parenthesis on the r.h.s. gives the probability that when () in a network state with overlap m_u (m_v) with this pattern.

To proceed, we need to compute the distribution of local fields conditioned on the state of the neuron in patterns shown at times u and v. The local fields are given by (31) where on the right hand side, the distribution of J_ij are given by where , and the distribution of are given by Eq (30). For large cN, (cN ≫ 1), we can use the central limit theorem and approximate the local field distributions by a normal distribution , where () is the mean(standard deviation) of the local field , conditioned on , . These two moments are given by (32) (33) where s_v, s_u, s, s′ ∈ {0, 1}. Here we used the fact that since the pattern u only changes the average of the local field at t = u.

The overlaps m_v, m_u can be written as: (34) In the balanced input case, , and thus (35) The Eq (34) can be simplified to: (36) Using the notation defined in Eqs (32) and (33), we obtain self-consistency equations for m_v and m_u: (37) where . Eq (37) can be solved numerically by updating Eqs (30), (32) and (33) using m_v, m_u. The pattern η_v is retrievable if there exists a solution m_v ∼ 1, m_u = 0, that is stable to small perturbations m_u = ϵ. In order to solve this mean-field equation, we convert Eq (37) to a map, in which the values of m_u,v at a given time step are given by the r.h.s. of Eq (37) where μ^11,10 and d are computed using the values of m_u,v at the previous time step. To check the stability of the state with m_v > 0, m_u = 0, we take as initial conditions m_v as 1 and m_u as ϵ. We then proceed to update the values of m_v and m_u iteratively until convergence is reached. In practice, we used ϵ = 5 × 10⁻².

III. Signal-to-Noise Ratio

We focus on the calculation of the signal-to-noise ratio in the balanced input case. We first derive the SNR for the binarized double well model. In this case, w_ij has the equal probability of being + 1 and −1, thus we have: (38) Therefore, the SNR can be simplified as follows: (39) w_ij(t) is drawn from the distribution of and the distribution of with the same probability in the balanced input case. The average can be calculated by: (40) where are defined through Eqs (15), (18) and (23), and we used the identity in the balanced input case.

The SNR for Markovian binary synapses SNR_b is given in [57]: (41) where (42) where I is a 2 × 2 identity matrix, w = [−1, 1] is the efficacy of the synapse, p^∞ = [0.5, 0.5] is the stationary distribution of synapses. M^pot(M^dep) is the 2 × 2 transition matrix when the synapse is potentiated(depressed). Their value is determined by the transition probability between two states.

SNR_b depends on M^pot and M^dep, and the upper bound is given as [57]: (43)

References

1. Bliss TVP, Lomo T. Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. J Physiol London. 1973;232:331–56. pmid:4727084
- View Article
- PubMed/NCBI
- Google Scholar
2. Dudek SM, Bear MF. Bidirectional long-term modification of synaptic effectiveness in the adult and immature hippocampus. J Neurosci. 1993;13:2910–8. pmid:8331379
- View Article
- PubMed/NCBI
- Google Scholar
3. Markram H, Lubke J, Frotscher M, Sakmann B. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science. 1997;275:213–5. pmid:8985014
- View Article
- PubMed/NCBI
- Google Scholar
4. Sjöström PJ, Turrigiano GG, Nelson SB. Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron. 2001;32:1149–64. pmid:11754844
- View Article
- PubMed/NCBI
- Google Scholar
5. Bittner KC, Milstein AD, Grienberger C, Romani S, Magee JC. Behavioral time scale synaptic plasticity underlies CA1 place fields. Science. 2017;357:1033–6. pmid:28883072
- View Article
- PubMed/NCBI
- Google Scholar
6. Hayashi-Takagi A, Yagishita S, Nakamura M, Shirai F, Wu YI, Loshbaugh AL, et al. Labelling and optical erasure of synaptic memory traces in the motor cortex. Nature. 2015;525:333–8. pmid:26352471
- View Article
- PubMed/NCBI
- Google Scholar
7. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–8. pmid:6953413
- View Article
- PubMed/NCBI
- Google Scholar
8. Amit DJ. Modeling brain function. Cambridge University Press; 1989.
9. Khona M, Fiete IR. Attractor and integrator networks in the brain. Nat Rev Neurosci. 2022;23:744–66. pmid:36329249
- View Article
- PubMed/NCBI
- Google Scholar
10. Amit DJ, Gutfreund H, Sompolinsky H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys Rev Lett. 1985;55:1530–1. pmid:10031847
- View Article
- PubMed/NCBI
- Google Scholar
11. Amit DJ, Gutfreund H, Sompolinsky H. Information storage in neural networks with low levels of activity. Phys Rev A. 1987;35:2293–303. pmid:9898407
- View Article
- PubMed/NCBI
- Google Scholar
12. Gardner EJ. The phase space of interactions in neural network models. J Phys A: Math Gen. 1988;21:257–70.
- View Article
- Google Scholar
13. Petersen CC, Malenka RC, Nicoll RA, Hopfield JJ. All-or-none potentiation at CA3-CA1 synapses. Proceedings of the National Academy of Sciences. 1998;95(8):4732–7. pmid:9539807
- View Article
- PubMed/NCBI
- Google Scholar
14. O’Connor DH, Wittenberg GM, Wang SSH. Graded bidirectional synaptic plasticity is composed of switch-like unitary events. Proceedings of the National Academy of Sciences. 2005;102(27):9679–84. pmid:15983385
- View Article
- PubMed/NCBI
- Google Scholar
15. Hruska M, Henderson N, Le Marchand SJ, Jafri H, Dalva MB. Synaptic nanomodules underlie the organization and plasticity of spine synapses. Nature neuroscience. 2018;21(5):671–82. pmid:29686261
- View Article
- PubMed/NCBI
- Google Scholar
16. Sompolinsky H. The theory of neural networks: The Hebb rule and beyond. In: Heidelberg colloquium on glassy dynamics. Springer; 1987. p. 485–527.
17. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences. 1982;79(8):2554–8. pmid:6953413
- View Article
- PubMed/NCBI
- Google Scholar
18. Tsodyks M. Associative Memory in Neural Networks with Binary Synapses. Mod Phys Lett B. 1990;4:713.
- View Article
- Google Scholar
19. Amit DJ, Fusi S. Constraints on learning in dynamic synapses. Network: Computation in Neural Systems. 1992;3(4):443–64.
- View Article
- Google Scholar
20. Amit DJ, Fusi S. Learning in neural networks with material synapses. Neural Computation. 1994;6(5):957–82.
- View Article
- Google Scholar
21. Dubreuil AM, Amit Y, Brunel N. Memory capacity of networks with stochastic binary synapses. PLoS computational biology. 2014;10(8):e1003727. pmid:25101662
- View Article
- PubMed/NCBI
- Google Scholar
22. Fusi S, Drew PJ, Abbott LF. Cascade models of synaptically stored memories. Neuron. 2005;45(4):599–611. pmid:15721245
- View Article
- PubMed/NCBI
- Google Scholar
23. Fusi S, Abbott L. Limits on the memory storage capacity of bounded synapses. Nature neuroscience. 2007;10(4):485–93. pmid:17351638
- View Article
- PubMed/NCBI
- Google Scholar
24. Lahiri S, Ganguli S. A memory frontier for complex synapses. Advances in neural information processing systems. 2013;26.
- View Article
- Google Scholar
25. Benna MK, Fusi S. Computational principles of synaptic memory consolidation. Nature neuroscience. 2016;19(12):1697–706. pmid:27694992
- View Article
- PubMed/NCBI
- Google Scholar
26. Dorkenwald S, Turner NL, Macrina T, Lee K, Lu R, Wu J, et al. Binary and analog variation of synapses between cortical pyramidal neurons. Elife. 2022;11. pmid:36382887
- View Article
- PubMed/NCBI
- Google Scholar
27. Mézard M, Nadal J, Toulouse G. Solvable models of working memories. Journal de physique. 1986;47(9):1457–62.
- View Article
- Google Scholar
28. Shaham N, Chandra J, Kreiman G, Sompolinsky H. Stochastic consolidation of lifelong memory. Scientific Reports. 2022;12(1):13107. pmid:35907920
- View Article
- PubMed/NCBI
- Google Scholar
29. Tsodyks MV, Feigel’man MV. The enhanced storage capacity in neural networks with low activity level. EPL (Europhysics Letters). 1988;6(2):101.
- View Article
- Google Scholar
30. Tsodyks M. Associative memory in asymmetric diluted network with low level of activity. EPL (Europhysics Letters). 1988;7(3):203.
- View Article
- Google Scholar
31. Pereira U, Aljadeff J, Brunel N. Forgetting leads to chaos in attractor networks. Phys Rev X. 2022;13:011009.
- View Article
- Google Scholar
32. Amit DJ, Fusi S. Constraints on learning in dynamics synapses. Network. 1992;3:443–64.
- View Article
- Google Scholar
33. Amit DJ, Fusi S. Dynamic learning in neural networks with material synapses. Neural Computation. 1994;6:957–82.
- View Article
- Google Scholar
34. Amit DJ, Gutfreund H, Sompolinsky H. Statistical mechanics of neural networks near saturation. Annals of physics. 1987;173(1):30–67.
- View Article
- Google Scholar
35. Sompolinsky H. Neural networks with nonlinear synapses and a static noise. Physical Review A. 1986;34(3):2571.
- View Article
- Google Scholar
36. Derrida B, Nadal JP. Learning and forgetting on asymmetric diluted neural networks. J Stat Phys. 1987;49:993–1009.
- View Article
- Google Scholar
37. Benna MK, Fusi S. Computational principles of synaptic memory consolidation. Nat Neurosci. 2016;19:1697–706. pmid:27694992
- View Article
- PubMed/NCBI
- Google Scholar
38. Yasumatsu N, Matsuzaki M, Miyazaki T, Noguchi J, Kasai H. Principles of long-term dynamics of dendritic spines. Journal of Neuroscience. 2008;28(50):13592–608. pmid:19074033
- View Article
- PubMed/NCBI
- Google Scholar
39. Holtmaat AJ, Trachtenberg JT, Wilbrecht L, Shepherd GM, Zhang X, Knott GW, et al. Transient and persistent dendritic spines in the neocortex in vivo. Neuron. 2005;45(2):279–91. pmid:15664179
- View Article
- PubMed/NCBI
- Google Scholar
40. Petersen CC, Malenka RC, Nicoll RA, Hopfield JJ. All-or-none potentiation at CA3-CA1 synapses. ProcNatlAcadSciUSA. 1998;95:4732–7. pmid:9539807
- View Article
- PubMed/NCBI
- Google Scholar
41. Willshaw D, Buneman OP, Longuet-Higgins H. Non-holographic associative memory. Nature. 1969;222:960–2. pmid:5789326
- View Article
- PubMed/NCBI
- Google Scholar
42. Nadal JP. Associative memory: on the (puzzling) sparse coding limit. J Phys A: Math Gen. 1991;24:1093–101.
- View Article
- Google Scholar
43. Dubreuil AM, Amit Y, Brunel N. Memory capacity of networks with stochastic binary synapses. PLoS Comput Biol. 2014;10:e1003727. pmid:25101662
- View Article
- PubMed/NCBI
- Google Scholar
44. Montgomery JM, Madison DV. Discrete synaptic states define a major mechanism of synapse plasticity. Trends Neurosci. 2004;27:744–50. pmid:15541515
- View Article
- PubMed/NCBI
- Google Scholar
45. Frey U, Morris RG. Synaptic tagging and long-term potentiation. Nature. 1997;385(6616):533–6. pmid:9020359
- View Article
- PubMed/NCBI
- Google Scholar
46. Choquet D. Linking nanoscale dynamics of AMPA receptor organization to plasticity of excitatory synapses and learning. Journal of Neuroscience. 2018;38(44):9318–29. pmid:30381423
- View Article
- PubMed/NCBI
- Google Scholar
47. Zhabotinsky AM. Bistability in the Ca²⁺/calmodulin-dependent protein kinase-phosphatase system. Biophys J. 2000;79:2211–21. pmid:11053103
- View Article
- PubMed/NCBI
- Google Scholar
48. Miller P, Zhabotinsky AM, Lisman JE, Wang XJ. The stability of a stochastic CaMKII switch: dependence on the number of enzyme molecules and protein turnover. PLoS Biol. 2005;3:e107. pmid:15819604
- View Article
- PubMed/NCBI
- Google Scholar
49. Graupner M, Brunel N. STDP in a bistable synapse model based on CaMKII and associated signaling pathways. PLoS Comput Biol. 2007;3:e221. pmid:18052535
- View Article
- PubMed/NCBI
- Google Scholar
50. Clopath C, Ziegler L, Vasilaki E, Büsing L, Gerstner W. Tag-trigger-consolidation: a model of early and late long-term-potentiation and depression. PLoS computational biology. 2008;4(12):e1000248. pmid:19112486
- View Article
- PubMed/NCBI
- Google Scholar
51. Ziegler L, Zenke F, Kastner DB, Gerstner W. Synaptic consolidation: from synapses to behavioral modeling. Journal of Neuroscience. 2015;35(3):1319–34. pmid:25609644
- View Article
- PubMed/NCBI
- Google Scholar
52. Barrett AB, Billings GO, Morris RG, Van Rossum MC. State based model of long-term potentiation and synaptic tagging and capture. PLoS computational biology. 2009;5(1):e1000259. pmid:19148264
- View Article
- PubMed/NCBI
- Google Scholar
53. Gastaldi C, Muscinelli S, Gerstner W. Optimal stimulation protocol in a bistable synaptic consolidation model. Frontiers in computational neuroscience. 2019;13:78. pmid:31798436
- View Article
- PubMed/NCBI
- Google Scholar
54. Yasuda R, Hayashi Y, Hell JW. CaMKII: a central molecular organizer of synaptic plasticity, learning and memory. Nat Rev Neurosci. 2022;23:666–82. pmid:36056211
- View Article
- PubMed/NCBI
- Google Scholar
55. Redondo RL, Morris RG. Making memories last: the synaptic tagging and capture hypothesis. Nature Reviews Neuroscience. 2011;12(1):17–30. pmid:21170072
- View Article
- PubMed/NCBI
- Google Scholar
56. Hofer SB. Structural traces of past experience in the cerebral cortex. Journal of molecular medicine. 2010;88:235–9. pmid:19904481
- View Article
- PubMed/NCBI
- Google Scholar
57. Lahiri S, Ganguli S. A memory frontier for complex synapses. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in Neural Information Processing Systems. vol. 26. Curran Associates, Inc.; 2013. p. 1034–42.

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Synapses
Is the Subject Area "Synapses" applicable to this article?

Thanks for your feedback.
Learning
Is the Subject Area "Learning" applicable to this article?

Thanks for your feedback.
Memory
Is the Subject Area "Memory" applicable to this article?

Thanks for your feedback.
Neural networks
Is the Subject Area "Neural networks" applicable to this article?

Thanks for your feedback.
Synaptic plasticity
Is the Subject Area "Synaptic plasticity" applicable to this article?

Thanks for your feedback.
Neurons
Is the Subject Area "Neurons" applicable to this article?

Thanks for your feedback.
Coding mechanisms
Is the Subject Area "Coding mechanisms" applicable to this article?

Thanks for your feedback.
Markov models
Is the Subject Area "Markov models" applicable to this article?

Thanks for your feedback.