Naftali (Tali) Tishby נפתלי תשבי
I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.
Our Information Bottleneck Theory of Deep Learning has recently been noticed - at last!
Recent & popular talks
ACDL, Siena, July 2019
MPI, Gottingen, July 2019
ISIT, Paris, July 2019
CBMM, Italy, June 2019
Gatsby triCenter meeting, London, June 2019
IPAM, Geometry of Big Data, UCLA, May 2019
Columbia University Economics, May 2019
IMVC, Tel-Aviv, April 2019
CERN, Geneva, ML for HEP, April 2019
BrainTech, Tel-Aviv, March 2019
ICERM, Brown University, February 2019
Deep Learning & the Brain, ELSC, Jerusalem, January 2019
Directions in Theoretical Physics, Edinburgh, January 2019
Courses given this year
I'm teaching only during the fall semester this year.
Selected Research Projects
We work at the interface between computer science, physics, and biology which provides some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations. An example is the Information Bottleneck method that provides a general principle for extracting relevant structure in multivariate data, characterizes complex processes, and suggests a general approach for understanding optimal adaptive biological behavior
A Deeper Theory of Deep Learning
Information Bottleneck theory of Deep Neural Networks
The success of artificial Neural Networks, in particular Deep Learning (DL), poses a major challenge for learning theory. Over the recent years we have developed a fundamental theory of Deep Neural Networks (DNN) which is based on a complete correspondence between supervised Deep Neural Networks, trained by Stochastic Gradient Decent (SGD), and the Information Bottleneck framework. This correspondence provide a - much needed - mathematical theory of Deep Learning, and a "killer application" with a large scale implementation algorithm for the information bottleneck theory. The essence of our theory is that stochastic gradient decent training, in its popular implementation through error back-propagation, pushes the layers of any deep neural network - one by one - to the information bottleneck optimal tradeoff between sample complexity and accuracy, for large enough problems. This happens in two distinct phases. The first can be called "memorization", where the layers "memorize" the training examples with a lot of irrelevant details with respect to the labels. In the second phase, which starts when the training error essentially saturates, the noise in the gradients pushes the weights, for every layer, to a Gibbs - maximum entropy - distribution subject to the training error constrain. This causes the layers to "forget" irrelevant details of the inputs, which dramatically improves the generalization ability of the network.
Our theory has the following predictions, which are also our main research thrusts of this project:
Information constrained control and learning
Information flows governs sensing-acting and control. We develop the theory to understand how.
We study how information constrains on sensory perception, working memory, and control capacity, affect optimal control and reinforcement learning in biological systems. Our basic model is a POMDP, represented by a directed graphical model consists of world states, W, organism's memory states, M, local observations O, and actions. A. We consider such typical models that achieve a give value (expected future rewards), by minimizing the information flow in all adaptable channels, under the value constraint. This is equivalent to the simplest organism
that achieves a certain value through interactions with its environment. It is also the most robust or fastest to evolve organism, according to the information bottleneck framework. The optimal performence of the organism is determined by the past-future information bottleneck tradeoff, or by the predictive information of the environment.
The simplest organism of this type is the Szilard information engine, with a thermal bath as the environment and extracted mechanical work as value. In this case the observation, memory, and action channels have single bit capacities. We also study how sub-extensivity of the predictive information can explain both discounting of rewards and the emergence of heirarchical internal representations.
Figure taken from Ortega at. al. (2016), based on Tishby and Polani (2009).
The Information Bottleneck approach in Brain Sciences
Cognitive functions, such as perception, decision making and planing, memory, and language, are dominated by information constrains and quantify by the Information Bottleneck framework. Learn how.
We argue that perception, memory, and cognitive representations of the world (semantics) are governed by information theoretic tradeoffs between complexity and accuracy, more than any other any other metabolic or physical constrains. In a recent study we show color names in different languages can be explained by this principle, as part of an on going study on the semantic structure of natural languages, which goes all the way to our original ideas on distributional representations of words (an early version of word2vec) and the first formalization of the information bottleneck as distributional clustering.
Figure from Zaslavsky et. al. (2017).
Lab Alumni: graduate students