Naftali Tishby’s Site on Strikingly

Naftali (Tali) Tishby נפתלי תשבי
Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research

Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel
I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.
News
The 2019 IBT Award in Mathematical Neuroscience - THANKS!
Our Information Bottleneck Theory of Deep Learning has recently been noticed - at last!
See the Quanta-Magazine article on our work and my June 2017 Berlin Deep Learning Workshop talk which triggered it.
A longer talk given at Yandex, Moscow, October 10, 2017.
Recent & popular talks
ACDL, Siena, July 2019
MPI, Gottingen, July 2019
ISIT, Paris, July 2019
CBMM, Italy, June 2019
Gatsby triCenter meeting, London, June 2019
IPAM, Geometry of Big Data, UCLA, May 2019
Columbia University Economics, May 2019
IMVC, Tel-Aviv, April 2019
CERN, Geneva, ML for HEP, April 2019
BrainTech, Tel-Aviv, March 2019
ICERM, Brown University, February 2019
Deep Learning & the Brain, ELSC, Jerusalem, January 2019
Directions in Theoretical Physics, Edinburgh, January 2019
Statistical Physics and Machine Learning: A 30 Year Perspectives, APS Physics Next Workshop, October 2018
Part I of a Mini-Course on The Information Theory of Deep Learning, ICTP, Trieste, November 2018
Part II of a Mini-Course on The Information Theory of Deep Learning, ICTP, Trieste, November 2018
Part III of a Mini-Course on The Information Theory of Deep Learning, ICTP, Trieste, November 2018
"What Makes Us Human: From Genes to Machines", June 6, 2018, Jerusalem.
A Talk at the Israel Academy of Science, June 6, 2018, Jerusalem
Interview on the future of Artificial Intelligence, The AI Summit, Berlin, June 2018.
Interview at SISSA , Trieste, Italy, before my physics colloquium, May 2018.
Perimeter Institute Physics Colloquium, Waterloo, Canada, April 2018.
Stanford CSE Colloquium, April 4, 2018.
Simons Institute, Berkeley, Public talk, April 9, 2018.
Simons Institute, Berkeley, "Brain & Computation program", March 2018.
Human creativity, Music & Deep Learning, BIU, Nov. 29, 2017.
The Synergy between Information and Control
שבוע אמנות ומוח 2016 דוד גרוסמן ופרופ' נפתלי תשבי בדיאלוג על מוח, מחשבה, דמיון ויצירה
My current Lab
Gal Keynan
Etam Benger
Zoe Piran
Shlomi Agmon
Ravid Shwartz-Ziv
Noga Zaslavsky
Nadav Amir
Ron Hecht (MSc 2007)
Hadar Aharoni Levi
Courses given this year
I'm teaching only during the fall semester this year.

Intro to inference, information, and learning
ELSC 76915 (fall 2017-18)
The Information Bottleneck seminar
ELSC 76929 (fall 2017-18)
Selected Research Projects
We work at the interface between computer science, physics, and biology which provides some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations. An example is the Information Bottleneck method that provides a general principle for extracting relevant structure in multivariate data, characterizes complex processes, and suggests a general approach for understanding optimal adaptive biological behavior
A Deeper Theory of Deep Learning
Information Bottleneck theory of Deep Neural Networks
The success of artificial Neural Networks, in particular Deep Learning (DL), poses a major challenge for learning theory. Over the recent years we have developed a fundamental theory of Deep Neural Networks (DNN) which is based on a complete correspondence between supervised Deep Neural Networks, trained by Stochastic Gradient Decent (SGD), and the Information Bottleneck framework. This correspondence provide a - much needed - mathematical theory of Deep Learning, and a "killer application" with a large scale implementation algorithm for the information bottleneck theory. The essence of our theory is that stochastic gradient decent training, in its popular implementation through error back-propagation, pushes the layers of any deep neural network - one by one - to the information bottleneck optimal tradeoff between sample complexity and accuracy, for large enough problems. This happens in two distinct phases. The first can be called "memorization", where the layers "memorize" the training examples with a lot of irrelevant details with respect to the labels. In the second phase, which starts when the training error essentially saturates, the noise in the gradients pushes the weights, for every layer, to a Gibbs - maximum entropy - distribution subject to the training error constrain. This causes the layers to "forget" irrelevant details of the inputs, which dramatically improves the generalization ability of the network.

Our theory has the following predictions, which are also our main research thrusts of this project:
The sample-complexity and accuracy of the DNN is determined by the mutual information of the encoder and decoder of the last hidden layer. For large enough problems they achieve the information theoretic optimal tradeoff, which depends only on the input-label distribution. In that sense DNN are optimal learning machines.
The convergence time is dominated by diffusion (in a non-convex space!). The compression time is exponentially boosted by the hidden layers!
The hidden layers converge to very special points in the information plane (see figure), which depend on the phase transitions (bifurcations) of the information bottleneck theory.
How much of this theory is specific to the SGD optimization?
How much of it is relevant for biological learning and "real brains"?
Figure from the September 21 issue of Quanta-Magazine article on our work .
Information constrained control and learning
Information flows governs sensing-acting and control. We develop the theory to understand how.
We study how information constrains on sensory perception, working memory, and control capacity, affect optimal control and reinforcement learning in biological systems. Our basic model is a POMDP, represented by a directed graphical model consists of world states, W, organism's memory states, M, local observations O, and actions. A. We consider such typical models that achieve a give value (expected future rewards), by minimizing the information flow in all adaptable channels, under the value constraint. This is equivalent to the simplest organism
that achieves a certain value through interactions with its environment. It is also the most robust or fastest to evolve organism, according to the information bottleneck framework. The optimal performence of the organism is determined by the past-future information bottleneck tradeoff, or by the predictive information of the environment.

The simplest organism of this type is the Szilard information engine, with a thermal bath as the environment and extracted mechanical work as value. In this case the observation, memory, and action channels have single bit capacities. We also study how sub-extensivity of the predictive information can explain both discounting of rewards and the emergence of heirarchical internal representations.

Figure taken from Ortega at. al. (2016), based on Tishby and Polani (2009).
The Information Bottleneck approach in Brain Sciences
Cognitive functions, such as perception, decision making and planing, memory, and language, are dominated by information constrains and quantify by the Information Bottleneck framework. Learn how.
We argue that perception, memory, and cognitive representations of the world (semantics) are governed by information theoretic tradeoffs between complexity and accuracy, more than any other any other metabolic or physical constrains. In a recent study we show color names in different languages can be explained by this principle, as part of an on going study on the semantic structure of natural languages, which goes all the way to our original ideas on distributional representations of words (an early version of word2vec) and the first formalization of the information bottleneck as distributional clustering.

Figure from Zaslavsky et. al. (2017).
Lab Alumni: graduate students

Stas Tiomkin (PhD, 2019)
Michal Moshkovich (PhD 2018)
Roy Fox (PhD 2016)
Nori Jacoby (PhD 2014, Co-advisor: Merav Ahissar)
Jonathan Rubin (PhD 2013, Co-advisor: Eli Nelken)
Sivan Sabato (PhD 2012)
Asaf Gal (PhD 2012, Co-advisor: Shimon Marom)
Yuval Tassa (PhD 2010, Co-advisor: Emo Todorov)
Ohad Shamir (PhD 2010)
Dan Rosenbaum (MSc 2010)
Uri Heinemann (MSc 2009)
Naama Parush (PhD 2009. Co-advisor: Hagai Bergman)
Yevgeny Seldin (PhD 2009, MSc. 2002)
Roi Weiss (MSc 2007)
Eyal Krupka (PhD 2008)
Meital Rabani (MSc 2007)
Hani Neuvirth (Co-advisor: Gideon Schreiber)
Amir Navot (PhD 2006)
Ran Gilad-Bachrach (PhD 2005)
Amir Globerson. (PhD 2005) (Co-advisor: Eilon Vaadia)
Yaki Engel (PhD 2005) (Advisor: Ron Meir)
Shmuel Brody (MSc 2005)
Amit Rosner (Co-advisor: Udi Shapiro)
Gill Bejerano (PhD 2003. Co-advisor: Hanah Margalit)
Gal Chechik (PhD 2003. Co-advisor: Eli Nelken)
Noam Slonim (PhD 2002)
Elad Schneidman (PhD 2001. Co-advisor: Idan Segev)
Adi Schreibman (MSc 2000)
Shai Fine (MSc 1996, PhD 1999)
Itay Gat (MSc 1995, PhD 1999. Co-advisor: Moshe Abeles)
Golan Yona (PhD 1998. Co-advisors: Nati & Michal Linial)
Lidror Troyansky (PhD 1997)
Shlomo Dubnov (PhD 1996. Co-advisor: Dalia Cohen)
Dana Ron (PhD 1995)
Yoram Singer (PhD 1995)
Tzvika Svinik (MSc 1994)·

Postdocs & visitors
Amichai Painsky
Pedro Ortega
Felix Creutzig
Michal Rosen-Zvi
Ran El-Yaniv
Shahar Mendelson
Jan Stiller
Ilya Nemenman
Yoav Freund
Past Courses
Introduction to Information Processing and Learning, 76915 (Noga Zaslavsky, Fall 2014).
Music and Brain, 76939 (Roni Granot, Naphtali Wagner, Israel Nelken, Naftali Tishby, Nori Jacoby. Fall 2009).
Introduction to Linear Systems, 67310 (Tal El-Hai, spring 2010).
Principled models of Perception-Action-Cycles 76911 (Spring 2009).
Machine learning seminar 67168 (2009-10).
Dynamical Systems and Control, 76929 (Fall 2009).
Intro to Information Theory 67548 (Talya Meltzer, spring 2006)
Statistical and Computational Learning Theory, 67583 (Ofer Dekel, spring 2006).
Workshop in Neural Coding (For ICNC students – with data) 76928.
The learning club

Naftali (Tali) Tishby נפתלי תשבי

Physicist, professor of computer science and computational neuroscientistThe Ruth and Stan Flinkman professor of Brain Research

Benin school of Engineering and Computer ScienceEdmond and Lilly Safra Center for Brain Sciences (ELSC)Hebrew University of Jerusalem, 96906 Israel

News

Recent & popular talks

My current Lab

Gal KeynanEtam BengerZoe PiranShlomi AgmonRavid Shwartz-ZivNoga ZaslavskyNadav AmirRon Hecht (MSc 2007)Hadar Aharoni Levi

Courses given this year

I'm teaching only during the fall semester this year.

Intro to inference, information, and learning

The Information Bottleneck seminar

Selected Research Projects

A Deeper Theory of Deep Learning

Information Bottleneck theory of Deep Neural Networks

Information constrained control and learning

Information flows governs sensing-acting and control. We develop the theory to understand how.

The Information Bottleneck approach in Brain Sciences

Cognitive functions, such as perception, decision making and planing, memory, and language, are dominated by information constrains and quantify by the Information Bottleneck framework. Learn how.

Lab Alumni: graduate students

Postdocs & visitors

Past Courses

Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research

Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel

Gal Keynan
Etam Benger
Zoe Piran
Shlomi Agmon
Ravid Shwartz-Ziv
Noga Zaslavsky
Nadav Amir
Ron Hecht (MSc 2007)
Hadar Aharoni Levi