Deep learning is very powerful at a variety of tasks, including self-driving cars and playing go beyond human level. Despite these engineering successes, why deep learning works remains unclear; a question with many facets. I will discuss two of them: (i) Deep learning is a fitting procedure, achieved by defining a loss function which is high when data are poorly fitted. Learning corresponds to a descent in the loss landscape. Why isn’t it stuck in bad local minima, as occurs when cooling glassy systems in physics? What is the geometry of the loss landscape? (ii) in recent years it has been realised that deep learning works best in the over-parametrised regime, where the number of fitting parameters is much larger than the number of data to be fitted, contrarily to intuition and to usual views in statistics. I will propose a resolution of these two problems, based on both an analogy with the energy landscape of repulsive particles and an analysis of asymptotically wide nets.