Finally I seem to understand the practical difference between frequentist and bayesian. Had to watch a few additional videos though.
Frequentist: Finds the one model that is most likely to produce the observed data. Works without prior knowledge, but without enough data, the choice can be extremely off.
Bayesian: Updates the belief distribution across models based on priors and incoming data. Works without much data, but without good priors, the updated beliefs can be extremely off.
N-1 formal proof: https://www.youtube.com/watch?v=D1hgiAla3KI
N-1 intuition: https://www.youtube.com/watch?v=Cn0skMJ2F3c
As far as I know probabilistic models are not alternatives, but complements to deep learning. In probabilistic programming, the domain expert encodes their knowledge of the process, while deep learning also works in the absence of (formalizable) domain knowledge. Anyways, it doesn't make sense to use machine learning (e.g. DNN-s) to approximate a model that we already know exactly. So I think the two approaches are meant for two different scenarios.
Mixed cases would be when domain knowledge is available, but is incomplete. A typical mixed case is when the expert defines the graph structure of the process, but cannot provide the exact parameters - those have to be learned.
Or am I missing something? Please feel free to correct me.
Regarding the typical neural network size ranging from bee to mouse (for neuron number and synapse number, respectively), I have heard somewhere and also believe it quite likely that the actual limit on the growth of the networks is not a matter of scarcity of computational power anymore. Much rather the limit is either us being unable to dream up problems that require a more complex network, or us being unable to gather data sets for such complex problems. Bear in mind, that our human brain itself is not a single "one-trick-pony" monolithic network geared towards a single task. Or, conversely, that maybe we are just unable to define and formulate that single grand task that requires the complexity of our whole brain.
So I'm somewhat skeptical about 2050 human-brain sized networks. Maybe the ensembling of simple problems (a.k.a. Multi-Task Learning, as it is already practiced in the field of vision, where you actually combine the capability of 1000 little nets to classify 1000 different objects, while reusing low-level features) could be the main driver behind further network growth. Any thoughts on this?