I am currently using "Introduction to Linear Algebra" by Gilbert Strang, who is a very well known professor of the topic. The way he has described the concepts, really helps in building a solid intuition. Also at many points he connects LA with stuff from calculus, statistics, geometry, etc too. Also the exercises are real good. I would suggest to try that book.
You can also try the video lectures from him class, on MIT OCW
In page number 20, the book mentions:
"Even today’s networks, which we consider quite large from a computational
systems point of view, are smaller than the nervous system of even relatively
primitive vertebrate animals like frogs."
I found this quite interesting as a fact. This is a wild question, but is there any way/study that compares how a machine/algorithm, well trained, can perform against an animal with similar number of neurons. Or can we be sure that once we achieve a machine capable of learning networks, having neurons comparable to that of a human brain (estimated to reach around 2050), such machine will be as intelligent, as a human, on a variety of tasks?
I wish to join. I hope it's not too late.
Having trouble understanding how equations 1.57 and 1.58 were obtained. What does it mean to take expected values of mean and variance?
As for your 2nd question, I was having some doubts too. I could get more about it after trying exercise question 1.4. Author perhaps wanted to clear out that, unlike general functions, probability density functions behave different under non linear transformations. As an example in the exercise, it says that if x = g(y) is a non linear transformation of variable y, then y_hat, the point where probability distribution function of y variable is maximum, may not necessarily coincide with x_hat, the point where probability density w.r.t x is maximum, i.e. x_hat != g(y_hat).
I was thinking of implementing, Gaussian distribution parameter estimation using MLE on code. This is the only thing I could find in the first 28 pages. This can be good as a start.
Also there are other graphs, especially ones in the curve fitting section, which can be reproduced using code, but would require one to read curve fitting revisited section first. This also seems a good idea to implement.
This helped me for understanding maximum likelihood estimation.
It is certainly feasible (and quite easy if the topic is well understood), to implement a small neural network, using mathematical tools (NumPy, MATLAB).
I would like to join