neuroplausibleI am a cognitive computational neuroscientist and this blog is about my research.
http://neuroplausible.com
I hate Matlab: How an IDE, a language, and a mentality harm<p>This blog post is inspired by a few Matlab-related tweets of mine, which turned into days-long discussions with fellow science and non-science tweeps.
Those tweets of mine in turn are motivated by two main things: my desire for programming in psychology, neuroscience, and science in general to be taught and taught well, and my desire for students to learn transferable skills more generally.
This blog post is premised on a number of themes which came up on Twitter.
The great need for scientists to be able to code.
The fact that Matlab is akin to bad training wheels on a bicycle, which never aid with learning to ride, but are used over again because they are better than walking.
And the idea that while there is a best tool for every job, not every tool is best for any job.
The discussion on Twitter was motivating and so I promised everybody I would write up what I think.
So this blog post is about how I think teaching Matlab, the whole ecosystem not just the language, within psychology harms students more than it helps them in many cases in my experience.</p>
<p>To clarify, Matlab used to be the best tool for many things.
Before things like the <a href="http://www.numpy.org/">NumPy</a>/<a href="http://matplotlib.org/">Matplotlib</a>/<a href="http://jupyter.org/">Jupyter</a> trilogy, it was probably the only tool that had “everything”.
When Matlab first came out, the alternative was <a href="https://en.wikipedia.org/wiki/Fortran">Fortran</a> (which has <a href="http://stackoverflow.com/questions/3517726/what-is-wrong-with-using-goto">goto statements</a>, if you don’t know why this is scary, never mind, you’re lucky).
But I believe it is now more a cause of brain-rot than mind-expanding awesomeness (please do not watch <em>Arrival</em> just to get this <a href="https://en.wikipedia.org/wiki/Linguistic_relativity">Sapir-Whorf</a> reference).
It is now more user- and science-jail than a freeing experience that allows us to make prototypes fast (it of course still does the latter).</p>
<div class="float-right figure">
<img class="image" src="/img/posts/matlab.jpg" />
<div class="figure-caption">
The Matlab logo is a visually appealing render of an <a href="https://uk.mathworks.com/company/newsletters/articles/the-mathworks-logo-is-an-eigenfunction-of-the-wave-equation.html">eigenfunction of the wave equation</a>.
</div>
</div>
<p>If you are a proficient coder and love Matlab, then this blog post is <em>not</em> really for you.
Importantly, my intended audience are those who wish to see an improvement in the teaching of programming within psychology.
I am talking from the perspective of my experiences within my field: psychology and cognitive science.
I have designed from scratch: <a href="https://github.com/oliviaguest/connectionism">a course</a>, that I taught when I was working as a postdoc at Oxford; and <a href="https://sites.google.com/site/introcompcog/">a workshop</a>, while I was a PhD student; both with the aim of teaching the principles of coding before diving into Python specifically for psychology students.
I also want people in science to have dependable transferrable skills, to be able to <a href="https://erikbern.com/2017/03/15/the-eigenvector-of-why-we-moved-from-language-x-to-language-y.html">move to other languages</a>, as well as having as much fun as possible while learning.
Because of <a href="https://www.ucl.ac.uk/pals/research/experimental-psychology/blog/women-experimental-psychology-olivia-guest/">my training</a>, I am privileged enough to be able to pick up a new language in a couple of hours.
I want others to have such skill-related opportunities too, not only because it is useful for science as an endeavour to have skilled researchers, but for us as individuals: if one emerges from their degree a coder one will have more opportunities (both within and outside science).</p>
<p>To reiterate my titular claim: the way we teach Matlab in psychology appears to be more harmful than helpful.
I would like us to move beyond Matlab because the ecosystem it provides is a dangerous attractor, which many of my peers and my students involuntarily get sucked into.
In this post I will outline the main reasons why the Matlab ecosystem and language are as provocatively described above.
I intend to use “Matlab” to mean the whole ecosystem: the IDE, the language, and the mentality it brings about because I think they are inseparable.
In the same way “<a href="http://journal.stuffwithstuff.com/2013/07/18/javascript-isnt-scheme/">C programmers [allocate] their own damn memory, probably right after building their own computer out of rocks and twigs</a>”, Matlab coders within psychology also have and create a culture around them aided by the IDE and the pre-existing community they have joined.</p>
<h2 id="limited-skill-transfer"><a name="limited-skill-transfer"></a>Limited Skill Transfer</h2>
<p>Firstly, Matlab is not sufficient to provide us with a transferable programming skillset.
Matlab provides a programming environment in which nothing, at least superficially, seems hard — and thus nothing meaningful about coding itself is learned.
We do not need to worry about namespaces, nor even functions too much.
And we do not need to learn anything too complex to get some OK-looking figures.
This is great for prototyping — we can produce something that works well enough impressively quickly.
But this comes at a huge cost to us as a newbie coder.
We have not learned any of the important skills that would enable us to pick up another language.
And we will undeniably need to pick up other languages because that is the state psychology is in — e.g., R is becoming the standard for statistical analyses.
Yet we just learned a language that does not help us do that since it did not push us to learn the basics of what other languages have at their core.</p>
<div class="float-right figure">
<img class="image" src="/img/posts/Emacs-screenshot.png" />
<div class="figure-caption">
<a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDEs</a> are extremely useful if you are a proficient coder already. However, they can act more like bad training wheels on a bicycle, hindering deeper learning.
</div>
</div>
<p>To put this another way, when one is learning to drive they do not tend to learn to drive using an automatic gearbox.
They learn to drive with a manual gearbox and it is tough.
Learning the harder of the two types, manual, allows us to then easily transfer to the easier of the two if need be.
In the case of USAmericans, <a href="https://www.quora.com/Why-do-Americans-mostly-drive-automatic-transmission-vehicles">they mostly learn to drive an automatic gearbox</a> and almost never learn manual (because their skills do not transfer easily).
Although the metaphor is simplistic, it suffices to explain why Matlab is not the best language to learn, it is a car with an automatic gearbox.
We cannot easily transfer what we have learned to driving stick and in fact licences for just automatic transmission exist in my home country and the UK: if you learn just automatic you cannot be expected to know stick, whereas if you learn manual transmission you know “everything”.</p>
<p>Furthermore, I posit that Matlab knowledge can make it harder than absolutely no programming knowledge for us to shift to another language.
Matlab has an <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a> that provides <a href="https://en.wikipedia.org/wiki/Graphical_user_interface">GUI</a> functionality that allows us to edit variables dynamically like in <a href="http://www.sciencemag.org/news/sifter/one-five-genetics-papers-contains-errors-thanks-microsoft-excel">Excel, which we know causes demonstrable problems</a>.
It causes some of our students to think that the Matlab IDE is what programming is, in much the same way some of our students think <a href="https://en.wikipedia.org/wiki/SPSS">SPSS</a> is what statistics is.
Furthermore, high dependence on manually editing things is extremely bad because our workflow will not be <a href="http://oliviaguest.com/doc/guest_rougier_16.pdf">reproducible nor replicable</a>.</p>
<p>In addition, all the bells and whistles of the IDE and the GUI never force us to think about variables deeply (since we can always visualise them).
This exercise in keeping a mental model of what the code is doing, writing down what the code should be doing, imagining the data structures, etc., is a skill one needs to be developing.
More than once I have been asked to help people who were editing their variables in the GUI and hence did not properly understand their own code nor how to debug it.
This is not their fault, but had they learned to code without this they would never have picked up such terrible habits.
They had not learned exactly what a loop was and a lot of other helper scripts worked just fine, so they had no feedback that editing in the GUI is maladaptive per se.</p>
<p>In most other languages: there is no GUI and there is no IDE that has the language baked in.
This results in many of us using Matlab by just pressing buttons and hoping something useful will come out the other end.
And this observation, shocking though it may seem, that this is what we and our students do, has been backed up by so many of you over chat and Twitter.
The GUI and IDE crutches will be snatched away from us as we will have to learn to code all over again — something we need never have to do if we had learned using a manual gearbox/not Matlab.</p>
<p>Matlab puts a ceiling on what kinds of projects we can do both in size and in scope.
Optimising for hardware, needing to lower <a href="https://en.wikipedia.org/wiki/DSPACE">space</a> and <a href="https://en.wikipedia.org/wiki/Time_complexity">time complexity</a>, wanting something very specific like web-scraping, etc., are all tougher within Matlab.
This is because Matlab is more a <a href="https://en.wikipedia.org/wiki/Domain-specific_language">domain-specific</a> than a domain-general language, it is centrally controlled, and the GUI and IDE cannot cope with large projects easily (although there is <a href="http://blogs.mathworks.com/community/2010/02/22/launching-matlab-without-the-desktop/">a command line mode</a>, which we will be predominantly uncomfortable with given we only know Matlab).</p>
<p>To further underline my point, Matlab explicitly teaches us some very unorthodox programming principles.
Some “features” do not exist in (m)any other languages, and certainly not in any we will likely want to learn in the near future (<a href="https://www.python.org/">Python</a>, <a href="https://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B">C/C++</a>, <a href="https://www.r-project.org/">R</a>, <a href="https://julialang.org/">Julia</a> — even <a href="https://www.latex-project.org/">LaTeX</a>).
For example, we are not allowed to have more than <a href="https://uk.mathworks.com/help/matlab/matlab_prog/create-functions-in-files.html">a single externally accessible function per file</a>, and that file must have the same filename as the function we wish to access.
In essence this means we cannot have more than a function per file if we are, e.g., trying to code up a library in a clear way.
Matlab does not permit us to store all our global variables in one file, e.g., if we need constant values.
Due to all this, Matlab promotes <a href="https://en.wikipedia.org/wiki/Spaghetti_code">spaghetti code</a>.
This adds to why many of us feel embarrassed to share our code online.
We never learned to write neat code because Matlab allows us to be quick and dirty without any repercussions.</p>
<p>Perhaps most flagrantly, <a href="https://nickhigham.wordpress.com/2017/03/15/tracing-the-early-history-of-matlab-through-siam-news/">arrays</a> <a href="https://www.mathworks.com/company/newsletters/articles/the-origins-of-matlab.html">in Matlab</a> <a href="https://www.mathworks.com/company/newsletters/articles/the-growth-of-matlab-and-the-mathworks-over-two-decades.html">start</a> <a href="http://stackoverflow.com/questions/22546787/why-does-matlab-have-1-based-indexing">at 1</a>.
One has no idea how maladaptive this is until they move outside Matlab.
Computer science <a href="https://www.johndcook.com/blog/2008/06/26/why-computer-scientists-count-from-zero/">starts from zero for a reason</a>.
If we want to learn generalisable skills, learning that indexing starts at 1 will hinder us, perhaps even cause us to introduce very nasty hard-to-find bugs when we move outside the Matlab ecosystem.
All these put together cause us to get more confused by new languages as the baggage we carry with us from learning Matlab needs to be actively unlearned and inhibited.</p>
<h2 id="closed-source-means-closed-science"><a name="closed-source-means-closed-science"></a>Closed Source Means Closed Science</h2>
<p>Secondly, Matlab is closed source, proprietary, and prohibitively expensive if you have to buy it yourself.
They obfuscate their source code in many cases, meaning bugs are much <a href="https://uk.mathworks.com/matlabcentral/answers/79714-how-do-we-know-that-matlabs-algorithms-are-working-properly">harder to spot</a> and impossible to <a href="http://stackoverflow.com/questions/2470765/can-i-distribute-my-matlab-program-as-open-source">edit ourselves without risking court action</a>.
Moreover, using Matlab for science results in <a href="https://github.com/openjournals/joss/issues/142">paywalling our code</a>.
We are by definition making our computational science closed.</p>
<div class="float-right figure">
<img class="image" src="/img/posts/Open_Science_-_Prinzipien.png" />
<div class="figure-caption">
The principles of open science, by <a href="https://commons.wikimedia.org/wiki/User:Gegensystem">Andreas E. Neuhold</a>.
</div>
</div>
<p>Many people in the mutually inclusive <a href="https://en.wikipedia.org/wiki/Open_science">open science</a> and <a href="https://en.wikipedia.org/wiki/Free_software_movement">open software</a> movements hope to see <a href="https://www.software.ac.uk/blog/2016-09-12-quick-and-dirty-analysis-software-being-used-research-python-matlab-and-r">Matlab surpassed</a> sooner rather than later and some even think it is inevitable.
By extension, people in these movements tend to think freely deciding to use Matlab (and indeed any closed source software) in science is <a href="http://academia.stackexchange.com/questions/80790/is-it-ethical-to-use-proprietary-closed-source-software-for-scientific-computa">at least questionable and at most unethical</a>.
I believe in free and open software and science, so I am in principle opposed to Matlab’s grip on science.
This does not mean I believe the science done with Matlab is in any way worse in and of itself.
By the same token, scientists who believe in open access do <em>not</em> think that science published in closed access journals is “bad science” — they think it is not the best publishing practice.
Sadly, one can either be for open science or against it.
So unless Matlab’s “<a href="https://www.mathworks.com/company/aboutus/soc_mission.html">core values and conviction to “Do the Right Thing”</a>” start to also include open source and science, Matlab is incompatible with our aims.</p>
<p>Something that pains me immensely, and indirectly affects all Matlab coders is the incompatibility between Matlab versions.
The main reason for this is, unlike <a href="https://docs.python.org/3/reference/grammar.html">Python</a> or <a href="http://www.nongnu.org/hcb/">C++</a> or pretty much all languages out there, there is no <a href="https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form">Backus-Naur form</a> for Matlab to my knowledge.
This means that Matlab has no official and formally-specified grammar, <a href="https://www.quora.com/Do-programming-languages-have-grammar">it could</a>, but it does not.
This is incredibly bad if true, and explains the compatibility problems, making Matlab more like Microsoft Word (which is not backwards compatible and not a programming language).
It also means Mathworks does not have to stick to any rules for the grammar of Matlab, they can change it on the fly.
And by the same token, <a href="https://en.wikipedia.org/wiki/GNU_Octave">Octave</a> compatibility is hard to maintain because the language is not defined.</p>
<p>Importantly, over and above the fact it is not <a href="https://opensource.org/docs/osd">open source</a>, I propose Matlab (and thus similar languages like Octave and <a href="http://www.scilab.org/">SciLab</a> which <em>are</em> open) should not be our go-to languages for the reasons outlined herein.
To re-iterate, Matlab is not the best language to teach our students and peers for pedagogical, skill transfer, and practical reasons <em>over and above</em> the ethical/openness reasons.
These reasons in and of themselves serve to discredit Matlab and demote it from its place as the primary programming language for teaching in psychology.</p>
<h2 id="conclusion"><a name="conclusion"></a>Conclusion</h2>
<p>In a nutshell, Matlab creates an environment where we learn how to code without ever doing anything too difficult, without ever developing skills that really transfer, and without ever understanding the core of what coding is about.
I want us to be better equipping ourselves and our students for both life in science and giving them useful skills for life outside science.
The default position in my part of science is you teach Matlab and then that is it.
I <em>do not</em> levy these criticisms against those whose of us who use and teach a multitude of languages (including Matlab).
I am focussing on the majority of us who teach and use for all intents and purposes <em>only</em> Matlab.</p>
<p>GUIs and IDEs are great — just like once we already know how to drive using a manual transmission we can easily switch to automatic — but they predominantly do not push us to develop our skills further.
If we want to we can switch to a fancy IDE after we already know the tougher stuff.
We learn multiplication tables off by heart <em>before</em> we switch to using our smartphone as a calculator.
I am assuming we all want to develop our technical skills appropriately, so inevitably we will need to carry out much more complex tasks, like writing a bash script or compiling something from source — all these things are skills we need to be building up slowly over time.
Matlab allows us to live in a lovely world where everything is easy but from which we cannot escape.
Research will throw harder programming tasks at us than quickly making graphs or fast matrix multiplication.
Thus we need to accept that sometimes learning new things can be hard (as well as fun).</p>
<p>Some will <a href="http://lorenabarba.com/blog/why-i-push-for-python/">push for their own favourite language, e.g., Python</a>.
Nonetheless, as long as we move away from the paradigms the Matlab ecosystem enforces, we will have made serious gains pedagogically.
I hope I have convinced my intended audience that even though Matlab has been the go-to language things should be and are rightfully changing.
For examples, even within engineering where Matlab has a historically strong hold its widespread use is being eroded — <a href="http://to.eng.cam.ac.uk/teaching/committee/SSJC_mins/ssjc_computing.pdf">the engineering department at Cambridge decided to teach Python instead</a>.</p>
<p>Programming education in psychology can be better.
Other languages provide more replicable and reproducible workflows, more opportunity to learn transferrable skills, and communities centered around open source and open science.
If we can teach the Matlab ecosystem, then we can make a small step for great gains and teach a better more open ecosystem.
We <a href="https://www.wired.com/2017/03/biologists-teaching-code-survive/">must teach the core concepts of programming</a> and we must teach them well.
We are in the midst of transition from closed source to open source, closed science to open science, black box workflows to reproducible and replicable workflows.
Let’s make this transition happen by equipping our students and ourselves with the most appropriate skills.</p>
<h2 id="thanks">Thanks</h2>
<p>This blog post would not have been possible without discussions with my <a href="http://software.ac.uk">Software Sustainability Institute</a> co-fellows and the institute’s staff, nor without the <a href="https://twitter.com/o_guest/status/841671820575162368">many</a> <a href="https://twitter.com/o_guest/status/842794088315404288">many</a> tweets from you all.</p>
Fri, 17 Mar 2017 00:00:00 +0000
http://neuroplausible.com/matlab
http://neuroplausible.com/matlabArtificial Neural Networks with Random Weights are Baseline Models<p>Where do the impressive performance gains of deep neural networks come from?
Is their power due to the learning rules which adjust the connection weights or is it simply a function of the network architecture (i.e., many layers)?
These two properties of networks are hard to disentangle.
One way to tease apart the contributions of network architecture versus those of the learning regimen is to consider networks with randomised weights.
To the extent that random networks show interesting behaviors, we can infer that the learning rule has not played a role in them.
At the same time, examining these random networks allows us to evaluate what learning does add to the network’s abilities over and above minimising some loss function.</p>
<div class="float-right figure">
<object class="image" data="/img/posts/ann_models_correlation.svg" type="image/svg+xml">
<img src="{"svg"=>true, "dir"=>"/img/posts/ann_models_correlation"}.png" />
</object>
<div class="figure-caption">
<a href="https://elifesciences.org/content/6/e21397#fig2">Figure 2A</a> from Guest and Love (2017): "For the artificial neural network coding schemes, similarity to the prototype falls off with increasing distortion (i.e., noise). The models, numbered 1–11, are (<i>1</i>) vector space coding, (<i>2</i>) gain control coding, (<i>3</i>) matrix multiplication coding, (<i>4</i>), perceptron coding, (<i>5</i>) 2-layer network, (<i>6</i>) 3-layer network, (<i>7</i>) 4-layer network, (<i>8</i>) 5-layer network, (<i>9</i>) 6-layer network (<i>10</i>) 7-layer network, and (<i>11</i>), 8-layer network. The darker a model is, the simpler the model is and the more the model preserves similarity structure under fMRI."
</div>
</div>
<p>In <em><a href="http://dx.doi.org/10.7554/eLife.21397">What the Success of Brain Imaging Implies about the Neural Code</a></em>, we examined an artificial deep neural network, Inception-v3 GoogLeNet.
This deep trained network, preserves the similarity of the input space and thus is <a href="https://elifesciences.org/content/6/e21397#s2">functionally smooth</a>.
Importantly, however, we found that functional smoothness in this deep network breaks down at later layers.
Is this because of the depth of the network, the many layers, or the specific learning regimen?
We sought to explain why this happens by using a baseline, a model with random weights.</p>
<p>To answer this question, let us consider some much simpler plausible contenders for the neural code — a rudimentary set of models — the components of artificial neural networks: <a href="https://en.wikipedia.org/wiki/Matrix_multiplication">matrix multiplication</a> and some kind of squashing (<a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid</a>, <a href="https://en.wikipedia.org/wiki/Step_function">step</a>, <a href="https://en.wikipedia.org/wiki/Activation_function">etc</a>.) function (in our case, the <a href="https://en.wikipedia.org/wiki/Hyperbolic_function#Hyperbolic_tangent">hyperbolic tangent</a>).</p>
<p>The first basic model, matrix multiplication, is how neural networks propagate activation from layer <script type="math/tex">\mathbf{m}</script> to the next <script type="math/tex">\mathbf{n}</script> via the weights <script type="math/tex">\mathbf{w}</script>.
For simplicity, our toy network contains layers <script type="math/tex">\mathbf{m}</script> and <script type="math/tex">\mathbf{n}</script>, which both contain three units.
Thus to calculate the states for <script type="math/tex">\mathbf{n}</script>, we take the matrix product of the previous layer <script type="math/tex">\mathbf{m}</script> and the weights <script type="math/tex">\mathbf{w}</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\mathbf{m} \times \mathbf{w}
=
\\
\begin{pmatrix}
x_1 & x_2 & x_3 \\
\end{pmatrix}
\times
\begin{pmatrix}
w_{11} & w_{12} & w_{13} \\
w_{21} & w_{22} & w_{23} \\
w_{31} & w_{32} & w_{33}
\end{pmatrix}
=
\\
\begin{pmatrix}
x_1 w_{11} + x_2 w_{21} + x_3 w_{31} \\
x_1 w_{12} + x_2 w_{22} + x_3 w_{32} \\
x_1 w_{13} + x_2 w_{23} + x_3 w_{33}
\end{pmatrix}\
=
\begin{pmatrix}
y_1 & y_2 & y_3
\end{pmatrix}
=
\mathbf{n}
\, %]]></script>
<p>where <script type="math/tex">x</script>s represent the units in layer <script type="math/tex">\mathbf{m}</script>, <script type="math/tex">w_{ij}</script> represents a weight in <script type="math/tex">\mathbf{w}</script> from unit <script type="math/tex">i</script> in layer <script type="math/tex">\mathbf{m}</script> to unit <script type="math/tex">j</script> in <script type="math/tex">\mathbf{n}</script>, and <script type="math/tex">y_j</script> is a unit in <script type="math/tex">\mathbf{n}</script>. For example, <script type="math/tex">w_{31}</script> is the weight on the connection between the third unit of the shallower/earlier layer and the first unit of the deeper/later later (others use other notations).</p>
<p>Matrix multiplication calculates the states of a layer — easily done in Python using <a href="http://www.numpy.org/">NumPy</a>, specifically <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html"><code class="highlighter-rouge">numpy.dot()</code></a>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">1.3</span><span class="p">])</span> <span class="c"># layer m with some dummy input</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c"># random weights from m to n</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="c"># pre-synaptic states in n</span>
<span class="k">print</span><span class="p">(</span><span class="n">n</span><span class="p">)</span></code></pre></figure>
<p>To apply a squashing function, <script type="math/tex">\tanh</script>, to <code class="highlighter-rouge">n</code> above, we may use <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.tanh.html"><code class="highlighter-rouge">numpy.tanh()</code></a>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">n</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="c"># post-synaptic states in n</span>
<span class="k">print</span><span class="p">(</span><span class="n">n</span><span class="p">)</span></code></pre></figure>
<p>Non-linear transformations like hyperbolic tangent allow the network to have non-linear decision boundaries, e.g., between classes, making it able to capturing the statistics of the training set (more <a href="http://www.kdnuggets.com/2016/08/role-activation-function-neural-network.htmlhttp://www.kdnuggets.com/2016/08/role-activation-function-neural-network.html">here</a> and <a href="https://www.quora.com/Why-do-neural-networks-need-an-activation-function/answer/Chomba-Bupe">here</a>).</p>
<p>In <a href="http://dx.doi.org/10.7554/eLife.21397">Guest and Love (2017)</a> we presented the above as two separate models as well as a combined model, here I have cut to the part where they are combined to form a traditional two-layer network (also known as the <a href="https://en.wikipedia.org/wiki/Perceptron">perceptron</a> model).
As you might have guessed, from two layers we can generalise to many, by continuing to take the matrix product of the output (<code class="highlighter-rouge">n</code> in the code above) with some new weights, and so on.</p>
<p>Running an untrained neural network with random weights allows us to compare more complex (i.e., trained models) with their untrained selves.
We can thus pick apart what aspects of the model are inherent to the architecture itself and which emerge as a function of training.
Networks that have random weights can be given the same training and test sets, although importantly no training has happened yet, and we can examine their internal states and outputs
This can serve as a guide to understand what the network “knows” a priori.</p>
<p>As we noted in <a href="http://dx.doi.org/10.7554/eLife.21397">Guest and Love (2017)</a>, networks naturally place items close together in their internal representational space that are similar/proximal in the input space. Hence why artificial neural networks are a plausible candidate for the neural code, i.e., they give rise to <a href="https://elifesciences.org/content/6/e21397#s2">functionally smooth</a> representations.
The simple network above can be made deeper and deeper, and we can inspect every layer in it for smoothness for every pattern.
Extending the above, we can do just that, and run the network on two very simple categories:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="n">prototypes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span> <span class="c"># two toy categories</span>
<span class="n">members</span> <span class="o">=</span> <span class="mi">10</span> <span class="c"># how many items per category</span>
<span class="n">patterns</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">prototypes</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">m</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">members</span><span class="p">)):</span>
<span class="c"># for each item, create a pattern that has noise as a function of the</span>
<span class="c"># number of items. First item in category has no noise, then 0.05 SD of</span>
<span class="c"># noise, then 0.1 SD, and so on.</span>
<span class="n">patterns</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">p</span> <span class="o">+</span> <span class="mf">0.01</span> <span class="o">*</span> <span class="n">i</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">p</span><span class="p">))))</span>
<span class="n">layers</span> <span class="o">=</span> <span class="mi">20</span> <span class="c"># how many layers we want, i.e., how deep is the network</span>
<span class="c"># random weights:</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">prototypes</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="nb">len</span><span class="p">(</span><span class="n">prototypes</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span> <span class="o">*</span> <span class="mf">0.1</span>
<span class="k">for</span> <span class="n">pat</span> <span class="ow">in</span> <span class="n">patterns</span><span class="p">:</span>
<span class="c"># for each pattern</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">l</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">layers</span><span class="p">)):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="c">#if we are at the input layer, then set units to pattern</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">pat</span>
<span class="c"># propagate through each layer</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">w</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="c"># pre-synaptic states in n</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="c"># post-synaptic states in n</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">layers</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="c"># print the layer, the first five features of the pattern applied at</span>
<span class="c"># input and the first five activations in the last layer</span>
<span class="k">print</span> <span class="n">i</span><span class="p">,</span> <span class="n">pat</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">],</span> <span class="n">n</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Even just by eye-balling the output in the terminal using <a href="https://github.com/oliviaguest/random-network">the code above</a>, we can see that indeed similar items (items within the same category) map to similar outputs, i.e., the network is functionally smooth without any training. We used a <a href="https://github.com/oliviaguest/brain-imaging-and-the-neural-code/tree/master/random-network">more complex version of the above</a> to demonstrate this principle in <a href="http://dx.doi.org/10.7554/eLife.21397">Guest and Love (2017)</a>, where we calculate the correlations between the representations in the input space and in each layer.
However, as we move deeper into the network, we see that functional smoothness has broken down and the network gives for all intents and purposes identical outputs for each items within a category, thus losing all structure within it.
We cannot looking just at the output, predict which input generated it, only which category.</p>
<p>Using this result we can infer that the property of Inception-v3 GoogLeNet, and indeed any similar deep network, which causes it to both display (at early layers) and gradually lose functional smoothness (at deeper layers), is due to the nature of the architecture and not the learning rule.
Because this property is present in simple untrained networks, it cannot be a byproduct of training.</p>
<p>Importantly, randomising weights can be done to any network with any topology, including to Inception-v3 GoogLeNet itself, to recurrent networks, and so on.
We hope this idea proves to be a useful exercise to others too, as many connectionist and deep network accounts would benefit from an understanding of the inherent properties of the topological configuration versus the fully-trained model.</p>
Mon, 06 Mar 2017 00:00:00 +0000
http://neuroplausible.com/random-network
http://neuroplausible.com/random-networkUsing the Gini Coefficient to Evaluate Deep Neural Network Layer Representations<p>Sparsity is an issue in neural representation and we think it should be measured in artificial neural networks to understand how they are representing information at each layers.
For example, are a few units doing the work or is there a distributed pattern across all units (i.e., overlapping units taking part in the representations of <i>cat</i>, <i>car</i>, etc.).
So in <em><a href="http://dx.doi.org/10.7554/eLife.21397">What the Success of Brain Imaging Implies about the Neural Code</a></em> we decided to use the <a href="https://en.wikipedia.org/wiki/Gini_coefficient">Gini coefficient</a>, inspired by its use in evaluating voxel activations, to uncover the degree of sparsity within each of the layers of Inception-v3 GoogLeNet.</p>
<p>The Gini coefficient is primarily used to give an idea of how wealth is distributed within a group of people, usually a whole nation.
But it can also be used more generally on a vector of numbers, a distribution, to describe how distributed values are (more on this below).</p>
<div class="float-right figure">
<object class="image" data="/img/posts/brain.svg" type="image/svg+xml">
<img src="{"svg"=>true, "dir"=>"/img/posts/brain"}.png" />
</object>
<div class="figure-caption">
<a href="https://elifesciences.org/content/6/e21397#fig2">Figure 2B</a> from Guest and Love (2017): "A deep artificial neural network and the ventral stream can be seen as performing related computations. As in our simulation results, neural similarity should be more difficult to recover in the more advanced layers."
</div>
</div>
<p>I looked around online for a dependable and fast Gini coefficient calculator in Python. Unfortunately, what I did find, while useful, were neither fast <a href="http://planspace.org/2013/06/21/how-to-calculate-gini-coefficient-from-raw-data-in-python/">nor bug-free</a>. So I decided to write <a href="https://github.com/oliviaguest/gini">one</a> myself!</p>
<p>We were dealing with relatively big data, as Inception-v3 GoogLeNet has quite a few layers, so I needed something with relatively low space and time complexity.
In terms of speed, my Gini calculator is quite a lot faster than (the <a href="https://github.com/pysal/pysal/issues/855">current implementation of</a>) PySAL’s Gini coefficient function (see <a href="http://pysal.readthedocs.io/en/latest/_modules/pysal/inequality/gini.html">the documentation</a>) and outputs are indistinguishable before approximately 6 decimal places. And it is slightly faster than the <a href="http://www.ellipsix.net/blog/2012/11/the-gini-coefficient-for-distribution-inequality.html">Gini coefficient function by David on Ellipsix</a>.</p>
<p>The <a href="https://github.com/oliviaguest/gini/blob/master/gini.py">Gini calculator function</a> I wrote is based on the third equation <a href="http://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm">here</a>, which defines the Gini coefficient as:</p>
<script type="math/tex; mode=display">G = \dfrac{ \sum_{i=1}^{n} (2i - n - 1) x_i}{n \sum_{i=1}^{n} x_i},</script>
<p>where <script type="math/tex">i</script> is the index for each data point <script type="math/tex">x_i</script> and <script type="math/tex">n</script> is the total number of data points.
For a very unequal sample, e.g., with 999 zeros and a single one, the Gini coefficient is very high (close to 1). For uniformly distributed random numbers, it will be low, around 0.33. While, for a homogeneous sample, the Gini coefficient is 0. In other words, the lower <script type="math/tex">G</script> is the more equal the distribution of wealth/numbers is. Check out the <a href="https://github.com/oliviaguest/gini/blob/master/README.md">readme file</a> for <a href="https://github.com/oliviaguest/gini/blob/master/README.md#examples">examples</a> of what can be passed to the <code class="highlighter-rouge">gini()</code> function.</p>
<p>The Gini calculation by definition requires non-zero positive (ascending-order) sorted values within a 1-dimensional vector. This is dealt with within the <a href="https://github.com/oliviaguest/gini/blob/master/gini.py">gini function</a>. So these four assumptions can be violated, as they are controlled for:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="k">def</span> <span class="nf">gini</span><span class="p">(</span><span class="n">array</span><span class="p">):</span>
<span class="s">"""Calculate the Gini coefficient of a numpy array."""</span>
<span class="c"># All values are treated equally, arrays must be 1d:</span>
<span class="n">array</span> <span class="o">=</span> <span class="n">array</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
<span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">amin</span><span class="p">(</span><span class="n">array</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="c"># Values cannot be negative:</span>
<span class="n">array</span> <span class="o">-=</span> <span class="n">np</span><span class="o">.</span><span class="n">amin</span><span class="p">(</span><span class="n">array</span><span class="p">)</span>
<span class="c"># Values cannot be 0:</span>
<span class="n">array</span> <span class="o">+=</span> <span class="mf">0.0000001</span>
<span class="c"># Values must be sorted:</span>
<span class="n">array</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">array</span><span class="p">)</span>
<span class="c"># Index per array element:</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="n">array</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="c"># Number of array elements:</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">array</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="c"># Gini coefficient:</span>
<span class="k">return</span> <span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">((</span><span class="mi">2</span> <span class="o">*</span> <span class="n">index</span> <span class="o">-</span> <span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">array</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="n">n</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">array</span><span class="p">)))</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></figure>
<p>And that is all there is to it! The only two inviolable assumptions it makes is that you have <a href="http://www.numpy.org/">numpy</a> installed and that you send it something like a numpy array (use <code class="highlighter-rouge">np.asarray()</code> to check if what you have is <a href="https://docs.scipy.org/doc/numpy/user/basics.creation.html#converting-python-array-like-objects-to-numpy-arrays">array-like</a>).</p>
<p>But what does this have to do with artificial neural networks? Well, instead of people within a nation, we can consider the units within a layer. And instead of people’s wealth we can look at units’ activations after we have propagated input to the layer. So given an input to a layer, we can measure how sparse (unequal) the distribution of activations is. A single number can give us an idea of how localist or distributed the representation the layer has learned is. Averaging over the Gini coefficients for all the possible inputs to a layer, we can calculate how localist or distributed the representations within a layer are in general.</p>
<p>Inception-v3 GoogLeNet has output that is trained to be completely sparse/localist, since it uses <a href="https://en.wikipedia.org/wiki/One-hot">one-hot coding</a> for the classes. Representing the output classes using one-hot coding ensures that outputs are trained to be both orthogonal and localist (two properties which are not by definition mutually inclusive). In terms of the targets it learns per input image, the network’s output will have a Gini coefficient of approximately 1. And in general, we can expect the output’s Gini to be close to 1, except in the very rare cases where the network is completely unsure of what we have shown it.</p>
<p>On the other hand, on other/lower layers, we find that the Gini coefficient can be high or low. It decreases and increases non-monotonically as a function of layer depth.
Although it does show a rough trend of becoming higher as we move deeper, it is by no means a given.
What this implies is that the network is not representing things by definition in a more localist way as we move towards deeper/later layers.
In the two layers we talked about in the aforementioned <a href="http://dx.doi.org/10.7554/eLife.21397">Guest and Love (2017)</a>, the network has a Gini coefficient of 0.579 for the penultimate layer and 0.947 for the shallower layer (on the specific stimuli we used). At the end the average Gini for the output is, as expected given the training regime, 0.941. These and other points with respect to the representational contents of each layer are discussed in depth in <a href="http://dx.doi.org/10.7554/eLife.21397">Guest and Love (2017)</a>.</p>
<p>See here for a translation of this article by Daniel Morales into Spanish: <a href="http://www.neuromexico.org/2017/03/18/el-coeficiente-de-gini-como-herramienta-para-evaluar-las-representaciones-de-las-capas-en-redes-neuronales-profundas/">El coeficiente de Gini como herramienta para evaluar las representaciones de las capas en redes neuronales profundas</a>.</p>
Sun, 26 Feb 2017 00:00:00 +0000
http://neuroplausible.com/gini
http://neuroplausible.com/gini