I have the beginnings of some thoughts about teaching statistical modelling

One of my fabulous colleagues has started a book club on campus where a group of us work through Advanced R by Hadley Wickham. After the day I learned about the tidyverse, this Advanced R book club has been the biggest set of leaps I’ve been making in my R skills, and I’m probably only understanding about a fifth of it.

This week we began the chapter on functional programming – and Ian’s code and examples are on github. I went home and spent the evening doing this:

There was one example that Ian drew up that I can’t stop thinking about from a teaching perspective. Teaching stats is really, really intimidating, because the more you know about it, the more you recognise how subjective it can be. I often see people take refuge in complexity where they refuse to answer a learner’s question in favour of reiterating the memorised textbook response. I’ve done this myself! At the same time, I’ve had a really intriguing stats challenge with a colleague where I’ve gone around the houses trying to make sure I can justify our choices.

This comes down to model selection, which is one of the most Fun(™) conversations you can ever have about statistics. The more I learn about statistics the more I feel that model selection is the personification of this tweet from my colleague:

You see, there really are no ‘right’ answers in model selection, just ‘less wrong’ ones. This is the subject of a lot of interesting blogs. One of them is David Robinson’s excellent ‘Variance Explained’.

Another of @drob’s posts that I’ve linked to before I’m sure is this one: Teach tidyverse to beginners. This idea fascinates me. David (and I feel I can call him David because I once asked him a question at a demo and he said it was a good question and it was honestly one of the highlights of my life) suggests that students should have goals, and they should be doing those goals as soon as possible.

I don’t know how much educational training the Data Camp/RStudio folks have but I’m always really impressed with the way they teach.

(It’s important here to take a moment to acknowledge the problems Data Camp is having at the moment regarding how they addressed a sexual harassment complaint. I have the utmost sympathy for all involved, and at the moment I don’t feel that boycotting Data Camp is the answer, but it’s worth pointing towards blog posts like this one to give a different opinion.)

‘Doing’ as soon as possible is something we struggle with in higher education. I’ve just had to rewrite a portion of a paper to defend why I think authentic assessment is so vital for science. We put ‘doing’ at the top of our assessment pyramids, and talk about how it takes us a long time to get there.

During this week’s bookclub, my colleague Ian had a great example of using the broom and purrr packages in R to fit multiple models to a dataset quickly and easily. And I had to derail the conversation in the room for a bit. Why don’t we teach this to our students straight away? At present, the way I teach model selection is a laborious process of fitting each model one by one, examining the results individually, and then trying to get those results into some kind of comparable format. After some brief discussion, with all the usual sciencey caveats, our Advanced R bookclub was all keen to use this as a way of introducing model selection to students.

I feel as though this is tickling at the edge of something quite important for higher education, especially for the sciences. Something about empowering students, and getting them to ask me about things I don’t know the answer to more quickly. I also feel just a little irate about the fact I can’t formalise this as nicely as I know David Robinson and the RStudio lot can. I kind of feel like some of the most useful stuff I’m doing lately is in the Open Educational Resources range, such as my Media Hopper channels and on my GitHub. There’s a freedom in OERs to push the boat, and to start teaching the complex things first.

And ultimately, my disjointed ramblings might just help someone else connect a few dots. Happy spring, people!

And May All Your Dreams Come True

For as long as I can remember, I have wanted to be a writer.

As a child, I filled endless notebooks with my stories. They were mostly stories about animals, or thinly veiled replicas of Lord of the Rings. I may even have tried my hand at the odd love story. At school, I kept a private tally of how often my essays were read aloud, or made a teacher cry. I love the written word.

When I was 29 years old, an editor approached me and asked me to write a book. That book, Animal Personalities, is currently available for pre-order.

Of course, when you achieve your childhood dreams, a weight lifts from your heart, a divine confidence settles in your soul, and you never again doubt yourself or your abilities. You become as happy as you always believed you would be . . .

I recently wrote a short case study about being a postdoc for Edinburgh’s “Thriving in Your Research Position” document from the Institute of Academic Development. In the case study, I talk about a spectral figure who has haunted me throughout my whole career: the Perfect Postdoc. She is always better than me. When I wrote my book, she somehow wrote a better one. She’s like a funhouse mirror version of me, and when I change, so does she. I’ll never be able to outdo her.

If you’re a long-term reader of this blog, you’ll know I’ve been thinking about failure lately. I explored my failures as an animal trainer, and meditated on how academia breeds an anti-failure culture. I’m also critical of the idea that all scientists have to be specialists – I’m not a specialist. I’m interdisciplinary and I love it. This leads me to another area of my academic life where the Perfect Postdoc is always one step ahead of me.

The Perfect Postdoc understands R much better than I do. I’ve spoken before on this blog about my frustrations while trying to learn R. While I have taught research methods and statistics for several years now, I’ve always hesitated to teach R. I’ve hesitated because, well . . . because I’m not brilliant at it. My code is ugly and often cobbled together, and I often find the community around R, places like stack exchange and stack overflow, are hideously unfriendly.

I’ve been lucky enough enrol on the Leadership Foundation for Higher Education’s woman-only Aurora programme this year. The first session was called Identity, Impact and Voice, where we explored how we can make a difference in our workplaces and communities. There were two-hundred plus women at the Aurora event in Edinburgh this month, and so many of us spoke about being afraid of ‘not being the best’.

The curious thing is, when I was listing my strengths, I never said I was “the best at [thing]”. My strengths are my communication skills, the fact I’m approachable, and my willingness to try new things. I firmly believe that in five years time anyone who doesn’t have R skills is going to find it very difficult to get a job in academia. Hiding my bad code means I’m not contributing to the R conversation happening right now. I have a voice. And I can have an impact too.

Hadley Wickham, who wrote some fabulous R packages, says:

So with that in mind, I’m going to start sharing my own R teaching materials more widely.  You can find my resources on Github (scroll down to find direct links to the exercises). The worst that can happen is that someone tells me my code is ugly. The Perfect Postdoc’s code is of course much prettier, but do you know what? Just like writing my book, writing that exercise was pretty fun.

Glory in your bad code. Glory in saying “I don’t know how to do that” in your local programming club meetings. Glory in your voice. There is nothing else like it.


Statistics Continued

Interestingly enough after last week’s post there is a brilliant article in the BBC magazine about doctors and their understanding of statistics.

Gerd Gigerenzer is one of those names in statistics I trust. His discussion of risk is fascinating. Take the example mentioned in the article:

As a doctor, you know the following facts to be true:


  1. The probability that a woman has breast cancer is 1% (“prevalence”)
  2. If a woman has breast cancer, the probability that she tests positive is 90% (“sensitivity”)
  3. If a woman does not have breast cancer, the probability that she nevertheless tests positive is 9% (“false alarm rate”)



When a 50 year old female patient, who has no other symptoms of breast cancer, has a routine mammogram, she tests positive. Alarmed, she asks you what her risk is? Which of the following is the best answer?

  • nine in 10
  • eight in 10
  • one in 10
  • one in 100


If, like me, you read this at lunch with a box of strawberries with one eye on your MOOC numbers, you probably said ‘nine in ten’. In fact the answer is ‘one in ten’. Why is this the case?

Well first remember that if there are a hundred random women in a room, the prevalence of the disease in the population suggests that one of them will have breast cancer. Second, remember that if we test the same hundred women, we will have nine women testing positive who don’t have the disease, and the woman who does have the disease has a 90% chance of testing positive (meaning that it’s possible she won’t test positive).

So with no other symptoms to go on, and remembering that it’s likely that 10 of our hundred random women would test positive (one because she does have cancer and the other nine because they get false positives), the best estimate of whether this patient has cancer is actually one in ten. She might be the true positive. But nine times out of ten she’s the false positive.


It’s an excellent teaching opportunity and the maths make sense when you think about it, but it’s keeping the populations separate in your head that makes it difficult.

In other news, I picked up Andy Field’s ‘Discovering Statistics Through R’ and I’m really enjoying it so far.

How To Teach Me Statistics

A few weeks ago I was swearing at my computer and had to go buy a Twix bar from the canteen to calm myself. There was some frantic chocolate scoffing that afternoon.

The source of my irritation? Statistics. I am not a great wielder of statistical power, but I am very interested in their dark arts. This leads to the common situation where I know I’m doing something wrong, such as using stepwise regressions to build a model, the fact I use frequentist over Bayesian probabilities, and even my over reliance on P Values to communicate scientific results, but I just don’t know how to do it better.

I’m expecting there are three reactions to that sentence. The first is “I don’t have a clue what any of that means”. Don’t worry, my grasp of it is very shaky, and it’s not something I’ve ever been taught. It’s something I’ve discovered through hanging out with statisticians.

The second is “Man, I have that exact same problem, but every time I try and learn how to do it, I can’t figure it out.” My friends we are in the same boat. I do not feel I have enough statistical training to tackle these problems.

And lastly the third kind of person is reading that and thinking “Well obviously the answer is *string of gibberish*”

I have had good stats teachers, but they are sadly few and far between, and there are a lot of poor stats teachers who get in there in the mean time and deeply confuse me. I have a lot of good friends who try to teach me and I end up glazing over. What I mean to say is that the following is not personal – and it’s as much a criticism of myself as those who have tried to teach me . . .

Loads of statistically savvy people are willing to teach, they just don’t seem to get it through to me. So seeing as I’m supposed to be quite good at this education malarky, here’s my guide to teaching me statistics.


Make Sure We’re Speaking a Common Language

Yes, we really have to start with the basics here. Statistical language is incomprehensible to me. And that’s because we’re all taught differently.

As an example, I refer to response variables as ‘y’ and explanatory variables as ‘x’. A good friend of mine refers to explanatory variables as ‘y’ and response variables as ‘a’ or ‘b’. This causes huge confusion whenever we ask one another stats questions off the cuff.

And the common language refers to more than just making sure I understand what your big formulas are saying. This is what the homepage of R looks like. R is a sophisticated and free statistical tool that we should all be using. I’ve seen more intuitive GeoCities layouts. This is written by and for coders and I have to explain how to extract a zip file to some of my colleagues.

Why are you writing your R manual or your page about your fancy new statistical technique? Are you trying to share it with others who think like you? Fine, carry on. Are you trying to improve the statistical techniques used by frustrated, busy scientists who haven’t had more than a few week stats CPD a year?

Use your words.

Now the R Book is a good start for people wanting to learn R but I still wish it was written by Andy Field, who’s Discovering Statistics book is still my favourite bible, even though I don’t use SPSS anymore. If you’ve read both, you’ll see the difference in style is extreme, and I think it’s because, as a social scientist, Field has a better grasp of how people think. (Although speaking of GeoCities sites . . . I still love the book!)

Edited to Add: I lie! Andy Field has written an R textbook, which I have just bought! Thanks to Comparatively Psyched for the heads up! 


Teach Me Something I Can Use

This may seem counterintuitive to what I said further up, but if you’re trying to teach me, say, an alternative method to a stepwise regression, don’t just give me a dataset and tell me the code to run.

Tell me how to arrange my dataset in the way its needed. Ask me questions about my data – get me thinking about the complexities of the experiment I designed. And then tell me the code to run. Don’t forget to walk me through the output. For example, the documentation for the lars package in R explains how I can run a least angle regression on a sample dataset. Great. I can copy and paste that code ad libitum. Can I get it to work on my data? Even though to the best of my knowledge I’ve arranged it in the same way? Nope.

Get me to work through the whole process and you show me where your new method fits into my life.


What’s the Application?

I recently sat through a stats seminar where someone was showing off a new method. In the same presentation they briefly glossed over ternary plots as a way of showing off new data.

Applied scientists work in a world that judges us on the number of papers we produce and the impacts our papers have. That is literally how we get our baseline funding.

I don’t disagree that there are lots of problems with publishing but you’re asking me to relearn how I think about statistics, and then to communicate all this in a real-world paper with real-world data (that doesn’t always play nicely). If you’re asking somebody to use an amazing new technique, you’re asking them to get that past reviewers (who more often than not will not know your new stats).

If you have a great technique but it won’t actually give me a conclusion that I can use to improve animal welfare, then it’s not going to help me. And related to this . . .


What does it Mean?

The truth of the matter is that the statistical tests we commonly use are ‘plug and play’. We get into the habit of checking the things we want to look at noting the laundry list of caveats in a footnote.

Walk me through an example of what my results mean. If you’ve got me using my own data, tell me if this result confirms or denies my hypothesis, show me why, give me some indication of the next step.

I’m amazed at how many people don’t do this when trying to explain stats to me. You’re interested in the method, I get that. I’m fascinated by recording aggression in groups, but there’s a time and a place to discuss this, or just to tell you what aggression means.


Don’t Assume I’m Stupid

I see this all the time when statisticians are trying to teach something to scientists. They spend a very long time on the basics because our fundamentals are so scattered. This is not the most helpful approach. The other method I often see, when I say I don’t understand or even hesitate, the statistician repeats what they’ve said, more slowly and slightly louder.

We’re not stupid. Try teaching us a complex problem in an environment we’re familiar with (i.e. with our own data) and you’ll be surprised how many fundamental skills we’ll pick up because of it. To use a simple analogy, if you wanted to teach me how to maintain a car, wouldn’t you be be better off showing me how to take an engine apart rather than build one from scratch?

Don’t spend half our time explaining the problem to me – I get that there is a problem with the statistics I already use, it’s why I’ve sought you out. Is a finer understanding of the theory really going to help me use this test in future?


Finally – Why Are You Teaching Me?

This blog post sounds very whiny. Trust me, I know.

I know I should have learned all this earlier in my career. I know I should use R every day until I’m fluent. I know I shouldn’t using all these out of date stats. But the sad truth is that I haven’t, I don’t and I can’t.

I want to change, and I need the great community of statisticians to help me. So if you’re a statistician who wants to help me and people like me, this is how I’d suggest doing it.

Good luck!

Personalities – Part One

One of my all time favourite topics is that of animal personality. In fact my PhD was centred around animal personality, using some nifty new technology to explore the phenomenon. Most of my papers are about how personality affects the lives of cows.

Don’t laugh. That’s genuinely what my PhD is in.

There are actually plenty of production and welfare reasons to study this in cattle, but today I want to talk to you about one of the basic concepts of personality.

Let’s Talk Science

You’ve heard people talk about personality traits or dimensions (I’ll use traits for the rest of this article), but what do you know about personality traits? I’m going to give you a very complicated sciency sounding sentence here, and by the end of this article, I think you’ll understand it.

Are you ready?

Are you sure?

Personality traits are a statistical construct based on the behavioural variation displayed number of individuals sampled.

Let me explain . . .

Continue reading “Personalities – Part One”