Complexity

One of my fabulous colleagues has started a book club on campus where a group of us work through Advanced R by Hadley Wickham. After the day I learned about the tidyverse, this Advanced R book club has been the biggest set of leaps I’ve been making in my R skills, and I’m probably only understanding about a fifth of it.

This week we began the chapter on functional programming – and Ian’s code and examples are on github. I went home and spent the evening doing this:

I was playing about with some #purrr tonight and after some truly questionable cross-validation I believe that the greatest predictor of #GBBO winners is them having a lower % of times in the bottom of the technical #rstats #imsobadatstats #andvisualisations #rstats pic.twitter.com/1JOHevwSZQ
— Jill MacKay (@jilly_mackay) April 17, 2019

There was one example that Ian drew up that I can’t stop thinking about from a teaching perspective. Teaching stats is really, really intimidating, because the more you know about it, the more you recognise how subjective it can be. I often see people take refuge in complexity where they refuse to answer a learner’s question in favour of reiterating the memorised textbook response. I’ve done this myself! At the same time, I’ve had a really intriguing stats challenge with a colleague where I’ve gone around the houses trying to make sure I can justify our choices.

This comes down to model selection, which is one of the most Fun(™) conversations you can ever have about statistics. The more I learn about statistics the more I feel that model selection is the personification of this tweet from my colleague:

As someone who has a job *today* and who indeed has had one for a few years, it has never been about “recall and recognition”. I learned all the skills listed and at a time when technology was minimal. This is not new, innovative, or disruptive. https://t.co/44T5XyemcB pic.twitter.com/2GdWYTia3u
— Anne-Marie Scott (@ammienoot) April 17, 2019

You see, there really are no ‘right’ answers in model selection, just ‘less wrong’ ones. This is the subject of a lot of interesting blogs. One of them is David Robinson’s excellent ‘Variance Explained’.

Another of @drob’s posts that I’ve linked to before I’m sure is this one: Teach tidyverse to beginners. This idea fascinates me. David (and I feel I can call him David because I once asked him a question at a demo and he said it was a good question and it was honestly one of the highlights of my life) suggests that students should have goals, and they should be doing those goals as soon as possible.

I don’t know how much educational training the Data Camp/RStudio folks have but I’m always really impressed with the way they teach.

(It’s important here to take a moment to acknowledge the problems Data Camp is having at the moment regarding how they addressed a sexual harassment complaint. I have the utmost sympathy for all involved, and at the moment I don’t feel that boycotting Data Camp is the answer, but it’s worth pointing towards blog posts like this one to give a different opinion.)

‘Doing’ as soon as possible is something we struggle with in higher education. I’ve just had to rewrite a portion of a paper to defend why I think authentic assessment is so vital for science. We put ‘doing’ at the top of our assessment pyramids, and talk about how it takes us a long time to get there.

During this week’s bookclub, my colleague Ian had a great example of using the broom and purrr packages in R to fit multiple models to a dataset quickly and easily. And I had to derail the conversation in the room for a bit. Why don’t we teach this to our students straight away? At present, the way I teach model selection is a laborious process of fitting each model one by one, examining the results individually, and then trying to get those results into some kind of comparable format. After some brief discussion, with all the usual sciencey caveats, our Advanced R bookclub was all keen to use this as a way of introducing model selection to students.

I feel as though this is tickling at the edge of something quite important for higher education, especially for the sciences. Something about empowering students, and getting them to ask me about things I don’t know the answer to more quickly. I also feel just a little irate about the fact I can’t formalise this as nicely as I know David Robinson and the RStudio lot can. I kind of feel like some of the most useful stuff I’m doing lately is in the Open Educational Resources range, such as my Media Hopper channels and on my GitHub. There’s a freedom in OERs to push the boat, and to start teaching the complex things first.

And ultimately, my disjointed ramblings might just help someone else connect a few dots. Happy spring, people!

Leave a Reply Cancel reply