Based in Sydney, Australia, Foundry is a blog by Rebecca Thao. Her posts explore modern architecture through photos and quotes by influential architects, engineers, and artists.

Episode 140 - Why the US Election Polls Are Tricky with Alex Andorra

Episode 140 - Why the US Election Polls Are Tricky with Alex Andorra

With the presidential election coming up soon, the US election polls are on everyone's minds. In this episode, Max talks to Alex Andorra of PollsPosition to discuss why election polling is so tricky. Together, they look at the US elections and tackle what could go wrong with the polls for the US presidential election.

Listen to this episode to get a better picture of how US election polls work!

About Alex Andorra

Alex Andorra is a data scientist; a core developer for two Python packages, ArviZ and PyMC3 and the host of the podcast Learning Bayesian Statistics. He is also an election modeler on PollsPosition, which forecasts France's elections.

Connect with Alex on Twitter!

Here are three reasons why you should listen to the full episode:

  1. Learn about how polling works and why it isn't easy.

  2. Understand how electoral modeling works to address uncertainty.

  3. Get an idea of how you can watch and interpret the upcoming US election polls.

Resources

Related Episodes

  • Episode 98 — Bayesian Inference and Political Trends with Alex Andorra

  • Episode 126 — Electoral Systems: Models vs. Reality

Episode Highlights

What Alex Does with Poll Data

  • PyMC (Python Monte Carlo) is an open-source Python package that allows people to do Bayesian or probabilistic programming.

  • He uses these packages for models such as electoral forecasting.

  • On Polls Position, Alex and his friends use socio-economic data and polls to forecast the elections.

Why Achieving a Consensus Is Difficult

  • The difficulty at agreeing on a poll arises from both selection bias and the nature of polling itself.

  • Alex disregards any politician or non-data-focused media talking about polls as they tend to exhibit selection bias.

  • People may not understand how polls are conducted.

  • There is also a tendency to be opinionated on whether a poll result is good or bad.

Problems with Polling

  • Random sampling is never truly random.

  • Selection bias arises through the method used to contact people or which subpopulations pollsters choose to survey.

  • Systemic bias may also come up when an election forecasting model's assumptions turn out to be false.

  • Non-response bias might exclude specific demographics of people who were unable to answer the poll.

  • Tune into this episode to hear Alex talk about challenges such as partisan non-response bias!

Adjusting for Uncertainty

  • Statistical models can take into account the sources of uncertainty in a way that your mind cannot.

  • Don’t overreact to immediate results after debates; these still have to be adjusted for biases and the like.

  • You have to look at the average result of polls and keep the magnitude of effect in mind.

Reweighting Practices in Polling Samples

  • Pollsters use reweighting practices so samples can accurately represent the population.

  • Having a smaller sample of an underrepresented group may skew your results or lead to a higher chance of outliers.

  • You must also choose which factors are reweighted, which may also affect your polls.

  • A statistical model's assumptions may be corrected.

  • These practices are needed because raw data on its own is not actionable.

A New Method of Statistical Modeling

  • Andrew Gelman's method, multilevel regression and poststratification (MRP), was created to balance polling samples' problems.

  • You can use demographic data to reweight the polling sample.

  • Using MRP will allow you to forecast at the state level, for example, even if your poll was on a national scale.

  • To do this, you will need detailed polling data of good quality.

Poll Fiascos in Politics

  • During the French presidential elections of 2002, a far-right candidate reached the second round.

  • This event was a massive shock in French politics, as pollsters did not see this coming in the data.

  • Alex studied polling averages for elections from 2002 to 2017 and noted that pollsters now overcompensate by overestimating the far-right party in the polls.

  • Having a model enables you to address polling errors.

  • To hear more about Alex's insights on polling in the United States, listen to the episode!

Alex's Thoughts on the 2016 US Elections

  • The 2016 national polls were better than in 2012 but predicted the wrong winner.

  • This result has led people to believe that the national polls were terrible when the issues is with state polls.

  • The problem arose from polls being off in some key states since people vote locally in the United States

  • The national polls help give us an idea of what people think nationally but do not account for all the uncertainty and correlation between states.

Watching the US 2020 Elections

  • Look at how the averages and electoral models are moving instead of focusing on individual polls.

  • As the elections draw near, there is less uncertainty around the polls.

  • Every day Biden is ahead by 8 or 9 points would be a good day as it gives the opposition less time to catch up.

  • Alex notes that the current news cycle involving COVID-19 is not beneficial for Trump.

  • Jump into this episode to hear more of Alex's thoughts on how the US 2020 elections may turn out!

Final Thoughts on Polling

  • There is always uncertainty that electoral models would not be able to predict

  • Instead of having set opinions on a poll’s quality, it is better to think in terms of uncertainty and probability.

5 Powerful Quotes

“That's why it's interesting to have models to forecast the elections, because a model can take all of that uncertainty in consideration.”

“Keep in mind that we have to do [reweighting], because the raw data is not actionable [...]. If the raw data was a pure random sample of their population, then you can be sure that pollsters wouldn't care for all this muddling.”

“National polls are interesting to get a sense of where the race is at the national level, but they don't account for all the uncertainty and all the correlations between states.”

"[...] What I look for is more, 'How are the averages and models moving?' I usually don't pay that much attention to individual polls again, because it's more noise than signals, and especially because you have so many polls in the US."

“This is a hard job, and I think having really set opinions on how polls are good or bad, and how models are useful or not, is really not warranted here. And I think we would do better if we all had more uncertainty and felt more probabilistically.”

Enjoy the Podcast?

Are you hungry to learn more about polling and electoral modeling? Subscribe to this podcast to get regular insights about A.I., technology, and society.

Leave us a review! If you loved this episode, we want to hear from you! Please help us reach more audiences to bring them fresh perspectives on society and technology.

Do you want more people to understand the US election polls? You can help us reach them by sharing the takeaways you've learned from this social media episode! 

You can tune in to the show on Apple Podcasts, Soundcloud, and Stitcher. If you want to get in touch, visit the website, or find me on Twitter.

To expanding perspectives,

Max

Max talks to Alex Andorra of pollsposition.com to discuss why election polling is so difficult, and what could go wrong with the polls in the US presidential election.

Transcript

Max Sklar: You're listening to The Local Maximum Episode 140.

Time to expand your perspective. Welcome to The Local Maximum. Now here's your host, Max Sklar. 

You’ve reached another Local Maximum. Welcome to the show. Today we are going to welcome back a guest on the program to talk about why all these election polls are so tricky. I know I’ve had had lot on the election recently. We only got a few weeks to go.

He is a data scientist, developer of PyMC three, and an election modeler at pollsposition.com, which focuses on French elections in France. So we'll have a look at our, my US election from a distance perhaps. So let's welcome back to the show, Alex Andorra. Hi, Alex, you've once again reached The Local Maximum. Welcome to the show.

Alex Andorra: Yeah, thanks, Max. And I'm worried because it seems like I'm stuck at The Local Maximum. And usually that's not good.

Max: Hopefully it's a higher local maximum than last time.

Alex: I hope so.

Max: But yeah. Well, you do a lot of data analysis with polls. And guess what? There's been a lot of interest in that this month.

Alex: Oh really? Is that what’s happening?

Max: I don't know. I don't know. I get a lot of requests for this kind of thing. First of all, before we go into that, tell me a little bit about what you do. You take polling data. What do you do with this? Remind the audience or tell the audience for the first time what exactly it is that you do?

Alex: Yeah, so it's tied, actually, to the open source development that I do. So I'm going to start by saying that I'm a core developer of two Python packages, which are really great packages, of course, which are called RVs and PyMC. And PyMC stands for the Python Monte Carlo. So it's a package to do Bayesian, or more generally, probabilistic programming in Python.

And so I do open source development on this package. And I use a lot – this package – on  some models, and that I do, and some of these models are about electoral forecasting. And so I have these projects in France, which is basically, which is called pollsposition.com. And this is a project where I and a bunch of friends are trying to forecast elections that happens in France. And to do that, we use some socio-economic data. And also, of course, we use polls. And so that's why I guess you were interested to talk again today.

Max: Yeah. So I think this will be a good discussion, because I feel like it's hard for anyone to be objective about the US election, but maybe you could be a little more objective, because we're not going to get into some of the more specific – well, I mean, I want a more general view of what can and does go wrong and how this time might be different in the US. But we're just going to kind of, we're just going to understand the field that we're diving into a little bit better.

Okay, so first of all, there's been a lot of discussions about the polls for the 2020 election here in the US. Can we trust them? A lot of people ask. And people can't even agree on whether the polls were right or wrong in 2016. You know, people thought Hillary Clinton was gonna win. The pollsters did, but you know, some people say, well, the polls were way off. And other people are saying, “Well, no, they were just off by a tiny bit.”

So why is it so hard for people to come to a consensus here? Is it, do you think it's just people trying to spin it either way? Or do you think it's actually, there's actually something more, the nature of polling and the mathematics and the statistics of it that make it so difficult?

Alex: Yeah. Yeah, that's an interesting question. I think it's a bit of both. Well, first, I tend to completely discard any politician that's talking about polls, because you have usually a high selection bias there. They usually talk about the polls that go in their way. And some do that more than others. And it's not only the US. I can assure you in France, we have a lot of that too.

Max: But the media outlets do it, too. They'll say a new poll says, blah blah blah. And really, there have been 10 polls on that thing, but they decided to focus on that one.

Alex: Yeah. Yeah. Exactly.

Max: And you don't know. I would like them to focus on the one that is like, you know, that is the best. Like, tell me why this is significant. Like this one has the best methodology, but you almost get that in your head, but that might not be the case.

Alex: Yeah, yeah. You see, it’s exactly that. And I mean, you can see that they were in France, the same thing. And so, and what's funny also is that in France, sometimes I go to some French TV media, and they asked me about polls in the US, and so often they tell me, “Oh okay, because this last poll, in this last poll, it seems like Donald Trump is having a bump in his popularity, you know, and so and so.” It's always very funny.

So I think there is a bit of that. You know, these horse race thing that we all have in our brain. We like novelty. And we like things that are, that seems new and shocking. So this is the sport and yeah to go back on my train of thought, I usually discard any politician or any non-data focused media that's talking about polls.

And also the second part of that, and that's also why I discard that it’s that, basically, polling is hard. Polling is very hard and it's not intuitive. It's something that's quite hard to explain. And people usually don't really know how a poll is conducted and how then the different weightings are taken care of by the pollsters and so on. So maybe it's because also it's a difficult job, and it's a difficult industry, and it's a real challenge to have good polls consistently for all elections and so on. And the problem often is that people have their mind set on how polls are good or not, but they don't know how polls are conducted. So it's also difficult there. Because, yeah, people tend to have opinions on this when I think we should not have that strong opinions usually.

Max: Yeah. I'm gonna go over some ways polling might be hard here, but if I miss something, by all means, let me know. So I guess, in the ideal situation, you'd contact a random sample of voters and you ask who they are going to vote for, and they give you an answer. But well, first of all, even if you do that, you can have random variations just by pure statistical analogies, just by pure statistical variability. And I think that's what oftentimes the margin of error is only talking about – your statistical variability.

Alex: Exactly.

Max: But there's a couple things that could go wrong, even if you get that right. So first, there is no random sample. The sample is never completely random and is that a problem? And then secondly, the voters themselves, the people you're contacting, they might not even be voters. They might not be telling you exactly what they're going to do. And there is a lot of speculation here that Trump voters aren't telling the pollsters their true preference. Maybe some Biden voters are doing the same. I don't know. Is this something that you've seen before, say in French elections? Do these problems happen to creep in sometimes?

Alex: Well, yeah. All the time. I mean, that's also why it's super hard. And first, I want to caveat all that by saying that I'm in no way a pollster. I'm a statistical modeler. So I use both as a raw material in my models and then I do a bunch of statistical operations on these polls and other data to get some forecasts of the elections.

So that being said, I do, I am curious about polls and how they are conducted. And also what's good is that I don't have any skin in the game. I mean, I don't care if polls – when I say that polls are good or bad, you know I’m just using polls and see whether they have some predictive power. 

Max: Do you ever look at a poll though, and say, “Well, I wish this was conducted a little differently?”

Alex:  Oh, yeah. Definitely. In France, especially. I do that a lot. But to go back to your question, to your original question. Yeah, so these are really important topics. Because as you say, you can't truly have a random sample, a true random sample as you're having their statistics books. So the first, one of the first and obvious ways is that you get some selection bias in how you contact people. I mean, if you use only landlines, then it's a big problem, because most of the young people, usually they live in the city, and maybe they're from poor households, and maybe they're from some minority. So then you have a problem because your bias is associated with some demographics in the population. And that's a big problem. So clearly, if you see, honestly, if I see a poll that uses only landline phone, I'm usually more skeptical of it.

Max: Yeah, it's amazing they still do that.

Alex: Yeah. So it's good. I mean, what's good is to have a blend basically, but it costs more, because there is fees also. You know polls cost a lot of money. And so someone has to pay for the polls that you see somewhere in the media, and you don't pay for your media outlets usually. These cost a lot of money. The best is to have you know, a blend. I love, in France, sometimes you have polls that use face-to-face interviews, landline phones, mobile phones, internet. So that's awesome, but that's super rare. And it's really the best institutes, the best pollster that do that. So that's the first kind of bias that you have.

But you have also a lot of other biases that are also interesting. And as you say, there is this voter versus just all adults. You know you have polls that interview the population of all adults, and they don't differentiate whether these people will vote or not, and they don't have a model of whether these people will turn out or not. I think in the US, from what I know in the US, it's quite rare now. And this is usually really well-written in the study and usually pollsters can also do both. They have one poll with all adults, and then one poll with the potential voters, but this is super important because –

Max:  I think they say they either do registered voters or likely voters.

Alex: Yeah, exactly. See, you have that. So you have all adults, then you have registered voters, then you have likely voters. So you'll have to be careful about that. Usually, it's better to use that polls that use likely voters when you're close to the election, like right now. Because, well people know more about whether if they will vote or not. And also, but one thing that you have to keep in mind there, it's that this is kind of like a statistical model that pollsters use. You know, they have models of election turnout. And, of course, if you do a model, you're having some assumptions. And if you're having some assumptions, and these assumptions turn out to be wrong, well, then your polls can have a systematic bias. So you have to be careful about that.

But basically, that's also why Bayesian Statistics is super important in elections and election forecasting. It's that you have uncertainty sources everywhere that creep in. And so you have really to keep in mind that one poll is really uncertain and you have to look at the average of all of them. And even this average has a lot of uncertainty around it. So really, that's what makes it a very difficult job, and that's also what makes it, I think, a very interesting job. But you have to keep that in mind all the time.

And there are also, if you want, I can also talk about, I think two interesting challenges here are with a non-response bias that we talked about, and you can’t have that because first, as we say, maybe you say like take a sub-sample of the population where people don't answer to the phone, or they don't have the internet. So if they don't have the internet, usually they're quite old. But then if you don't get the old people in your sample, that's a problem. So that's one part.

And there is another part that is an interesting part to have in mind. This is called the partisan non-response bias. And this is actually a phenomena that is, usually partisans of candidates tend to not answer polls when their candidate is in a bad news cycle.

Max: Oh, interesting. So if some bad news about your candidate is out, you're not going to be likely to talk to the pollsters?

Alex: Yeah, usually they're like, you can see that as they are demoralized for a bit. And so they tend to not want to answer their phone and not want to answer a pollster.

Max: So when the polls go up and down, it might not actually be people changing their minds. It might be just people changing their behavior in terms of whether they're talking to the pollsters or not.

Alex: Yeah and actually you can adjust for that in the models that you do. Well, that's why it's interesting to have models to forecast the elections because a model can take all of that uncertainty in consideration whereas you're homo sapiens’s brain can't do that consistently. And although they are all that, so that's why it's interesting to have the model.

But yeah, basically, that’s also why it's important to not overreact to so-called convention bumps or debate, after-debate bumps, etc., because you have this partisan non-response bias, but you can adjust for that. And also, I mean, it's not, we're not talking of an effect at the magnitude of I don't know, 10 points. I mean, if a candidate goes from, I don't know, plus five to plus 15 after some event. In the average of polls, this means if there is probably signal there. You have to keep the magnitude of the effect in mind.

Max: Right, right. So you have to kind of, you have to be able to balance between them. I've realized this with a lot of people now, including myself, that asked like, why I haven't taken any polls? Who is being polled? And we used to - like, 10 years ago, 20 years ago - we used to get calls all the time to do polls. And now it's like, is it because we're in states that don't matter? But that doesn't make any sense. Because even some of the states that don't matter do have polls that come out and there are national polls. I don't know what's going on.

Alex: Yeah, well, funny thing. I've never answered any poll either.

Max: So you're messing up your own models, huh?

Alex:  Yeah. But I think it's just, this is one of the funny things of randomness. Basically, I think we both live in huge cities with millions of inhabitants. We're probably not in an underrepresented group. And so all of these factors, you know, come in and basically, yeah. If you're a white male living in a big city, I think it's quite novel, that it's super rare that you're contacted for a poll, because keep in mind that polls usually have sample of around 1000 people. And these 1000 people are not all people living in New York, because otherwise the poll is really weird. And so yeah, if you do the math, I think it's 20 million people living in New York and 1000 people in the sample.

Max: Well, less than that, but we have a metro area. So another strategy we've seen is kind of these like observational and corrective studies.

Alex: Yeah.

Max: And that's a little bit, you know, you've done it. And I'm not sure how widespread that is. It might be everybody does this, but they kind of attempt to correct for that imbalance. They're like, well, if we don't have a, well, first of all, you do want to have underrepresented groups, like you said. You don't want to have, if you have 90% of the population that's all very similar and then 10%, where they're just very different types of people, then I think you want to focus on those 10%. Because among the 90%, you could probably get a good signal by polling them just a little bit.

So I used to do this in attribution for online marketing, where we were trying to figure out what people would have done had they seen the ad or what they would have done had they not seen the ad, and we were, we had to kind of focus on underrepresented groups and that sort of thing. So how common is this practice first of all? Do all pollsters do this? And what could go wrong there? Because I could tell you in marketing attribution, there's a lot that could go wrong.

Alex: Yeah. Yeah. So I think you're talking about reweighting practice that pollsters use. And yeah, of course, this is used by every pollsters, I'm guessing. Because, as we said in the beginning, basically, you can't have a random sample and polling is more and more difficult because people don't answer the phone anymore. And also, just getting people via the phone is not representative. So it's super hard now. It's harder to reach people randomly and to reach than random sample, a true random sample of the population.

So have to re-weight your sample. So that means that, for instance, imagine that there is 20% of the population of voters that are between 18 and 24. And in your poll, you're getting only 10% of the sample of your poll that are between 18 and 24. So that means that you will re-weight your poll, and you will basically count as two every person that is between 18 and 24. I'm simplifying a bit. But that's basically what these means - reweighting. That's quite transparent as a term. It's just that each person between 18 and 24 in the poll will weight as two. So the problem here, there are many of them, of course. First, you mentioned that if you, if only by chance you get the 10% of people between 18 and 24 that have a weird behavior compared to their class of age, then you were like –

Max: Just by randomness. Now your sample is much smaller, because it's not your total sample. It's like just your sample in this group. And even if your total sample you think is enough to be significant, this is, this could have some random –

Alex:  Yeah. Because if you look like, if you're like between 18 and 24, there won’t be a lot of them in your poll. And the lower the number, the higher the chance to have outliers. You know, it's like –

Max: If it has a big weight, then that outlier could outlie the whole poll.

Alex: Exactly. If you get, there was a story like that in 2016. I think it was a newspaper in LA, I don't remember the name, that did some polls, and they had, I think only one black voter in their sample. And this one black voter that said that he would vote for Trump, which was like super weird, because usually, the black voters really dislike Trump in the polls.

Max: But it was about 8%, so not impossible.

Alex: Yeah. And so that messed up, that messed up their sample, right? So that's one of the risks. Another risk is that you have to choose what, which factors you are re-weighting on, because you can't re-weight on everything. It's like you can’t say, yeah, I'm re-weighting for these and for that and for this and we’re basically re-weighting for every possible factor.

Max: Yeah, that's what we were trying to do for attribution. And it got like, it got crazy.

Alex: Exactly. You can’t do that. It's like trying to prove that something is not dangerous. Like, you can't prove that something that like, I don't know, a chemical has no risk. It's impossible. So you have to choose which weights, which factors you are re-weighting.

And so these can mess up your polls. Like, for instance, in 2016, one of the conventional wisdom is that polls didn't reweight enough for education. And this is because education has been more and more indicative and predictive of what you're gonna vote for, which party you're going to vote for, but this is quite new in the US, because before it wasn't as polarized on education. But it seems like it's becoming more and more predictive. And so you have to reweight on education. And if you don't do that enough, then your polls will have some problems. So that's basically at least two of the big problems.

So this second problem, by the way, is tied to what I was saying earlier. This is basically a statistical model. So you have to make choices, and you have to make assumptions. Because if you don't make assumptions, you can't have a model. And so if your assumptions are wrong, then you're gonna have problems. But the good thing is that, then you can go back on your assumptions on the next election cycle, and you know, correct course. So that's how you do science. Basically, that’s how you learn.

Max: But then I feel like the next election cycle. They're always like, “This time, we're right.” And then there’s still more problems.

Alex: Yeah, there is the risk. There is a risk, which is fighting the last battle, you know, and overcorrecting, overcorrecting your course. And like, it's true that if you draw conclusions only on one sample, like, “Oh, yeah, in 2016, we were underweighting for education. So now, let's be super aggressive and weight a lot for education.” And you only have one election where you had this problem, then probably you're overfitting, and then you could have the inverse problem in 2020. You know, so I'm not sure about that. Because if you look at 2018, for instance, like education was super predictive, but these can happen like imagine for other predictors.

Max: Yeah, we don't know what features are going to come in and out of importance over the years.

Alex: Yeah, yes. That's what makes it a hard job but an interesting job. You know, but –

Max: Well, yeah, makes it interesting, too.

Alex: But also what I want to say is that, you know, often people are kind of shocked by these re-weightings and so on. It’s as if, you know, you get an unpure poll or stuff like that. And people want the raw version, you know, like an organic vegetable. They want the rawest version possible of the poll.

But keep in mind that we have to do that, because the raw data is not is not actionable, you know. Because the raw data is bad. The raw data is messy. And you have to do all that precisely because the raw data is not good enough, you know. If the raw data was a pure random sample of the population, then you can be sure that pollsters wouldn't care for all these muddling, you know. Voter turnout, modern voter turnout is super hard. And then re-weighting and so on, this is super hard and it costs money. So you can’t be sure that they would like not to do that, I guess. So the bottom line is, here, answer the polls when you are asked to do it, you know?

Max: Can you tell us about, do you know any situations that have come to mind in like France, or where you have done some modeling where things went off the rails a little bit?

Alex: Oh, yeah. Yeah, of course. First, before talking about that, I have an interesting, very, quite new and interesting method for statistical modeling that appeared based on this problem of random sampling and so on. It’s a method that was spearheaded by Andrew Gelman, who is one of the most renowned researcher in the Bayesian framework.

Max: Sure.

Alex: And who is in New York, by the way, at Columbia.

Max: Yeah.

Alex:  And so he spearheaded a method, a method called multilevel regression and poststratification. And so these methods, basically, is designed to balance exactly the problems that we were talking about with polling samples. So basically, you get a polling sample. And then what do we do with that? How can you reweight this poll to draw inferences on the true latent support for this party in the population?

And basically, just very broadly, you can use demographic data in, for instance, each states of the US, and then you do some reweighting and so on based on these demographic data and the sample from the poll, and then you can get inferences and forecasts at the state level, even if your poll was at a national level, for instance. So that's a super nice, super nice method. It's really interesting. It does – you have to have nice data, nice polling data, and you have to have really detailed samples from the polls, which basically – I can't do in France. You know, I don't have enough detail from the pollsters to do, to apply the method. So I can’t do that. But in the US, I think you can because the details from the pollsters are better. So this was kind of a nerdy detour, but I wanted to mention that because it's kind of nice – 

Max: Yeah, no. Our audience likes that stuff.

Alex: Yeah, it's a cool way. It shows that you can get inventive, because you have these drawbacks from pollings and so on. And then you can. Yeah, I mean, it's not lost. It's just hard. It takes time, but it's not lost. And so to answer your question now, so about some –

Max: Yeah, tell me about a time things went off the rails in France or some polling got . . . What happened?

Alex: Well, a funny thing is like, one of the biggest shockwave that we have known in French politics recently is in 2002, when, for the first time, the candidate from the far right party got to the second tour, second round of the presidential elections. So presidential elections in France are two rounds, and the two first of the first round go to their second round. And it never happened that a candidate from an extremist party went to the second round.

And for the first time in 2002, and pollsters didn’t really see that in the data, because, actually the race was really, really close. Actually, it was, like super close between the leftist candidate, the traditional left candidate, and the far right candidate. So basically, it was like a half a percentage point or something like that, you know. So it's super hard. But this was like a huge shock in French politics. And so pollsters had to react to that.

And so now the funny thing, actually, is that I think they are overcompensating for that. Because I did a study about that for Polls Position. And basically, I looked at all the polling averages for all the elections that happened between 2002 and 2017. And in almost all of these elections now, the National Front, so the far right party is overestimated by polls, like almost all the time. So it's probably a sign that pollsters are overcompensating for this error. And also, because, you know, everybody's focusing on that. So I think pollsters also have incentives to overestimate the far right party, because it's less, it's not as bad for them to overestimate the far right party than to underestimate it. And so, it's true that if I were a pollster, I would tend to prefer to overestimate the far-right party.

Max: That was even in the news in the US when it happened because I remember it.

Alex: Yeah. Yeah, that was like super huge. And so that's an interesting thing. And that's where having a model, and especially the Bayesian model, is interesting because you can put that assumption into the model, and then the model can take that into account. And since there are uncertainty around the polls and around all of your model, then your forecast can account for that.

And surely the last forecasting model that I did, which was in 2019 for the European elections, it took that into account and it was quite good. I anticipated that the far right would be overestimated, and it was. And the polls were kind of wrong. I think it was quite a big mistake if I remember correctly. But the model got it right, because you had these uncertainty around the polls.

And also, what's interesting at the same election was that there was a polling error also for the far left party, which was also overestimated. And the model got that right, too because the uncertainty in the polling error was within the historical bounds of polling errors. So it was, if I remember correctly, something like a two or three point error. And historically, French pollsters are quite good, you know. And so that means that it's very rare that you have a polling error more than three points. But that means a confidence interval of plus or minus three. So it's already quite big, you know. But what's interesting is that they did a huge polling error on the traditional right party, and they overestimate it by eight points, I think if I remembered it correctly. 

Max: That’s a lot.

Alex: Which is like super huge.

Max: In a divided election, especially.

Alex: Yeah, exactly. So it means that if you put that into historical perspective, it's like more than twice the average historical polling error. So it's super, super large.

Max: Yeah. I feel like we have this issue in the US though, where third-party candidates are often, you know, a month like, before we get close to the election, they actually have very high. They do well in polls. And then, as we get closer to the election, their polling just collapses to like 1%. I'm never sure like what's going on there.

Alex: Yeah, I think it's also bigger in the US where it is since, I mean, voter choice is constrained because you basically only have two parties. I mean, I think you have some, you know, aspirational people that answer, well, aspirational answers to the polls. You know people saying, “Oh, yeah, I'm gonna vote for, I think, [unintelligible 00:34:24] or something like that you know.”

Max: Yeah. Well, I remember last time, there was a – Evan McMullin was running in Utah and people were like “Utah's not gonna vote for Trump. They're gonna vote for Evan McMullin.” And, he was ahead in one poll. And then no, not at all. And Gary Johnson was getting 20% in some states, which is like, which would be huge, but like, no, they all dropped. Sub five percent, two percent, one percent.

Alex: And I mean it’s also, I guess one of my hypothesis would be that people don't want to waste their votes, you know. And they know that it's like super, super rare that there is these kind of things happening in the US. And so maybe, they would really like to vote for a fourth or third party candidate. But they see that 15 days before the election, the candidate is like, he's at like 10% or 12%. You know and it's quite high. It's quite honorable, but they're like, “Yeah, but he's never gonna make it. So I have to vote for the Democratic or Republican candidate, because otherwise I'm just wasting my votes.” Which is actually not what you want in an election. It's not good that you guys are in this position.

Max: And sometimes, I mean, like, here in New York State, I feel like I could do whatever I want, because it doesn't matter.

Alex: Yeah and that's a problem. That's kind of a big problem. You should be able, I think, you should be able to vote for whoever you want, you know, but that's –

Max: We’ve talked a lot about different voting system. We'll have more discussions about that in the future. Episode 126. I'm going to point back to people to. Let me just make sure that's correct, because there was one – all right. Yes. I talked about the Electoral College in 126 but I talked about other things as well, a lot of election related possibilities.

Alright, so coming back to my first point, you know, I don't know if you have an opinion on this. If not, we could skip the question. But I've seen people say, the polls were way off in 2016. And then, I see other people say, well, they weren't that far off. They were just a little sparse in some of the key states, like Wisconsin threw everything off. Did you look into this? Like, what is your sense of that?

Alex: So I think both actually, both assessments are true. It's just that we're not talking about exactly the same poll. National polls, they were quite right. They were like, I think the error in the end was two points, something like that, which is, like better than what they did in 2012. I mean in 2012, I think they were off by, I think, four points if I remember correctly.

Max: I feel like that could have been accidentally right, though. You know, because if it's like, if it was up in, if it was overestimating in some states, and underestimating another states, it's like, well, I made one error here and one error there, and they just happen to cancel out.

Alex: Yeah, yeah. But that's why also, that's why it’s interesting to take an average, you know. You're hedging your bets. But so at the national level, polls were really not that bad, and actually better than in 2012. The problem is that, at the national level, in 2012, they predicted the right winner. In 2016, they predicted the wrong winner. And people tend to think, you know, in a binary fashion, and they just look at, “Oh okay. They had the wrong winner. So polls were crap.” Which is not true at the national level. At the state level, though, the situation is much more disparate. And you had, indeed, in some states, they were quite good. But in a few of the key states, they were really, really off and that's a problem, because you vote locally in the US. You vote by states.

And so national polls are interesting to get a sense of where the race is at the national level, but they don't account for all the uncertainty and all the correlations between states. So, again, that's why your model is interesting, because it takes into account all the correlations between the states and also the Electoral College math that you have to do. And so usually, you have to take more, pay more attention to state polls, and yes, state polls in 2016, they were off by quite a large margin in some key states. And that was a problem.

Max: All right. So the 2020 election here in the US is less than a month away. I'm not going to ask you make a prediction, don't worry. What are you going to look for in both the polling as they come out in the next couple weeks? And like earlier, election returns, let's say it’s election day, what are going to be some of the indications of how things are playing out in terms of whether the polls are accurate or not, or that will tell us which way it's gonna go?

Alex: Well, so there are, I think, several, of course, several things to say here. But I think usually, what I look for is more how are the averages and models moving. And I usually don't pay that much attention to individual polls. Again, because it's more noise than signals and especially because you have so many polls in the US. I mean, it's impossible to follow all the polls, national, state level polls and so on.

So what I'm gonna look for is there any movements in the averages, and in the polling, and in the modeling? And a good, I think a good heuristic here is that every day that passes where Biden is still ahead by eight or nine points is a good day for him. Because as you get closer to the election, there is less uncertainty around the voting and around the polls. And so if Biden stays up by nine or eight points, this is quite a big lead. And it becomes less uncertain. And it's also mean that Trump has less time to get more ground, so it's really good. And it's really good for him if things stay as they are. Every day where things stay as they are is good for him. And so – 

Max: It is. It's 2020, though. It's tough to say something can stay where it is.

Alex: Oh, yeah, yeah. 

Max: But I would be happy if it were like two days to go and nothing's changed then. If I were Biden, I’d be like, “Okay, we're...”

Alex: Yeah, yeah. Clearly, it's not finished. Because if I remember correctly, in 2015, at this stage of the race, Hillary Clinton was still widely ahead, like less than Biden.

Max: Not only was she ahead. Not only was she ahead, but this is when the Access Hollywood tapes came out. And so it was like, oh, now he's gonna be even worse or totally done. I’m like, what happened? 

Alex: Yeah. But she was at plus six again, which is not as comfortable as plus eight or nine, because the Democrats have these systematic disadvantage in the Electoral College. So plus six is good, but if you get to plus four, then you get in weird, you can get into weird territories. And then if you are a plus two, basically, it's like 50/50.

Max: All bets are off.

Alex: Yeah, so. But plus eight is really comfortable, you know, and things are nonlinear in probabilities, because you have floors and ceilings and so on. So. Plus eight, plus nine, I’d feel good right now. But yeah, I think it’s still three weeks ago, right. So we'll see. I think as long as COVID is in the news cycle, this is not good for Trump. So I bet.

And if you will get the history of the evolution of the averages and the models in the campaign, they're mainly good for Biden. At the beginning, Biden had two in three chance of winning the Electoral College, which was exactly the same probability as Hillary Clinton on election day. And then now he has an 85% chance of winning, so it's much better. And Trump didn't really made any improvements since then. He always was at one third at best.

So honestly, if I had to bet today, I would bet on Biden. Of course, if you allow me to change my bet two days before the election, maybe.

Max: It might be different.

Alex: But I mean, I'm a Bayesian. So I like to bet and I think in bets.

Max: Right.

Alex: So I will not have any problem. I know you –

Max: I think you would agree, you're not going to – it's imprudent to veer too far off from 50% at this point. Like, okay, maybe 60–40, but, you know.

Alex: Oh, you think? I don't think so. I would be more aggressive here. 

Max: Really?

Alex: Yeah. Yeah. I mean eight points is really big.

Max: Yeah.

Alex: So honestly, if you ask me that I have to bet today, I would bet on Biden, and I would feel pretty confident in that. I mean, even if the polls are off by two points. So as like last year, so like 2016 sorry, this will still be a six-point lead, which is quite big. I know you didn't ask me for a prediction, but I think predictions are actually really good. And it's good to do predictions publicly. And I know you do that on the podcast and I really love that because I think it's kind of an intellectual laziness of people going to some media and then saying, “Oh, no, I'm not making predictions because too hard, blah, blah, blah, but I think this is  gonna happen.” you know, I mean.

Max: But I also don't like it when people make predictions and then they don't come back to it and say like. You know, it's just to score points at the time, but then they don't come back to it and say, was I right or was I wrong, I have to examine things.”

Max: And if you’re right, then you could say, “Hey, I told you so.” And if it was wrong.

Alex: Yeah. That's why. Exactly. That's how you learn, you know? That's right. It's interesting to do electoral forecasting. So we'll see. But if you asked me to bet money today, I would bet on Biden.

Max: What would your spread be if it's not 60–40? How high would you go?

Alex: Well, for now, the models have Biden winning between 85% and 90%. So if I hedge my bets, I would say 80–20.

Max: Okay. All right. Good to know. Alex, thanks for coming on the show. Any last thoughts on this? And please remind people where they could find you online?

Alex: Yeah. Last thought is related to, you know, the polling misses that I was talking about earlier. In France, for instance, in the 2019 election, this big miss in the polls, for the right party threw off my model for this party. Why? Because the polling error was so huge in historical comparison, that, basically, the model doesn't know what to do. What do you do when you observe such a polling error that you didn't observe before? And so the model doesn't know what to do, but it doesn't know what to do, because you wouldn't know what to do. I mean, what do you do? What's the good thing to do in these cases? You know, you should have a huge uncertainty in these cases.

So it's often, it's really easy to go and say, “Oh, yeah, the polls are crap, and the models are stupid, and they were completely off in 2016.” But this is a hard job. And I think having really set opinions on how polls are good or bad, and how models are useful or not, is really not warranted here. And I think we would do better if we all had more uncertainty and felt more probabilistically you know. That would make my day easier.

Max: Agreed.

Alex: And so just to finish on that, I'd be happy to talk about all that with people. They can get in touch with me at Twitter @alex_andorra. Like the country, tiny little country in Europe. And yeah, basically, I'm also doing a podcast. You were there, Max, on Learning Bayesian Statistics podcast, and so people can check it out. The link is on my Twitter or otherwise, you just type “Learning Bayesian Statistics” in your favorite podcatcher and it should pop up.

Max: Alright. Great. All of that will be on the show notes page, localmaxradio.com/140. We have three more episodes until the election. So if you're getting tired of the election stuff, three more episodes left. That’s all you have to go through.

Alex: Yeah. I am not, I really don’t.

Max: I don't know if I can talk about the election in all three episodes. Yeah, I’ll probably do a check in. So, Alex Andorra, thanks for coming on the show.

Alex: Yeah. Thanks a lot, Max, for having me again.

Max: All right, that was great. I just have two follow ups from previous episodes. One of the stories that we followed, that we covered in the last episode was Coinbase. And if you remember, Coinbase gave their employees a very, very generous packages to leave the company if they're uncomfortable with the rules that kind of restricted the political activism in the workplace. And so at the time, only three employees have taken it, but Ars Technica now reports that 60 Coinbase employees have taken the severance or about 5% of the company. It's interesting. Some of those might not have cared about the Coinbase, the new rules. It might have just been that they were incentivized by the really nice package. They might have been like, “Hey, I could collect double salary for six months. Maybe I'll do it!” Or maybe they’re thinking about, anyone who's thinking about leaving was on the fence.

So is that a bad thing? I don't necessarily know. It's a very interesting gamble. It's not clear how it will play out. Will the company have more cohesion after this very interesting experiment? So I will continue to follow this and we'll follow up when we have more to say.

Another one is, we talked about the Viking DNA last episode and I got, we were talking about how some potential stat or I don't know if it's real, like all Ashkenazi Jews are sixth cousins or less. And a friend of mine, who actually, he's the one who posted the one or retweeted the one on the Viking DNA. So he's interested in that stuff, sent me a Medium article entitled, from 2017 saying, No, You Don't Really Have 7,900 4th Cousins. And the article is saying that a lot of these DNA tests will say, if you're an Ashkenazi Jew, that you have a lot of fourth cousins, when really it's not necessarily fourth cousins, it's people who share multiple ancestors, but further up the family tree. So further back in time.

That is an interesting idea. There's actually a lot of like interesting mathematics to genealogy and a lot of interesting things happen when you kind of post people back in time and you kind of look at shared ancestors. I'm sure this is true with other ethnic groups as well. So yeah, maybe I can do a whole episode on genealogy math. That would be that'd be pretty cool. All right. So with that, have a great week, everyone.

That's the show. Remember to check out the website at localmaxradio.com. If you want to contact me, the host or ask a question that I can answer on the show, send an email to localmaxradio@gmail.com. The show is available on iTunes, Soundcloud, Stitcher, and more. If you want to keep up, remember to subscribe to The Local Maximum on one of these platforms and follow my Twitter account @maxsklar. Have a great week.

Episode 141 - How Social Media Censorship Affects Your Election Information Diet

Episode 141 - How Social Media Censorship Affects Your Election Information Diet

Episode 139 -  Coinbase Goes Apolitical, Libra Languishes, and Deepfakes Rise

Episode 139 - Coinbase Goes Apolitical, Libra Languishes, and Deepfakes Rise