Aug 22

Aug 22 Episode 292 - Copyright Clashes, Literary Science, and Rational Distributions

The Local Maximum · Ep. 292 - Copyright Clashes, Literary Science, and Rational Distributions

Max and Aaron do an August news update with discussion about the closure of Prosecraft, a small site dedicated to literary analysis which was shut down due to copyright and ownership concerns. Will these types of internet mobs and legal actions bring a chilling effect on innovation?

Probability Distribution of the Week: Max's First Idea on Rational Distributions

Links

The San Francisco Standard - Robotaxis: California Regulators OK 24/7 Self-Driving Car Expansion in San Francisco

The New York Times - Superconductor Scientist Faces Investigation as a Paper Is Retracted

WIRED - Why the Great AI Backlash Came for a Tiny Startup You’ve Probably Never Heard Of

The Mary Sue - Guess What, If You Build an AI Project That Steals Data From Thousands of Books, Authors Are Going To Be Mad!

@inkbitspixels Tweet

CBS News - Robin Williams restricted use of his image for 25 years after his death

Medium - The Intellectual Yet Idiot

Transcript

Max Sklar: You're listening to Local Maximum episode 292.

Narration: Time to expand your perspective. Welcome to the Local Maximum. Now here's your host, Max Sklar.

Max: Welcome everyone. Welcome. You have reached another Local Maximum. I'm joined today by Aaron. Aaron, thank you so much for coming back on the show. I really need you. Because after these two, two solo shows, you know, there's only so much I could talk on my own. And I'm sorry, I need to get some more guests on here. But we're working on it.

Aaron: I thought last week's solo show was a good one. I was getting really animated listening to it. And I think it'll actually tie into our main topic today. But before we get to that, well, yeah, you had some stuff you wanted to talk about?

Max: Yeah. Well, last week's episode was twofold. Right? It was about archive.org. And I think that's the part that's going to tie in today about how man, people trying to give away free information, especially the little guys who are doing it are getting crushed, because a lot of these articles are about, you know, not major corporations using you know, people's works for free and stuff. It's about nonprofits and small players.

And that to me is somewhat somewhat strange, alarming. I don't know what it is. But I, again, if you listen to it, I don't have a full prescription of it. I have feelings. I have facts. I don't have a full story to go along with this.

But these issues of copyright and fair use are going to be they will they have been important in the past. But they're going to take on new meaning in the next few years, I think. All right. So before we get into that update, you remember I wrote a new constitution, I am almost ready to publish what I have here. But as you can see, this front page has a bit of markings. So I'm still writing.

The second page has a lot of markings, but the rest of them have almost no markings. And I went through it. So I just need to rewrite. It's always like those first two pages, I want to be perfect. And I don't know how to do that. But I think there's a lot that we could talk about.

There's a lot I want to talk to you about. Maybe we could do an episode on apportionment and different electoral systems that we can try that might work that I've found through this. And I think we enjoy talking about that. I assume you enjoy talking about that.

So maybe we could talk about that in a future episode. Last week, this is the paper on the Bertrand Paradox. I don't know if you remember that one as well. So that was kind of interesting. I don't know what else to say about that other than, did you get the impression that the Bertrand Paradox was resolved by this fourth option?

Aaron: I don't think I was following it closely enough to be confident in that.

Max: Yeah, yeah, me neither. Alright, so we're gonna go through some headlines. And then we're going to talk about copyright and, and data science. headline number one, a California Commission allows Waymo and Cruise to operate taxi service in San Francisco, removing most of the restrictions, you could now you know, the restrictions before were, they had to be between this time and this time.

And, you know, if you wanted to charge money, you had to use the safety driver, and things like that. Apparently, according to these articles, a lot of these have been removed, and they could be 24/7. throughout San Francisco. I'm surprised that San Francisco in the commission is the California Public Utilities Commission. I'm surprised that San Francisco is maybe I'm not surprised that San Francisco is doing this.

But man, San Francisco has a lot of other problems to deal with, that they are not dealing with effectively. You know, such as, you know, all their stores shutting down. I don't know where you're going to drive other than driverless car, get me out of here. But no, I'm glad they're doing it. And I feel like everyone's like, this is the turning point. This is it. And I say that a lot. But again, this is always a step by step industry.

Aaron: Yeah, and I don't have it in front of me right now. But I was under the impression that one of these companies had also been given permission to operate in a couple of additional cities.

Max: Oh, yeah, that's good, too.

Aaron: What was it? Yeah, back in July. It looks like Cruise got permission to to begin operation in some some East Coast cities.

Max: Anywhere near us?

Aaron: I don't think so. Oh, Miami is mentioned so not near us. But someplace that you occasionally do go.

Max: Yes. Occasionally as in once every couple years, maybe. So it might be a while. But yeah, okay, exciting. Maybe I want to say like, you know, the real East Coast, northeast, they'll hopefully be coming soon.

Aaron: Well, there's some challenges there that may not be present in Miami or California. Right comes to winter driving. Right. I don't know if that's actually a heavy lift for autonomous driving, but I could see it being something that they haven't yet optimized for.

Max: Yeah. Although California, particularly Southern California this week might be a little rough with it, hurricane and all that. All right, semiconductor flop, semiconductor scientists, according to the superconductor. Oh, right. Semiconductor, what's a semiconductor?

Aaron: So semiconductors is just the whole silicon industry, right?

Max: That's true. We talked about semiconductor didn't when I

Aaron: doctor actually has a very specific definition, but I don't have it at my fingertips.

Max: I did a whole show; I covered semiconductors. All right, sorry, superconductors. Superconductor scientist faces investigation as paper is retracted. I don't know what a superconductor is to be honest with you. But apparently, we're not getting it. There's not room for ones. Yeah. And so that was a big deal. And now it's not a big deal anymore. So I'm glad we did.

Aaron: It was at least fascinating to see attempts at replication happening in real time, and a resolution to it. Now. Now, let's just do that for everything else in science.

Max: Yes. You know what? That's right. Because all that it kind of goes to show how, you know, a lot of the Oh, it's been scientifically shown, there's a link between this and this are like none of that's replicated. And how, okay, well, you could make these claims and oftentimes, they turned out not to be true if they can't be replicated.

Aaron: Yeah, it was exciting to live through. disappointing that it didn't pan out.

Max: Yeah. Well, we'll live through some other cool stuff in the future, I'm sure. All right. So tomorrow, I'm going to the Soho forum. This is a debate between Corey deAngelis and Stephan Kinsella, the debate is on school choice. But what's interesting is that I know Kinsella his big thing is that he argues against all copyright law, and all patent law, he thinks that all should go away. I'm sort of, you know, I kind of am interested in that position.

And that's, that's kind of interesting for our discussion on Prosecraft today. I think that, you know, we don't have to take that extreme position to look at what's going on here. And, you know, try to try to pull it apart. But let's talk about Prosecraft. So there's this article in Wired about a small company called Prosecraft that was shut down after being pounded down by the hordes of the internet mob. Does that sound like what the article was going for at least?

Aaron: Yeah, and to refer to it as a company is perhaps overstating it.

Max: Yeah, it's a guy with a project which is like me with a lot of different things. So I kind of feel for this guy.

Aaron: Yeah, there's — I guess David versus Goliath is not the right metaphor here. It's more like David versus the unruly mob.

Max: David lost though,

Aaron: Yeah, yeah. Well, he's, he's, he's walking away. But he's, he's taken the L.

Max: Right, right. He's not, he wasn't torn apart by the mob. He just said, Okay, I'm out. Alright, so the article opens, Harry Kunzru wasn't looking for a fight on August 7. The Brooklyn based writer, that's your problem right there, sat on the subway scrolling through social media, he noticed several authors grumbling about a linguistic analysis site called Prosecraft.

It provided breakdowns of writing and narrative styles for more than 25,000 titles offering linguistic statistics like adverb count, and ranking word choices according to how vivid or passive they appeared. Kunzru pulled up Prosecraft websites, check whether any of his work appeared. Yep, there it was.

Now, first of all, I should point out, you know, this is making the forefront of the news because of, you know, generative AI, LOMs GPT, all that stuff. This kind of stuff has been available for decades. This is not these algorithms are not particularly sophisticated is my understanding.

Aaron: Yeah. You certainly could use some AI like oh, yes for this but then digging a little bit deeper. It sounds like that's barely the case in this implementation here. Yeah, that maybe he talked about AI in his you know, in his funder deck trying to, you know, get investors. But this is not a large language model. This is not openAI or chatGPT. This is very different.

Max: Right. But, I mean, I do think that the connection is those tools and those breakthroughs or those innovations have put the spotlight on this kind of thing.

Aaron: Absolutely, people are a lot more sensitive about how their work, their data, their material might be used because of concerns that have been raised by LOMs. In the last six to 18 months,

Max: so many authors saw their books on there, they wanted their books or their works taken down. All all he was doing was analyzing the writing styles, it sounds like and uses the loaded term “steals” I saw it there's another article in themarysue.com where it says, Benji Smith Steals Thousands of Books, I kind of picture him like running into a bookstore, and just like, you know, putting some in his backpack and like carrying a stack out and just like hobbling away and like just stealing the book as if like that.

That's very, very Hamburglar ask Yeah, I know. But no, it's you know, a lot of this stuff is just laying around on the internet. And people want to treat it the same way. And I think over time, I don't think that analogy is going to necessarily hold up but we'll see some more stuff from the article.

Somewhat harsh words for Smith: entitled tech bro, soulless troll, scavenger, shitstain. If I may say that on the podcast. Bloody hemorrhoid. Yeah, this is already you know, this is the stuff for the podcast. Sorry for those of you who are listening to us over meals.

Aaron: These are published authors, I expect better from their insults. This is very pedestrian.

Max: Yeah. Well, you know, I think the problem here is that who is um, look, I don't know this person, Benji Smith. But he doesn't sound like- he sounds like a software engineer who decided to build a project. He doesn't sound like some troll or some tech bro, as they say.

Aaron: But he's not a liberal arts major. Therefore, he is a tech bro. And part of you know, big brother and-

Max: But what kind of tech bro is like, you know how we're going to screw people over and make lots of money? We're going to do literary analysis.

Aaron: Stop asking reasonable questions.

Max: Okay, until this past May, Smith had held a full time job as a software engineer, but he quit to focus on his startup, a desktop word processor, aimed at literary types called Shaxpir. Pronounced Shakespeare, but spelled Shaxpir. Shaxspir doesn't make much money, not enough to cover its cloud expenses. Yet Smith says less than $10,000 annually, but he's feeling optimistic about it. You know, don't we all.

This doesn't sound like someone that's to rip people off. You know. And there are a lot of people around rip people off. But it doesn't sound like one of them. Another quote, “Smith saw this as a grimy means to a justifiable end. He doesn't defend his behavior.” Now, I understand why everyone’s upset, I should do voiceprint. But he wants to explain how he defended it to himself at the time: “What I believe would happen in the long run is if I can show people this thing, people would say, well, it's so cool. And it's never been done before. And so fun and useful and interesting. And then people would submit their manuscripts willfully and generously. And publishers want to have their books on Prosecraft,” he said, “but there was no way to convey what this thing could be without building it first. So I went about getting the data the only way I knew how, which was it’s all there on the internet.”

So I don't think he did anything wrong. In fact, it sounds like he fully intended it to be ultimately some kind of a voluntary thing. You know, once he started to make money off it, which he he never ended up doing. You know, he wasn't profiting off their work, and it didn't seem like he had.

Aaron: So to be fair, yeah. So there's, there's, there's a lot of fair use ambiguity and what part of the problem with fair use and why, you know, if you were to say sample a piece of music, you know, if we wanted to use commercial music for the intro to the podcast, there's, there's not a lawyer who will tell you definitively “Oh, you can use 10 seconds but not 20. Or you can use 15, but not 30,” because there's not a clearly defined set of criteria and rules for where Fair Use ends and copyright infringement begins.

It's a multi point test that involves a lot of entrance balance. Seeing and feels. So this is a huge gray area. That being said, so well, so sorry, where I was going with that is one of the criteria is, is if we're a commercial or a, you know, a non commercial use. And while he's not yet making money off of this, and at this point probably never will.

His intent was for this to be some sort of business to build a business around this idea, this concept, this tool, right, so he was using it in attempting to use it in a commercial fashion. So that weakens some of the Fair Use arguments. But I think, given what he was doing with that information, he still very much had a leg to stand on if he had so chosen to stand and fight.

Max: Well, if you just step back a little bit and just think, economically, or just just think in terms of like innovation in terms of the ability to bring products to people, is this is this a project that's basically taking someone else's work, and repackaging it in some fair, unfair way? Or is this like, innovative project that's bringing new information into the world?

Aaron: I would say it's absolutely the latter unless there is a way to access the database that he's used to, to test and train this and then extract the full text of a novel from it, which I don't know if that's even possible. And if it was, I'm sure that there are ways that you could protect against that to make it- I mean, it's just like, is it not not Google Scholar.

Max: Google Books?

Aaron: Maybe Google Books or Gutenberg or something that they have the full text of, of books in their database.

But when you go to Google Books, you can't, you can only go view a couple of pages from it, you can't you know, if you're searching for something, and it'll find, oh, the the part you're interested in is on page 65. And you can view from page 63 to 67. But it's not gonna let you read through the whole book, at least not in that search.

Max: Can’t you get all these books for free in the library?

Aaron: Well, that was one of my big questions. I think the concern here is not that he is using. So put aside the transformative or not work issue later, the concern is not so much he's using our material and not paying for it, because there are ways that he could access this without having to purchase it directly himself, he could go to the library, he could check out the books, he could read them.

Now the question then becomes, because he's not doing this with his human eyes, and counting words and counting adverbs and your uses of the passive voice or whatever, he's scared. He's processing this with an algorithm and automatically analyzing that.

How slow would he have to do this for them to not be upset that a computer is doing it instead of a human being? Which quickly degrades into a how many angels can dance on the head of the pin question? It's people being upset because it’s computers doing it, divorced of what actually is being done.

And much like quantity has a quality of all its own, speed doing something simple, it can be transformative, somewhat, in the nature of the output. But I think you have to step back in this case and consider what is actually being done. It's an analysis that could be done in a much more painstaking and less rewarding fashion by an intern or a graduate student, but he's built an algorithm that does it.

Max: But I'd like to know is like, how, I mean, how do these authors make money? And what are they worried about? Are they just like, “No, we don't see that this is hurting our wallet directly, we just have some kind of like, ideological, like, Oh, you're stealing my stuff,” kind of thing, “I need to protect my stuff”?

Aaron: I think it's a little bit of both, I mean, so you could make the argument that marginally even if he went out and bought a copy of every single book that he used in his training set or or for his proof of concept. Most authors are making you know, pennies, less than a buck a copy on this. So to each individual author, this is meaningless.

But if you take that marginal difference and apply that argument to you know, thousands and thousands of potential sales then actually it starts adding up into real money. So you gotta be careful with that argument.

Max: But is his website, like, is there a good argument to be made? Like the website is, like, would be if it were popular would be harming sales in books? I think it would increase the sales of books!

Aaron: No, in fact, I would think well, I would think it would be the opposite- unless I could see a case being made for your website has analyzed my work and given it a poor rating and said that it's piece of literary garbage. And so that is reducing my sales. But as the kids say, get good, you know? Yeah.

Max: The kids are saying that nowadays?

Aaron: Maybe that was kids a decade ago. But it's either the algorithm, you know, his tool will be determined to be not useful and flawed, or it'll be providing valuable insight. And the only way to determine that is to actually apply it and use it and get the results.

What it is absolutely not doing is this is not people are not going to say, oh, I can just use Prosecraft, and and look up, you know, the, I don't know, the next, George RR Martin book, and now now I don't have to buy a copy of it and read it, because I've looked at the Prosecraft analysis. That absolutely isn't the case. It's not producing sales or demand in that way.

Max: I mean, you might be good for recommendations like you might have, here's a book that I liked. And maybe I'll read a book that has a similar analysis, because it fits. Absolutely.

Aaron: And I think authors are concerned that someone is using my work, and they found a way to extract valuable value out of it. And I'm not getting a slice of the value, which I mean, there's there's a angle to that that is very similar to what we've talked about, in general, in terms of our personal data that, you know, the Googles of the world, and all the other large social media, big tech companies, they are collecting our data, and they are extracting value out of it and reselling it and we don't get a cut of it.

And so this is just that through the eyes of the fine arts. And so they're I think they're not entirely unreasonable there. But I think they're definitely barking up the wrong tree here with this approach.

Max: Yeah, I mean, I think by cutting off innovation, like you're ultimately harming the industry you aren't. Like, it's like, they're saying, Oh, if we allow all this stuff, it's going to harm the publishing industry, but it actually hurts the publishing industry if you don't innovate at all. And I think I think they're missing that. I think I think I think someone's missing that.

Aaron: Yeah, well, and I certainly think that what they're looking for, and one of the things that that was mentioned in the article is that a lot of authors are inserting clauses into their contracts, basically prohibiting the use of of their published works for training of any large language models or anything AI related. They're looking to protect those rights, which, okay, that's fine.

But I think what they're what they ultimately want is they're looking for a pricing model for that, that where they can sell those rights, which they're now reserving, that's going to be more than, you know, Google needs to buy one copy of my book, so that they can use it to train their their AI tools or to feed into their database, because, like we said before, that's, that's at the margin pennies for the author, they want something where every time data from my book is used to support, you know, a query of some sort that I get, you know, maybe a fraction of a penny, but if you're doing that, you know, millions of times that starts to add up.

Max: Is that a practical…?

Aaron: I think that I'm sure that there is a model that can be reached, you know, reached with an agreement, but I think they're delusional about how it's going to be structured. Much, much like I mean, it's it's not entirely unlike what we're seeing with with the, the the writers in the in the Screen Actors Guild, it's striking currently that, that they're not happy with the way that that profit sharing is working in the new streaming world, and they want a new model that's going to give them a bigger chunk.

What are they providing that will justify that? Well, we'll see how the negotiations go there. But I think they're, they're delusional to think that they, they hold all the cards there.

Max: That we talked about young people putting these clauses in these contracts about fair use, that reminds me of that thing. You posted about Nassim Taleb at the end of his-

Aaron: Yeah so I recently read a post of his which I think is an excerpt from one of his books, I want to say skin in the game. But yeah, he calls out at the end that basically that this article can be reproduced, you know, under fair use, you know, just make sure you do it in its entirety, and, you know, credit where the source is except that, you know, the following publications are prohibited from from reproducing my material without explicit permission. And he only lists one media outlet: It's the Huffington Post.

Max: Huffington Post, all languages, publications ban. It's the only one.

Aaron: I'm curious what the backstory is. Yeah, I'm sure that they have burned him and he has a justifiable reason for holding that grudge. But also, I feel a little bit of schadenfreude or they're like, they just on principle, I feel like they're deserving of that kind of a grudge.

Max: I mean, yeah. All I can say is, I mean, I'm sure there's a lot of things that could be listed where I'd be like, Yeah, you know, I don't like them. I don't like them. But it's funny that he has one like, they did something to him in the past that really ticked him off. And maybe he'll keep on including them in like every publication through the future, even after the Huffington Post is long gone.

Aaron: So the one other thing I want to mention is we talked about how some authors are including clauses in their contracts about AI and large language models. It harkens back to, I think it was in 2014, when Robin Williams passed away. And it was revealed that that his will, or I guess the trust that managed his estate, there was a clause in there that prevented his likeness being used for 25 years after his death.

And at the time, the big context that people were talking about was this means there will be no Robin Williams holograms, because I guess that was the Tupac hologram had just made an appearance at Coachella or something. Yeah, so those are, they did not become as ubiquitous as we may have thought at the moment.

But I'm wondering if this is just kind of the next wave of that. And we will see kind of a tapering that we will find solutions for how to deal with this type of licensing and move forward. Or only people who have particularly valuable legacies to protect will be able to put these restrictions in place.

And also, it's interesting that it had a 25 year sunset on it, that that maybe authors will allow their their works to be used, but we'll have some sort of a grace period, you know, I mean, and that's, that's to a large extent, how copyright and patent law works. They all have expirations on them granted, they tend to be at least in terms of copyright, what is it? It's like the life of the author plus some number of years or something? So it's a little bit different there.

Max: And Disney keeps extending it out. Yeah. And yeah, so I mean, my two thoughts. Robin Williams is what he was in that movie, Night at the Museum that came out after his death. So I guess they can't put him in another sequel to that. But hey, in 2039, maybe we'll get another Robin Williams movie.

Aaron: I wonder if the 25 year thing was specifically like, he doesn't want it to happen, you know, during the lifetime of his children. or if it was simply that 25 years from now, if they stop using me, nobody's gonna care. And so there's not going to be a point to take it off the shelf. I wonder where they came up with that specifically,

Max: He won't be as famous in 25 years, but people will still know who he is. I mean, surprisingly, you know, I mean.

Aaron: In 2040, do you think AI Robin Williams will be able to help move more bottles of Coke? I don't know.

Max: Possibly. All right. So just one tweet, I'll read from @inkbitspixels. Who is this person? I don't know who they are. Nate Hoffelder, they’re a web designer. The one thing that some are missing from the fear over Prosecraft is that it was very likely using works legally.

The TL;DR version is that the app did not create a competing work or reduce the value of a work. So its use of a work for analysis was legal. And that's one analysis. He said it's kind of fuzzy. But you know, there are people who are making that point.

Aaron: Yeah, I think what he says there is true in terms of not competing doesn't reduce the value of the work and was used for analysis. If it went to court, what would the judge say is an open question, but there's, there's a legit argument to be made there. It's not like this is an open and shut case of yeah, he was obviously using this inappropriately.

Max: Judges do sometimes, you know, not side with the industry, sometimes they side with the individual. So there you go, can only help. Alright, so now we will start our segment.

Narration: And now the probability distribution of the week.

Max: All right, the probability distribution of the week so we've done a lot of continuous probability distributions this year, haven't we learned? They're getting increasingly complex. I don't think I need to go over. I mean, maybe we'll think of some more but I don't think I need to go over it.

Aaron: A lot of narrow corner cases of oh, this is this is a special case of one that we've talked about before type of situation.

Max: We might be able to find some of those. But will we have anything interesting to say about them? I don't know. And then, of course, last year, we did a lot of discrete distributions, you know, distributions on, you know, a set of numbers, or maybe a set of all positive numbers or non negative numbers, I should say, like starting at zero, natural numbers, let's call them.

So now I want to talk about something that is not covered very often, but I think should be, and that is, could you have a probability distribution on the rational numbers? That is, it's a probability distribution on the, you know, so, you know, there'd be a certain probability that something is a half, there'd be a certain probability that something is a third. But there would be no probability that it's like pi, or e or something like that.

Now, the interesting. So I think there's a few questions that come up, first of all, like, why would you want to do this? And, again, I didn't see and I couldn't find any good examples that I liked. So I'm kind of making up well, how would I do it? So we're going to talk about Max's distributions over the rational numbers, and I have a couple ideas. So I'm going to start with one today. And maybe we'll go through another one. And maybe I'll come up with a third or fourth as we go.

So rational numbers. Interestingly enough, there are a countable number of rational numbers, which means that the infinity size of the rational numbers is the same as the infinity size of the natural numbers. Now, that seems kind of strange, because you think, well, rational numbers, you know, there's so many of them, if you go between one and zero, there's like an infinite amount. If you take any little bit of space, there's still an infinite amount of rational numbers. But it turns out that you could come up with some sequence that enumerates them, which you can't do with the real numbers, which means most real numbers are inaccessible to us.

I don't know if I want to get into the mathematical kind of reasons for that. But the basic implication of that is that you could design a probability distribution on the rational numbers, where any given point has a nonzero probability. So there's like a nonzero probability of a half a nonzero probability of a third, you know, a nonzero probability of nine fifths all of that.

And where all of the probabilities add up to one still, even on basis of adding up an infinite amount of, of probability. So that's kind of interesting. What you can't really do in, you know, in the continuum, you can't have, okay, every probability, every possibility between zero and one has some nonzero probability.

And then, you know, it's gonna blow up into infinity, what you really need is probability intensity, and you do integration and calculus and all that. So that's an interesting point about rational numbers, I think. I have never run a model on rational numbers? But I'd like to, and the question is, why would you want to do this?

I think, first of all, there's this interesting concept that I am thinking about, which is like snap to place, like, okay, you know, we're running these models, we're coming up with these numbers. Wouldn't it be nice if it snaps to a nice, neat, rational number from time to time? Secondly, you know, I think like, when it comes to priors, like real life processes are often nice round numbers like, okay. The gravitational constant in physics, right is what is it, related to the square of the distance? Am I right on that?

Unknown Speaker Yeah, that sounds right. Yep. Yeah, yeah. So, right.

Max: So why is it the square of the distance? Why is it not there raising it to the power of two, why is it not the power of 2.1? Why is it not the power of 1.99? Now, it would be kind of difficult to, you know, if you were dealing purely on the basis of, of real numbers, you know, we never think that it's too exact.

But I think given the given the fact that we've done so many experiments, and it's so astronomically close to that, it's like okay it's exactly two or at least there's some physical reason why it's exactly two maybe there's some tiny little error on that, that we can't detect. But there's some physical reason why it's exactly at that point. So I feel like there should be some reason ,there should be something that snaps you to the important fractions in certain cases.

And then there are also physical processes where it's like, okay, I know I have the ratio of two things. So I know there's one physical process producing integers, another physical process is producing integers. And I know I'm finding some ratio of that.

So I know it's going to be rational anyway, if that makes sense. So, all right. So let's try to think of some ideas on how to generate a probability distribution over rationals have a generator prior over rationals. You know, like, Hey, I'm thinking of an irrational number. I don't know which one it is, what's your prior.

And so it seems like, just like when you do integers, the smaller ones at some point, the larger ones have to be a lot less likely than the smaller than the smaller ones now could be like, a bump up really high, doesn't have to be like zero has to be more probable than one. But at some point, you get so far out billions really.

Aaron: You got to have a tail eventually.

Max: Yeah, eventually. So now for rational that's twofold. There's a tail when you go far, and I'm only thinking of positive rational numbers here. There's a tail when you go out towards infinity when they get very large.

But there's also a tail, you know, when you're dealing with like, very large numbers, in terms of like in the numerator and the denominator, and you can't really put it into lowest terms. So if you have something like, oh, it's like, some 18 digit number divided by some 11 digit number that's prime.

So they can't be well, yeah, okay, that's a number, it might be close to some other, might be close to an integer for all we know, but that seems to be a very unlikely one, even or maybe it's an 18 digit number over some other 18 digit number. So it's some number close to one. But it seems very irrational-like even though it's rational, if that makes sense.

So my first idea on this, and again, this is just the first idea, I want to run it by you and see what you think, is to take a distribution on the natural numbers. And we went through a lot of them, there was the exponential or the geometric distribution where it was like, okay, zero’s a half, one is the fourth, the two is an eighth, you know, so and so forth.

There's the Poisson distribution, which is like a common rate, like I think this happens, exactly 10 per hour, but sometimes, obviously, it's going to be nine, and sometimes it could be 11. And then there's gamma Poisson, which is a little bit more dispersed than that, so it takes something like that. And then you say, okay, the numerator is going to be distributed that way, the denominator is going to be distributed that way. That way, high numerators and denominators have very low probability.

And then the probability of any rational number, P over Q is just the probability of P times the probability of q with one caveat in that, okay, what's the probability of half? Well, that's the same as that, well, there's also a probability of two quarter, two fourths, and there's a probability of three, six, there's a probability of four eighths, so you have to kind of add all those together. But I think if you use one of these distributions on the positive integers or on the natural numbers, then those sequences will converge, and you'll get a number.

So that's my current idea. I don't know what that distribution is called. But I'd like to use it. What do you think?

Aaron: It's intriguing. I'm having difficulty visualizing what exact use case you would apply it to, but what you said so far? Makes sense.

Max: Right? So it has some interesting properties. One is that it is going to be symmetric around the number one when you take the reciprocal. So the probability of one half is going to be the same as the probability of two. And the probability of two thirds is going to be the same as the probability as three halves. So I kind of like that symmetry of it. And what's interesting is that, you know, how do you have that long tail with that cemetery?

Well, one side of the cemetery is stuck between zero and one and the other side of the cemetery gets spread out to infinity. So I feel like if we're designing a probability here, maybe that symmetry is very important. I don't know. Particularly when you're talking about rates of things, but that's just one of my hunches.

Aaron: Interesting. So the question that immediately arose when you said, let's do it probably distribution on the rational numbers is: could you do a probability distribution on just the irrational numbers?

Max: Well, you know, that is an interesting question. I think. What you could do is you could take a continuous distribution, like let's say the uniform distribution between zero and just a uniform number, somewhere between zero and one. We don't know what it is. But that's a pretty straightforward one. And let's say, well, first of all, if you pick something in that distribution, it's almost certainly going to be irrational. And I think if you remove the rational numbers from that, it still has probability one.

So I think, I think, yes, you could do it. But it's not too interesting. Yeah. So I think the rational case is far more interesting. The irrational one is like, it's, I can't think of a good use for it, it seems like you're removing the rational numbers from a distribution that were the answer, you're going to be estimating it with rational numbers anyway. So what's the point and that's another reason for doing this.

Every single, every time we're, we're making these models in the wild. We are using floating point numbers. So we're estimating them using some rational number, which means, you know, why don't we just do a distribution on the rational numbers anyway? Fair enough, which I mean, that's one argument. I'm not saying that's necessarily the right way to go. But I think it is one way to go that I would like to explore more.

So all right. I think that's what I'd say about that. I don't know if there's any philosophical questions about that. But it's certainly very interesting. What do you guys think? Log on to our Locals, maximums.locals.com, or just email us local max radio@gmail.com. Aaron, do you have any other comments on today's episode, before we log off?

Aaron: All I can think of is, you know, I, I'm a big fan of the whole information wants to be free. And I think you mentioned Aaron Schwartz, in last week's episode. And he's just bouncing around in the back of my head between that and what we talked about today. And while maybe not always free, as in beer, but free as in, in distribution.

I want to see more of that and maybe work towards once you've solved this constitution thing, then we can we can look at another piece of the the constitution that deals with, you know, patents and copyrights and right, and how we can we can better better address those systems in a modern information world.

Max: I think, you know, when you said information wants to be free, I think another term I hear a lot in the tech industry, or you used to hear it a lot you don't hear it anymore, is permissionless innovation, and how absolutely powerful that is, you open something up to the world, like the internet, like an open protocol, and anyone can build off it. You get massive, massive gains in, you know, productivity, human understanding innovation.

And it seems like everyone these days has just been programmed to crush it. And so we need to unprogram them. And so one step at a time here on the Local Maximum. So that's what I think. All right. Without further ado, I can adjourn this episode. If there be no objection?

Aaron: I second the motion.

Max: All right. Have a great week, everyone.

That's the show! To support the Local Maximum sign up for exclusive content and their online community at maximum.locals.com. A Local Maximum is available wherever podcasts are found. If you want to keep up, remember to subscribe on your podcast app. Also, check out the website with show notes and additional materials at localmaxradio.com. If you want to contact me the host send an email to localmaxradio@gmail.com Have a great week.

Based in Sydney, Australia, Foundry is a blog by Rebecca Thao. Her posts explore modern architecture through photos and quotes by influential architects, engineers, and artists.

Aug 22 Episode 292 - Copyright Clashes, Literary Science, and Rational Distributions

Links

Transcript

Aug 29 Episode 293 - Twitter Rants: Jon Stewart, Scrum, and Just Plain Numbers

Aug 17 Episode 291 - Publishers vs Librarians, Artificial Digital Scarcity, and Revisiting Bertrand