Do LLMs employ variable rewards, Spike Lee’s hat, and a chilling video [Friday Wrap-Up]
Welcome to the Friday Wrap-Up for May 15, 2026. This is a short newsletter where I talk about 3 things: What’s on my mind this week, Recommended Reading, and Recommended Media. Here’s what’s on my mind…
Earlier this week I found myself fighting Claude on something I felt was a pretty basic problem — one that I had used it to solve before. I kept going back and forth with Claude. I would ask it questions. It would then do things I didn’t even remotely ask it to do. I started to form a weird theory in my head that Opus 4.7 is designed to waste tokens. But I’m actually worried it’s worse than that.
Recommended Reading: The colorful impact of Spike Lee’s red Yankees hat request 30 years ago: I’m a chronic Yankees hat collector. I suspect my collection pales in comparison to some, but I have over a dozen hats emblazoned with the classic Interlocking NY that has persisted for over 100 years. In other words, I love a dope hat.
Recommended Media: I Tracked Down the Hidden Workers Secretly Powering ChatGPT: And now for something totally different. This video talks about companies that recruit people who train LLMs. The problems it highlights is twofold: the predatory nature of recruiting experts in a way that’s dehumanizing, and the chilling mindset behind AI companies who basically want to own knowledge and sell it back to us.
Get the full article and a free automation of the week by signing up for the newsletter: https://streamlined.fm/wrap
- (00:00) – Intro
- (00:31) – What’s on my mind: Are LLMs employing variable rewards?
- (05:24) – Recommended reading: Spike Lee’s red Yankees hat
- (08:22) – Recommended media: The hidden workers powering ChatGPT
- (12:29) – Outro
————
Streamlined Solopreneur is the podcast for solopreneurs who want to automate their business and take time off worry-free. Each week, Joe Casabona shares practical systems, tools, and strategies to help you reclaim your time and run your business without sacrificing your the rest of your life, or your health.
Start with the free Solopreneur Sweep — a step-by-step method for finding where your business is losing time: https://streamlined.fm/sweep
If this episode helped you, leaving a review on Apple Podcasts helps other solopreneurs find the show — it only takes a minute and means a lot.
Connect with Joe on LinkedIn: https://www.linkedin.com/in/jcasabona/
00:00:00.660 –> 00:00:13.020
Welcome to the Friday wrap up for May 15th, 2026. This short episode is where I talk about three things. What’s on my mind, recommended reading, and recommended media.
00:00:13.540 –> 00:00:29.960
Streamline Solopreneur is the show to help you build more reliable solopreneur systems so you can take time off worry-free. I hope this roundup and reflection will help you think more about your own systems. I’m Joe Casabona and here’s what’s on my mind.
00:00:31.160 –> 00:00:40.260
Okay, so earlier this week, I found myself fighting with Claude about something I felt was pretty basic.
00:00:41.000 –> 00:00:46.000
A problem that I’ve actually used Claude to solve before.
00:00:47.480 –> 00:00:55.900
I kept going back and forth with the LLM and I would ask it questions, and then it would do things that I didn’t even remotely ask it to do.
00:00:57.180 –> 00:01:05.440
And so I started to form this weird theory that in Opus 4.7, it’s designed to waste tokens,
00:01:06.140 –> 00:01:09.380
which I know is like a weird conspiracy, but maybe it’s not that weird. I don’t know.
00:01:09.820 –> 00:01:10.940
Sound off in the comments.
00:01:11.860 –> 00:01:16.300
But I’m actually worried that it’s worse than that.
00:01:17.260 –> 00:01:22.040
And I, first of all, this is like wild unsubstantiated speculation.
00:01:23.880 –> 00:01:25.080
But we’ve seen it before.
00:01:25.700 –> 00:01:36.260
So in his book, Hooked, Near Aal talks about how social media sites and addictive products employ variable rewards.
00:01:37.120 –> 00:01:42.880
The general idea is that our craving for the reward is stronger than the reward itself.
00:01:43.840 –> 00:01:50.360
So we invest time and money into pursuit of the award.
00:01:50.520 –> 00:01:52.500
And that’s what actually satiates the craving.
00:01:52.740 –> 00:02:01.560
So if you are scrolling on a social media site like TikTok, maybe 75% of the videos you don’t care about.
00:02:02.060 –> 00:02:07.480
But 25% of the time, you’re going to get a video that you really like.
00:02:07.580 –> 00:02:08.820
And it gives you that endorphin hit.
00:02:08.920 –> 00:02:12.560
And so you keep scrolling on TikTok.
00:02:14.220 –> 00:02:21.959
This is why social media sites, gambling sites, prediction markets, and so many other things are so
00:02:22.380 –> 00:02:33.280
addicting. As I had my argument with Claude convinced that I could get a computer program to act
00:02:33.420 –> 00:02:41.900
logically, which I don’t think is a wild request, I started wondering if large language models
00:02:42.620 –> 00:02:51.520
offer a sort of variable reward system. After all, it actually does perform certain tasks really,
00:02:51.780 –> 00:02:52.460
really well.
00:02:54.580 –> 00:03:01.580
And what if that variable reward is enough to convince most people, as well as momentarily
00:03:01.860 –> 00:03:08.140
trick others like me, that the LLMs are actually good at a lot more than they are good at?
00:03:09.120 –> 00:03:14.860
We are pleasantly surprised by the results of a single task or a category of task like
00:03:15.680 –> 00:03:21.600
vibe coding something or going out to our calendar and, and,
00:03:21.660 –> 00:03:26.100
grabbing stuff or going to our email and sending stuff to our to do list, all very like
00:03:26.280 –> 00:03:32.640
computery things. So we start to crave that feeling. The I can’t believe a robot actually did
00:03:32.760 –> 00:03:38.700
this so I don’t have to do this anymore feeling. And we pursue that craving for efficiency.
00:03:39.700 –> 00:03:48.760
We pursue it with time and tokens. And again, this is totally unsubstantiated. It’s a probably a weird
00:03:48.860 –> 00:03:56.220
theory. I don’t think it’s whereas with social media sites and gambling sites and prediction
00:03:57.290 –> 00:04:05.840
markets, it’s in the best interest of those websites to employ variable rewards because it’s in
00:04:05.840 –> 00:04:12.360
their best interest for people to stay on those sites. I actually don’t think it is in the best
00:04:12.600 –> 00:04:18.739
interest of large language models to employ variable rewards. They want to want to. They want to
00:04:18.760 –> 00:04:28.580
want a higher hit rate because the productivity or efficiency angle is the thing that keeps
00:04:28.660 –> 00:04:29.300
people coming back.
00:04:29.400 –> 00:04:30.940
But it’s just something.
00:04:31.240 –> 00:04:37.340
It was so weird in the beginning of the week where I had successfully vibe coded something
00:04:38.100 –> 00:04:38.320
running.
00:04:38.780 –> 00:04:41.700
And then I tried to do it again with a different project.
00:04:41.880 –> 00:04:43.520
And it just like went off the rails.
00:04:44.860 –> 00:04:52.240
And so as I am wasting time trying to bend the LLM to my will, I thought, is this a variable
00:04:52.520 –> 00:04:56.460
rewards thing where like it pleases me a certain amount of the time?
00:04:56.620 –> 00:04:58.040
And so I keep coming back to it.
00:04:58.800 –> 00:05:00.700
Again, I don’t know.
00:05:00.860 –> 00:05:05.880
I don’t think that large language models would actually benefit from that.
00:05:05.960 –> 00:05:07.080
But it’s something I was thinking about.
00:05:07.720 –> 00:05:08.300
What do you think?
00:05:08.360 –> 00:05:09.540
Is it plausible, likely?
00:05:10.080 –> 00:05:11.040
Way off base?
00:05:12.040 –> 00:05:14.820
Let me know either in the comments below or over.
00:05:14.840 –> 00:05:23.200
at streamlinedfeedback.com. I’d love to hear your thoughts on variable rewards in large language
00:05:24.610 –> 00:05:30.840
models. Now, moving on to recommended reading. Usually I like to make recommended reading like
00:05:30.940 –> 00:05:36.660
a heavier, interesting think piece and the recommended media to be something fun and light to bring you
00:05:36.670 –> 00:05:44.800
into your weekend. But it’s reversed this week. So the recommended reading is from the
00:05:44.880 –> 00:05:50.180
it’s called the colorful impact of Spike Lee’s Red Yankee Hat request 30 years ago.
00:05:51.340 –> 00:05:54.520
I am a chronic Yankees hat collector.
00:05:55.780 –> 00:06:04.160
And I suspect that my collection of about a dozen or so Yankee hats pales in comparison to some.
00:06:04.280 –> 00:06:11.300
But still, I have those dozen or so Yankee hats emblazoned with the classic interlocking
00:06:11.340 –> 00:06:15.560
NY that has persisted for over a hundred years.
00:06:17.110 –> 00:06:18.820
In other words, I love a dope hat.
00:06:20.420 –> 00:06:29.840
And arguably, we would not have the vibrant dope hat market that we have today without Spike Lee.
00:06:31.020 –> 00:06:39.560
30 years ago, 1996, Spike Lee, a diehard Yankees fan, wanted a red Yankees hat to match his
00:06:39.480 –> 00:06:40.020
red jacket.
00:06:41.700 –> 00:06:47.200
And when he tried to get it made, he was unable to do to licensing.
00:06:48.180 –> 00:06:53.720
New Era, the company that makes, I believe it’s New Era, the company that makes the on-field
00:06:53.920 –> 00:07:00.900
hats only had license to make hats the teams actually wore on-field.
00:07:02.520 –> 00:07:08.320
So Spike Lee doing something that only Spike Lee and a handful of other people could do,
00:07:09.100 –> 00:07:12.500
Went to the boss, Yankees owner George Steinbrenner.
00:07:13.340 –> 00:07:18.380
And he got George to approve a red Yankees hat for him to wear.
00:07:19.560 –> 00:07:25.960
And now, of course, we have all sorts of hats, all sorts of on-field hats, different colors.
00:07:26.180 –> 00:07:32.280
I have that red Yankee hat, and I wore it on red hat, red Yankee hat day,
00:07:32.340 –> 00:07:34.200
whatever that was called a couple of weeks ago.
00:07:35.160 –> 00:07:41.220
And I have blue hats and yellow, a yellow Yankees hat to match my brand.
00:07:41.920 –> 00:07:47.100
And it’s just, it’s so, it’s a, it’s a fun thing to have.
00:07:47.930 –> 00:07:53.280
And I love this article because it provides such an interesting bit of history and context
00:07:54.320 –> 00:07:59.660
to something that wouldn’t necessarily seem to have an interesting backstory, right?
00:08:00.060 –> 00:08:07.120
You would think, oh, MLB realized they could make a lot more money if they printed different color hats.
00:08:09.360 –> 00:08:15.640
But Spike Lee kind of pioneered this because he also wanted to wear a dope hat.
00:08:16.100 –> 00:08:17.000
So love that story.
00:08:17.440 –> 00:08:19.080
I’ll link it in the description and the show notes.
00:08:19.720 –> 00:08:20.440
I think it’s a fun read.
00:08:22.340 –> 00:08:29.680
And now, because I’m not following the news cycle of like hit you with the bad story and then
00:08:29.780 –> 00:08:35.860
end with a nice story. It’s called the recency effect or recency bias where you remember the last
00:08:35.860 –> 00:08:47.620
thing they talked about. But this video, I think, is too important to not mention. It is by, oh,
00:08:47.620 –> 00:08:55.060
I linked something else recently to them. Right. More Perfect Union. It’s called, I tracked down
00:08:55.160 –> 00:08:58.560
the hidden workers secretly powering chat GPT.
00:09:00.140 –> 00:09:01.760
this is something totally different, right?
00:09:01.920 –> 00:09:07.820
It’s the video talks about companies that recruit people who train large language models.
00:09:08.880 –> 00:09:14.420
And so a lot of large language models are trained with, I forget what it’s called,
00:09:14.580 –> 00:09:18.660
it’s like human reinforced learning, something like that.
00:09:19.260 –> 00:09:25.520
This used to happen in, like, countries that were struggling economically.
00:09:27.000 –> 00:09:41.040
Which is why, for example, chat GPT would use delve so much because the humans that were reinforcing the training, I think in South Africa, used the word delve a lot.
00:09:41.720 –> 00:09:47.580
So there’s a little tidbit as to why delve is used so much in large language models.
00:09:48.300 –> 00:09:54.840
But now, as the companies are trying to be more Ph.D. level, right?
00:09:54.920 –> 00:09:56.720
you’ve heard Sam Altman say this.
00:09:56.840 –> 00:10:03.160
You’ve heard Amodi say this, Dario Amadhi from Anthropic, right?
00:10:03.780 –> 00:10:08.540
That older models could do these things, but now they have PhD level knowledge.
00:10:08.740 –> 00:10:09.660
And that is not true.
00:10:10.360 –> 00:10:12.860
They are hype men trying to make billions of dollars.
00:10:13.360 –> 00:10:15.860
And so they need to paint their products in the best way possible.
00:10:17.020 –> 00:10:24.820
But they are hiring PhDs to do human reinforced training on these.
00:10:24.900 –> 00:10:33.460
large language models. And so the problems this video highlights is twofold. It is the predatory
00:10:33.800 –> 00:10:41.200
nature of recruiting experts in a way that is dehumanizing, like selling to the lowest bidder,
00:10:42.160 –> 00:10:47.760
making them or like offering like higher contracts, but like you’ve got to be available at all
00:10:47.940 –> 00:10:54.820
hours to do it. No price negotiation. You get what we give you. And, and it’s,
00:10:54.840 –> 00:11:01.480
It’s terrible work, and the people who are doing it are possibly in dire economic straits.
00:11:02.740 –> 00:11:03.840
So that’s one side of it.
00:11:04.440 –> 00:11:12.020
But the other side of it is this chilling mindset behind AI companies who want to own knowledge
00:11:12.380 –> 00:11:13.940
and sell it back to us.
00:11:15.560 –> 00:11:22.820
So, you know, I think Sam Altman is quoted in this video as saying, like, yeah, we want to make knowledge available
00:11:23.100 –> 00:11:25.400
and you purchase tokens to get a little bit of that knowledge.
00:11:26.580 –> 00:11:27.340
And that’s awful.
00:11:28.900 –> 00:11:32.320
It cuts against what got us here, right?
00:11:32.440 –> 00:11:36.140
The reason the internet was created
00:11:36.900 –> 00:11:40.780
was so that researchers could quickly share knowledge with each other.
00:11:42.060 –> 00:11:47.060
And large language models appear to be trying to paywall that knowledge.
00:11:48.880 –> 00:11:50.580
So this video is really interesting.
00:11:50.860 –> 00:11:52.380
I think like more perfect union.
00:11:53.380 –> 00:11:58.640
I’ve been watching a few of their videos and I want to dig a little bit deeper into them and like where they came from and this and that.
00:12:00.520 –> 00:12:06.160
But it’s at the very least seems like they do this like deeply researched work and it’s well produced.
00:12:09.200 –> 00:12:16.320
And it’s just an it’s an interesting thing that one might not think about when it comes to large language models.
00:12:16.540 –> 00:12:20.300
Now there is a hopeful call to action at the end.
00:12:21.380 –> 00:12:27.520
but ultimately I’m sharing this because I think it’s an important message for anybody who uses large language models to hear.
00:12:29.900 –> 00:12:35.080
All right, that is it for the Friday wrap on May 15th, 2026.
00:12:35.500 –> 00:12:43.440
If you want to get a written version of this delivered directly to your inbox, as well as an exclusive automation of the week,
00:12:44.460 –> 00:12:48.900
join my newsletter over at streamlined.fm slash wrap.
00:12:49.960 –> 00:12:52.700
This week, maybe somewhat hypocritically,
00:12:52.880 –> 00:12:53.880
given what I just talked about,
00:12:54.440 –> 00:12:57.420
I am sharing an automation I’ve set up in Claude.
00:12:58.280 –> 00:13:02.180
But that is it for this episode of the streamlined solopreneur
00:13:02.180 –> 00:13:03.100
and the Friday wrap-up.
00:13:03.100 –> 00:13:04.000
I hope you enjoyed it.
00:13:04.480 –> 00:13:08.880
If you did, again, sign up for my newsletter.
00:13:09.480 –> 00:13:10.480
Thanks so much for listening.
00:13:11.260 –> 00:13:13.940
And until next time, I hope you find some space.
00:13:14.540 –> 00:13:15.160
in your weekend.
