Search

548: Infinite Canvas, Luro + Figma, and Scraping or Crawling

Download MP3

What's going to happen to homework with AI? Thoughts on infinite canvas which leads into Luro and Sigma integration, and Chris gets nerdsniped and tries to scrap (or should he crawl?) websites for data.

Tags:

Guests

Chris Coyier and Dave Rupert in silly sunglasses and a sign that says Shawp Tawlkk Shough DOT COM

Chris Coyier and Dave Rupert

This episode is with just Chris & Dave, ShopTalk Show's hosts. Chris is the co-founder of CodePen and creator of CSS-Tricks, and Dave is lead developer at Paravel.

Time Jump Links

Transcript

[Banjo music]

MANTRA: Just Build Websites!

Dave Rupert: Hey there, Shop-o-maniacs. You're listening to another episode of the ShopTalk Show with Davey and the Chris. [Humming tunes]

Chris Coyier: [Laughter]

Dave: [Humming tunes]

Chris: I can tell what was spinning on Dave's desk five minutes ago.

Dave: We'll have to pay some royalties for that one, but... hey! We're back.

Chris: [Laughter]

Dave: 2023, here we go. How are you doing, Chris?

Chris: I wonder if any kind of AI could recognize that yet? You know? Like when you're at a bar. What's that app where it listens to the song for a minute and then figures out what it is? Incredible technology. I think we should all be pretty amazed by that thing existing.

Dave: Yeah, like Shazam.

Chris: Right. Yeah, Shazam. I don't know if Google owns it or what. I don't know. My wife, when she had an Android phone, it was always just telling her what song was playing. It was just listening constantly, which entered the realm of creepatude pretty quickly.

Dave: Yeah.

Chris: Anyway, it's just amazing how that stuff evolves and stuff. It makes me think.

I'll just hijack this right away. We'll get into a bunch of stuff.

Dave: Go for it.

Chris: The Washington Post had this great explanation of how all this AI art stuff works, and it finally made it click for me. Way to go, journalism, I guess.

Dave: [Laughter] Good. Okay. Yeah.

Chris: The idea is -- and I feel like everybody should know this because it feels like something you could almost explain to your parents or something -- this machine, it has an image of cat riding a skateboard.

Dave: Yep.

Chris: Then the piece of text that says, "Cat riding a skateboard," and knows how to connect those two things. That image is a cat riding a skateboard. Great.

Then it applies noise to the image. Then it still knows that that's a cat riding a skateboard. It adds noise, and it adds noise, and it adds noise, and it adds noise, and its goal is to continue to identify that as cat.

Then it has the second job, which is removing noise from images. Can it de-noise an image? Can it get really good at that? It does those two things.

Then the way that the art is randomized, because if you go to one of these art generators and you type in "Cat riding a skateboard," every time you do it, it's going to be different.

Dave: Yeah.

Chris: It feels like it's being creative, which that's the part that is goddam magic. But the point is, it starts out with just TV static. Just absolute total random noise.

Dave: Mm-hmm.

Chris: Then you tell it, that image, that's a cat riding a skateboard. Then it's like, "Oh, it is? I will de-noise this 100% noise to be a cat riding a skateboard." It'll just keep removing noise, remove noise, remove noise until what has started as absolute random noise is now a cat riding a skateboard.

It just blows my mind that that's how it works. It actually makes it understandable in a way that I just did not understand it before.

Dave: Yeah, that's really interesting. Well, and it's weird how it knows. Or you can say, like, in the style of Van Gough, you know? It's just like, "Oh, okay. I know what Van Gough is. That noise kind of works like this."

Chris: Yeah, that noise works like this. That's exactly it. It's the confidence that's so interesting.

I'm sure you've heard all the ChatGPT stuff. It doesn't say, "Oh, maybe I think this." It's always very confident in what it's telling to you, even if it's absolutely not true.

Dave: Yeah. Yeah.

Chris: That confidence is expressed the same way in the art generators.

Dave: Yeah.

Chris: It's very confidently telling you that this random noise is a cat riding a skateboard in the style of Van Gough.

Dave: Oh, man.

Chris: It has that same bravado. You know?

Dave: It's got my Uncle Bill vibes, just like, "I am an artist." [Laughter] Or "I'm writing a screenplay and it's going to be good." [Laughter]

Chris: God-dang right.

Dave: Are you? Yeah, okay.

Chris: [Laughter]

Dave: Yeah.

Chris: Good.

00:04:19

Dave: No, that's interesting. Not to be--

We've been doomsday about AI in the past, but it's really interesting times right now. I've been in lots of conversations.

Over the weekend, I was in three conversations about AI, ChatGPT specifically. People are just like, "How is that going to work for you? How is that changing your life?"

It's wild because this guy who was asking me, I was like, "You know it's pretty good, and it does clever results."

He was like, "Yeah, I used it. I had it write a feature for my website," that instead of paying offshore to go develop this feature for 20 hours, 40 hours--

Chris: No!

Dave: He built this little calculator they wanted using Chat, using prompts.

Chris: Wow. That's wonderful.

Dave: He committed it and put it. I'm like, that's just like, "Whoa!" You know?

I always thought AI would be coming for my job in 2070, like way after I'm out of the industry. But now it's like, now here. And my job is probably different, but it's just shocking. But that also means I don't have to ever code up somebody's weird calculator. I can talk into the machine and then build them for that. Or I can have them talk into the machine.

But what's interesting to me, the fallout is if all of us are prompt engineers and prompt graphic artists, how do we fix things is sort of my question. If somebody is like, "I want robot playing guitar in the style of Van Gough." I'm like, "Great. I did that for you." They're like, "No, but I want the moon over on the left side." It's like, "Oh... I don't know how to do that." Isn't that weird?

Chris: Yeah. I would think, in that case--

I really am impressed by the ones that take an image as input. You'd take that. Take your futuristic tablet. Draw a little circle in the upper left-hand corner, and be like, "I know what you mean. Move the moon."

Dave: Yeah. Yeah. Yeah, that's interesting, like fill in the blank, kind of, or just figure the rest out.

00:06:52

Chris: You know what's interesting. We have GitHub copilot, but there's not a ChatGPT that's specifically code design yet. You'd think that's coming real fast, especially because the most recent news was - whatever - Microsoft is going to give them $10 billion.

Dave: Oh, yeah.

Chris: And have 75% of their revenue be paid back right away. Very interesting.

Dave: I'm sure. I think the ChatGPT was just this extension of the DaVinci 002 language model. It's now 003 or something. But I think we're on the cusp of, like, if they can apply that downstream into the code completion AI, that's going to be massive.

Chris: Yeah.

Dave: Very different.

Chris: It's very interesting that it's been so effective already. You're talking to a guy who got a free calculator or whatever. That's not going back in the jar.

Dave: Well, can it get weird? He also told me the story. He showed it to his high schooler. Right?

Chris: Mm-hmm.

Dave: Very dangerous here. She took it and wrote thank you notes to her grandparents for Christmas gifts.

Chris: Oh, wow.

Dave: Like, "Write a note to nana that says thank you for the - whatever - Lego set."

Chris: Yeah, so I don't have to cough up the words, because they feel very rote anyway, right? "I really appreciate the presents."

Dave: Now, grossly, [laughter] scamming your grandparents. Yes.

Chris: [Laughter]

Dave: Grandparents should be upset if they find out. However, it's a lot of effort to write grandma. You're talking to a 2023 teen who lives by texting in 80 characters at a time. That's summoning a lot of mental acumen to write the full letter to grandma. You know what I mean? Maybe that's great.

Then you extrapolate that for writing essays in college or whatever. My opinion is, in one sense, "No, I want my kids to do it." But then my other side of my brain is just like--

Chris: Just get it done.

00:09:07

Dave: Yeah, why should my son ever have to write a paper in his whole life? Why should he? Computers do that now. Why should he ever have to write a single paper or a report in his entire life?

Chris: But surely, you want him to do something. What's the something then if it's not that?

Dave: I don't know. What is it? Yeah.

Chris: I don't know. I don't know.

Dave: The parenting game has changed for me, or just education entirely, because a paper is just regurgitating facts - more or less. It's like, "Find five facts about Britain."

Chris: Well, what's the counterargument there? Is it, "Well, yeah, but you're learning to craft pros"? You're learning where the periods and the commas go. You're learning to, in a sense, understand what your teacher expects of you and connect the dots between expectations and results and stuff. I'm sure it's not just the final product of the paper that matters there. There's a whole process that is kind of being learned.

Dave: I agree. My friend Taylor, he kind of had that argument. He doesn't care about school. He just cares that his kids know how to do the thing, like how to do something.

I think it's cool. I don't need to know how to Photoshop anymore. I can just ask the AI to make the photo for me.

Chris: Mm-hmm.

Dave: But knowing how to Photoshop is a very cool skill. That's a good thing to know. But I don't know.

In my brain, I'm just like, "Oh, cool. My son gets to learn how to write an essay when everyone else's kids are just asking Google for essays." He has to not graduate?

Chris: Yeah. It's happening really fast, too. School is notoriously moving slowly, too. Meaning that they need to react to this really quickly, and they almost surely won't, so it's going to be a weird couple of years in that regard. Not that I can speak to it. I'm not really in that industry. But hey, good luck. You know? [Laughter]

Dave: I know. Well, education is different. Work is now different. Any email or job application you get from here to eternity is possibly a robot.

Chris: [Laughter] Yeah.

Dave: How do you--? You know?

00:11:31

Chris: You've got to home in on what's uniquely human, which I think is interesting. If you want to tell your grandparents, "Thank you so much for the presents," the words are one thing, but wouldn't it be neat if we found another way to do that? In a sense, that's what kind of Facetime or something is.

You get to see each other's faces, and you can express those little shy moments where your little kid is hiding behind your legs because they haven't seen their grandma in a while. But that's that kind of human connection moment that ChatGPT can't do that.

Is there some kind of way to express to your grandparents, "When you gave me that present, my heart felt warm. I want to tell you that I appreciate you."

Who cares. If it's words that do that, meh. If it's a photo or something. There's got to be something that a computer can't do that can express that human-to-human moment.

Maybe it's not a letter anymore. Maybe it changes. It's something else now. We all strap little electrical things around our chest and--

Dave: Heart monitors.

Chris: Yeah. [Laughter]

Dave: Yeah. The Apple Health integration. That's awesome.

Chris: Yeah.

Dave: That'll bust you big time. "Hey, honey. Do you like my outfit?" "Ugh..."

Chris: [Laughter]

Dave: Meh! [Laughter]

Chris: Yeah. Was it The Circle or something? "There is no more lying in the future. Lying is deprecated."

Dave: Uh...

00:13:01

Chris: All right. Here's another one for you. You know Figma tool, design tool.

Dave: Figma. Love it. I $20 billion love it. Yes.

Chris: Yeah. Me too. They changed the game in a number of ways, and some that are positive for our industry because it's a Web app, Dave. Wow! The best design tool in the world is on the Web. Who fricken' would have guessed?

Dave: I remember we used it for a ShopTalk redesign five years ago, or something.

Chris: Mm-hmm.

Dave: It was like, "I don't know if this is going to work, man." You know? [Laughter]

Chris: Yeah.

Dave: It was on the edge. But now it's awesome.

Chris: Mm-hmm. Yeah, totally good. Conceptually, there's a thing going on that we'll call infinite canvas.

Dave: Mm-hmm.

Chris: You can hold - whatever - spacebar, I guess, and just drag in any direction anywhere you want to go. Interesting approach, right? Different, in a way, than tools that we're used to in the past. Even Sketch or some modern tools like that.

A lot of things that they wanted to know upfront was, what is the canvas size?

Dave: Mm-hmm.

Chris: They wanted you to pick it. Certainly, in Photoshop, Illustrator, and stuff. Illustrator was a little bit more infinite because you could have multiple canvases and stuff like that. And it's true that Figma has it both ways in that they allow you to draw. What do they call it? Is it a canvas?

Dave: Art board. Art board.

Chris: Art board. Yeah, there you go.

Dave: Yeah.

Chris: That has a fixed size. But while you're working, it's very infinite-like.

Dave: Mm-hmm.

Chris: Interesting. [Laughter] It's one of those things that feels like it's just in the water and that lots of things all of a sudden work in this way. It's kind of like - I don't know - firing people, I guess. That got very en vogue for a minute there to do that.

Dave: Yeah. Apparently, yeah. Just fire everybody now.

Chris: Yeah. The same with infinite canvas tools. I also think of them, in my brain, just to attach it to CSS is like position absolute tools in that these type of tools, they just don't have any sense of flow. You don't drag something around and it pushes something else.

I know there's auto layout and stuff in Figma, but that's very opt-in and not the default mode and requires a little special setup and stuff. But I kind of like position absolute tools because it gives you this opportunity. You can always just pick up and grab something, and it puts your brain in this different position that is good for design work. Good job.

Dave: Mm-hmm.

Chris: I just wrote a little bit about it as a blog post, a while back now, but I think it's interesting how many tools are like that. I wrote that, and then all of a sudden, Apple free form drops. It might have dropped shortly before I published, so I got to sneak it in there. You know. Not every day Apple releases a new app, and it's the same thing. It's just drag little sticky notes anywhere. Whatever.

Dave: Oh, yeah. Interesting.

00:15:48

Dave: Interesting. Then I happen to be using Arch. I'm still on Arch. You know?

Dave: Yeah. Me too. Same.

Chris: I'm loving Arch. They have this concept of an art board. It's a little bit of a wildcard feature for a browser. But they dogfood it because they put their release notes in it. I just saw today, as we record this, they released their January update that had some stuff in there.

If you open that up, it has that same Apple freeform kind of feel to it. It gives you this page that's not like a Notion page because Notion is like blocks, and you can drag the blocks up and down. But not just absolutely anywhere. Not like position absolute.

Dave: Mm-hmm.

Chris: They're easel app is very much more position-absolute-like. And you can draw in there and drop little images in there and put text wherever you want it and stuff. it works really nicely for their release notes, I think.

Dave: Mm-hmm.

Chris: It makes for a good example of why would I use this. Pretty neat.

But they publish it and it's almost like a website. Now, they have this advantage that Arch is only a desktop website, so they know, generally, that you have a pretty big screen in front of you to do this. That's unique on the Web because it's almost been the opposite story on the Web.

On the Web it's like, "You should pretty much assume it's a tiny little screen at this point."

Dave: Mm-hmm. Yeah. You kind of have to. Yeah.

Chris: Yeah.

Dave: You have to assume it's 320 and it can grow - kind of.

Chris: Yeah. Yeah, so that's interesting about these infinite... these tools like this is that they are pretty desktop-focused as far as I'm concerned.

Now, I don't have any huge point to build, but I was looking at this. This has been around a while, but I think it just had a big upgrade. The URL is mmm.page. They make these.

It's essentially this tool like I just described, like Arch's easels or Figma, really.

Dave: Mm-hmm.

Chris: That just allows you to just drag and drop and put crap anywhere. It all feels very position absolute, and then have that be a literal website. You publish it.

[Laughter] I hesitate to call it a CMS. It's more like just a site-building tool. And I think it's really cool.

The point behind it is, "Go crazy! Be weird! Put weird GIFs. Really express yourself." But you don't have to. You can be pretty classy about it, and they have some classy examples, too.

But what struck me about this is this is for the Web then. That assumption we just talked about, like, assume it's 320 and grow, it's a little weird here.

Dave: Mm-hmm.

00:18:23

Chris: What is their approach? It's interesting. Their builder kind of puts you into a not 320 column, but 600 - maybe.

Dave: It's like a 600 flexi-column. I'm seeing it. Yeah.

Chris: Then they just smash it down by scale. That's how the Arch ones work, too. As you resize your browser, you'll see the whole damn thing just scales.

Dave: Hmm... Interesting.

Chris: Which is something we just did not... We just did not choose that path in Web design, generally. If you want that behavior, that's a little tricky to pull off. It's probably some JavaScript and some transform magic and stuff to make it do that.

I do think that's interesting. I've just been seeing it more and more as just a responsive Web design via scale. [Laughter] You know?

Dave: Yeah. Yeah. You know it's funny. As I'm tabbing around one of their example sites -- sorry, judgy zone. Welcome to the judgment zone.

Chris: Yeah.

Dave: Tab order is a mess. Tab order is not what you want.

Chris: Wicked mess?

Dave: Wicked mess. Anyway, it actually works, I guess, is kind of the--

Chris: That's bad. Yeah, I mean the text is text.

Dave: It's tabbable. It's not just... Yeah, it's not just PNGs. You know?

Chris: I wonder if they could apply some algorithm to it, kind of a top left to bottom right. At least if the language adheres to that thing.

Dave: Order.

Chris: Attempt to not tab index, but maybe just straight up... Because all these things probably are position absolute.

Dave: Yeah.

Chris: Do it with source order. Anyway.

Dave: Anyway.

Chris: Just interesting.

Dave: Different world. I am pro let's get weird, I guess, is what I want to say.

Chris: Let's get weird accessibly.

00:20:10

Dave: But let's get weird accessibly. Let's not leave people out in our weirdness. But infinite canvas tools are interesting. I like them.

I also sometimes feel totally lost when you're on the slippery ice of an infinite canvas trying to find the thing you want. That can be a bad experience sometimes.

Chris: Mm-hmm. Mm-hmm. Don't make me take meeting notes like that. I don't want to take meeting notes like that. You know what I mean?

Dave: Right.

Chris: I need some structure.

Dave: Right. I used OneNote for a long time, and it infinite canvased your notes. It was like, "Cool, except not cool." [Laughter] I kind of want structure to my notes. You know?

Chris: Yeah.

Dave: Or at least when I add a new text field, I just want it to left align to where everything else was, unless it's a totally free-form thing. I guess that's just where it gets a little weird. It's like, "Am I taking notes? Am I making a document? Or am I making a collage?" You know?

Chris: Yeah. [Laughter] I mean I guess it kind of depends. If you knew ahead of time that you were just going to be writing down a series of bullet points, fine.

Dave: Yeah.

Chris: But a lot of times you don't really know what's happening.

Dave: Right. Right. It's kind of just a big question. No, I don't know. Let's go. Let's get weird in 2023.

00:21:44

Chris: I got a question for you.

Dave: I love questions.

Chris: A little similarity. Well, yeah, stop me if I'm giving away too much about Luro, but there's a thing that Luro can do.

Dave: [Gasp] Stop! No, I'm just... [Laughter] Go ahead. 2023, talk about Luro.

Chris: [Laughter] Yeah. Yeah. No ads in this show anymore. Just Luro and CodePen.

It connects to Figma, I guess. I don't know if there are other sources. I don't know if scrape is the right word for it. If they have an API, it's not really scraping.

Dave: Yeah. Yeah, we find all your components and stuff from your Figma, and we document that. We capture that: styles, colors, tokens, and all that stuff too.

Chris: Yeah.

Dave: Then another feature we have is pages, right? I'm kind of a firm believer of, like, let's connect the design system to the actual thing we're building. Let's not just have a design system. Let's have the thing we're building, the target that we're trying to build with the design system.

We allow you to have these pages, and we crawl your site.

Chris: A page is not in Figma. I'd put in literally codepen.io/designsystem or /about.

Dave: Yeah. URLs might be another. Yeah.

Chris: Yeah. URLs. There you go. That is a scraper, then, of sorts. You need to connect to my website. You need to see whatever, the whole DOM, I guess.

Dave: Yeah. Yeah. There's probably some nuance here, like scraper would be yoinking content.

Chris: Mm-hmm.

Dave: This is a crawler. It's just really indexing content.

Chris: Okay.

Dave: Does that sound good? Can we make that distinction?

Chris: It does.

Dave: Crawler, but--

Chris: You want all of it. Not some of it.

Dave: Technically, we do scrape the title. [Laughter]

Chris: Oh...

Dave: The title tag. Yeah. But other than that--

Chris: That's cool. I remember when we were setting up some CodePen stuff -- as time goes on, I feel more embarrassed by this -- it is entirely client-rendered, a lot of CodePen. Not absolutely all of it, but when we were Rails, it was server-side rendered.

As we moved to React, we really went all in on client-side rendering because we're not using a framework for a lot of our Next stuff. It's just a page.

We load React. React loads and it does stuff like load Apollo client, and Apollo client connects to an API, and we pull data from the API and build the page. There's no Next.js or the like to do that stuff server-side and then spit out HTML.

Now, there's going to be.

Dave: Mm-hmm.

Chris: We have all that in development branches and stuff. But we're just neck deep in a lot of different stuff, and so some of the pages that I might want to crawl. Even our about page is just client-side rendered.

Dave: Yeah. Yeah.

Chris: If I pointed Luro at it... At one time, it just got nothing, essentially. The old empty div problem.

Then two days later you're like, "Oops. We fixed that."

Dave: Yeah.

Chris: It now gets all of it.

Dave: We'll invoice you for that later.

Chris: Yeah.

00:24:57

Dave: But yeah. We were using this crawler, and I really liked it. I'm kind of managing speed and accuracy versus just robustness, I guess.

Chris: Yeah.

Dave: The goal was to get it to return something in a minute or two. A pretty deep crawl or breadth, I think, we're breadth-first. But do a pretty breadthful crawl and then surface that as quickly as possible.

HTML is awesome. And if there was an argument -- I know there's currently HTML-JavaScript beef going on right now on the Internet -- HTML is very awesome for crawlability. We go through a site very fast with just a static HTML crawler.

But CodePen is not alone. There are a lot of sites that do JavaScript or take a lot of JavaScript to render the page. And so, we ended up using Puppeteer, so we're scripting Puppeteer now with Crawlee. It's by APIFI, I think.

Chris: Crawlee?

Dave: Crawlee with two E's. Yeah.

Chris: Yeah, because right away I'm like, "Okay. If you need to execute the page, you have to run a browser."

Dave: You have to run a browser.

Chris: You're basically in Puppeteer territory, or whatever the other one is.

Dave: Yeah. Playwright, I guess.

Chris: Playwright.

Dave: Could even do Firefox, I guess. But yeah, and then I think there's maybe a world where there's these not-so-standard browsers like Flow or whatever. But I don't think that really exists yet.

Chris: But it's a dramatic difference. If you don't need to execute the JavaScript, it's a fricken' cURL. You need nothing to crawl the page.

Dave: Nothing. Nothing.

Chris: It's so light and so fast. It's great. Then the second you're like, "Oh, sorry, this is client rendered," you're like, "Oh, okay. Now I need a gigabyte of dependencies to deal with that." [Laughter] That's crazy.

Dave: For sure. I mean quite literally. It's in that territory, but it's a better product.

I think Puppeteer scripting is actually pretty awesome compared to bespoke crawling service or something like that.

Chris: Yeah. It does feel good to just use kind of browser-ish APIs.

Dave: Yeah.

Chris: Like query selector or whatever to get what you want. That's kind of nice.

00:27:36

Dave: Yeah. Puppeteer is not going to be super far from a browser or even the JS console or something like that.

Chris: Mm-hmm.

Dave: Like $$ works for query selector all. You'd have to use Cheerio or something like that the other way.

Chris: Yeah. Oh, yeah. If you pull it in node, you'd have to use some kind of--

Dave: DOM parser, JS DOM parser thing.

Chris: Yeah. That's the worst of both worlds. It's still a huge amount of dependency and it sucks. [Laughter]

Dave: Yeah, so anyway. But the tradeoff is it's slower. And so that actually spun out into a whole other thing where we were doing a synchronous task like hit the API, hit a function, boom, it comes back. Now we have to hit the API and queue a job. The job subscriber worker then runs, and it can take a minute, two minutes, three minutes, five minutes. It depends on how big your site is.

It's maybe a better UX overall, just like, "Hey, come back and we'll have it for you." But it has been a whirlwind tour. Then the next frontier, which is sort of like where we've had problems literally this week is authentication.

Cool. I have an app, but it's behind a login. [Laughter] And so, it's like, "Oh, boy."

Chris: Well, but now that you've switched to a Puppeteer-like setup, it can't be that bad. Yeah? You can get it done?

00:29:04

Dave: We can do it. It's just there are a lot of ways that login happens. Is your login button on one screen? Is it on two screens? The page that you go to after login, is it an actual page or does it depend on a question mark redirect thing?

We did one client, a customer. The login goes to a page with no anchor links. It's just divs and buttons, an Angular app. Whoops.

Chris: Well, doesn't Puppeteer have scripting stuff and be like, "Hey, if you need to login, you write how to log in, in Puppeteer code. Give it to us. Put it in this text area, and we'll execute that"?

Dave: Yeah. Generally, from a security standpoint, you don't want to execute arbitrary code. That's usually...

Chris: Not familiar with that...

[Laughter]

Dave: CodePen maybe has some experience. You guys kind of went... just gated it and said, "Why not?" Yeah.

Chris: Yeah, but these are paying enterprise customers doing it to their own website.

Dave: Right. Right.

Chris: It's not that big of a deal.

Dave: No, and so anyway, we are... But it's been very interesting, and it's a challenge.

I think we were talking in the D-d-d-d-discord about crawling because you kind of wrote a crawler over the holidays.

Chris: I didn't write a crawler, but I had an idea that I wanted to crawl stuff - just to see what it was like. I'm familiar with the concept of crawling. It's come up many times in my life. But yeah, I was thinking of you because of this exact moment where you kind of switched from essentially a cURL. I don't know what you were using, but it was probably something like that.

Dave: Yeah. Yeah, it was basically just cURL. Like, cURL go, cURL go.

Chris: Right, and then this rendering thing. And I was like, so, okay. This is what happened to me. It's interesting because it had a big up and down. At the moment, it's a down, and I'm looking for inspiration.

Dave: Oh, no.

Chris: To salvage the idea, but almost like an emotional down. Okay, I'm done carrying about this. I got nerd sniped. You know how fast a nerd snipe comes on? It comes off. You know? [Laughter]

Dave: Mm-hmm. Oh, yeah. Yeah.

00:31:20

Chris: The nerd snipe was my wife being like, "Oh, you know--" It was, I think, this Bluey. Y'all know Bluey, right? If you have kids, you know.

Dave: Yeah, I know Bluey. She's great.

Chris: [Laughter] I like it too.

Dave: Oh, yeah, Bluey. Whack-a-doo, Bluey.

Chris: [Laughter]

Dave: Oh, Bingo. What are you doing, Bingo? Oh, whizza-whizza.

Yeah. I don't know. We know Bluey.

Chris: Perfect. Yeah. They have a stage show, live in-person.

Dave: Oh, live!

Chris: I'm sure it's coming to Austin. Austin is a big city. It's coming to Portland, but we found out about it just in time for it to be sold out. You know?

Dave: Ah... nuts!

Chris: Yeah. Bummer. But you know how you find out about stuff like that is in the local rag.

Dave: Mm-hmm.

Chris: Or you're lucky enough to catch it online or something. You're a little bit late, but those things, they get their source probably right from the websites of who knows what. But the idea is that maybe if you knew what all these sources were, like, for example, the websites of the venues of these things, and you were just looking directly at them all the time, if a new thing was published, you could have a crawler that's like, "Ooh, they just published something on the Portland fancy venue place."

I want to know. Send me an email or whatever or just have a dashboard I can look at. Maybe get a daily email, digest, something that scrapes known websites that publish events and then has that information.

I was like, okay, this is an opportunity for me to pick all kinds of fancy stuff. I also had a friend who was like, "You know what we--" He has a shop. I won't say too many details. It's not illegal. It's just one of those weed-selling stores.

The idea is like wouldn't it be nice to have information about the prices of all this stuff. In that market, things are all over the place.

Dave: Sure. Okay.

Chris: It'd be really nice to know what's up, like who is selling what for what.

Dave: For sure.

Chris: That's a scraping job.

Dave: Yeah.

Chris: It's all publicly available. You go right to their website and be like, "Oh, this weed sells for this much." You could write a very advanced scraper for that.

I was always like, "Oh-" You know he'd talk to me, and I'd be like, "Ah, it's just a scraping job. You can outsource that. Meh..." You know?

Dave: Mm-hmm.

Chris: But I'm like, "Is it?" I'd like to experience a little bit of how hard really is this job of scraping. And so, my wife's idea of--

Dave: Bluey, a Bluey tracker.

Chris: Yeah. Yeah, right. Not just Bluey, but just other events in that category. I'm like, "Oh, I got this." So, I was like, "I'll just pick all new technologies here." I'll pick a scraper.

How would I write this thing as a cron job? Well, I'll just do the Netlify thing because they've got scheduled functions.

Dave: Yep.

Chris: They can run them in JavaScript, TypeScript, or Go. I'm like, "Well, why don't I write it in Go?" It's fun to write something... Because I write Go at CodePen all the time, but I rarely write it outside of the context of our own codebase, so maybe that'll be an excuse.

Dave: Yeah.

00:34:17

Chris: I'm like, "Oh, surely there's a Go scraper. Oh, sure. Here's one called Colly. I'll just use that."

It's like cURL, though.

Dave: Yeah.

Chris: I knew that going in, though. I made that choice upfront. I was like, "I don't actually care that much about this. If this becomes a real thing, I'm happy to switch scrapers or whatever.

Dave: Mm-hmm. Mm-hmm.

Chris: But it is an API that's very weird. There's no query selector all in Colly. It's very Go-like API.

It was nice enough, but there is six, seven, eight lines of code for each website I want to scrape that is so bespoke to scraping. It's that moment where you're like, "Oh, what do you call a UL with events in it?"

Dave: Right. Yeah.

Chris: You call it some weird Angular something. And that's all you got is the DOM. That's all you got. If you had to Nth child that thing, that's what you'd have to do.

Dave: Mm-hmm.

Chris: That requires somebody with just deep kind of understanding of how to query for things on the Web and know that it could just break at any time. These companies are not beholden to you in any way.

Dave: Yeah.

Chris: That is the most brittle code you'll ever write is some scraping of somebody else. You'd probably have to wake up and validate your scrapers every day if you were really serious about it.

Dave: Yeah, and that's something we had to do because we do a little CSS selector stuff, like, "What's your input?" You know?

Chris: Yeah.

Dave: But I had to do a lot of validation this week just around, like, oh, if there are two buttons named the same thing or two lists named "event list," which one? Which one do you mean?

Chris: Yeah. Well, even that.

Dave: It's hard.

Chris: Yeah. Yeah.

Dave: Yeah.

Chris: Well, a wrapper and its internal have the same class. Oh, great. [Laughter] Cool.

Dave: Yeah. Blah!

Chris: [Laughter] Anyway, so, okay. I'll write in Go. I'll use Colly. I did it. It's fine. It gave me, you know, play with types and stuff, so I had to make an event type in Go and then pull all the information in a pen to the array and then have it crawl all the sites, make one big array of all the events, and then return it as JSON. Of course, Go has no problem with that. It's very Web-friendly language in that way.

I just do it on the fly. Every time I hit this function, it does the cURLs. It crawls the stuff. It puts it all together - or whatever.

I ship it to Netlify. There's the function. But of course, that's the least responsible thing you can do. You should not be crawling per URL hits.

Dave: ...request. Yeah.

Chris: Yeah, that's horrible. So, I'm like, ah, maybe an on-demand builder.

Chris: ODB, yeah.

Chris: That's the Netlify thing saying only run this function and then cache it and de-cache it sometimes. You can probably do it on a schedule. I don't know.

Dave: Yeah. Once a day or something. Yeah, you can set that up. Yeah.

Chris: Yeah. Unfortunately, Go feels like a little bit of a second-class citizen in Netlify land - a little bit. They don't have on-demand builders for Go.

Dave: Gah!

00:37:14

Chris: Blah! Nor do they have the internal scheduling of the things. You can do it via their Netlify toml file. Be like, "Run this once a day," or whatever.

But anyway, I'm doing it. But just the scheduling doesn't help in this case because the data isn't being stored anywhere.

I'm like, "Ah, maybe I'll spin up a PostgreSQL." We've talked about that enough times on this show. You're like, "Where would you put a little bit of data?"

Dave: Yep.

Chris: It comes up all the time on this show and in our Discord and stuff.

Dave: Yeah.

Chris: But I'm like, I don't know. I should know PostgreSQL better because we use that at CodePen and it's kind of like a very good choice these days - it feels like.

Dave: It's MySQL with muscles is how I'd describe it.

Chris: Ah, there you go.

Dave: You know? It's like, oh, I wish MySQL did JSON. It might now, but PostgreSQL was like, "Yeah, I do it and I'm really good at it."

Chris: Yeah. Yeah, it is good at it. It's got all kinds of cool little features. I needed none of these cool features. I just like the idea of, like, there's a table and it has columns of types. You put stuff in the columns.

Dave: Sure. Yeah.

Chris: Not that hard. So, I spin up a Supabase kind of for the first time ever that I've spun it up, which is all PostgreSQL all the time. Every Supabase has a PostgreSQL in it.

I didn't even have the type the SQL to make the table. They have a whole UI just for being like, "What columns do you want? What types are the columns? Hit the save button." Just that alone, I'm like, "Hell, yeah, Supabase."

Dave: Yeah.

Chris: Well done.

Dave: Supabase is very cool. I kind of wish it was around when I started or maturing when I started. But anyway--

Chris: Yeah, clutch. And they have little -- I don't know. What would you call it? An SDK, I guess, where it's like, "Just import Supabase from Supabase." Then if you want to update information and stuff in the database, you're not writing SQL statements to do that. You're using their little nice API for it.

Dave: Yeah. It's an ORM.

Chris: Yeah. Hell, yeah.

Dave: Active record sense. It's just like, whatever, await post or Supabase.post. Then it's just like, pew.

Chris: Dot update this.

Dave: Yeah.

Chris: Great. Super cool.

Dave: Great.

00:39:20

Chris: And no bindings from Supabase to Go, but there's a userland. There's a userland one and it's fine. And it just mimics theirs identically, for the most part. So, just yank that in, in my Netlify Go function. And instead of just returning the JSON, I just plunk the data into Supabase. Then I change.

Then I have a second function in this case. And all that does is connect to Supabase and pull that data.

Dave: Mm-hmm.

Chris: I do it in TypeScript because I've been doing everything in TypeScript lately. So, welcome to the party, me. Meh.

Dave: Not good, huh?

Chris: Yeah. Yeah. [Laughter]

Dave: Glowing review.

Chris: I just can't even talk about it.

Dave: Okay.

Chris: Because it's just so... just... [groans]. There are very rare moments that I even like it at all, but anyway. Whatever. Now it exists. It's a thing. It's another very complicated language.

Dave: I'm probably heading that way in 2023, and I'm mad already about it. But it's fine.

Chris: [Laughter] I am mad about it.

Dave: Yeah.

Chris: Now that I know it, I'm trying to stop my brain from being like, "I know it, so I like it now." Once you know stuff, there's this tendency to like what you know. But I still have this lingering thought of, like, "Yes, but how many problems are actually about types?" So few. Bugs are usually about something else.

Dave: The dependency I was using, I think it was Crawlee was written in TypeScript. I clicked the function name, and it took me to the function definition in the file. I was like, "Okay. That's cool."

Chris: That's great. If you're writing something for other people, it should be in TypeScript because it just makes it so fricken' great to use. Yeah.

Now I've been writing Go for long enough, so the type thing doesn't really bug me because everything in Go is typed. But TypeScript almost feels more robust types, like it can do a little more. But by virtue of that is more obnoxious.

Dave: Yeah.

Chris: [Groans] I have a million thoughts, but we won't go there. But I write it in TypeScript, so now I have types in TypeScript for my events and types in Go for my events.

Dave: Mm-hmm.

Chris: They sort of match, but not all the way. It was relatively satisfying. But then kept running into little bugs because I'm like, hmm...

Dave: Yeah. I was going to say, you didn't seem stoked on the project, but it sounds like you were able to do it. What's the--?

Chris: Yeah. It's like done-ish. But I had little data problems because I kind of want to use... I know so little about databases that it's hard. Like, here you go, Dave. You can just answer for me because I never got to the bottom of this.

Somebody in the Discord... I think Andrew had the good idea of using the URL to the event on the canonical website as the primary key, essentially. If you scrape again, because you wouldn't just scrape, put it in the database, scape, put it in the database. You can't do that. You'll have duplicates all day. You need to know if it's a duplicate when you scrape the second time.

The idea is, well, I'll just find in the database based on the URL, and if that's a match, I'll just overwrite the data there on the new scraping, which is fine, so no duplicates.

Dave: Yeah.

Chris: But by default, every table in the world just has an ID, and it's a bit int - or whatever - that's the primary key. But would I make, would I actually make the primary key the URL instead?

00:42:47

Dave: Uh... I would not. But that's basically how Luro works. We have a URL and that's... Actually, I don't think it's unique. But if you... URL, the name means uniform resource locator, so its job is to locate resources. That's what it's for.

We kind of made the decision or the assumption that that would always be unique. Does that make sense? Then maybe instead of making that your primary key or whatever, you can just kind of say this is unique or we're just going to assume it's unique and sort of query off of that - or whatever - go do our fetches based on what we think is the unique URL.

Chris: Right.

Dave: Your events would have URLs, probably, like a deep link to the page or whatever. And so, you could almost always assume that that is unique to at least that event center.

Chris: You can also just say, in PostgreSQL, this column thing has to be unique.

Dave: Yeah.

Chris: And it will just panic if it's not, if you try to write a new one with the same value.

Dave: Yeah. Then when you have a unique, you can actually... I think Supabase will probably do this. I know Prisma does this. One you have unique, you now have... You can be like, find by URL - or whatever. Then pass it, and it works just like an ID.

Chris: Yep. That part was working. But there were a couple of problems. One, if I did a find or whatever it was, like, "Hey, I'm looking for one single row."

Dave: Mm-hmm.

Chris: "Look for this by URL," it was just failing. It would always return nothing found. I was like, "Ugh! Is it escaping or something? Why is it not finding this exact string search in the database?" I could not figure it out. That was kind of a low moment.

Dave: Oh...

Chris: But then I could search by title. If I had the title of the event, that would turn up just fine. I'm like, "Fine. I'll just use title for now as my uniqueitude search."

Dave: Hmm... Yeah.

Chris: Then I was starting to write. I'd clear the database and be like, "Let me just run a fresh scrape." Do it. The API I was using had some problem with writing or something. This was after a long Saturday where somehow I got the freedom to just code on nonsense all day, and I took advantage of it.

Now it's like 10:00 p.m., and I'm drooling, staring at the screen. I kind of left it at that. Then I was kind of like, "I don't know that I have any more motivation anymore to care about this." Then it's Sunday and I don't really want to work on it. Then Monday is back to work.

The thing is just kind of sitting there super close to functional. But then I think back on, well, it's this cURL scraper. Obviously, it has super big limitations that way. And it has this idea that there's all this code in there that's so specific to these few websites that have it, and I didn't really find a collection of amazing websites to scrape anyway.

Dave: Mm-hmm.

Chris: I found one, basically, and then I added a couple more just to prove out the concept. But I don't really care about the other two. I'm like, "How useful is it to scrape one website?" You're like, why don't you just go to that website? [Laughter]

Dave: Yeah. Yeah.

Chris: Between all those kind of things, my nerd sniping energy just disappeared. I was like, "Meh."

00:46:18

Dave: Yeah. I mean maybe it's not the answer you want, but I feel like that's an awesome way for the project to end. You had an idea. You prototyped it. You built it out. It's functioning. Then you have an idea of what it might take to go full.

Chris: Yeah.

Dave: A full experience. You're like, "Well, no. I'm out of energy," and that's great.

Chris: Yeah, I think you're kind of right. TypeScript, Next.js, and Next.js that would do the server-side props thing, so it would connect to Supabase, get the data, render the page as HTML on the first response.

Dave: Oh, I still have to try that.

Chris: So, to SSR, which is nice.

Dave: Yeah.

Chris: I used open props, Adam Argyle's thing, to style the whole page.

Dave: Yeah.

Chris: With just custom properties and shit, and then TypeScript. It was a real interesting collection of technologies. I'm like, "Wow! Apparently, I'm a full stack dev now," because that's about as full stack as it gets if you ask me.

Dave: That's full stack, my guy. Yeah. Nice!

Chris: [Laughter]

Dave: You did it. That's wonderful.

Chris: Yeah.

Dave: No more front of the front for you. You're full stack.

Chris: But you have a real product. That was just screwing around. Yeah. It was most impressive to me that you're like, "Oh, crawler? I'll just rewrite our entire crawler with this new thing in two seconds."

Dave: [Laughter] Well, yeah. I got help from Kyle Zinter, who is our dev on the team. But I think he did most of that rewrite, but we hit this... It happened...

Your problem happened for us at the right time where it was just like we were thinking, like, "Oh, we need another thing to go and--" whatever. I forget what it was exactly. It might have been the authenticated crawls just kind of happening all at the same time.

Chris: Oh, so you are doing authenticated crawls now.

Dave: We have the thing, the technology. But yeah, it was a little... We found out it busted. Either Puppeteer updated - and that's another whole deal is dependencies, right? But Puppeteer updated and crawls were just falling apart. We had to kind of redo some of the logic and make sure we handled any kind of race conditions.

It's weird. If you simulate a click on a button, you're actually clicking it in the headless browser. But you're waiting for something. You can get in a situation where the wait resolves before the click. Anyway, we're kind of fighting that right now. Anyway.

What's weird about authentication is just the myriad of ways that can exist is just massive.

Chris: Yeah. Yeah. Yeah.

Dave: It's just stupid.

[Laughter]

00:49:08

Dave: This is what I'm realizing, too. Maybe this is my big criticism. We look at design systems a lot, too, like Storybooks and all that stuff. We're not exactly replacing your Storybook. We think it's cool that you have one. We want you to have that, right?

How people set up their Storybooks and how people set up their Figmas and how people set up--

We talked design systems for ages, right? But how people are setting this stuff up is just wild west.

Chris: [Laughter]

Dave: It's like, "I invented it." You know? There's a human element in it. It's all centered around human things. It's like, "No, I need a story with a button that has too many characters in it. I need to have that." And so, that goes in the system.

Chris: Isn't it the story of all time, this opinionated versus unopinionated? You don't get the luxury of being opinionated because you're dealing with other people's stuff.

Dave: Yeah, so I have to kind of like ocean boil quite a bit. [Laughter] That's hard. That's hard to do, but I don't know.

I think we're the 2023 thing. Design systems are great. I think they're a little ripe for some disruption and having tokens in a W3C thing. I think that's big. It's just interesting.

The whole thing about design systems is reduced, duplicated effort. Right? App one and app two inside your company shouldn't be making their own tabs.

Well, cool, but we are duplicating that effort across thousands of companies. It's wild. It's a lot of action is what I'm trying to say.

00:50:50

Chris: But now that you have the word thousands in your brain, you can know that you can always make decisions around that. If any one of them is like, raise his hand, "Bespoke thing, please," you can be like, "No." [Laughter]

Dave: Right. Right.

Chris: [Laughter]

Dave: It's hard when you're a small company.

Chris: Saying no sucks in that way.

Dave: Like very few companies right now. Yeah. Yeah, so we're trying to say yes as much as we can.

Chris: But a yes is $100,000. [Laughter]

Dave: [Laughter]

Chris: Or something. You know?

Dave: What happened was... Yeah. No. Yeses can be expensive.

Chris: Yeah, indeed. They can be... When Alex talks about it for CodePen, he's like, "Through the first years of our life, all we did was Twitter-driven development," he calls it.

Dave: Yeah.

Chris: Because people submitted one Tweet. Somebody would like, "You should do this," and we'd be like, "Okay."

Dave: Well, I'm sure.

Chris: And then we paid for it forever.

Dave: I'm sure that's my fault, too. You can invoice me.

[Laughter]

Chris: Then we grew up. Not that you've seen a lot of releases lately, but it's kind of cool to be working on this big, new release as adults. We have all these challenges ahead of us, but we can do it with, like, "I'd rather fricken' get this perfect." Perfect is not the right word but solid. Do it with all the right principles in mind and then ship it, or not ship it. Not that that's any risk, but I really feel that seriously about putting out junk is not in the radar anymore because of the pain that it causes, real physical and emotional pain to ship a busted ass thing.

Dave: Nah...

Chris: And live under the burden and weight of that, no. No thanks.

00:52:30

Dave: There's a really good Shigeru Miyamoto (creator of Mario) quote that is, "Get the fundamentals solid first and then do with whatever time and ambition allow." You get it solid, and then you just keep iterating - whatever - spinning up new levels (in his context).

They figure out how to make Mario jumping feel really good. What an enemy fight, what an enemy stomp is. Okay, cool. Now we can build out the game.

Almost every Mario game has been late because I think they just realize, okay, our fundamentals are solid first. But then once they have it, then boom. What is it? Mario 64, 120-something levels.

Chris: Oh...

Dave: Just shitting out levels, Chris.

[Laughter]

Dave: You know?

Chris: Right.

Dave: Once you have the fundamentals in place, now you can do a lot of big stuff. That's the principle I try to abide in.

Chris: Nice. Nice. That's good. I love that. That stuff... The older I get, the more it appeals to me.

We had a principles discussion just on Monday at work. I won't steal Alex's, but he found someone or it was called The Principle of Least Surprise that we tried to adhere to. It was cool.

Dave: Ooh... I want to know more.

Chris: I know.

Dave: It sounds like a CodePen Radio or blog post I'd love to hear.

Chris: Yeah. I hope so.

Dave: Awesome.

Chris: All right, man.

Dave: Let's wrap it up. Thank you, dear listener, for downloading this in your podcatcher of choice. Be sure to star, heart, favorite it up. That's how people find out about the show.

We have a Twitter. Sure. @ShopTalkShow.

Chris: Maybe we should pick an instance.

Dave: Maybe we'll find an instance. Yeah.

Then head over to patreon.com/shoptalkshow and join the D-d-d-d-discord. Popping off. Yeah.

Chris, do you got anything else you'd like to say?

Chris: Uh... ShopTalkShow.com.