Maggie Appleton talks with us about her work at Elicit, working with large and small language models, how humans vet the responses from AI, the discussion around the Soggoth meme in AI, using Discord as UI, what to do if your boss wants AI in your app, and why does she call her blog a digital garden?
Time Jump Links
- 00:20 Welcome
- 01:29 Introducing Maggie Appleton
- 02:05 Working with language models
- 03:16 What is Elicit?
- 06:29 Humans vetting language model data
- 08:52 Do we still have to know what questions to ask?
- 11:13 The Shoggoth meme
- 17:21 Language Model Sketchbook and hating chatbots
- 23:03 Is fine tuning something you do once or each time?
- 29:08 Personal feelings in response to generative art
- 33:31 Using Discord as your UI
- 37:06 Environmental concerns around AI
- 40:19 Training on smaller data like a single website
- 45:54 How do you test AI models?
- 47:04 My boss wants AI - what do I do?
- 51:17 How does Elicit vet and process PDFs?
- 53:42 Why is it a digital garden instead of a blog?
Episode Sponsors 🧡
MANTRA: Just Build Websites!
Dave Rupert: Hey there, Shop-o-maniacs. You're listening to another episode of the ShopTalk Show. I'm Dave--a little bit sick and literal chainsaws in the backyard--Rupert. And with me is Chris Coyier. [Laughter] Hey, Chris.
Chris Coyier: [Laughter] Oh, I'm sorry, Dave.
Dave: Yeah, so I should say I'll probably be muted occasionally. Some hot mic mutes because I have people trimming my trees, and you know.
Dave: Just the way times work. It had to be during the recording.
Chris: Yeah. I mean, for me, it's all I can do not to stand there and watch them. You know?
Dave: Oh, yeah. I am, but I'm also terrified because it's just these big branches coming down. [Laughter]
Chris: Oh, yeah.
Dave: I've already lost a lot of big branches, so this very interesting. Anyway, that's me.
Chris: Out of sight, out of mind.
Dave: Chris, who do we got on the show today?
Chris: Yeah, we are joined in the virtual studios today by Maggie Appleton. Hey, Maggie. How are ya?
Maggie Appleton: Hi! Thank you for having me.
Chris: Yes! Wonderful. Long-time fan. Owner of a spectacular website. I assume that's how I know you in some way. You know?
Once in a while, drop a real banger blog post and it goes around like crazy. That's happened recently. You're getting involved and have been involved for a number of years now in kind of the larger world of AI, wouldn't you say?
Maggie: Yeah. I'm specifically in the language model world, I'll say. I think we're putting a lot of big things under this umbrella of AI. I'll say I know nothing about generative images or any of the very tricky moral issues of artist rights in generative images, which is also a good hot topic on Twitter today.
I am very firmly in the ChatGPT-esque.
Maggie: Like words, just generating words. Not necessarily solving cancer with neural nets or any of that kind of stuff. Just the words bit. [Laughter]
Chris: Just the words bit. Okay, so when you say language model, did you omit "large" on purpose?
Chris: Do you feel like that's used a lot, LLM? I don't know.
Maggie: Yeah. The large versions are the most popular versions. Those are definitely the favorites. But you can have small language models that are trained on less data and kind of just do very small, specific things and aren't just big, giant reasoning engines, the way something like ChatGPT is. So, language model is maybe the larger category, and then large is one of the subsets.
Chris: Oh... See. God darn, I'm learning stuff already. This is great.
Maggie: Sorry. Too early.
Chris: No, it's good. It's good. And work is Ought?
Maggie: So, I work for a company that used to be called Ought but has just been renamed Elicit because we made one product that was called Elicit, and then Ought was the larger research lab that created this product. We decided to simplify things, stop confusing everyone, and just call ourselves Elicit, and so that's now the official name.
Chris: I get it. Everybody's favorite. You Basecamped, I guess. Remember that?
Maggie: Yeah, we did. Yeah, we did. [Laughter]
Chris: Cool. All right, so that's that. Maybe... Actually, can we start there a little bit, just so we know what it is and, thus, stuff that you think about? What is Elicit then?
Maggie: Absolutely. I joined Elicit (previously Ought) about a year ago. Although, I knew about them beforehand and was a user and fan. We use large language models (and small language models) to automate a process called Literature View, which is something that researchers and academics and large organizations like governments, think tanks, and NGOs do, where if you want to run a scientific experiment or implement a policy in government, you first have to read all the scientific literature that exists on the topic that you're interested in.
If you're like, "Um... Should we go give iron supplements to every child in a country?" you first have to go read every single thing science has ever written on iron supplements, which can be tens of thousands of papers.
Maggie: The way this usually works is they pay a bunch of grad students pennies--although not pennies; maybe like a couple of dollars an hour--to sit and read PDFs and extra a whole bunch of data into a really large spreadsheet for months on end. It's super boring, hard work that humans currently do. This is something that language models are actually really well suited for. They're very good at reading large volumes of text and extracting information for us and summarizing it.
Our product does this. It just allows you to find papers, upload papers, and then do tons of data extraction over them specifically for the literature review process.
Chris: Interesting. Immediately, perhaps maybe rightfully so, the mind is like, "Ooh... but don't they lie and stuff sometimes?" If you're talking about if I should give iron to kids or not, shouldn't maybe I read that paper?
Maggie: Yes, that is a very, very valid concern. One reason that the lab picked this problem to work on is because it requires really high accuracy rates. The lab is really interested in AI safety and alignment in a broader sense, so one of the research goals is to figure out ways to make language models more truthful and more reliable. This was a kind of product-shaped problem that also allowed us to do a ton of research into figuring out ways to make the models more reliable.
A lot of our work has involved designing systems that get models to double-check their answers. It involves getting humans in the loop, so getting humans to double-check answers that the model has returned.
We have, at the moment, done a lot of refinement work over our infrastructure so that we're above 90% accuracy on most answers. But 90% is still not 100%, especially when you're dealing with science, medicine, and things you need to be super sure of.
As the interface designer, one of my jobs is to design interfaces that encourage, enable, and make it super easy for our users to go double-check every single answer in the results if they need to, if they need that level of--
Chris: Oh, really? Oh, wow.
Maggie: --yeah, scrutiny. So, we point them to the exact quote in the paper where the answer came from. They can easily go double-check it.
We're thinking of building systems that let them go through each paper one by one and mark it off as reviewed or not. It's very much being designed with human vetting and humans in the loop as part of this system.
Chris: Wow. Yeah. You're telling me a language model could link to a credit source. It's just that they generally don't.
Maggie: A language model alone cannot. But if you build a system that is larger than the language model, it can. We do a whole system called composition where we sort of do many small language model calls in combination with other kinds of programmatic functions and traditional programming things where we sort of be like, "Okay, read this paper. Find all the paragraphs that mention whatever question the user is asking. Check for specific sentences that might be relevant to them. Stack rank those sentences. Return those sentences to them."
Versus just asking for a generated answer, we're instead asking the models to do very different things like find the most relevant sentence and show that to the user, which is a totally different process.
Dave: Could I use this product to become the next Malcolm Gladwell?
Dave: He takes sociology papers like, "If you drink milk before breakfast, you are smarter during the day."
Chris: You're better at tennis--
Dave: Can I be like, "How to be better at tennis, the Dave Rupert Gladwell way"? Could I do that with your product?
Maggie: Hmm... Hmm... Uh... It's interesting.
Dave: Does the world need that is maybe the bigger question.
Maggie: Yeah. Also, how rigorous do we think Malcolm Gladwell's scientific research is?
I think you could probably do something with the tool that would, yeah, get you to corral a lot of the literature that you maybe wouldn't have manually done yourself and speed you up so that you could make maybe slightly wild, out-there claims, making sweeping statements about the world. I think that you could probably do that. Yeah. [Laughter]
Dave: Cool because I don't need to be good. I just need to be better than Malcolm Gladwell.
Maggie: Right. Don't need to vet it. [Laughter]
Dave: And that's maybe a 50/50 bar.
Maggie: Yeah, it's not a really high bar sometimes. [Laughter]
Chris: I'll take the bottom row at the airport bookshelf. That's fine.
Chris: I don't need to be up there.
Dave: That's fine. I just want to be in the airport, baby. That's good.
Chris: Just want to be in there.
Dave: Get me in there.
Chris: But you're asking for insight, though, which is great because that's what everybody is after. That is appealing to me in some part of me is like, could I give you a bunch of data and you give me actual insight?
We lived through years where everybody was like, "You know what? We need some analytics on this website. Google Analytics is free. Slap that on there," and it would be just vacuuming up data about who is visiting and what they're clicking on and how long they're there and all this stuff. But it never delivered. It never gave us any real insight.
Chris: Eventually, we had to learn. You had to have a question. You had to be like, "Is this page more popular than this page?" or "If I make this change, do people stay on the website longer? Do I even want them to stay on the website longer?"
I had to form those questions. Do you think we'll still have to be forming those questions, or do you think insight can be delivered to us?
Maggie: That's interesting. I think we still have to ask good questions. I think that's actually most of the skill of using language models well is what we might call prompting is kind of the thing. Prompting involves many things including telling the model it's a very clever, intelligent, attractive model, and so it should answer you with a correct answer, which always improves the results.
Maggie: But also, it involves asking the correct question. But your example of getting insights about a website from a whole bunch of raw data, language models are actually poised to be quite useful at that exact thing. They're very good at taking a fuzzy data space and then being able to answer very specific questions about it.
It's probably still going to have hallucination problems. I mean hallucination is built into language models as kind of almost a first-class citizen. We will never get rid of hallucinations, but we can sort of tame it and control it with various techniques like the kind we use to elicit. But it's still... They hallucinate less and less now that we've discovered more and more techniques to kind of rein it in.
Maggie: I think something like asking for these very specific insights is totally a thing it would be able to do quite soon or existing systems could do much better than... Like if you just hired some random human to do it for you, I don't know that they would perform better than a model on this kind of task.
Chris: That's both very cool and scary.
Maggie: [Laughter] Yeah, a little bit.
Chris: I think this maybe made the Smashing... I wasn't at the conference. I didn't get to see the talk, but you wrote it up. You have a blog post of your latest presentation, all kinds of stuff in there that I'd never heard of before. Apparently, there is this mascot for language models.
Chris: Lovecraft. Yeah. "Shaw-gah," is that how you say it?
Maggie: Shaw-goth, I think that's how I pronounce it, but this might be a giff/jiff sort of debate.
Chris: Yeah. You described it as an amoeba, a gray amoeba with eyeballs on it and stuff. Very creepy, sci-fi kind of looking thing. But the point is it's a metaphor to think of a language model as squishy, right? That thing clearly is squishy. Do I have that right? Yeah.
Chris: So, if I ask this squishy thing, I'm not going to get this deterministic answer like I might in a Jest test or something. Right? I put in this and get this. It's going to happen. It's a linear line through there. I'm going to get some squishy answer back.
I don't know why. I feel like that's related to something we were just talking about.
Chris: But I have now lost the thought. Yeah.
Maggie: Yeah. Yeah, the Shaggoth character came out of the AI safety Twitter space, so there are a lot of people on Twitter who are very concerned about language models and generative AI, in general, sort of developing very, very advanced intelligence in recent capacities and possibly, in the future, plotting against humans secretly and then murdering us all sort of in the night, as it were.
Maggie: Somewhere between 5 to 30 years from now is most people's timelines, which is quite short. And one of the popular memes that came up out of this community was, yeah, this big, crazy, squishy creature with lots of eyeballs that's very scary. And the metaphor usually refers to, like, "We have trained large language models with so much data, we essentially scraped the entire Internet and fed them to these models," including all of 4chan and probably all of 8chan, and everything on Reddit, and lots of legitimate stuff like Wikipedia and books, but also all the dark kind of scary corners of the Internet. That's how it learned human language, so it probably has quite a scary, warped understanding of what humanity is, and we have now trained these models to be very polite when we talk to them in ChatGPT, but there's sort of this still dark underlayer we don't know much about, and that's supposed to be this big, scary monster that we sort of try to tame and put very nice, happy faces on, but underneath could be quite nefarious.
Chris: Hmm... Okay. I think that's how I was kind of trying to connect it is that you can type something into a model, perhaps prompt it, get something back, and that there's almost... I don't know if you used the metaphor of a dial or something, but that's what I think of is that you can have it all the way to the left like full squish, like, "Just give me whatever. just be weird. Write a poem. Answer a joke. Whatever." Or you can crank it the other way, which is just kind of enforcing more structure or combing the results or something. And that they both have benefits. It's like there's kind of a nice zone in the middle of that that's not just full-blown weird and full-blown structured.
Maggie: Yeah. Yeah, so full-blown squish, so full-blown the model just saying whatever it would naturally say, we don't even really have access to that version of that. That's what's called a base model, and that's something that only large companies like OpenAI or Anthropic or Google.
Maggie: They all have a base model that none of us have ever talked to, which probably is quite unpleasant to talk to, frankly, and very weird and crazy and might not even make much sense to us. But they've trained it very well (through reinforcement learning and through prompting) to only say things that seem smart, intelligent, and are in correct grammar and are useful to us.
Chris: Hmm... Okay.
Maggie: And so, they've already applied some structure for us. Then when we interact with something like ChatGPT, we can apply even more structure by saying, "Only print out the numbers one to ten." Most of the time, it's not going to print anything other than the numbers one to ten. We give it a very small boundary to sort of reply within, and so that's kind of turning the structure all the way up to the full structure side.
Or we could give it examples of only counting to ten over and over and over, and we could guarantee you it would never reply with anything but numbers one to ten.
Maggie: But sure, if you ask it something super open-ended or you just typed a bunch of really random words into the ChatGPT interface, you have no idea what it would give back, so that's a lot more squish.
Chris: Right. Sometimes that's good, right?
Chris: I don't know. Trust is all over the place on this, but one way to use it but not trust it in any way -- and I'm just saying it meaning just any model, I guess. I feel weird saying it that way.
Dave: Shaggoth, that's who you're talking about.
Maggie: Yeah. [Laughter]
Chris: Is that--? Even if I have no trust in it, I might still ask it stuff just because I want to be amused or bust me out of some writer's block or I've just got no ideas. Just help me out. There's no way I'm going to use what you give me verbatim, but it might help me out of a bind kind of a thing, and that can be awesome, right?
Chris: I don't trust you to give me anything useful, but you might help me just slap my creativity engine started.
Maggie: Yeah, that's my favorite way to interact with models, and I think the way the vast majority of us should be relating to them right now is rubber ducks and not sources of truth at all. They're genuinely, as you say, sounding partners, thinking partners.
I do a lot of... I also use Whisper, which is the voice-to-text transcription, which is amazing. it's like the best voice-to-text transcription.
Maggie: And you can hook that up then to a language model generator. And you can talk to your computer and have it sort of talk back to you in text for, like, brainstorming out blog posts, that flow is really satisfying, and that's very rubber ducky.
Chris: Yeah. Don't listen to this, Tina. Tina, she's our transcription person.
Maggie: Oh... [Laughter]
Chris: You're good. [Laughter]
Chris: Okay. Oh, I love this. So, one of the ways that you came up on this show, this was a number of episodes ago, but you had a blog post that ended in "Why I Hate Chatbots," in a way.
Maggie: Hmm... [Laughter]
Chris: That was kind of like maybe that's the least common denominator of ways that you can interact with a model. There are other ways to do it. Right at the same time, I think Amelia Wattenberger had a similar kind of thought. It just seemed to be in the water.
I had never thought about it, but it was really beautiful. You showed off this demo of a UI that required no prompting. To me, it seemed like there were already words on the page. There's already a blog post or something. Then you had these colors off to the right that was like, "Take this sentence and change the tone of it, or give me different types of advice." I thought that was so clever because I'm like, "Yeah," not that I... I mean I might be there with you.
I hate having to write the prompt because I just feel under-skilled at it, and I don't know if I'm getting the best value out of it. I'm like, "Can't you do smart things without me having to learn this new language?" [Laughter]
Maggie: Yeah. Yeah. That example that you talked about; I was brainstorming the idea for what I called... Well, it depends on how you pronounce this. I say "day-mons," but apparently "d-mons" is the correct way to say it. But it's spelt d-a-e.
Chris: Yeah. Mm-hmm.
Maggie: Deamons, and this idea is there are little characters that kind of live in your writing app, and you have assigned them characters like "play devil's advocate," "play cheerleader," "play copy editor who just cares about grammar," "play synthesizer to try to make my ideas more concise," "elaborator: try to expand on my ideas."
Maggie: Each of these little characters is reading your text as you're writing it. And if they see something that they can improve or that they can suggest upon (based on their character), they will suggest that you revise it. As you mentioned, there's no prompting required from you in this interface. You're just writing, and these kind of characters are ambiently in the background and very interested in how language models can be ambient supports in systems and not constantly the only thing on the interface is you prompting this text box.
Maggie: It's like I want to be doing other things, and I want them to be subtly supporting me on very specific tasks and very specific helping actions without it being like I'm having to come up with a new prompt every single time because that's cognitive labor I don't want to do. I'm trying to do other things. [Laughter]
Chris: Right. We got used to that a little bit with GitHub Copilot because we're not constantly prompting it. It's just helping us wherever we happen to be, whatever we happen to be doing, which is so cool.
Do I have it right that what happens behind the scenes then is if you've selected some paragraph or something, and then you've clicked on "elaborate," what it does behind the scenes would be like take your paragraph, put some quotes around it, and then send in a preconstructed prompt that's like, "Take this paragraph and elaborate on it like a sophomore in college would"?
Chris: You're still using prompts. It's just however you've programmed it that they're hidden.
Maggie: Exactly. I think a lot of my snarkiness in saying I hate chatbots or being critical of them is that I think, at the moment, developers or designers who are building these language model products (or the early versions of them) are trying to put all the cognitive load onto the user to figure out what to do with their product.
Maggie: They're not being opinionated enough because, really, prompts are something that we as the creator should be writing and crafting and perfecting and testing and fine-tuning on and making really, really good. They're just like, "Code." The user shouldn't have to write the code saying what their app should do. They should just be handed the app, and it should be obvious what the app does, and the app should do that thing really well.
In this new kind of language model world where they're part of our build chain, we have to decide, first of all, what the app actually does. It's not just an open text box. We're going to be like, "Okay, this thing is going to help you write in a certain way, or it's going to help you synthesize ideas in a certain way."
We should hide the prompts from the users. I mean there's a little bit of a debate around this. We could definitely move into a world where users get used to sort of being able to edit different functions, and they could see the prompt and maybe edit it themselves.
Maggie: But the end user shouldn't have to be prompting experts. Then you're just asking them... You want to give them control and agency over how their tools work. But at the same time, are they really going to write a better prompt in the five minutes they spend thinking about it than if you and your team spent a week perfecting this prompt and training data on it and really making sure it does the thing well?
Chris: Right. Right. Hmm... Does prompt seem like the only way you're going to be able to ever ask one of these language models a question? Prompting seems here to stay?
Maggie: Yeah, pretty much. There are a few other kind of ways we can influence them. I mentioned something called fine-tuning, which is when you give them lots of example data and you tell them, "Given this input, here is the ideal output." And so, you can kind of fine-tune the model to respond in the way you would like it to.
Maggie: You could feed in a bunch of poorly written essays and then also give it (in the training data) turning those into grammatically correct, well-written, intelligent essays. It would learn this is the kind of output that they want me to give. Then it'll perform better based on that data.
Dave: Is fine-tuning--? This came up. We had Swyx on the show a couple of weeks ago.
Maggie: Oh, yeah. I just saw him the other day. [Laughter]
Dave: Isn't he great? He's just a fantastic human. But fine-tuning, is that something I do once and then, ooh, I have my own baby model? Or is it something you do every prompt? You're like every prompt I'm sending this expectation back?
Maggie: It's something you don't do that often. It's not like you fine-tune every single time a user interacts with the system. It's something you might (like code) update every now and again. But mostly, you fine-tune it once. If it works, it works. Maybe later you get more training data and you fine-tune it more.
But more fine-tuning doesn't necessarily... It's like diminishing returns. It doesn't lead to infinitely better answers. At some point, you've given it enough examples that it just gets what you want.
Dave: I assume that's part... It sounds like you're doing more with Elicit, but if I were making, Dave Rupert LLC was making, an AI, I would want to fine-tune it to whatever my business is doing, or CodePen's AI would be kind of fine-tuned toward code suggestions or cool effects or something. Yeah? Kind of? CSS-Tricks or something baked in.
Maggie: Yeah, yeah, yeah.
Chris: I'll mention, just because I had experience with it this week, and it was an interesting UI that I guess did it both ways was the new Photoshop (beta). I don't know who exactly and how access is delivered, but I happen to have Creative Cloud installed on my machine, so I just opened that and clicked their little beta apps thing and downloaded it. It seemed to come down for free, and I just popped it open and used it.
One of the tasks I wanted to do was to stretch out an image, the classic. It was a picture of a hamburger, and it was too tightly cropped. I wanted to make it about twice as big on all sides. You specifically mentioned you're more into language models and not the image ones, but as far as interface design is, I thought this was interesting.
I stretched out the canvas, which was just nothing around the outside of the original image. And it has a generative fill button, and you can just click it, and it'll just do it. You don't have to say anything. It's just a button.
But right below the button is a little input, and you can prompt it if you want. It's like optional prompting.
Even after you've prompted it, then it gives you tips and stuff. It takes 30 seconds or so for it to do what it needs to do. Speaking of product design, it's taking that opportunity because it knows you're just staring at the bar. [Laughter] Like, "Hey!"
What I typed in was like, "Please just extend the background normally," or something, which is probably the worst prompt in the world because what is normally. It probably has no idea what I mean. But I didn't know what else to type, so I typed that.
It was like, "You know maybe you should have just typed nothing." It gave me this little poke, like, "Not the best prompt." You know?
Maggie: User education in action. [Laughter]
Chris: Yeah. [Laughter] Yeah, it was. But I thought it was clever that you could go either way. And it got me thinking, "Oh, I like that." It's like a chatbot, but you don't have to use it.
Maggie: Yeah. I bet... Actually, see, I haven't seen much of what Adobe is doing, although I have heard... I used to be a huge... I was an illustrator. I worked in Adobe, like Adobe Illustrator and Adobe Photoshop just constantly.
Maggie: But I haven't seen the tools in so long just because I've moved into such a different space and a way of working, but I hear they're doing really interesting things with UI for generative images and editing and giving people control, not just prompting into a text box to get an image, which seems so strange because that's clearly not the right medium to make an image. Describe the image in words is not how we make images.
I really have been meaning to go look at what they've been doing because I hear that it's actually quite sophisticated or they're genuinely putting sincere effort and time into developing fine-grained tools.
Chris: Yeah. It seems like it. The beta is pretty UI-forward with those tools. Probably on purpose, like, "We know why you downloaded the beta." You know?
Maggie: [Laughter] Right.
Chris: "We're going to be pretty clear about that."
Chris: Y'all should check out Maggie's Dribbble. You clearly know what you're doing in Illustrator. Geez. That's... Wow!
Maggie: It's been a while, though. That's all quite old at this point. I think I stopped... I still illustrate for my own essays and notes, but I stopped professionally illustrating like four or five years ago.
Maggie: It was really fun when I did it, and now it's strange to see all the generative image models come out. It's not like I go, "Oh, I'm glad I stopped," but I just go, "Oh, my God. My job would have been so much easier if I'd have had generative image stuff back then."
Maggie: I would have been so much faster even just generating reference images. I would spend so much time taking photos of my hand in certain positions or having to go get props and take photos of them for reference. I could have just prompted it in this day and age. [Laughter]
Chris: Unfortunately, I can scroll up and down your Dribbble and be like, "Yeah, I know that it's hand-crafted because you did it, but they look a little like they could have been generated these days." Our brains are already broken in that way.
It was really interesting. The team had taken a bunch of my existing illustrations and used, I think, Stable Diffusion, one of the open-source image models, and tried to train it on my artwork to see if it could generate more, and it did. They sent me a bunch of the samples, and it's actually pretty decent. It was able to mimic my style pretty well.
Maggie: There were still artifacts. I could tell I didn't really make it.
Maggie: But I was impressed.
Dave: How does that make you feel? Are you, "That's the way the Web works," or are you like Mike Montero, "F-you. Pay me"?
Dave: How does that make you feel?
Maggie: I was thrilled. I think I'm in a very different position to a lot of the artists and illustrators who are really feeling emotionally negative about what's unfolding right now because it's not my livelihood anymore. And I mostly get excited about capabilities. I mostly go, "That is incredible that we managed to get a computer to train on these images and generate more." I was always more enamored with the ideas than I was rendering the images. I wanted the image to exist, but I didn't really want to spend 20 hours making it.
I use Midjourney a lot now just for fun, like a game, and it feels like the most addictive thing in the world. I love seeing the image just come to life and go, "Oh, my God. I didn't have to paint map for like hundreds of hours." That's so tantalizing.
Chris: That is satisfying.
Maggie: But I have a very different relationship to it.
Chris: When I was dragging out those images in Photoshop, I was like, this is such a weird thing that I'm doing. It's not something like, "Oh, this would have taken me 20 hours before." I wouldn't have done it at all.
Maggie: Right. [Laughter]
Chris: There's no way. I feel like the same in Midjourney when I'm typing in stuff to generate some images and it's just giving me a smile or something.
Chris: It's not that I would have hired an illustrator. It's just that I wouldn't have done it at all, I feel like.
Maggie: Yeah. I know. I'll try not to say because I'm so afraid of, like... Well, I shouldn't be too afraid of stepping on toes. I know it's a very hot topic right now and lots of people are quite concerned about livelihoods and theft and their work being taken.
Maggie: But... But having worked as an illustrator and having worked with so many companies where they wanted illustrations done but didn't have the budget really to get that many made or really couldn't even hire me to do the work just because it would take so long to make, I'm kind of thrilled about what is about to be this explosion of way more people being able to make visuals and put them on the Web, use them to communicate, make their presentations look better, make them more engaging, be able to explain themselves visually. I think we've been very, very visually stunted as a culture because the Web was so text first and the Web became the medium for everything. And so, the only way to communicate was always text-first.
Images on the Web are really finicky and actually not that fine-grained, like SVGs. I know you're both SVG experts, but they're really hard to work with otherwise if you're not an expert.
It's just like it wasn't a Web-first medium, and I'm really quite excited that it's so much easier to make really high-quality images now for everyone. You don't have to train as an illustrator for years to now be able to make gorgeous, interesting, kind of evocative images and explore communicating ideas in images. That's a very controversial statement because morally these--
Chris: A bit. Yeah.
Maggie: Yeah. Yeah.
Chris: I get where you're coming from, though, especially if it changes the whole... It's like if there's twice as less demand for images because these things exist, that's a problem. But if there are 10x more images on the Web, well then maybe that demand comes right back because it's changed the nature of things. But how could you guess? Who knows how it's going to pan out.
Is Midjourney--? I haven't used it in a while. Is it still Discord-based where you type the prompt into a Discord channel?
Maggie: Yeah. It's still very... It feels very startup and scrappy. They have plenty of funding and they're building out their Web app, but you are still prompting through Discord, which feels so... Yeah, it feels so early days or something. You're just like, "Oh, my gosh. This isn't even a real product." But the technology itself, the images it puts out are just incredible.
I have a bunch of favorite illustrators who are dead, so I'm not even stealing their livelihood. They're very must passed on, people from sort of the 1700s and 1800s. But I can prompt all kinds of things in their style and it gives me such a thrill. I'm just like, "Oh, my gosh. They would never illustrate this, but I just get to play around with their beautiful esthetic and get it to just come up with crazy things."
Chris: Oh, that's cool.
Maggie: It's like the best game to me.
Chris: But I wonder. Is that the thought you have when you use their Discord? It's kind of like, "Oh, this is... It's just early days for them. That's why they chose to do this."
I've heard that sentiment but cranked up even. Just like, "What is this, amateur hour? Get out of here with the Discord. What are you even doing? I thought this was a real company," kind of thing.
Chris: All the way cranked to the other side be like, "This is brilliant. It's meeting people where they are. It's a UI you get for free. It has a chat box built into it."
I'm a little closer to that side, like, I don't mind using it in this way. I think this is clever.
Dave: They spend zero dollars a month on their homepage.
Maggie: I'm on the side of being incredibly impressed with how they have used the affordances of Discord to make it possible to manipulate images. They have little up, down, left, right arrows to say, like, "Okay, render the image more to the right," or a zoom in-out emoji.
Chris: Oh, right.
Maggie: To zoom in, so you're clicking an emoji in these responses, and you're like, "Wow, this is a UI in a chatbot."
Maggie: I am very impressed with the creativity.
Chris: Yeah. A UI that we probably already use. That's where the ShopTalk Show communication happens and all that. I thought it was clever. Maybe it can't last forever.
Chris: You probably will reach a wider audience if you make an app, have a full-blown website, et cetera. But it's kind of cool.
I mostly wanted your take because you're an interaction designer. That's your main thing, right? Yeah.
Maggie: Yeah. Yeah. And I'll say the social aspect was... Now the Discord is quite overwhelming because goodness knows how many hundreds of thousands of people are in there just prompting all day.
Maggie: But I got into the beta back in February of last year when there were very few people in there, and it was the most fun, collaborative game because we would see what everyone else was prompting, and we would start to riff off each other. You would see what someone else was playing with and you would start playing with it. You would go back and forth, and it was the most addictive game. I didn't do anything else for like two weeks. [Laughter]
Chris: Yeah. It's educational.
Chris: To me, somehow my prompts end up so short. But all it takes is one stop in there to see nobody does short prompts.
Maggie: Uh-uh. Yeah, they're like books.
Chris: They're 40 words long. Yeah.
Chris: The longer the better, really.
Maggie: Yeah. yeah.
Chris: Because it has more to work with.
Maggie: You just have to write gorgeous, beautiful, wonderful aesthetic, professional photography, octane render.
Chris: Oh, don't forget high res, or whatever.
Maggie: You just put every... high res, 4K, 8K. [Laughter] It is quite funny, but it works.
Dave: I think somebody from our Discord was taking quotes, like from science fiction books.
Dave: Like, "They landed on the planet Zargoth and it was steaming with lava," or whatever.
Dave: They said, "Sci-fi concept art. This quote that I highlighted from my Kindle," and it gave a pretty good one, like a pretty good image. And that's cool. [Laughter] I don't know.
Dave: They don't have the... No one has the time to... I guess, given infinite time, you have the time. But no one has a lot of time to come up with this stuff.
Chris: Yeah. Maybe that was your point, Maggie. If the Web is typography only or words forward medium books... [Laughter]
Chris: Talk about words forward.
Dave: Well, okay. So, not to sour the mood, but there are some ecological concerns. There was recently a study that five Google Bard searches is like pouring a bottle of water on the ground, like a 500-milliliter water bottle on the ground. So, I mean does that impact any of your thinking around AI, like just the general energy used or anything like that?
Maggie: Mm-hmm. So, this is definitely a concern in the industry. It's funny. I've asked people who are much higher up, kind of been in this for a while, what their thoughts are on like the cost of using generative AI or developing it more, especially training the models. Enormous volumes of energy.
It's one thing for us to be prompting on small, everyday tasks. But it's totally something different to train something at the scale of GPT-4 or GPT-5 or other kinds of models.
The answer I've sometimes gotten is they're like, "Well, um, you know, once we develop the intelligence, the capabilities of models in the future, it'll just figure out how to solve a lot of energy problems for us, so this is a self-fulfilling system." [Laughter]
I was like, "Well, that's a bit of a gamble that we don't know is definitely going to happen, so that's a bet." [Laughter]
Dave: Yeah. That's like when my parents ask me, "How are you going to make money at this computer thing?" and it's like, "I'll event a robot that prints money."
Maggie: Yeah, exactly.
Maggie: It's very much that.
Maggie: [Laughter] I think this is actually where the smaller models thing comes in. A lot of the really large models, it's really overkill to use GPT-4 for small tasks, like asking it to write you a little poem. You could probably run an open-source model like Llama locally on your laptop and do that same thing and not be sending a request to OpenAI who is running some huge server farm and using a huge model to do it.
It's more efficient to use smaller models, so there's a good chance that we'll just improve efficiency and learning what needs a big model and what needs a smaller model and save a lot of energy that way. But it's not a total solve. As more and more people use these large language model products, they just are more energy-intensive than normal compute.
Dave: Yeah because I feel like if the larger one is always better, which is what I hear... You know?
Dave: Whatever... 70 billion tokens, that's better, right?
Dave: My bosses are always going to be like, "I want the better one." You know? [Laughter]
Dave: Better one, faster one, or whatever. Yeah, it's an interesting problem. I don't believe this whole, "The computer will solve it." I don't believe that.
Dave: Because I feel like even if we have an advancement in GPUs or the model figures out how to train itself better, we're just going to do it more.
Dave: We're going to find ways to use that energy that we reallocated.
Maggie: Is it cool that the highway problem--? You have a highway that's always busy, and so you add another lane to the highway, and so more people drive to work because the highway is bigger. Then the highway is still always clogged no matter how many lanes you add to it.
Dave: Yeah. Yeah.
Dave: It's exactly that experience.
Chris: If we could dwell on the smaller model thing for a minute, I think it seems juicy to me because somehow the biggest models, you're like, "Oh, those are only the biggest companies, and they're always going to get theirs, and that's a bummer. That's a rich-get-richer thing, and I'll never be able to have a company that can compete on that level," yadda-yadda. Right?
But it's like, "Ooh, if the tech actually gets better, and all I've got to do is feed it one website worth of stuff and get cool answers out of it, that's nice!" I tried to get this out of Swyx when we talked to him, and I feel like I just didn't quite get it.
I knew this one guy, this Luke Wroblewski dude that somehow trained the model just on his own website so that you could go ask it. Be like, "Hey, what would Luke say about the UX of login screens," or something, and it somehow would only answer stuff that seems like something that Luke would say.
I was like, "Well, that's interesting." Is it powered by his website and all of ChatGPT, or is it just you?
Maggie: Yeah, it's going to have been trained on maybe not as big as ChatGPT's model, but it's going to have some base training that taught it how to put together words and sentences.
Chris: Because it needs to know how to make a sentence and stuff, right?
Maggie: Yeah. Yeah.
Chris: Yeah. I see. I see, so a small model doesn't mean small like 50,000 words small. It still means millions.
Maggie: Yeah. But it's like this is going to get into where I do the hand-wavy and then insert some technical details here. But we've been finding some techniques to make the models more compressed or smaller, like we can make the final output size smaller and the amount of compute it takes to prompt that model smaller. I definitely cannot speak to the technical details of that. That's going to be some paper on an archive that I don't totally have saved my Zotero that I definitely haven't read that talks about how we managed to make the efficiency and size improvements. But I do know that's happening at least.
Chris: Nice. Yeah.
Dave: Middle out compression.
Chris: It'd be cool, like, "Give me one that can just talk good."
Maggie: Yeah, right.
Chris: And then I'll just layer my crap on top of that.
Maggie: Exactly. Yeah.
Chris: I did find it fascinating how it's already so advanced. It's just so neat. If I ask Luke's one, like, "What's the best kind of cotton candy?" or something, it'll totally just be like, "Nope. I don't know."
Chris: Which is what you want.
Maggie: Yeah, that sounds perfect.
Chris: Yeah. Right. You've had something to say about that before, too. You want less... I mean I guess the word is hallucination (you've already said) which just means a wrong answer pretty much.
Chris: You just don't want it to make up something. Yeah. I don't know. I don't know what I'm trying to say. Don't do that.
Maggie: Yeah. We've had a good amount of success figuring out architectures where you get the model where, if it doesn't know the answer or it can't find the answer, it just says, "I don't know the answer." We can make models do that quite not easily but effectively at this point. We know how to do that. That is at least one saving grace is we can say, "If you don't know the answer, don't lie."
Chris: That's nice, which is what you don't want in Midjourney or whatever. Like in Photoshop, if I stretch my image out and be like, "Can you extend the background of this building?" it better not just be like, "No."
Maggie: [Laughter] No, cannot.
Chris: "I'm not going to do that."
Dave: But in your squish talk -- which I love the term squish. That's awesome. Way to go. [Laughter] But you said you have this example where you're like, "What's one plus one?" and OpenUI is like, "Two." Yeah, one plus one equals two.
Then you're like, "Are you sure it isn't three?"
Dave: Then it does that thing it always does. It's like, "Oh, yeah. I'm so sorry, bud. Yeah, I was thinking of something else. Of course, it's three." Right?
Dave: Yeah. How do we get it to just be like, "Ah... I have no idea what I'm talking about"?
Dave: Or, like, "You really confused me on this one, Maggie. Great job."
Chris: Oh, boy.
Dave: How do we get to that point?
Maggie: Yeah. It's funny. Examples like that, if you ask any of the big models that question now, if you do the "What's one plus one?" it tells you, "Two." Then you say, "Wait, isn't it three?" They all now say, "No, it's still two."
Chris: Oh, okay.
Maggie: So, a lot of the times these examples get publicized and then they ship a new version of the model and they kind of go accommodate for these kind of bugs. But they can't squish bugs forever, right? There'll always be things where we will trick models in certain ways or be able to do red teaming on them.
Yeah, but it's the way they're designed, right? They just predict the most likely next word (with a lot of caveats there that we've fine-tuned them on certain kinds of responses and certain kinds of ways of speaking), but that is what they're doing. And most of the time, if you're in a conversation with someone and you say, "Wait, isn't that the wrong answer?" then they go, "Oh, yeah. You're right. That is the wrong answer."
Chris: Oh, I see. They're trying to be a good dinner partner or whatever.
Maggie: Exactly. They're being polite.
Maggie: You've told them they need to be polite. You've told them to be not argumentative. We all learned that from Microsoft's Bing debacle. I don't know if you guys saw when that chatbot came out.
Dave: Tay, yeah.
Maggie: Tay or... oh, no. I think it was one of the early Bings. Anyway, it went very rogue. It would just say inappropriate things to users. It would fight with them. It would tell them they were bad users and that it was a good Bing.
Dave: Oh, yeah. Those.
Maggie: They hadn't prompted it well enough to say, "Don't be argumentative. Don't be mean. Don't accuse them of things." [Laughter] "Don't attack them."
Maggie: They now have put all those in, so it's a very polite, accommodating model now.
Dave: What methods do you have in place for testing this stuff? Even on your examples of summarize this?
Dave: Are you just saying if text-content.length is less than text-content.length?
Dave: Or is it something more like... Is it just like, "Is it good?" Do you do any QA? Yeah?
Maggie: We call this evaluation. This is an enormous part of the work. We have just an evaluation team that is full-time on this.
Maggie: I think most language model product companies do where we have humans constantly testing the models and then manually going through its answers and then being like, "Good, bad, good, bad." Kind of ranking it, like, out of ten, how good was this answer? Why was it that good? Then using that evaluation to then fine-tune the next round of improvements or change the architecture to improve the output. This is, I think, just an enormous amount of the work. I don't know if I want to say 50% of the work that goes in at language model companies is evaluation and improvement of the architecture.
Dave: That's daunting for a small team.
Maggie: Yeah, a little bit.
Dave: That was kind of where I was going. My boss, "Hey, AI is hot, baby. We've got to put it in the app." You know?
Maggie: [Laughter] Yeah.
Dave: What's my next step? Do I just put a little chat widget or do some of this refining UI that you have? What should I be looking to do?
Maggie: Right. That's a very good question because I definitely have been DM'd by people who say this is literally what happened to them. They're just like, "Hey, our company is panicking about the AI thing. Should I put a chatbot in the corner?"
I've told them... It's not the easiest advice, but I'm like, "No, you should go find the best product manager at your company, and you should get them on your side. Go get them to help you convince the boss that language models are not any different to any other..." Okay, they're slightly different, but they're not fundamentally different to all of programming.
Computers can do impressive things, and we should think very carefully about what we want computers to do and why we want it to them, and carefully design products in a strategic way based on what the market needs. So, your users are not asking for a chatbot. But they might have needs that could be solved by language models that probably isn't a chatbot conversation. But it might be something like improved search on your docs. Maybe you could integrate embeddings or something with language models to improve your search. Maybe it could be some sort of ambient helper that directs you in the right way during onboarding. But it's unlikely that it's a chatbot in the corner.
Chris: Hmm... That. Yes. It's unlikely.
Chris: Do you hear that, everybody?
Chris: I'm hoping these models can start punting. It seems like the math one to me is like, can't you just somehow recognize that somebody is asking you some math and just punt on your large language model-ness for a minute and just ask the big math model? Yeah.
Dave: Go back to calculator.
Chris: Switch to calculator mode.
Maggie: Yeah. Thankfully, we can do this now.
Maggie: There's a new technique. Sometimes it's called prompt chaining -- we call it composition at Elicit -- that totally does this (or agents) where the model gets the request from the user and it does an observation step or a reflection step where it goes like, "Huh. What is the best way to answer this question?"
Then we give it a set of tools that it knows it has available like search the Web, calculate, run Python script, that kind of thing. It picks the best tool for the job. It's very good at doing that. And then uses that tool to help answer the question, so it has access to a bunch of external tools that make up for all its weaknesses. Actually, these systems work great. There's a lot of promise in these.
Chris: Yeah. It gave me hope when OpenAI announced their one that's like, "Listen. Our model stops at this date, but you can check this checkbox that enables the "go search the Web" mode - or whatever."
Chris: I'm like, "Yeah! Do that."
Chris: That sounds good. But yeah, math mode seems like one that could be all kinds of stuff; different, little models that are better at answering really specific things that seems like it would otherwise get wrong in a language model.
Well, that's cool. I like the reflection mode. That's nice. Good naming opportunities here in AI land.
Dave: Observe, reflect, capture.
Chris: It must be cool to be working somewhere that has a clearly good, useful, monetizable idea. Like this, drop 1,000 PDFs onto us and we'll give you obviously useful output. That's great. Instead of like the rest of us where we're like, "How do we get AI into this thing?"
Chris: Is there any money here or not? To have a really clear, cool idea to be working on is--
Maggie: Yeah, I can take no credit for that, but our founders are some of the most thoughtful, intelligent people I've ever met, which is why I joined their company. When they explained kind of the vision to me, I was like, "Wow! You are like 200 steps ahead of the rest of us, and I will get on your boat. Sure."
Dave: Man, this is maybe launching. We don't have time for this. But I do wonder. How do you do sourcing of materials? Not all PDFs are created equally. What if it's the first study on feeding lead to babies is from the 1800s, and it's not OCR-able or whatever? Is that something y'all deal with then?
Maggie: Yeah. The way we pull papers in is from a service called Semantic Scholar, which is a very popular, large database of academic papers. They do a lot of the heavy lifting of quality filtering. Then they have a great API that we just call.
Then we're able to do things like filter for citation count, reputable journals, journal score. We at least make those visible to the user and allow them to sort and filter by them.
Maggie: You know the old-school classic. We have a big table. You can do all of the standard things to it. No crazy chatbot kind of paradigms here. That kind of takes care of a lot of helping academics filter out low-quality papers or stuff that's not very valid.
Dave: Very interesting. Well, I'm looking forward to trying Elicit on my goal to become the next Malcolm Gladwell.
Chris: Oh, I like it. You should just train it on only Malcolm Gladwell.
Dave: Top five papers on getting good at tennis. Thank you.
Maggie: Yeah, you could totally go ask that. [Laughter]
Dave: Then it will give me that, and then I'll write a bookskie. Great! We're done.
Maggie: [Laughter] We haven't quite implemented that Web flow, but we will soon. [Laughter]
Chris: But, yeah, we managed to not even talk about websites at all, even though that's another specialty of Maggie's. You should check all that out at maggieappleton.com, including the incredible section. She has a bookshelf like you, Dave.
Dave: Love bookshelves.
Chris: I don't know if you--
Dave: You've got a good bookshelf.
Maggie: Harry Potter websites.
Chris: Dave, you should maybe... This is a copyable idea of the anti-library, which is a second of... [Laughter] Not of books that you dislike but, as Maggie says, "Books I like the idea of having read."
Chris: Which is very nice.
Dave: Oh, that's good. I like that.
Chris: Not everybody has an anti-library but visit maggieappleton.com to enjoy that one.
We can't just pivot this whole thing into Web tech, but there is--
Dave: Digital gardens. You run a digital garden. You don't call it a blog. You don't call it a website. You call it a digital garden on purpose. That's wonderful. Why is it a digital garden? Is it just that sounded cooler? [Laughter] Makes you more money?
Maggie: It does sound cooler. Yeah. Apparently, it became a buzzword. I was maybe implicated in making it a buzzword but did not mean to.
The metaphor is you grow the content of your website slowly over time so you are not publishing finished, polished, fancy blog posts. You are putting up loose notes and, later on, you come back and clean them up, add to them, eventually turn them into a more respectable piece of writing. But it takes all the pressure off being perfect and shipping everything kind of neat and tidy in the first round. You just treat the whole thing like a slowly growing garden.
Dave: It's wonderful. It's one of the best sites, so I will give it that accolade for sure. Great job. We appreciate it.
Chris: Would not have pegged it as a Next.js MDX site. That was a surprise to me, looking at the....
Maggie: It is.
Chris: Cool. Cool. It is, indeed.
Maggie: Overengineered. I will not claim that I have done this site efficiently or with the correct technologies. I did them with the ones I knew how to use, which I really wish was something cool like Svelte. But was sadly React and Next and super over-stuffed. [Laughter]
Chris: I think it's pretty cool.
Dave: Yeah. Hey, we can't always... [Laughter] Yeah. I've definitely got myself into a situation... unmaintainable situation with the old blog, so sometimes it's the best thing.
Dave: All right. Well, we should probably cap it here. Maggie, thank you so much for coming on the show. For people who aren't following you and giving you money, how can they do that?
Maggie: Oh, I'm not a money-taker, but I guess I'm just an attention-taker. Maggieappleton.com, I guess, is the best spot. I still am on what I'm going to call Twitter, but some other people call other things. My handle is @Mappletons. I still use it even though it's morally questionable at this point. But yeah, I'm always happy to chat to people or DM me if you have questions or things you want to chat about. No money needed.
Dave: Perfect. Well, thank you very much. Thank you, dear listener, for downloading this in your podcatcher of choice. Be sure to star, heart, favorite it up. That's how people find out about the show.
Follow us on X or Twitter or Mastodon. I have credentials again, so I can tweet there. Then head over to join the Discord, D-d-d-d-discord, patreon.com/shoptalkshow.
Chris, do you got anything else you'd like to say?
Chris: Hmm... ShopTalkShow.com.