#66 From Will Smith to Meta's MovieGen: How AI Video Got Real. Plus Claude 3.5’s “Computer Use” & Open Source Tools Artwork

DataTopics: All Things Data, AI & Tech

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

All Episodes

DataTopics: All Things Data, AI & Tech

#66 From Will Smith to Meta's MovieGen: How AI Video Got Real. Plus Claude 3.5’s “Computer Use” & Open Source Tools

October 31, 2024 • DataTopics

Send us a text

Welcome to Datatopics Unplugged, where the tech world’s buzz meets laid-back banter. In each episode, we dive into the latest in AI, data science, and technology—perfect for your inner geek or curious mind. Pull up a seat, tune in, and join us for insights, laughs, and the occasional hot take on the digital world.

In this episode, we are joined by Vitale to discuss:

Meta’s video generation breakthrough: Explore Meta’s new “MovieGen” model family that generates hyper-realistic, 16-second video clips with reflections, consistent spatial details, and multi-frame coherence. Also discussed: Sora, a sneak peek at Meta’s open-source possibilities.
For a look back, check out this classic AI-generated video of Will Smith eating spaghetti.

Anthropic’s Claude 3.5 updates: Meet Claude 3.5 and its “computer use” feature, letting it navigate your screen for you.

Easily fine-tune & train LLMs, faster with Unsloth: Discover tools that simplify model fine-tuning and deployment, making it easier for small-scale developers to harness AI’s power. Don’t miss Gerganov’s GitHub contributions in this space, too.

Deno 2.0 release hype: With a splashy promo video, Deno’s JavaScript runtime enters the scene as a streamlined, secure alternative to Node.js.

Speaker 1: 0:01

All right, you have taste in a way that's meaningful to software people. Hello, I'm Bill Gates. I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.

Speaker 2: 0:20

I'm reminded, incidentally, of Rust here Rust this almost makes me happy that I didn't become a supermodel Cooper and Nettix Well, I'm sorry guys, I don't know what's going on. Thank you for the opportunity to speak to you today about large neural networks.

Speaker 1: 0:38

It's really an honor to be here Rust Data Topics.

Speaker 2: 0:42

Welcome to the Data Topics podcast. Welcome to the Data Topics.

Speaker 1: 0:45

Podcast. Hello and welcome to Data Topics Unplugged, your casual corner of the web where we discuss what's new in data every week, from movie generation to clicking boss, everything goes. Check us out on LinkedIn Twitch I don't know if x we're still there um feel free to leave a comment or question or send us something on datatopicsdatawithio. Today is the 25th of october of 2024. My name is marillo. I'll be hosting you today. I am not joined by Bart as soon as he's away. I'll just start with all this.

Speaker 2: 1:38

But I am joined by a good friend, friend of the pod as well, vitale, thank you. Thanks a lot. How are you? I'm actually very happy to be be here. Thanks for inviting me. Thanks, always a pleasure to come and join you guys yes, yes, no, it's always.

Speaker 1: 1:52

It's always a pleasure to have you vitale, so really happy that you're here with a lot of uh cool stuff to share, um, how you've been actually. I feel like it's been a while. I feel like you had some uh updates if you want to share yeah, it has been a while.

Speaker 2: 2:05

I feel like you had some uh updates. If you want to share, yeah it has been a while.

Speaker 1: 2:11

Um, it's okay. It's the same space.

Speaker 2: 2:13

You can go friends here, it's okay no, I, I don't know which kind of dates, but uh, yeah, everything is okay everything is okay, I'll leave it at that, then I'll leave it at that, but I'll leave it at that.

Speaker 1: 2:25

But okay, what do we have for today? Maybe let's see here Meta MovieGen. What is Meta MovieGen? I guess it's not like a, it's like a meta topic in a way. What is this about?

Speaker 2: 2:41

Yeah, lately I got really interested into generative ai models for a video generation because I think after I don't know how many years, after gpt3 and chad, gpt is already three years more or less, maybe, maybe three years it goes by, so fast it goes, it goes super fast, yeah, but now we are not super impressed anymore. If you know there is a new model from OpenAI that is 0.2% better than the previous one, we are like, okay, fair enough.

Speaker 1: 3:15

Yeah, we almost expect it at this point. It's like a new iPhone, right Like yeah, okay, it's a bit better, it's a bit bigger, it's okay.

Speaker 2: 3:22

Definitely, until it's a bit bigger. It's okay, definitely, and, and until I don't know, they will release probably gpt5 and it will be artificial general intelligence. We don't know yet, but so far it has been that the world of llms is uh, I don't know less, I think, at least for me, exciting than the past. Before it was way more exciting. Why do you think that? Because I think now models are reaching a level where it's almost impossible to distinguish the previous version to the new one. Maybe you can feel that it's a bit better than before in certain tasks. Sometimes it's the same, sometimes it's a bit better than before. In certain tasks, sometimes it's the same, sometimes it's even worse. So you need to find the edge cases where it's really wow, really way better than before, while when we had, for example, gpt-3 compared to GPT-2 was another planet.

Speaker 2: 4:25

Yeah, yeah, yeah and also the same I will say with gpt4. Gpt4 was uh, okay, fair, I can accept this. Of course, let's wait again for gpt5, maybe it will be again do you think it's like changer?

Speaker 1: 4:40

but uh do you think it's exactly like iphones? Like from one iphone the other you don't see. But also, at the same time, though, to be fair, if you skip a few generations, you can notice a big difference between iPhones. And I don't know if it's the same between models, right, like if you skip a few generations, maybe you're going to look like, okay, I didn't see the incremental difference. But if you take a few back you can see more clearly.

Speaker 2: 5:03

No, definitely, definitely. Maybe it's just for us because we are in the field, so we see constantly new things happening every week. So, we see all the intermediate steps between one big thing to the other.

Speaker 1: 5:15

Yeah.

Speaker 2: 5:16

But I didn't say this because I want to discredit the work of OpenAI or other.

Speaker 1: 5:22

Sam, if you're listening, you know Vitaly is talking smack here?

Speaker 2: 5:26

Exactly no, it's just because I think there are still fields where we have better impressive results from a few months ago to now, for example, fields that are still, let's say, unexplored or where we didn't have the real big thing yet, and, for example, video generation is one of of these because, uh, there are many companies working in the field, some commercial companies, some well-known companies like OpenAI. They announced their model, sora, already a few months ago, but Meta is the first, let's say, big company investing a lot of money on the field and releasing Well, it's not released yet the model yet, but they are already disclosing all the information around this model, the results and also how good it is compared to the others, with some benchmarks. They are releasing also datasets to compare video generation model to have uniform benchmarks. So I found it really interesting and I got excited about topics. That's why I wanted to explore it a bit with you so real quick.

Speaker 1: 6:51

Um sora, the opening I model that you mentioned. I put it on the screen for people that are just listening. Um, this is not as new, right, but I remember well it's still impressive, right? I guess it's like that fatigue effect, right? Um, this is from how? How long ago was this? Do you remember?

Speaker 1: 7:08

I think it was uh spring, beginning of summer 24 so, yeah, basically they in the in the page that I'm sharing I actually haven't used it, but just a small recap. Um, it's pretty impressive. There there were a lot of rumors, let's say, that they're using game-generated videos to help with creating data. It's pretty impressive in terms of, yeah, looking at the physics, all these things, but there were a few glitches. Let's say that, like, I think this is what I'm sharing on the screen now. I think it was appearing, uh, puppies or wolves or something, someone running backwards on the treadmill and stuff like that, but it was very, very cool. And now the one that you're talking about now is from meta, which is what's the name of this model movie gen.

Speaker 2: 8:01

Movie gen, yeah yeah, it's actually a family of models. Um, I simply want, I would like to share with you also another video. It's a comparison between what we did, what we used to do, in 2023 and now 2024. Because I don't know if you remember the video where Will Smith was eating spaghetti. It was the first AI video generated with a powerful enough machine learning model it was a bit disturbing, but you had the feeling of a person doing something performing an activity yeah on the screen and now, when you see the new models, there is like a wow effect, because we

Speaker 1: 8:48

went from this to oh my god, we are getting closer and closer yeah, I think in the beginning hold on, I'm just taking the video here. I think in the beginning that was a bit of uh, oh, wow. Yeah, if you kind of look you can kind of see a person's hand here like whoa, this is crazy.

Speaker 1: 9:05

And now I'm gonna put the video that you just shared let me just make sure there's no audio for the people that are, and we'll put this all on the show notes for people that you just shared. Let me just make sure there's no audio for the people that are, and we'll put this all on the show notes for people that are just listening. So this shows the ah 2023. So it's not even that old. And then you see that the the face is uh, yeah, it's looking back now. Yeah, looking back now is like crazy yeah exactly, exactly this is 2023.

Speaker 2: 9:28

Yeah, it's not that long ago and if you check, for example, even if you go back to the sora web page and you see their demo videos, they are crazy like, yeah, the first one, yeah, this one yeah, so what is insane?

Speaker 1: 9:44

what are? Can you describe what you were seeing for people that are just listening?

Speaker 2: 9:47

yeah, basically it's a woman, is a person walking around in a street that resemble tokyo. I will say, yeah, so full of lights and neons, and, uh, you can see all the lights, all the lights reflected from the water on the ground.

Speaker 1: 10:06

Yeah, so the scene is really dynamic yeah, and also the lady's wearing sunglasses and now it's zooming her face and you can see the reflection on her sunglasses and it looks like it reflects the actual streets right, exactly Like the perspectives and all these things, and it's pretty crazy yeah. Like the level of detail. It's impressive.

Speaker 2: 10:28

It's insane. It's insane, it's impressive, it's insane. It's insane and because, for example, we have now a lot of machine learning models to generate images, and there they need to. Of course, images can be represented by a matrix. We can imagine it like a canvas, right when pixels are what we actually see, so the color on the screen, and it should have some special, let's say, uniformity, because, uh, you should preserve the context.

Speaker 2: 11:05

If a person should be there, okay, you know that eyes are here, then you have your nose, your mouth, so, um, it's a sort of spatial implementation of the knowledge of a person, and you should have a context only in space, because you simply see a frame of the scene, for example, while for video is way more difficult because you have the spatial context, the spatial uniformity, but also during time, because if you generate multiple frames, you need to keep, let's say, the same context across time. So, for example, the person there was walking in the street, so it was turning around. If you had, for example, before some lights in the background and then the lights were reflecting somehow in the sunglasses, the model needs to remember that something was there, for example. So you need to preserve this information, not only in the space, in the pixel space, but also during time, and this is way more difficult and it's still an open challenge. But it seems that OpenAI and Meta are getting closer and closer to the solution for that?

Speaker 1: 12:25

And is this paper of family sorry, this models, or family of models, right From Meta? Is this open source? Because I know LLAMA is also a very, very good general model. I guess that is also it's open source, right? It actually became a go-to for a lot of people, right? Exactly?

Speaker 2: 12:48

So is this also open source? No, llama 3.2, I think, is the best open source. Now. Lama 3.2, I think, is the best open source model. This family of models is not open source, ah no, hopefully it will in the future. But what is crazy before, for example, the release of Sora, and this is that now I have all the information about what they did, the training data, the training process, the model. So it's not open source, but it's at least open knowledge, while open AI basically that should be open is not open at all.

Speaker 2: 13:25

If you take back, for example, the Sora webpage, for example, they have a report blog or technical blog or something where they say, okay, here we train the model, we do stuff, the model does this more or less. It's fine, it's magic, it's magic. Just trust us, it works, it's cool, exactly so. You don't know which training data they used.

Speaker 1: 13:48

Did they?

Speaker 2: 13:48

use a YouTube video. Maybe Did they use 3d generated videos, for example, from game engines, we don't know.

Speaker 2: 13:57

While there, if you go to download paper yes, this people from meta, yeah, created a 92 page report paper with a lot of technical exactly details and I think for the community, for the research community, it's really important to know what they are doing and how they are doing, Not only to say, okay, maybe here you're doing something wrong, but at least to help to give back to academia, to all the other researchers working on the same topic and publishing results, intermediate results, that are leading us to this kind of models.

Speaker 1: 14:45

Really cool indeed and for people that are just listening as well, we'll put all the links on the show notes. There is the paper and there is also accompanying. Just listening as well. We'll put all the links on the show notes. There is the paper and there is also accompanying blog posts as well, so something that's more easily digestible for people that are not as into the research part. And there are very cool video snippets, like there's a baby hippo and now it's a lady holding something with a parent in Greece, but the videos are very, very, very high quality.

Speaker 2: 15:10

Yeah, they are incredible and we are discussing about a model, but it's not correct, exactly correct, because in reality, um, they are, they trained a family of models, movigen, the biggest one has, uh, 30 billion parameters. Nowadays, more or less, we compare everything to, I don't know, gpt models that have 175 billion parameters, so it's relatively small compared to large language models, but yet you need a big, let's say, gpu cluster to train it. They also specified their GPU compute to train this kind of models and also to run inference on these models, so probably won't run on the on the laptop but, it's really interesting.

Speaker 2: 16:07

So they released the three models the biggest has 30 billion parameters and then they used this as a sort of foundational model to create other three models specific for certain tasks and activities, because one of the tasks is video generation, generation. So text to video, you can describe something like via natural language and then you will get a video as output. But then they also created models for precise video editing. For example, you already have a video if you put the page again, the presentation page again. For example, there is a guy running and then they describe like okay, now put um, blue gadgets.

Speaker 1: 16:54

I don't remember what but this fit this page.

Speaker 2: 16:57

Or for the yes exactly if you go, if you go down. Uh, I think, indeed, uh, yeah, for example, you can personalize b, for example.

Speaker 1: 17:08

There you can say add uh, change the style or add elements to the to the page, and then they will edit your video, the input video, with your elements, and so basically you have a video and then like that, so for the people following again on the audio, there's precise, precise video editing and they're just showing basically a whole bunch of examples, right? So you can say so, this one is a penguin in the in the ice, I'm surrounded by ice. Basically, you can say I'll dress the penguin with a nice dress and then you will understand what you mean and change the pixels, basically the pixel values of just the part that needs to be changed. Right, this is, this is very, very impressive. So there's also a very big component of text understanding Exactly.

Speaker 2: 17:54

Actually in the paper it's very well described. So I will invite you to listeners to take image model. It's easy to find the captions for images already online, or maybe it's easy to create, for example, a caption for a video, for for an image, and then you can feed. You can train your model to take another similar caption a short one, 10 words for example and generate a consistent image, while for video is way more difficult because you need a lot of information to generate video captioning good enough to represent the same information you want to see on the screen with the video, because the main idea behind all these models is that the concept of a penguin, for example, that we saw before.

Speaker 2: 19:01

It doesn't matter how you represent it. You can describe it by word or via an image, but the concept is the same. So the model is trying to understand what is a penguin and what does it mean a penguin walking on ice, for example. So you describe it with with text, you describe the image, you create a representation of this information and then the model can generate back, maybe with some variations, something different. So, to do the same with videos, you need more textual information to let the model understand. Okay, this group of frames. The meaning is this yeah, so that you can describe it later, and to do that they used llama to enhance the video captions, for example.

Speaker 1: 19:49

Okay, to be a bit more descriptive so llama already had some understanding of images, because I actually I think I saw on the landing page for the ai and meta that they do talk how llama is multimodal, and multimodal meaning it's one model that can serve for many purposes, right. Different modalities meaning text video. Uh yeah, I'm not sure about video, but text images and etc. Right exactly yeah, here the the big word. One simple to 90 billion parameters. Um, this is very cool.

Speaker 2: 20:23

So they created a pipeline to, basically because, uh, the idea behind the scientific idea, let's say the formula, the mathematical formula of what they want to optimize by training this model. It's relatively simple to understand, but then the actual implementation is hard and they need to create some workarounds, some tricks, some you know handcrafted methods in order to train these massive models. And it's really interesting to read the paper because you see also the challenges of these big organizations and what they did to. They basically used a lot of resources.

Speaker 1: 21:05

Yeah.

Speaker 2: 21:05

I can imagine, To overcome these issues. For example, for me it was impressive to get to know the cluster they used to train the MovieGen models. They used, I think, 6,100 NVIDIA H100 GPUs, so the latest one.

Speaker 1: 21:27

Okay.

Speaker 2: 21:29

Each of them consume 700 watts of energy, so you can have a general idea how many let's say millions of dollars this requires. Yeah, also the work of all these people managing the infrastructure, coding this, collecting the data, and now they are writing everything nicely in a report paper and I hope that they will release the model as well.

Speaker 1: 22:01

Yeah, well, because do they have any proprietary models this far?

Speaker 2: 22:06

Not yet, not yet, but at this stage they are working together with, let's say, creatives, with artists. Oh, okay, in order to give these models to professionals? I would say so. I'm not sure. Are they going to create some sort of product around that, maybe for uh, you know these people that needed for work, or it's just that they want to test the model with experts and then release it to users?

Speaker 1: 22:45

I still don't know but then, if it does, it will be another product that meta offers can be.

Speaker 2: 22:53

Unfortunately I live in europe, I couldn't try. No, unfortunately. Thankfully I live in europe. I couldn't try meta ai, so their platform for, uh, all their generative ai models? Because, uh, it's still blocked for people living in Europe, so I don't know, but it could be that it will be a feature there. Maybe you need to. I don't know. Maybe you will have X amount of minutes of video every day and you need to pay something to get more. I don't know A bit like OpenAI does something to to get more. I don't know a bit like open ai does. Is it really important to? To mention just to finish, because this seems amazing and uh, yeah, maybe tomorrow we can create a movie from start to finish or, if you don't like, your whole podcast.

Speaker 2: 23:44

I will podcast exactly, but uh, there are still some limitations. For example, if you look, you will see some artifacts in the video and also the maximum length at the moment is 16 seconds. So it's not super long, because also it takes a lot of resources already to generate this 16 seconds of video and also it can be really dangerous because one of their models we have the precise editing, but also you can somehow add some context, add some, let's say, conditions to the video generation model and, for example, a condition can be your picture.

Speaker 2: 24:26

So you can say, a man joining a podcast with my face, so we'll probably generate a person with my face there.

Speaker 1: 24:36

I see.

Speaker 2: 24:36

And you can already imagine how dangerous this could be if they release it in the wild.

Speaker 1: 24:43

I think also there's the elections in the US coming up soon. I feel like it's.

Speaker 2: 24:47

Yeah, the deepfakes are already here. Yeah, the deep fakes are already here and somehow, especially for people that are not aware of these, technologies are impossible to to to identify. So imagine if we even have something even more powerful.

Speaker 1: 25:03

So yeah, maybe also to uh illustrate what you're just describing now. So this is a they call, I mean, on the blog post. They put it here as personalized videos and the idea is that they have a put a picture of a woman and then they basically change it to have her as a, as a dj on a pink jacket, spin records and all these things. So, yeah, it's very, it looks very easy to manufacture and I think, if it's something that you see, it's more appealing, right, it's. I think people have a higher tendency to believe something is real if there is a video of it. Right, a realistic video, right.

Speaker 2: 25:39

Exactly exactly.

Speaker 1: 25:40

But do you think that's enough reason for Meta to be cautious and not open source?

Speaker 2: 25:43

this. It's hard to say because, as AI enthusiast, I would like no, no, no. Please. Open source it as soon as possible, give to people the opportunity to play around with it, improve it so that we can go faster towards. You know, a very nice tool. We can do a lot of cool stuff. For example, I'm not a creative person, but sometimes I don't know. I have some ideas. So for people like me, it will be important to quickly describe what I mean and then generate an image, text or audio, for example. But I also understand that there are people that can take advantage of this entry code.

Speaker 1: 26:29

So it's the world with everything I see, the word yeah, yeah, so true, curious to see and maybe if they do release it, they do open source. That I'll be very curious to. Maybe you can give it, take it for a spin and tell us what was your experience exactly.

Speaker 1: 26:45

You can come back here one other thing related to this. So we've talked a lot about Yama, so open source models, and there are other ones as well. One thing that I kind of see a bit more there's like Ollama. I don't know if you ever heard of Ollama, maybe I'll put it also here, and I never used it.

Speaker 1: 27:04

So I'll start with that. But what I understood from Ollama is that it's basically a way to run models locally. So what I heard as well is that it's from the same people that created Docker. So you have like a model file, so it's really analogous to Docker, right? So basically you have a model file and then it makes it very easy for you to run locally. Exactly as you mentioned, yama models are really big, so it's not something you can probably just run on your machine, and that's where I came across this framework. I guess it's called Unsloth. The tagline is easily fine-tuned in training LMs Get faster with Unsloth, and it's a super cute mascot by the way, super cute.

Speaker 1: 27:49

Super cute and the idea, then, this is for people that want to fine tune your models, right. So right below it says train your own chat GPT in 24 hours instead of 30 days, and then I think it also quantizes the models and all these things. So faster inference. So, since you're fine-tuning your data, get better accuracy, uh, less data usage, and this is also it is an open source project here. So, going on the github page, you can see here fine-tune for free. Llama 3.2, 3 billion parameters. There's a notebook here, so they also claim that it's two percent faster and 60 less memory usage. Wow, right. So this is really like you take one of these models open source models and then you can kind of, since the models are open, the weights are open, you can start from there and you can fine tune for the test that you want this to be optimized for.

Speaker 2: 28:40

That's crazy.

Speaker 1: 28:41

So I haven't tried this yet, but this does look. It looks cool, but yeah.

Speaker 2: 28:50

That's what I like, for example, from open source AI, because it's true now, let's say, even universities or common people like us cannot train the model right. It cannot be as good as what people at meta and open ai are doing. But uh, when they release it open source, there are crazy people in the world that they will take it, study it, improve it, make it faster, more efficient and release it again. Yeah line.

Speaker 1: 29:22

So I think it's very cool yeah, the way I also see is like there's a bit of a pendulum, like there's research that kind of pushes the boundary of what's possible and maybe the meta models and even though, well, I wouldn't say open AI because it's not open source, but and then the pendulum swings back and say, ok, we know that this is possible, but people can still not use it because of operational things. Right, maybe they don't have the hardware, maybe they don't have this, maybe don't have that. And then this is for an example right of how to make these things more accessible. Another popular one, from a while ago already, is the llama cpp.

Speaker 2: 29:57

I think this was the first, uh actually, example of this.

Speaker 1: 30:01

It was crazy at the time yes, maybe for the people that don't know what this is, do you want to?

Speaker 2: 30:05

we have a quick uh tldr uh, yeah, well, as I, as I see it is that, like meta released the lama, they open sourced it, even the first model. Yeah, they have a nice repository, but, uh, for example, you needed to clone it, have the right nvidia gpu uh, you need a lot of resources. Then there was like a superstar programmer I don't know how to define it. Yeah, this guy was like no worries, I can take care of it. I got it. Challenge accepted yeah, now this model will run on your uh, let's say, model will run on your, uh, let's say, macbook or laptop, whatever, or not potato. So they used the. Basically, they created a new project, uh, called lama cpp, where they basically recreated the full api, rewritten in c++, to take advantage of low-level hardware instructions in order to speed up the model inference. So make it accessible to more and more people.

Speaker 1: 31:18

Yeah, I think so, the first project. I think this is the guy that did it. So this is the star programmer, georgi Gerganov, I'm not sure, probably butchering his name For the people following. We have his GitHub profile and he also did the Whisper CPP, which was something similar. Whisper was think audio to text, right, so basically he just ported this to be able to run on your hardware, basically right.

Speaker 1: 31:46

And yeah, and I think he also did some other tricks like um, quantizing models, right. So, and as I understand, quantization is like you have a whole bunch of numbers, the numbers are floating point numbers, right, um, so it's like 3.01272256, whatever, um, but maybe you don't need all those digits after the comma, so you can just kind of make it shorter and then it's have faster inference and you use less memory as well. So it was a whole project about it. I think was one of the first ones, like you said, to really made a lot of noise, because now you can run your own hardware, you don't need special, you don't need cloud and all these things, and yeah, that's kind of how I see the pendulum going right, which I think is really cool. I think it relates a lot to MLOps and I don't have to tell you MLOps is an ambassador of MLflow, one of the leading MLOps doing in this space, right, but I think that also excites me, you know.

Speaker 2: 32:39

Yeah.

Speaker 1: 32:40

How we can make these things more practical, right, I think, yeah, maybe it's a good way to put it. Research makes it possible, and then these initiatives, which I guess it is a different branch of research, makes it what do they say? Practical, so possible and practical.

Speaker 2: 32:56

I like the pendulum analogy and also this sentence. Let's note it down, you're going to steal it. We can reuse it for something else.

Speaker 1: 33:05

Yeah indeed, someone's going to be like oh yeah, I heard it, it for something else. Yeah, indeed, Someone's going to be like oh yeah, research makes it possible. It's like I know what you listen to.

Speaker 2: 33:15

But it's very cool, let's make t-shirts.

Speaker 1: 33:18

Data topics unplugged.

Speaker 2: 33:20

Exactly and in the back. Research makes it possible.

Speaker 1: 33:25

So, yeah, this is only possible for open source models, right, things that we can actually see through end to end. But there are some models as well that are not open source, but they're super cool, and this week, actually, there was a model that was released Claude 3.5, sonnet or Haiku.

Speaker 2: 33:45

Yeah, the full family, I think. But that wasn't very Claude. 3.5 or no? Cloud 3 yes, cloud 3, so it wasn't 3.5 yeah, yeah, uh, what is this about? Because, uh, I think there is a war happening in us between, uh, let's say, these kind of big companies that are releasing LLMs and software around LLMs, and it's funny because every time there is a new release from OpenAI that claims to be better Anthropic is like I can do that too, yeah.

Speaker 2: 34:21

Just wait another month. Yeah, so indeed, they released a new family of models iQOO or Sonnet, I don't remember, is the largest one. Yeah, it's the Sonnet. Yeah, and they claim to be better in all the usual benchmarks that we all know now than GPT-4.0. Yeah, a lot better right. A lot better, way better than Geminiie. Jamie is like the brother that nobody, nobody cares.

Speaker 2: 34:53

It's like the, the ugly duck yeah the black sheep of the family is like it was like embarrassing like it's together bro they put it in the benchmark, just for you know just yeah, it's google still right, so maybe just to I think even if tomorrow they do something crazy like the best model ever, people are like yeah but I also think, yeah, for Google, even when they released a model as well, that the demo looked really cool.

Speaker 1: 35:20

But then it turns out it was very, you know, it was just marketing, you know.

Speaker 2: 35:23

So I think they lost a lot of credits, right? Yeah, it's true.

Speaker 1: 35:31

They're trying, but uh, yeah, but so this model quad 3.5 sonnet, that's the the bigger one. Then there's haiku. Um, what, from what they're showing, is actually much better, right? So maybe an example, like high school math competition, of gpt 40 has, gpt-4.0 has 9.3% and Cloud has 16%.

Speaker 2: 35:50

Exactly.

Speaker 1: 35:50

So it's not like a 1% or half a percent. It's a significant difference apparently, right? Actually I don't remember where I saw. Ah, yeah, for code. So the zero shot 90% for ChatGPT-4.0 and 93%. So apparently, according to this benchmark, if you're doing, if you're a developer like we are, this is the model that you should be using. Dave, are you using this already? I think you're. I tried it yesterday quickly.

Speaker 2: 36:21

Yeah.

Speaker 1: 36:22

Amazed or yeah, a new iPhone Next year I'll buy another one.

Speaker 2: 36:25

It was good, but like new iphone, next year I'll buy it. It was good, but uh like nothing because it's I don't know 0.2 better than previous one, and you don't feel it right, it's not. It's not like, yeah, exactly this code reproduced it's 0.2 better.

Speaker 1: 36:38

Now 0.2 better, right like it'sa yeah, you, you kind of feel it, I think yeah okay, but um, and maybe it's for the people that are also not familiar, so ChachBT is from OpenAI and Claude is from Anthropic, which is a very big player in this space but I want to be fair and we have to say that in this comparison they didn't mention the O1 model from OpenAI. But do you think they should?

Speaker 2: 37:10

They should reason a bit better.

Speaker 1: 37:12

But do you think they should? No, because I feel like, if this doesn't do reasoning, right Because the O1, like, for example, for code completion, if you have VS Code, you're not going to have the O1 model. Yeah, that's also true. Right, like you're typing and then halfway typing it freezes and it's like oh, let me think about that. You know, it takes like 10 seconds. And it's like did you want to say x? It's like, yeah, okay, you know. So I'm not sure if it would be a fair comparison, but it is true, I think for some of these tasks right, like a high school math competition, whatever, like these more analytical things that you probably could afford to wait a bit longer for a response. Maybe you could also compare that.

Speaker 2: 37:51

But, um, yeah, sorry, and I didn't mean to interrupt you no, I wanted to ask you because I think the real juicy reveal was something else from yes, I was gonna say this is not the only thing, right, so they have um computer used for automating operations.

Speaker 1: 38:08

Is this the actual name? Ah, yeah, so we're also introducing and for the people just listening, I have the announcement post from Anthropic about these models, and on the second paragraph they have we're also introducing a groundbreaking new capability in public beta computer use. And then they go on right, I'm not going to go read the whole thing, but computer use is what? Basically? They are enabling the model to click through your computer. Right, it has access to what's visible on your screen and then it can kind of send operations, you know, like oh, click here, click there, type this, do this, do that right, click here, um, which looks cool but also weird. It's also weird and it's also a bit like I don't know, are you going to just trust it? You know, it's like you know it's a responsible use.

Speaker 2: 39:14

But uh, I'm not sure what what this means for them right, like uh.

Speaker 1: 39:17

It's like trust me, bro, um, maybe I'll, while we're talking here and I have the page I'm gonna put, hopefully that we won't get triggered for showing this um, and I also saw a video from uh. Maybe I can actually show the fire ship. They actually showed uh quick snippets, oh yeah. So maybe this is a bit of a for the people following. Again, it's like you have the google sheets and then you can also ask like do this, do that? You can give tasks and you will kind of figure out, click through, create formulas, input text and then come out with your spreadsheet as you want for. On the fire ship youtube video I don't know for people that don't know, fire ship, it's a very cool youtuber uh that talks about like tech, um topics.

Speaker 1: 40:03

I guess it's like a lot of javascript, a lot of programming a lot of programming stuff, not as focused on data ai, but it did cover that and he did try. So one of the tasks that he gave to this model was to get the SVG for his YouTube channel, and first thing he did was on a landing page. And then he shows that he finds that there's a Firefox icon and he clicks on it and then he goes to the Fireship website and then he sees the logo. Then he right clicks the logo and then he goes to the fireship website and then he and then he see the logo, then he right clicks the logo and then he gets the svg. So it looked very complete, it looked very cool.

Speaker 1: 40:34

but he also says that a lot of the times he just crashes okay and he also mentioned that because so, as I understand, you have like the input, which is like the screen kind of, and then you have the model, that kind of processes, and then he has an action and then there's another screen and so basically there's a loop right, and it keeps iterating until either it crashes or it completes the task. So he also said that you spend a lot of money on this. Wow, okay, like a lot of money. That's what he was saying, right, especially, I guess, if you don't find the answer right, you just kind of keep. Then it just crashes.

Speaker 2: 41:11

but how do you feel about this, vitaly Buff we were discussing about this during lunch yesterday and people were like, oh, this kind of technology will take our job. And I was like, finally.

Speaker 1: 41:31

I was like damn, I was tired, I need a break anyways.

Speaker 2: 41:37

No, but on a serious note, I think there are situations where it could be useful. I'm thinking to, for example, companies that sell the software I was called, like robotic process automation, something like that. If you have repetitive tasks in your laptop, in your computer, and you don't have a proper way to automate them, maybe you could use a technology like this to automate a boring process over time. True, while for daily activities for you know the normal usage of a laptop I think it's useless. Also because, I don't know, maybe it's just my point of view, but I enjoy to use the laptop of it. So if there is a software they're doing stuff for me.

Speaker 2: 42:32

Okay, what should I do? I don't know, these are my first impressions, but of course, um, let's see how it goes. Because, uh, I remember last year I think it was in december when there was a company releasing the rabbit 01 software that he was supposed to automate everything, if you want to listen, when there was a company releasing the Rabito One software that was supposed to automate everything if you want to listen to Spotify. It had some actions, the large action model, all these kind of things. In the end it was a bit of a scam. So I'm curious to see if Anthropic is doing something different, like how to use these language models to understand the right action to do and execute it properly on the system. I think this can be useful to know.

Speaker 1: 43:19

I'm curious for sure.

Speaker 1: 43:22

Well, I also think it could be very powerful. I think, again, they're kind of pushing a bit the boundaries, right. By the way, one thing if you're going to run this this, I would recommend you run it in the docker image, which is also something that is possible, right, so you don't have the the thing having access to your laptop and whatever files and whatever, um, so you can restrict it a bit and safeguard it. So that's what I would recommend, uh. So I think, yeah, indeed, they're trying to push a bit to this. Like you have an environment it's almost like, I think, a bit of like reinforcement learning as well how you have an environment that reacts to a certain action and then you have your agent that kind of takes another action and then you have to kind of reach your goal Very analogous, right. It's not exactly like that.

Speaker 1: 44:00

But I also I wonder if this is a bit of a hacky way to do it, because, from what I saw, it really looked like it was a screenshot of your screen, right, which I'm not. That's a bit of impression I got from the videos and the announcement and all these things. But I wonder if there will be a more proper way to do it Because, for example, everything that you really do on your UI right on your desktop, if you open a folder, you could do this on a terminal, for example. Right. If you're navigating the the the web, you can also do it like in headless mode, right, and I know it's not like for us, it's not nice, but I would imagine that's a better.

Speaker 1: 44:41

It's better for machines I guess right yeah, and I would imagine that even if you were to do these things, if you wouldn't be better to kind of have something that doesn't use the UI. I don't know, for some reason I feel like people do this because that's how we reason and that's what was there, and it was just easier to just kind of do that.

Speaker 2: 44:58

But I wonder if the performance wouldn't be better for understanding and maybe won't need as big of a model to do these things, if you actually have a machine representation yeah, I see what you mean of this environment, you know, yeah, um, yeah, I'm not sure.

Speaker 1: 45:15

I'm not sure if also the performance would be better or not. I mean, granted, there's probably a lot of work that you need to do, right, because now you need to understand like a headless web page and you need to do a bnc, but I, I'm not sure still cool, definitely cool, but I feel like it works.

Speaker 1: 45:32

Yeah yeah, yeah, curious to see as well if there's gonna be any story in the news you know like oh, this guy deleted a proud database because he was using the computer. It will happen, for sure what do you blame?

Speaker 2: 45:44

yeah?

Speaker 1: 45:45

exactly. It's like oh, it wasn't me, blame entropic. You know, they said he was a good bot, you know yeah, yeah, that's cool but, I don't know if AI is trying to exterminate humanity.

Speaker 2: 46:01

I think this could be the first step. Right, yeah, right, it's like. Oh yeah, just give it, it's fine, it's cool bro, it's fine, just just.

Speaker 1: 46:07

Just.

Speaker 2: 46:07

Just, let me just give me, let me just click exactly, give full access to your laptop just got it, yeah it's cool.

Speaker 1: 46:17

Well, yeah, yeah, yeah, let's see.

Speaker 2: 46:19

Let's see, in a few years we'll, we'll know exactly, but it's interesting to see how, in the end, we are not changing, uh, our human to machines interface, but we are trying to adapt ai to to use them yeah, a bit.

Speaker 1: 46:37

Yeah. Indeed, I'm not sure if all yeah for the rabbit one, for example. There are other um ai devices that in the end they kind of just look like an iphone and I'm wondering as well if we kind of just optimize all that there was to optimize the interface, like I don't know what would be a better device, like an AI phone kind of right, like the iPhone is already, touchscreen is already any layout you want. So, yeah, not sure. I'm not sure how to make a bet, do you?

Speaker 2: 47:07

think we will see a better device anytime soon.

Speaker 1: 47:17

It's difficult, right, like, uh, I don't know, it's like I remember there was the human pin that I guess the only difference between that and my iphone is that, that one you could clip on your shirt and it just stays with you. Um, but yeah, right, like if you have an iphone, it's like a hand-sized device that is touch screen. So, whatever you know, you want to have a button here, you can create a button here. Yeah, you want to do that, then you do that. I'm not sure if there's a lot, uh, a lot of wiggle room there to optimize I don't know.

Speaker 2: 47:42

I'm curious to see how the new meta glasses will evolve yeah, they really say right like um maybe there can.

Speaker 1: 47:51

They can add some ai to that is true that is true somehow, but I'm not sure but I think it also because, yeah, because it's like a ray-ban right yeah so I also think that needs to be a bit more stylish. To be honest, what was the so one picture was like? It was even a joke I think it was I don't remember Saturday Night Live or something that it was Mark Zuckerberg with the glasses and then they made a joke that he looked like the Minion ah, yeah, yeah you know, but they look already better than what we got before, like yeah, the Google glasses

Speaker 1: 48:29

were weird, or like I'm glad that someone is doing it, but I wouldn't wear it in public yet. Like exactly so. I put a picture on the screen, you know. It's like it's super bulky still. I mean it's cool. But I saw videos of people using the vision pro in the? U in the us I think it's new york. They're just walking with that on the street, you know, and it's a bit weird. It's like it's a bit like uh, I don't know, black murray.

Speaker 2: 48:51

You know it's a bit strange, but also their commercial was weird because, uh, sometimes I think commercials for ai are super weird. Like, uh, with this, uh, apple device, the guy was taking pictures of his daughter during a birthday party and, okay, it's cool that you can take pictures with your VR device, but your daughter is looking at you and you have a weird thing in your face. Or another weird commercial was when they announced Apple, where, when they announced the apple intelligence, so one guy was walking around he met another person. The person had the dog, so the guy, instead of asking to the person, for example, oh murillo, this is a cute dog, what is, uh, you know?

Speaker 2: 49:44

I want to know all the characteristics of it. He took a picture. He asked full intelligence, like in front of the dog owner. It was like okay, you can simply ask the guy, right?

Speaker 1: 49:57

so it was like for the extreme introverts right like the guy.

Speaker 2: 50:03

It's funny because if there is a follow-up of the commercial, yeah, you can pet the dog, it's fine, I don't want to kill you.

Speaker 1: 50:13

It's okay, I can talk too Exactly. Maybe one of the last topic. Then, before we call it a pod, you mentioned like this weird AI advertisements. I did talk about Deno a bit while in the past, which is like let's call it a Nodejs 2.0 from the same creator, and Deno 2 was making some noise, right, and they had a lot of expectation and actually they did release it. But not only they released it, but they released it with a marketing video I'm putting here for the people following.

Speaker 1: 50:45

I'm putting this on the screen right now with no audio, but it's basically the actual creator, um, of dino, with the same creative note that he just kind of talks about, and how dino is supposed to uncomplicate javascript.

Speaker 1: 50:58

You should make things simpler with, like, uh, typescript, javascript, all these things. So the video is really, really funny. I think it was like the best, the best uh framework marketing video kind of thing, right, like there's a lot of jokes, there's a lot of yeah, like even the thing with Bun. You know that they take a little like the jab at Bun, which is a competitor, you know, and they have some I don't even know if they're actual developers or they're just actors, right, but like they just kind of show everything. They also talk about how Node is not safe, and then, as he's saying that, the guy puts a usb on his laptop and some people come and like rip his shirt off and steal everything. So I thought it was, it was really cool and I I think tech would be way more popular if all the marketing videos, the release of new versions, were like this.

Speaker 1: 51:45

So it was amazing it was really cool, I think, especially if you're in the javascript space.

Speaker 2: 51:52

I think um you probably get a lot of stuff in the end.

Speaker 1: 51:55

We are simple person, like programmers in general, so if we hear jokes about, uh, what we do daily, yeah, it's funny for us I think it was, yeah, even the style of the video and all these things Like there's even one like what was it In the end? Some girl asks something and then it's like yeah, and it's like like super cringy, you know, it's just like smiling yeah, exactly, yeah, this one, right now, everything.

Speaker 2: 52:29

Yeah, there are even jokes in the background in the background right, because it's like node.

Speaker 1: 52:33

But then you start the letters, then it becomes dino right.

Speaker 2: 52:35

Also, I think they were doing something with a string in javascript there. Ah, and you need to I don't remember if you if it's about using an external library or something and dino comes with a very powerful standard library, so every time you need to do something, it's already there, basically, and they were displaying in the background a weird line in javascript to do something super simple yeah, I saw that looks really cool. For me it worked. After that I like, let me try this Dino framework.

Speaker 1: 53:10

That's it. I'm a JavaScript developer now.

Speaker 2: 53:12

I used it a bit for a Slack bot before.

Speaker 1: 53:16

Ah, yeah, sure they are using this API.

Speaker 2: 53:19

And it was not as good as now. You need still to when you need, for example, an NPM package. It was not so straightforward, but now Now it's fully compatible. They don't use node modules and all the rest Also. The configuration is very simple, so I like it.

Speaker 1: 53:40

One thing that I saw and now, like in Python world, uv is making all the noise. I also drew parallels in my mind. I know that Deno is also a runtime, so it's different. But one thing that I thought was cool is that from the JavaScript or TypeScript code they can create a binary that runs on the different architectures. This is something that I really like Python to have, so if someone from UV is listening, I would really leave a suggestion there what do you mean?

Speaker 2: 54:11

give us, give to the developers more details.

Speaker 1: 54:13

Yeah, so, for example, uh, so a binary is basically like uh, ones and zeros, right, and it's just an executable. So you just have it on your machine and you click and it runs in python. You don't have that you have. Basically, you still need to send the code with the interpreter, right, so you send the instructions with the thing that can read the instructions and execute in your machine. But sometimes it's a bit of a pain. So if you have a CLI tool and CLI is the command line interface, right, so it can be the AWS, it can be Azure, it can be Google Cloud A lot of these things are actually written in Python, right? So basically, whenever you're running a command on the terminal, there's actually Python code that gets executed, interpreted line by line.

Speaker 1: 54:53

But if you change the default Python that you're running, you may have issues, which that actually happened with me. I installed the latest Python and my gcloud CLI tool broke Because there was some standard library things that they didn't support. So basically, you always have to kind of keep track of these things and then you have, like, pym to manage all these things and you have PipX to have a different Python version. So, like, basically, people figure out a way around this, but I think the simplest solution would just to just have a binary right. Or, for example, if I want to have simple desktop desktop application and I want to send it to you, vitaly, and you may I think you have a different Mac than I do I need to send Python. You need to have installed the same Python that I have. This is for, like, the same version but for a different architecture, blah blah, and I think it would be much nicer if you can just distribute these applications with a binary.

Speaker 2: 55:43

Okay for multiple platforms, like Multiple platforms.

Speaker 1: 55:46

Yeah, like you do, uv compile whatever, which is what DL does, win64 and, yeah, it can be Linux, the ARM can be, this can be, that can be Windows. If you had something like this, I think you'd be really, really cool. Yeah, that's cool, I would be a happy person.

Speaker 2: 56:02

And also today we discussed about UV.

Speaker 1: 56:05

Yeah.

Speaker 2: 56:06

You need your calendar like the video, yeah the company is.

Speaker 1: 56:13

Let's see, there's a lot of stuff there. There was some other things like lock files that we won't have time to get into today. But yeah, hope that things For Python world. I hope that things keep going in a better direction, because the packaging started today in Python is not very Do you see in Python a similar situation as in the JavaScript world?

Speaker 2: 56:35

Because in JavaScript it's the reality. Everybody wants to be okay, we will release this framework whatever to fix the problems of all the other frameworks, and then there is yet another framework whatever to fix the problems of all the other frameworks, and then there is yet another framework, basically.

Speaker 1: 56:53

So now we have.

Speaker 2: 56:54

UV to be the package manager.

Speaker 1: 56:59

Yeah, I think yeah, but that's why I hope that one at least becomes the mainstream right, and it's a good one. Because I feel like Poetry becomes the mainstream right and it's a good one, cause I feel like poetry was the mainstream for a while, but poetry I had. I'm not as happy with poetry as I am with other tools, for example, but then, yeah, like you don't want to have 30 tools right To choose from. So let's see, I'm a. I'm a, I'm a new project. I will use UV.

Speaker 2: 57:30

Also, let's say in a professional environment or just for yourself, Okay.

Speaker 1: 57:35

As long as yeah. If I have teammates that don't like UV for some reason, I will try to convince them, but that's the only thing. But I was using Rai. Actually, I still see one benefit of using Rai over UV. But okay, everyone's going towards UV, I'll just go towards UV. Let's, you know, try to. Let's just be on the same page. We'll see.

Speaker 2: 57:56

You're still in better days. We want someone from UV to discuss in this podcast.

Speaker 1: 58:01

I'll work on that, but I think that's all we have time for today. Thanks, vitaly. Thanks a lot.

Speaker 2: 58:09

Thanks for joining.

Speaker 1: 58:12

It's always a pleasure to have you with us and, yeah, maybe you can come back after you. Try Cloud Click, click, click or no, what was it called? Was it Meta or both? Try both, both.

Speaker 2: 58:21

If Cloud doesn't do anything weird and sends some emails to my boss and they fire me, I will come.

Speaker 1: 58:26

You get yeah, you get called by the Secret Service.

Speaker 2: 58:29

They arrest me. What did you try to do? With the FBI server.

Speaker 1: 58:37

All right, buddy, Thanks a lot. Thank you, and thanks everyone for listening, for following. I'll see y'all next time. Bye, you have taste in a way that's meaningful to software people. Hello, I'm Bill Gates. You can put it there if you want. I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.

Speaker 2: 59:06

I'm reminded, incidentally, of Rust here, rust. This almost makes me happy that I'm reminded. It's a rust here, rust. This almost makes me happy that I didn't become a supermodel, cooper and Netties Boy. I'm sorry, guys, I don't know what's going on. Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here. Rust Data topics Welcome to the data. Welcome to the data. Rust. Rust Data Topics Welcome to the Data. Welcome to the Data Topics Podcast. Ciao.

People on this episode

Ben Mellaerts

Host

Murilo Cunha

Host