DataTopics: All Things Data, AI & Tech

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

All Episodes

DataTopics: All Things Data, AI & Tech

Generative AI

March 01, 2023 • Dataroots

Send us a text

Datatopics is a podcast presented by Kevin Missoorten to talk about the fuzzy and misunderstood concepts in the world of data, analytics, and AI and get to the bottom of things.

We are back with a new episode, exploring the fascinating and rapidly evolving field of generative AI. In this episode, your host Kevin is joined by three experts in the field: Murilo, Tim, and Vitale, to discuss the future of generative AI and its potential impact on our world. The conversation delves into the ethical considerations, the challenges that need to be overcome for it to reach its full potential, and the impact it will have on people's lives. Kevin and his guests also explore the exciting possibilities that generative AI presents, such as its potential to transform the way we work and live our lives. Join Kevin, Murilo, Tim, and Vitale for a thought-provoking discussion about the future of generative AI and its implications for society.

Datatopics is brought to you by Dataroots

Music: The Gentlemen - DivKid
The thumbnail is generated by Midjourney

SPEAKER_03: 0:00

Welcome to another episode of the Data Topics Podcast, a forum where we explore concepts popping up in the world of data and try to see what they are all about. Today, we'll be discussing a topic that has been making big waves these last weeks, generative AI with high-profile examples like ChatGPT, DALE, and Today I Learned BARD. Joining us today are Vitaly Murillo and Tim. Welcome. Can you briefly introduce yourselves for our listeners? Hold up, before.

SPEAKER_01: 0:26

Before we carry on with this. Can you care to explain what is that?

SPEAKER_03: 0:32

Yes, yes, yes. Okay, fair enough. So actually this is what today's talk is about. It's a very specific generative model where you can upload basically one minute of yourself speaking and then it can make you say whatever you want. So it's a cloned voice thing. So I thought it was very... appropriate to uh open today's talk

SPEAKER_00: 0:57

very meta kevin is on another i know level

SPEAKER_03: 1:01

and that's your voice then that's my voice that's my clone voice yeah

SPEAKER_00: 1:07

if you sounded a bit like lex freeman's like you know like i was like oh god he's not my name you know

SPEAKER_05: 1:14

okay cool i was gonna say it you sound a bit more american and like you conditioned it on podcast person or something like that.

SPEAKER_03: 1:21

I was paying attention to my pronunciation. Okay. Sorry, Kevin.

SPEAKER_00: 1:27

I think Tim is trying to say that Kevin's English is not that good. He has a thick accent and it didn't sound like him. We'll move on from this. We'll move on from this. But maybe I can already introduce myself since I'm on the mic. My name is Morello. I'm a tech lead on the AI business unit at Data Roots.

SPEAKER_05: 1:42

I'm Tim. I'm a machine learning engineer

SPEAKER_00: 1:44

at Data Roots.

SPEAKER_02: 1:45

Yes, and I'm Vitale. I'm also a tech lead in the AI business unit at AdWords, and I'm a machine learning engineer.

SPEAKER_03: 1:52

Awesome. But so yeah, why today's episode, right? So high profile cases like ChatGPT have been making headlines the last couple of weeks since it was released in December. So we could not not talk about it. But I wanted to explore because of course, this is part of a larger family of generative AI models. So the idea is to kind of take it a one step back and look at generative AI as a whole. But so that's also maybe a place where we can start, right? So generative AI, what is it according to you guys?

SPEAKER_02: 2:26

Yes. So we have, let's say, a standard definition. Usually in machine learning, we distinguish between, for example, supervised, unsupervised learning, then classification, regression, and so forth, reinforcement learning. But More in a higher level, we have discriminative models, so models able to classify the information you are bringing to the model. So, for example, an image classification model distinguished between cats and dogs, for example, a very common example. Then we have generative models, so the term generative AI. This kind of models are able to create new data, potentially new original data after being trained on a large dataset with a specific, let's say, scope, a specific meaning. So, for example, we saw a lot of new generative models in 2022, GPT, GPT-3, DALI, so GPT to generate text, DALI to generate images. Now we have ChatGPT, which is very hot at the moment, so we will see more and more of those examples in the future. And for me, I think the main idea, I read a few blog articles from OpenAI, which is the pioneer on this area. The main idea is to train models able to understand the world we live in order to generate new samples that are coherent with the world we live, right? So images which are coherent with our reality, so not random pixels or almost correct images, but realistic images, for example. So I think it's a very nice idea, very nice intuition.

SPEAKER_05: 4:24

I mean, I think that's a really great way of defining what a generative AI is. The way I understand it is maybe a bit from a statistical perspective. So the simplest way to describe something, data in the world, is to use a statistical distribution, for example, a normal one. And generative AI or chat GPT is essentially thousands, millions of those types of distributions interacting with each other, understanding the relationships between one variable and another. And so ChatGPT is a model that captures a lot of the real world distributions. And because of that, you can pull stuff from that distribution saying, hey, okay, this part, this variable is fixed. So for example, I want to learn more about cats. Tell me something about cats. And then it will pull from its huge understanding of the world, its distributions that it's learned to create something new for you. So that's my more layman sort of perspective.

SPEAKER_00: 5:19

Yeah, I mean, I agree as well with everything that's been said, but just maybe one point to highlight that this is a very goal-oriented way of classifying models, but the way that they actually work is very different, right? Like if you're having a model to generate images, it's very different from something like ChatGPT, right? And I think the first ones that I can remember, and maybe correct me if I'm wrong, was with GANs, with... adversary networks yeah exactly i think that was one of the first first ones that like started to to make more noise yeah and i think from there i mean i think there was a lot of focus on images and then there were transformers and then with transformers they started creating some more nlp stuff and now it's just really exploded like now it's like you see you cannot go on LinkedIn, go on Twitter, go anywhere, and everyone's just like, oh, ChatGPT, ChatGPT.

SPEAKER_03: 6:11

Or BARD, I read that. That was one of the news articles, I think, yesterday. Google announcing they're introducing BARD as a competitive suggestion solution to what Microsoft is going to try and achieve with ChatGPT.

SPEAKER_05: 6:29

Yeah, exactly. Vitaly, when you were saying that OpenAI is the pioneer in this area, I could hear some people at Google screaming. Yeah. because they've been working on it for a while.

SPEAKER_03: 6:39

Based on papers they wrote as well.

SPEAKER_05: 6:41

Yeah. But I mean, it's a joint effort and OpenAI has definitely taken the lead here. So I agree.

SPEAKER_02: 6:49

Well, it's one of the pioneers, right? I think their merit is about democratizing this kind of models, right? Because For example, I have already a question for you. We heard about the Google engineer a few months ago really having a strange experience, let's define it like this, by interactive with Lambda, the model probably behind BART, indeed. BART, sorry. So I felt a bit the same when I tried for the first time ChatGPT because it was able to be clear be correct, be aligned with what I was trying to ask. So I felt a bit like that engineer. I was like, okay, here is something huge happening.

SPEAKER_00: 7:42

Vitaly is going to break his computer and look for the person inside, you know, he's like,

SPEAKER_02: 7:45

where are you? Yeah, but it was just to say that, yeah, true, Google, of course, basically invented Transformer, if I'm correct, a paper from them, or other... let's say, technology used in this kind of models, generative models, but OpenAI was able to mass distribute the access to those interfaces and it allowed people to try it out, to check what is it about and the potentials, the limitation, of course.

SPEAKER_03: 8:17

And people seem to be quite understanding for the fact that it's not perfect. I mean, we've all seen people finding glitches and putting that on Twitter or LinkedIn. But I think on the contrary to what was done before, where it's either a chatbot or it was... It was very deliberate in terms of its purpose. I have the impression this one kind of... I've seen so many different examples by asking it to write code, asking it to write articles, asking it to review something. And so it seems the most multi-purpose model... that we've come across, that most people have come across.

SPEAKER_05: 9:00

Is this the iPhone moment for AI then?

SPEAKER_03: 9:03

Could be, I

SPEAKER_00: 9:04

don't know. Maybe,

SPEAKER_05: 9:05

yeah.

SPEAKER_00: 9:06

I just wanted to, because you mentioned glitches on ChatGPT, but I think when you say glitches, I don't know if it does justice because I feel like most of the glitches, they're very convincing still. You know, it's not like, I don't know, I think most of the, I mean, majority of cases that I see is like, you ask about something and he gives you a very detailed, very elaborate, and like, it sounds very rational explanation, but then when you actually go and check it out and see if it makes sense, it doesn't, right? So it's almost, I was reading on, I think for us, we play with code a lot on ChatGPT, another thing that you mentioned, right? And there was one guy that he's trying to learn a new programming language and he was using both Copilot and ChatGPT. And he was saying that he sees it as a delusional, conspiratory teacher, you know? So like you ask and then most of the time he's like telling you like, yeah, this, that, but then he just, from time to time it goes on these delusions that you always have to like, yeah, but are you sure it's like that? You know? But I thought it was a very good way of looking at it, you know? I think it's a very impressive tool and I think it's so impressive that we almost have to be careful, you know? I think people that are not as much in the field, you know, it's very easy to get fooled and just say, oh no, I believe what they say, you know, it must be true. It's like machine, it's all of that, whatever. And it's not, quite like that right and I think another thing you mentioned that is very multi-purpose and I think that's one of the things that made ChatGPT so popular I think was very accessible to everyone you know and like it's a well put application as well open to anyone anyone can go sign up and just start talking to it you know so I think it also made it super popular but I wonder if this is the way forward with industry applications for generative AI

SPEAKER_03: 10:54

what do you see as an alternative

SPEAKER_00: 10:56

so tailor-made things, right? So, for example, I imagine that ChatGPT is related to Transformers. right and if you go there are pre-trained models right so usually you have you train on a lot of data and then you fine-tune for your task meaning that you use a lot of data that is not labeled you can just take from anywhere like books i think bert is a big popular model it was trained on like fill in the blank tasks with books so he trained on the whole wikipedia all the books and everything and then by doing so we kind of try to learn semantics we try to learn you know grammar like what what things make sense and what don't and then once you have that you can actually transpose that information into classify if this sentence is a happy or a negative one, you know, if receiving emails. So that's kind of how I think was the big step in Bert was how to leverage this unlabeled data, you know, to do transfer learning. And now... So you can have this pre-trained model, but some people also further pre-trained on domain specific things, right? Like medical data, legal data, because if you open Wikipedia, it's going to be probably very different from if you went to read a medical report, right? And I do feel like that's more useful. That's more, they're probably better. It's more reliable. I think that's more suited for industry. And I wonder if something instead of ChatGPT, you can have a find two model for writing emails for you. You know, I think BARD, I think is going to be more catered towards search, right? Because I think you probably cannot have these hallucinations, right? If you're doing a search engine thing. So I think that may be, I mean, I think it's too early to tell. I wouldn't have predicted the popularity of ChatGPT. So don't crucify me if I'm wrong. But my guess is like, something more tailor-made and having multiple of these ChatDPT-like versions or flavors, it will be something more industry-suited.

SPEAKER_03: 12:49

Industry-grade. But then, does that mean... Because this is a foundation model, right? So it's a basis that others can build upon. And you can still make more tailor or more specific models based on those foundation models. So doesn't it serve the purpose of being a foundation model

SPEAKER_00: 13:09

then? I think so, but I think... it wouldn't be as popular because I think ChatGPT, I mean, before ChatGPT, there was GPT-3, you know, that I think we got very excited. But if I talk to my friends back home, my family, no one cared, right? Like, oh, what is this? Oh, this is really cool, you know? But now, no, everyone's talking about ChatGPT. And I think... And I think the popularity is because it's general purpose, you know? So maybe we wouldn't be this hype. And I mean, yeah, now ChatGPT is super popular. Microsoft bought it for, I don't know how much. And now Google, like a couple of days later, you know, like you really advanced the industry, you know, like people are investing a lot of money in this. Like, yeah, I think the Yann LeCun, right? They released their version of ChatGPT. And yeah, I mean, I think it's still more useful, but I think in terms of advancing... this area of AI, maybe having this big general purpose things because more people are talking about it attracts more money, you know.

SPEAKER_03: 14:04

But is this going to make, I mean, there's so much attention going to it, which also somehow translates to there's so much money going into it. Does that mean that that 23 is going to be the year of generative AI?

SPEAKER_00: 14:17

I wouldn't, I would be more specific than that maybe even. Because I mean, again, generative AI is very broad, right? And I think it seems like it's very NLP focused, right? It's very like chatbot oriented or something like that, you know? I think it's going to die out quickly as well, right? Because I think now they have this shiny tool, like, oh yeah, you can write emails with this. Oh yeah, maybe you can do search engine. Oh, maybe you can do this, maybe you can do that. I think there's going to be a lot of applications, but then I feel like Generative AI as a whole, there's a lot more potential, but there's not as much attention, right? So I think maybe like chatbots or NLP generative models, something along those lines. But

SPEAKER_03: 14:58

a lot of money does mean a lot of effort will go into continuing to evolve the field. It might be a kind of a jumpstart or something that will... I know the jumpstart because it's already there, but I mean, it might be a booster that... advances the field faster than it would have in...

SPEAKER_00: 15:16

Yeah. And I think also the attention, right? I think more people are talking about it and you have more people putting their minds, ah, maybe we can use this, maybe we can use that. So you have way more applications to it as well, right? I think if people weren't talking so much about it, people wouldn't be thinking of all these things that you could do in such a fast pace.

SPEAKER_03: 15:34

Tim, you wanted to add something? Yeah, absolutely. So

SPEAKER_05: 15:38

I think you touched upon some really good points here. And I just want to take a step back for a moment and say that ChatGPT is not really just about search or even chatbots, but it's much, much more than that in the sense that it can really transform and change anything we do with information. It's about, a lot of our jobs are based on processing information. Yeah, exactly. And then curating that information in a way, doing it efficiently, presenting it in a certain way and all of those things ChatGPT can make at least easier, a lot easier. And at best, maybe in the future, completely automate. And because of that, I actually have a few predictions. And one of them is really, really extreme, but we'll get to that later. I think I agree with Muriel overall. I think that Google and OpenAI and probably some other companies that are sitting on similar models, you know, the big vertical stacks like Amazon probably as well, they're looking at how to monetize this. And in the short term, simply because of the capabilities you need to run a model like chat gpt the huge amount of compute that these big vertical stacks are sitting on they can they can monetize it right now but it's it's really difficult to start unbundling existing services for smaller companies because they just don't sit on the absurd amount of gpus computing units uh All the infrastructure needed to run a model like this efficiently and still make money. But what is really interesting is that the cat is out of the bag. Everything or a lot of it is open source. The open source community can create some of these models themselves. We already have a lot of the ingredients. And so what's going to be really interesting is to see just how fast the open source community can start making ChatGPT truly open source. Start recreating that infrastructure. experience, not just of general purpose, but also the accessibility of not needing to have a degree in prompt engineering, having to learn how to manipulate the model into doing what you want, but having this, well, I'm sorry to say it again, iPhone moment of just saying, hey, do this, and it will just exactly do what you want. But I believe that we can get there. And there's a lot of really cool initiatives already ongoing. And so once we have that, it's going to have a massive impact in many ways that are difficult to imagine right now. But Let's talk about that in a bit some more.

SPEAKER_00: 18:01

And just to, because you mentioned open source as well. And one thing I have seen, because also I mentioned that there are pre-trained transformers already available for people. And one thing I have seen, I think end of last year, they were doing some overview of cool open source packages. And one of them was basically how you can take these big models and build applications with it, right? So I think... I mean, it's still a model in the end, right? So if you want to retain state in between messages, right? So if you asked about a product, for example, and then say, oh, how much does it cost? But it, you know, you have to somehow retain that information about the previous message, you know, and all these things, and it gets more complicated. And I already seen some open source tools that are catered towards this, taking this, well, I mean, what we call large language models, right? So these pre-trained models that are huge and just kind of like put them together and assemble into nice applications, right? So I think that's something that already is starting to happen at least in open source. But I also think that we are going to start seeing more, ChatGPT is going to be an API, you know, that you can call and you have to build your application around this. So really like AI is a service, but then you build the stuff around it as well. And I think that's also something that will, I mean, people are going to take advantage of, basically.

SPEAKER_03: 19:18

Because if you look at the applications, if we dive into how it can be used and what are some of the stuff companies are already exploring, the list is quite impressive because you have, I think, one of the most Prominent examples are the things that come to mind now is obviously what Microsoft is going to do with it. And probably, indeed, you will be assisted in writing your emails and you'll be assisted in maybe even creating your slides and whatever. And so they'll make that promptable and accelerate your work. So in terms of, I think, productivity is definitely going to be one area where these models are going to help us become faster, more efficient. Writing code, I saw some examples where they asked it to write the code for a website and it just kind of auto-generated a template, boilerplate, something they can start from, but then you could elaborate from it.

SPEAKER_00: 20:15

Also, just Google Copilot, GitHub Copilot also is a good example of something like that. You know, that there are examples that you write a function, you write the little description of what it does, and then it just kind of pre-fills a lot, right? And I can see a future where, yeah, you write a subject on your email and then you kind of pre-populate some stuff, right? So just...

SPEAKER_03: 20:36

So yeah, productivity is definitely going to be one area. I think I saw another one, which I found actually quite fascinating as well, was about enhancing. So basically the generative part, if you take, let's imagine a good old 1995 movie, I don't have one in mind for a moment, but whatever, if you want to scale it up to a 4K movie, you're missing a lot of pixels, you're missing a lot of granularity, you're missing maybe even frame images to be able to render it at, I don't know how many hertz. So where these models can help you complete the images so that you can more easily scale up to those quality levels, which then translates its two examples in the medical sector where they're going to try and improve medical imagery using the generative models to try and filter out some of the wrong parts. And then the one that I found actually quite mind-blowing as well, again, in the medical industry where they were going to try and enhance MRI scans to CT scans. So because CT scans, they require some radiation, which is not the most healthy thing to be confronted with. And so apparently there's a way to upgrade MRI scans, and you need several then, but then to make it a CT scan. I thought, I mean, if you just look at some of the examples, it's pretty mind-blowing. And it's not all chatbots. It's not all DALI and just image generation. It goes...

SPEAKER_00: 22:08

But even the... I've seen the upscaling the pixels of images as well, but I've seen this a while ago. So this is not something terribly new, but I think now... because of the hype, people are already thinking more of the application. I mean, I also think that, especially for the computer vision things, it takes some time for people to kind of come up with applications and like actually seeing the value in it because in the beginning, I saw it because it was cool. And then you start seeing, ah, yeah, you can use it like this. I think I saw something on game design even. I've seen, I think, also putting colors on old pictures as well. So I was like, ah, this is pretty cool. Now you see some editing, automatic editing of images and all these things. So I think it's pretty cool. And I think... I think it's exciting to see what is...

SPEAKER_03: 22:59

All

SPEAKER_02: 22:59

the different applications. Exactly, exactly, exactly. There are no limits, right? It's just using imagination, trying to get more data as possible related to a problem. Indeed, to manipulate, to shape the task, the realities involved in a particular task and recreating it using the AI. So... For example, another nice application about, let's say, image enhancing and generating new pixels. Now it has been integrated inside Nvidia GPUs, so gaming GPUs, a technology to create frames between two different frames in two different moments in the game. So the effect... is to have more frames per second, so have a smoother game, even if you are playing on a low-end GPU. You potentially don't need to spend a lot of money to buy the latest and greatest NVIDIA GPU, but even with a low-end one, you will get nice results. These are applications that we are already seeing now in products, in software, in services. And it will expand in the future, I think, in all the fields because there are so many applications we can think about. So indeed, the only limit is imagination and data because this is also another limit.

SPEAKER_03: 24:32

But even there, because in data, I saw that they used it for example, in fraud detection to create synthetic fraud cases because when there's not enough fraud cases to build a model on, that you can build synthetic fraud cases to have a sufficiently representative sample to build a model on. So that even when you lack data, that you can use it to create synthetic data to then create other models.

SPEAKER_05: 24:59

Okay, that's wild. I did not see that coming because that's so difficult to... to like find their anomalies per definition. And these models do not necessarily learn anomalies in a way. So that's interesting. skeptical

SPEAKER_03: 25:14

but I can send you the article I'm not sure how if it's just a brain fart and somebody wants to go that way or they're already there let's make some money but it definitely sounded interesting because that's another application it's everything around synthetic data I think was also an area where These models came up as a potential solution and where you could use basically creating synthetic data sets to create privacy compliance data sets for medical research, for example. You had, I think because I saw some others, but so just a lot of creating synthetic data to help you do stuff.

SPEAKER_02: 25:58

But there is a problem around this. I think... I agree with you. There are a lot of possible applications where synthetic data can help you to train maybe specific models like Murilo was saying before. But to generate synthetic data, first you need real data where they can take inspiration, right? And for example, DeepMind, I think, a few months ago, studied the impact about scaling this kind of models. Because so far we got boost in performances going from GPT, GPT2, 3. Now we have ChatGPT, which is a sort of 3.5 maybe. Probably we'll see GPT4 in the future. But from sound analysis, yes, indeed, from sound analysis, we saw that we are basically using all the public data available in the internet and And sooner or later, it will finish. So to scale up again, GPT-5, 6, 7, which probably they will have billions of parameters, we will need much, much more data. And this can be a problem. Or they have to pay people to create new data for a particular application. And they are already doing this, for example, with ChatGPT. Or they need to wait that... We ingest more data on the public internet. And going back to the point of Tim, I think open source there can help because if we joint effort to create not only the code, but also the data and create models available for the community, I think can be beneficial to everybody. What do you think, Tim?

SPEAKER_05: 27:42

Yeah, absolutely. I'm a huge fan of this initiative. So I think you're talking about... For example, Open Assistant.

SPEAKER_02: 27:50

Yeah.

SPEAKER_05: 27:51

There's a very famous person behind it. What was his name

SPEAKER_02: 27:55

again? Well, one of the promoters is a YouTuber. Yeah. That's the... Kirchner, was it? Yeah, yeah, yeah. But the company behind is Lion. They released already a few models, a lot of data sets. about computer vision in particular. Now they are moving towards text.

SPEAKER_05: 28:14

But they're also not really focused on profit. So they're like the OG open AI in a sense. So they're nonprofit and actually being open. Yes, I'm saying that. I'm putting it out there. That's cool. I didn't know that actually about open assistants.

SPEAKER_02: 28:27

Yeah. Maybe to explain it a bit better, because we discussed about generative models We also mentioned before some random definition like transformer, GANs and so forth. But the main idea, these are just model architectures. The main idea is that to train a model, indeed, a generative model, we need to replicate somehow the knowledge, the reality, right? So we need to collect a lot of data, then use the right... architecture, model architecture, then train for a lot of time using a lot of compute power, in particular GPUs, and then we will get our model. So to collect data, we have a few options or go online and scrape public web pages and so forth, a lot of datasets available. For example, for images, one very famous dataset is ImageNet, released already a few years ago, but it has more than one million of images. But to train ChatGPT, they needed real data because they wanted to have a sort of natural conversation with people. This is a particular approach, probably Tim has more experience than me with this, reinforcement learning, assisted training by humans. But the main idea is to allow people to interact with the model, give new data, new input data, give new responses, and then verify when the model is right or wrong. And this open initiative is about collecting the same data, the same data set, using open source technology and the contribution of the community. Then they will generate a dataset to train a possible clone of ChatGPT or something similar. They want to replicate the paper from OpenAI about creating conversational AIs. So it's a very nice initiative. Everybody can join and contribute by interacting and giving new information about a particular task. They... that they need only 5,000 samples per task to train a decent chat GPT-like model. So it's not a lot. If a lot of people interact with that, it will be possible to have this dataset very soon. I'm very curious to see the outcome. I

SPEAKER_03: 31:03

was going to say 5,000 is not that much.

SPEAKER_02: 31:04

No, sorry, 50,000. But if we consider the large... open source community, I think they can get it really easily.

SPEAKER_05: 31:16

So if our listeners contribute 50 samples each, I'd say we can boost this open source project. I want to start using this, please. Let's get going. No, but I think it's really interesting what Vitaly mentioned. And it's also a really nice initiative in the sense that it might prevent some of the more ethical issues that arise i'm sure we've all heard about that as well in the news how this data was sourced it may need some reconsideration about how we do this in the future essentially it's really interesting what is going to happen to data as you mentioned there may not be enough or there may not be enough of a specific type especially not reachable by smaller companies that are not Google. So what is this going to do to the data economy? That is really interesting. How is data ownership going to change?

SPEAKER_03: 32:04

I think at some point there's also, and it's maybe a prequel to also the ethical discussions, is what's the risk of being kind of flooded by generated stuff, not being able to kind of distinguish fact from fiction. I mean, there's already the case, you know, and I think there's been a lot of talk about students, of course, kind of looking this up and using these technologies to write papers and then professors being like what should i do about this is this okay is this not okay i mean probably not but but you're gonna create a lot of material that is artificially created that is not stamped as artificially created i

SPEAKER_00: 32:38

think whenever i think a little bit about this i get almost philosophical so but you know if this happens we'll stop you yeah yeah just

SPEAKER_01: 32:46

you know smack me in the head or

SPEAKER_00: 32:47

something um no i think One, I mean, I saw the, especially in schools, right? And I remember seeing that someone already created something to detect if it's chat GP3, you know? So it's like a little fight over. I even saw one. I had the latest. It was someone, it was like a TikToker or something. I don't know. And then they actually use a 3D printer. plus ChatGP3, and then the 3D printer would write on a notebook whatever ChatGP3 was telling him to. It was like the whole thing automated. But I think, well, maybe the philosophical part is one, like if you have automatically generated emails, right? So basically the way I'm thinking is you write the subject line and then it pre-populates the whole email for you. And then I'm wondering, like, do you need to read the email? Like everyone's email is going to kind of sound the same, right? Like it's the same information. It's just some formalities before and after. I imagine. Imagine the

SPEAKER_03: 33:40

flood of marketing material that you'll get. I mean, if all of that can also be automatically generated, you get a lot of content thrown your way, right?

SPEAKER_00: 33:53

Yeah, I even heard that BuzzFeed, they're going to start using generative AI to help writers and stuff for quizzes and all that. I think I saw it. I couldn't find it. I looked for it. But I think it was a YouTuber or a podcast or something that they actually pulled an article from some source, I remember, that it was clearly generated AI generated because the article was really well written but then out of nowhere they just kind of threw aliens or something like something yeah it was like insane it's like okay but it makes you wonder right like how much of the stuff I'm reading is already like AI assisted AI generated a lot of

SPEAKER_03: 34:30

articles are and I mean that we know right a lot of sports articles and stuff like that are almost they are automatically generated so there's already a lot of content that is artificial this is going to take it to the next level, I think.

SPEAKER_00: 34:43

And I don't know, maybe going back to my philosophical part, I really want to say this. I'm wondering like, for example, if the stuff that we're teaching in schools, an AI model can just do, like, are we teaching the right things kind of thing? You know? Because it's like the same thing with I mean, I heard it a lot when I was younger, you know, like, oh, why do I need to learn how to multiply stuff by hand? It's like, oh, you're not going to have a calculator all the time. It's like, oh, really? What's my phone doing now, you know? It's like, got you, you know? And it's like, I mean, I'm not saying it's not valuable. I wouldn't go as far as to tell my kid that he doesn't need to learn how to multiply stuff by hand, right? But then... You know, it makes me wonder, you know, if the stuff that we're trying to assess students can be replaced by a machine learning model or, I mean, it's very, very elaborated, very complex and sophisticated, you know, but then I'm also wondering if this is the right things that we should be focusing on.

SPEAKER_03: 35:38

Yeah, but I think it's a fair question. I think part of it is there are some skills that also kind of just form you, right? So being able to do some math kind of triggers your brain to think in kind of logical terms. And I think somehow, although your calculator might be able to do it a lot faster, easier, whatever, and you can ask your phone now and it will do it for you. I think it still helps to... but then we're going philosophical, to develop yourself as a child. Where I think it's interesting is how it can help in education because there's two sides to the coin. I think the side of, of course, being able to automatically generate your papers and stuff like that, but there's also the tutoring aspect. If you have questions like this, I don't really understand. Can you explain it to me like I'm five? And then it gives you a more simple explanation or you're asking for a summary or for flashcards. And that way it can help students But this

SPEAKER_02: 36:32

is funny because then we'll have professors creating the track of the essay or whatever with ChatGPT and then students creating the response with ChatGPT. Are you

SPEAKER_05: 36:44

making the argument that we should be letting children perform certain tasks so that we can... train them so that they will work well

SPEAKER_03: 36:53

afterwards? To train your mobile? Is this what

SPEAKER_05: 36:55

we're... No, but it's actually funny because ChatGPT can do so much, but it can't calculate. It's really funny. It needs a calculator. You need to give it a calculator if you want it to do any exact math. So it's just funny how that works. No, I think going back to that question about, yeah, fake content and just how easy it is now to create text that seems somewhat qualitative.

UNKNOWN: 37:18

Mm-hmm.

SPEAKER_05: 37:19

I think it's a short-term problem, actually, but I might be naive in that. In the sense that, yes, right now it's really easy to create noise in a web that is already flooded and we don't know how to search for it. But in a little bit, hopefully, we'll get so much better search that will really help us to curate things and filter through that noise. That's what I believe. I believe in that. I'm optimistic in this sense, but I may be wrong.

SPEAKER_02: 37:44

So maybe the risk is more not in the consumer, but the producer, like the producer of something. Like Murillo was saying, we need to do multiplication to learn how it works, then maybe we can use the calculator. And I think technology is already influencing us in certain ways, because while Murillo was making the philosophical, let's say, consideration, I was thinking that, for example, if we ask to our fathers or maybe older people to go, I don't know, to Brussels, to that particular street, they probably know where to go, which is the fastest path and so forth. I can go only there using Google Maps. And maybe if I go there 10 times, then I will be able to replicate the exact same.

SPEAKER_03: 38:35

I'm the same. I go nowhere without my GPS.

SPEAKER_02: 38:38

Yeah. So these are already influencing us, right? Relying on technology and not... taking let's say the map and searching for the path

SPEAKER_00: 38:48

and I also think that technology in general so not generally AI but with the fake news and everything I also I have seen in the news sometimes you know how some schools are focusing more on identifying you know Good candidates for fake news, basically, right? So thinking

SPEAKER_03: 39:04

critically and being able to,

SPEAKER_00: 39:06

yeah. Because I think it's something that nowadays, which is relatively new, right? Like we're flooded with information. Like you, I mean, I remember my dad saying that when he was in university, he's like, oh, you want to know something? You go to the library and you look for stuff in the, you know, the index, you know? It's like, Jesus Christ, that's like so much work, you know? And I think it takes some time. Like now we have all this information, which is really good. Okay, but now we have a lot of bad information, you know? So how do you parse things through, you know? And I think that's also a skill. That's also something that is already changing. When you were talking about a use case for generative AI, one thing I thought that I saw somewhere, and I wasn't sure how I felt about it, I think it was an AI chatbot to act as a therapist or something.

SPEAKER_03: 39:49

That hits close to home for someone here.

SPEAKER_05: 39:51

Yeah, so a little background about me. I am a former... I guess you could call it psychologist. I studied psychology and well, the research part, I by no means a clinical therapist. So I'm maybe not fit to evaluate the AI chatbot giving therapy, but yes, it's, it's yeah, it's, it's a common theme, right? So, you know, new technology, let's do something that exists much better, but let's not consider all of the other stuff that, you know, those pesky ethical issues and

SPEAKER_03: 40:20

whatnot. Go for the ethical issues because of course there's, there's, there's kind of being flooded by and fake news and, and, and probably IP related discussions. I think those, those are maybe one of the logical or most more logical ones. What were the ones you had in mind or you wanted to bring up earlier?

SPEAKER_05: 40:38

Where do I begin?

SPEAKER_03: 40:38

What's the biggest concern?

SPEAKER_05: 40:43

Bias. Let's go there. Bias. I've been using chat GPT to write stuff for myself, not just code, but also creative writing. Let's call it, you know, being a, a thought leader on LinkedIn writing technical articles, something ChatGPT is really good in, by the way. So something I have to do is push ChatGPT away from that. So I have to stop it from, you know, going in thought leader-esque ways and using all of these buzzwords and superficial ways of thinking. So I tell it, no, no, hey, That part is way too, like, yes, people will click on it, but it's ridiculous. Please stop. Just start being serious and focus on this part. This is what we're really interested in. And you can see that it's just not capable of doing that sometimes. And that's fine. That's a level of granularity that it cannot reach because it doesn't learn that detailed maybe yet. But then there's, of course, the issue with stuff like gender bias, racial disparities, and yeah. The fact that indeed a lot of people's culture is being commodified in a way, but now it might be going too far. It's a dangerous trend potentially. It can be really misused in so many creative ways. As many ways as there are to use it for good, there's so many ways to use it for bad. That's the essence of any great new leap in technology. Practically, I think the example that Vitale has already mentioned about data collection and why I mentioned the Open Assistant initiative is really good is because the data that is being used, for example, by OpenAI is collected basically in data sweatshops. So low-income countries, that's where data is being collected. People are being asked to give feedback to a specific prompt from Chachapiti telling it how it should answer. So the whole reinforcement learning with human feedback process the part where it really became so powerful, so accessible, is outsourced and is commodified. And maybe people are not treated as well as they should be. On top of the fact that our data is being used without any permission in many cases. So yeah, again, I don't know where to begin. So let's ask Chachi PT.

SPEAKER_00: 42:52

Maybe I will... I agree with Tim, but I'll be devil's advocate here. And also... Maybe think like, yeah, maybe you go, you outsource, but in a way you also generating income jobs. Right. So, and I mean, also when I was traveling, I remember I had like, I was talking to someone in Cabo Verde and then they were like, they were saying, yeah, my grandfather built these roads and everything. I was like, oh man, they must suck. You know, that's like, oh, that's such like hard labor. It's like, yeah, but everything, everyone had jobs back then. Right. Maybe, you know, I mean, I'm not agreeing to this whole thing by any means, but I do have these moments. I'm like, hmm, I feel like the solution is not to stop, but maybe to change how it's done. But one thing that's been on my attention as well, more for the computer vision, generative AI, is again, the whole data thing, how people are using data and how... how much of that data is reflected on the generated examples. And I think you had a good example, right, Kevin, with images that were used for training.

SPEAKER_03: 43:59

Yeah, indeed. So if you tune your prompt, you could get the original images it was trained on. If you have really the exact prompt to ask for that specific image, somehow the trained data is still in there. So if that's proprietary, then you can extract proprietary stuff from there.

SPEAKER_05: 44:16

Yeah. So one of the main messages that Timnit Gebru, who was the former head of ethical AI at Google, brought at one of the keynotes at NeurOps, I think it was, was that Before we start building products, before we start building AI models, we should ask ourselves, what are we doing this for? What will be the impact? Is this towards the utopia that we're trying to build? Because of the general nature of this model and its lack of guardrails, that's a serious concern because it can be used for almost anything. And I think if you're... Looking back to one of the examples you gave, for example, the fraud one, the anomaly one, if you're going to start generating synthetic data using this type of model, it's going to make use of these biases to generate these particular anomaly cases. And that's a very dangerous thing. And that goes back as well to its maturity. I think in the near future, there will need to be some sort of staging before we use the output of ChatGPT. So by staging, I mean, before we actually use it in an automated way in any high-impact scenario, we will first review it by a human to some extent.

SPEAKER_03: 45:27

It also, I think, was when the CTO of OpenAI, she helped build the thing, and she was also advocating for regulation on AI to be kind of installed. I know, I mean, the AI Act is going to be officially released very soon. It's a first step in that direction, but... So then they're building it and so they're advocating

SPEAKER_02: 45:52

for it. Does it have any points related to this kind of generative models, the use of chat GPT to create something that can be dangerous for people?

SPEAKER_03: 46:05

But like Tim said, I think it can be used for good and bad. I think somehow you need to limit... You need to put guardrails on what you want it as a society to do. Does it build up to your utopia, as you said? Or is it going to go into the dystopia where it's two opposite pictures?

SPEAKER_02: 46:23

For example, I can go a bit more practical. I saw there is probably a judge or someone... some similar role in US, I think. They created something for a possible case, open case, using ChatGPT. I'm not fully informed about this, but maybe we should also insert in the AI Act the limitation of using AI for this kind of application, right? When you are dealing with low regulations, I mean... People should evaluate case by case from an expert, not from a robot.

SPEAKER_03: 47:05

The ADA Act, we'll probably go into it in the next episode, because I think somehow it's risk-based, so it should cover... these cases when applied properly, but that's food for another, because I think we can talk for hours on that one, and we're already quite far ahead. Maybe as a kind of, to sum it up maybe, what are some of the applications you're most excited about, kind of up and coming? What do you see?

SPEAKER_00: 47:38

Tim looks very excited, and he... Yeah, he gave a preview earlier that he has a very extremist, I think, predictions for the future. That's how he described

SPEAKER_05: 47:48

it? Maybe extremist is not

SPEAKER_00: 47:51

a term I would use. An extreme position, yes. Extreme position. So without further ado, you know, I'll just give the whole floor and everyone.

SPEAKER_05: 47:59

Wow. Thank you, Marimo, for that intro. That was great. Okay, so yes, I have an extreme prediction to make, but I'm really curious what you think of it actually. I think this evolution is going to unbundle search. So it's going to what is now a very, yeah, it's a service that is really much offered by big, big companies. People go to these big companies because they offer convenience. But these information processing agents like ChatGPT, as they become more accessible, open source, commodified, maybe even usable on your laptop or your phone, Everybody will have one of these assistants for themselves. They no longer need Google to offer this service for them. They do not. Bing. Who cares? Nobody cared before. Nobody will care again in the future. I think that these AI assistants will become much more integrated in our daily life, not just in search, but will continue to unbundle many, many things. Anything that involves a certain level of information processing will be to some extent fed into our own personal assistant because that assistant, you know, has her own biases, the ones we like, make us more efficient at what we do on an everyday level. Yeah, yeah, yeah. Yes, let's go for it. Full steam ahead towards confirmation wise. No, no, no.

SPEAKER_03: 49:28

And I think- inside like just fun facts and whatever and so i google a lot i mean when when people talk about something like oh let me google that so if i now not only need to do that but it just kind of there's an easier way to get to that same information why not

SPEAKER_05: 49:46

exactly but what is really interesting is that it will destroy business for companies like google um And others. Probably why they issued the red alert. So I'm just thinking back of the times that people were criticizing open AI. They were open, right? They were non-profit, I think, even before. Then they shifted status because they found something very profitable indeed. Became closed AI. But still, eventually, we're starting to release their models after people were able to replicate it from their papers. Thank you for sharing it, at least academically. That's nice.

UNKNOWN: 50:21

Yeah.

SPEAKER_05: 50:22

But so the cat is out of the bag, as I said before, and the competitive advantage that these companies held may just disappear. As soon as we're able to, you know, reduce the amount of compute that we need from 10 massive GPUs running full day to generate 20 words per second, costing hundreds of dollars for just a person per day to use it full time. As soon as we can shift away from that, make it accessible, it will really, really change the way that we interact with the internet, with applications, with digital. It's going to eat digital. That's my prediction. That's

SPEAKER_02: 51:00

a nice prediction. Yeah, indeed.

SPEAKER_05: 51:02

And I have lots of other hallucinations after

SPEAKER_02: 51:04

that, but let's give the word. We will listen again this podcast in 10 years. Team new. Team new, yeah. The new Simpsons.

SPEAKER_00: 51:15

Simpsons sometimes,

SPEAKER_02: 51:17

yeah.

SPEAKER_00: 51:18

Maybe one comment as well on the confirmation bias, or I don't know, something very catered to you. I felt that because, you know, Twitter, a lot of people are leaving Twitter and going to Mastodon, you know, and the whole thing with Mastodon is like, you see chronologically everyone you follow chronologically. And it was, oh, this is really nice because, you know, you see more people, you see more this, more that. And it's nice. But at the same time, I see a lot of stuff that I don't really care for, you know. And the Twitter algorithm, it did give me a lot of interesting things. Sometimes people I don't follow at all, you know, but just like, oh, maybe you're going to like this and actually worked. But I'm starting to feel like you cannot have, like it's two sides of the same coin, right? Because on the other extreme, you have, I don't know, like there was a lot of stuff with Facebook, you know, with the research that they leaked, you know, how the algorithm is designed to just keep your attention and get you addicted to it. And a lot of the times it's not healthy stuff, you know, and it causes a lot of depression and whatever, you know. So, yeah, like it goes downhill very quickly, you know. It's like, yeah, you know, like suicide rates and this. It's like, oh, okay, too much. But yeah. I think it could be very nice. I think the Google service part, I think it is going to be for search. I think if I were to look ahead, and I kind of mentioned this earlier, that I think they're going to be different flavors of ChatGPT. And one of them is going to be search. But I do think they're going to be other flavors as well. And I also think that... there are going to be some jobs that are going to have to adapt very, for example, now we talked about prompt engineering, right? With the mid journey, you know, you want to have a logo. I think even the logo for this podcast, it's AI generated, right? And it's like, well, what about those agencies, you know, that before you had to send like, oh, I'm trying to do this, I'm trying to do that, you know? And then they will come up with a sketch and it's like, okay, can you change this? And then they will change it. And that was their income. And now it's more like, What do I have to tell mid-journey to get exactly what I want?

SPEAKER_03: 53:24

I've read an article of an artist who's doing that already. He's creating art and his art is asking the right prompts. And so fine-tuning the prompts to create impressive pieces.

SPEAKER_02: 53:37

Yeah, but I think more realistically, yes, indeed. In the near future, we'll see all this kind of applications. For example, democratizing graphics, art... composing emails so i think for example large company like microsoft they invested a lot on open ai not only for their search engine bing but also to provide support ai support in all their softwares so excel word uh pain probably super powerful so would be great uh I would have used in my past because I used to draw tiny houses and the sun, you know, with paint. Nowadays, it can be like something impressive. So I like this kind of application, honestly. Or, for example, I think we will see in the future AI-generated videos. So if you are reading a book or something, a story, and maybe you want to visualize it, how cool it is to enter a few lines and see the animation of your favorite characters. Probably we will see this kind of application in the future.

SPEAKER_05: 54:53

It's also really interesting because it will really change the creator economy in the near future. It will give a new meaning to the term handmade. This was authentically handmade by a human. They painted it on their own canvas. Can you believe it, people?

SPEAKER_03: 55:10

Do you have an NFT for it? Yeah, I

SPEAKER_05: 55:14

mean, okay. Let's not delve into that. That's another episode. But yeah, it's also really interesting the word you used, Vitaly. Democratize. It's really interesting that you put it that way and not... steal jobs but I see what you mean it's about I think what people are always thinking especially in the you know the 60s 50s 40s even in those sci-fi utopias is you know manual labor will be automated it will be robots walking around I don't need to cook or anymore I don't need to do all kinds of physical labor and now yeah here we are my job is in jeopardy and Data scientists will be no more soon, perhaps. Yeah, go ahead. Go

SPEAKER_00: 56:03

ahead. I

SPEAKER_05: 56:03

was

SPEAKER_00: 56:03

just going to make a... You predicted that the machines would take over labor, you know? And I remember there was even a cartoon, old cartoon, that they worked like three days a week because the machines did everything, so people work less, you know? It's like everything's going to be chill. And I feel like it's almost the opposite, you know? It feels like people are working more hours, longer hours, and all these things. And... I think, I also think that indeed we're going to have, hopefully, right? We're going to have these services and we can actually build applications. And one thing maybe I should have mentioned earlier with the search engine is that I think, I really don't think it's going to be just like, especially for the NLP stuff, you know, for the fact-checked stuff. It's not going to be purely just like the model is going to give you stuff. I think people are going to come up with complex applications, you know, to safeguard what are facts, what is this, what is that, what are citations, where are the sources, you know, and they're going to have to build more manual stuff around it to make sure that it's reliable and people can use it.

SPEAKER_05: 57:00

Maybe a last question. What are you planning to do with Chachapiti soon? Anything creative?

SPEAKER_03: 57:06

One of the cases I actually find, and it's a bit similar to the experience when you mentioned Vitale, I think, of course, Medical research, hugely important, very exciting stuff. And probably going to have yet another kind of big effect on just improving longevity. And I think that that's definitely the obvious exciting stuff. I think the experience one is actually one where I'm actually very... fond of as well like you said you know you read a book and you get kind of images related to it or an ambience related to it I was reading something people know I like to run and where even the music you listen to can adapt to how you run so it can pick up pace if you're going a bit slower to trigger you to go faster and so you can use music dynamically generated to uh to adapt to your pace depending on what you want to do you know just like and and so it can i think it can transform experiences as well so that's that's actually what i'm quite excited about

SPEAKER_05: 58:23

and why stop there right as soon as you buy google glasses you can just style transfer your environments you can be running in a van gogh painting uh Imagine. I imagine. It's not far off, you know. Real-time style transfer is there. Now we just need the glasses to go with it, you know. True, true, true.

SPEAKER_03: 58:42

But on that note, I found my outro. Are you ready? Yes. Thank you for your very interesting insights. The near future has definitely taken yet again a new turn, hopefully exiting stuff ahead as long as we stay careful and attentive for the challenges and bias.

Ben Mellaerts

Host

Murilo Cunha

Host