DataTopics Unplugged

#48 How Can We Define DevRel in the Tech World? Tech Insights with Mehdi Ouazza

DataTopics

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

In this episode, we're thrilled to have special guest Mehdi Ouazza  diving into a plethora of hot tech topics:



Speaker 1:

You have taste in a way that's meaningful to software people.

Speaker 2:

Hello.

Speaker 3:

I'm Bill Gates.

Speaker 4:

I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.

Speaker 3:

I'm reminded, incidentally, of Rust here.

Speaker 2:

Rust, congressman. Iphone is made by a different company and so you know you will not learn Rust while skydiving. Well, I'm sorry guys, I don't know what's going on.

Speaker 4:

Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here.

Speaker 3:

Rust Data topics. Welcome to the Data Topics Podcast.

Speaker 4:

Hello and welcome to Data Topics Unplugged, your casual corner of the web where we discuss what's new in data every week from IBM to. Tupac, bart, anything goes. We're also on YouTube, linkedin X, twitch. Don't expect much from Twitch yet, but it's there. Check us out. Feel free to leave a comment or a question. Today is the what's the day? The 3rd of April, yes, of 2024. My name is Murillo. I'll be hosting you together with Bart, my partner in crime, Hi, and we have a very special guest today, bart. Yes, billy the one.

Speaker 4:

The one and only there you go.

Speaker 1:

How are you? I'm good. I don't know. I was going to London. I got lost in Belgium somewhere.

Speaker 4:

And you're in. Well, maybe before I start with all the questions, would you mind introducing yourself a bit. People that haven't heard. I mean, I think we were discussing just before I think it's hard to never have heard of Midi. I know right.

Speaker 1:

So I feel like, If you live in Belgium, which is a small country.

Speaker 2:

I would go as far if you are active in the data community and live in Europe.

Speaker 1:

Yeah.

Speaker 4:

Because you don't live in Belgium, do you?

Speaker 1:

No, no, I'm based in Berlin. So, yeah, I can give a few words about myself. So I've been working in data for 10 years now. So I started in the ugly days of Hadoop, on-premise cluster and all those things. And, yeah, mostly as a data engineer. I've been doing different roles, but data engineering stick to me and more recently I've been, so I moved to to actually Berlin. That's where I'm based for the past five years, joining different tech company over there Klarna, backmarket, the refurbished website and trade republic and then now, more recently, switching to developer advocacy so at mother duck, which is on the mission to ship duck db in the cloud, so developer advocate. But I'm first and foremost uh, a data engineer. I need to build things.

Speaker 4:

Yes, oh yeah thanks for joining us very, very, and this is your website, right there. I put it on the screen for everyone to see I need to update this one but it's still, it's still, it's still doing okay, no? It's still nice.

Speaker 1:

What do you need to update? I don't know. I usually update my personal websites every four or five years.

Speaker 4:

It's been.

Speaker 1:

I think I have a few more projects.

Speaker 2:

The blog is not really up to date, but yeah, you need to put duckdb there yeah, yeah, yeah.

Speaker 1:

So that's that's probably. I need to put one or two ducks, yeah probably, and there's a rumor that I'm just creating.

Speaker 4:

Now that you decided to come to the podcast after you've seen all the ducks around and you were fed up, you're like this is absurd. People are just appropriating these ducks. Um, I need to go there that's a really good point.

Speaker 1:

Like there is a duck there I mean, this one is belonging to me, is always traveling with me, uh, but it's fun to have so many ducks around data routes.

Speaker 4:

Yeah, exactly, that's what's the reason behind because it's fun it's fun to have a lot of ducks around, right, right. I feel like Bart one day.

Speaker 2:

He just put it there. But did you ever in your life hear someone say it's not fun to have a lot of ducks around?

Speaker 4:

I have never heard that.

Speaker 1:

That's my point, I mean nobody said it's not fun to have a lot of monkeys around. That's not why they have tons of monkeys around. Not yet, not yet.

Speaker 4:

Next week is gonna be a completely different podcast, all about monkeys. Yeah, um, cool. And then you mentioned like uh, so we mentioned ducks, uh, duck db, mother duck, right. Um, for people that haven't heard of mother duck or duck db, would you like to just give a?

Speaker 1:

yeah, I mean, that's my job. Yes, I didn't expect I had to do that, but techDB is basically an embedded all-app database, so that means it's run in process. In other words, if you're so, there is multiple clients, cli, java, but if you're working with Python, you just do a pip install. Python. Sorry, pip install. Tech DB is the equivalent of SQLite for analytics, if you want, and so it can process a lot of data set and pretty fast. Has been getting a lot of traction. We passed the 1.5 million downloads only on the Python client, which is pretty crazy. Close to 20K stars on GitHub. I know that's vanity metrics, we shouldn't rely on those ones, but still that gives you a thing.

Speaker 1:

And I probably should mention that MotherDuck is not the creator of TechDB, which is the open source project. This was born in Amsterdam, so at the CWAI, which is the Center of Research in Science and Mathematics, by Hannes and Marc, and so this is mostly maintained by the DuckDB Foundation, duckdb Labs basically, and we have a tight partnership with them. So MotherDuck, which is basically the commercial company which provides DuckDB in the cloud. So it's an interesting business model, because we don't see that often where it's mostly basically the creator of the open source project is the one that's going to commercialize it. But the creator of tech DB had different goals and I think it's nice. It's really you know, a research project at the end because it's the first all-app system embedded database. So they have different goals for good and bad. For us for MotherDuck I would say it's a different you know relationship. But yeah, I think it's really interesting and I think it's always important to highlight that Mother Duck is the US commercialized DuckDB and DuckDB is born in Amsterdam. So, hannes, if you listen to this hi.

Speaker 1:

Shout out Exactly. I think you will appreciate.

Speaker 4:

Very cool. I was actually talking about Mother Duck and DuckDB today, early today. I have a Brazilian friend that would chat about these things and he was asking about DuckDB more and I mentioned him. So shout out to Rudy Talking about Brazilian audience. I recently learned Mehdi has a huge following of Brazilians, huge, huge, also in Brazil.

Speaker 1:

You mean Also in Brazil, exactly without speaking a single word of portuguese, which makes it the more impressive, I'll say yeah, so actually it's not like a huge part, because most of my audience I can share.

Speaker 1:

it's like 45 40 us then and then, 15% Europe and the rest, the other, the word and, um, and the Brazilian thing was coming from someone following me, uh, which is a Brazilian and kind of like, using all my content and, just you know, giving me back credits, but doing his own blogs, or it didn't even. Uh, I did a design, um, of a project. It did a better design than me.

Speaker 3:

Oh, wow, and so that was like I said okay, actually my design is not clear enough.

Speaker 1:

I don't know the way that you know the shapes were done or the link. Sorry, we do it. So it's really fun how you know things can go around. And yeah, I did a live stream with him where he was translating things in Portuguese, so I was giving the content in english, and it was a really interesting experience for me to uh to have such an engagement in the brazilian communities and, and yeah, not understanding a word and also when he was translating.

Speaker 1:

You know he could say yes. So this dumb has said that, like I I don't know what he was saying.

Speaker 2:

Yes, yes, really like a big following through a proxy.

Speaker 1:

Yeah, yeah. And I think that just leads me that, as MrBeast is doing in the entertainment on YouTube, he's translating all these YouTube channels.

Speaker 2:

Exactly.

Speaker 1:

His biggest audience is actually not in the US, and I think we kind of have this bias in tech that everybody is learning in English. True, yeah, if they are smart enough to get into tech, but I think that's not true. I think there is really smart people where language is still a barrier for them, and I think we should solve that in some way. I think AI can help, but I think having champions in different culture and country helping to that that's an amazing thing. I think. Yeah.

Speaker 4:

And I was actually thinking I was talking to Medha before I went live, that I didn't think I didn't realize the importance of having blog posts as well in Portuguese right, because in my head it's like maybe the video format is more interesting. Right, because people can all translate like have Chrome extension. I think actually Google Chrome does it natively, right, but there's still a big aspect of searchability, right.

Speaker 2:

Yeah, true.

Speaker 4:

So interesting point. I also do. Coming from Brazil, I do see that even the people that are very knowledgeable in the tech community a lot of times they do content in English. So there's a big group of people in Brazil Like sometimes they do speak English. Most Brazilians, I believe, do speak English, but it's not like they're fluent and talking about tech is not something very it's not easy either. Right Like the English you learn in school is not about scalability and whatever right Like it's a different type of thing. You know you can speak English and still proven point is that, if I have to talk about work in portuguese.

Speaker 1:

Sometimes it's the same for me like it's not, like it doesn't come naturally yeah, exactly so, uh, I do see a lot of the value of trying to to like, just make it more accessible to people and I learned a whole new vocabulary in english, since I have a kid like five years, like all the animals and so on. I have no clue, almost even in french I almost don't know like what? What the hell is this? But now I know in english because I never used those words and you're actually uh studying dutch, yeah, yeah.

Speaker 2:

So we are going to do like a special, uh special edition later this year in dutch yeah, we can. Almost won't be a stalker we're gonna record it in two languages. In two languages, first name.

Speaker 4:

Then we redo it maybe maybe you can do in three like med. You can come and then you can do in french as well that's much more difficult for me I think french.

Speaker 1:

I think french will have more audience because, uh, netherlands, people like flemish and the netherlands, they speak really good english on average so this is where a good counter example where probably is not that much forfeit but for french side, like there is a lot of uh, pure french uh content which is happening right now and still useful I notice also a lot like when I do meetings in the netherlands or in belgium versus france, for example, like doing a presentation in english, like everybody's comfortable in the netherlands.

Speaker 2:

Yeah, when I do this in fr like everybody's comfortable in the Netherlands. When I do this in France, like it's really stretches people. Like you have people that don't follow anymore. Like it's it's the barrier is bigger.

Speaker 4:

I do think some people too, they're more, they're not just comfortable. I had, I've, I've been in meetings where the person was like, oh, I understand, but I don't feel comfortable speaking, so I will also, I'll just speak in french, and they were like, okay, that's fine we do the.

Speaker 1:

We did a dark db meetup in paris. It's the only one we did in french so far. Hundred people show up all the questions in french I couldn't answer anything.

Speaker 4:

It's really hard for me to kind of translate every single, you know but it's true term of content that I never translated them, so yeah, and now I live in a world where, if I meet another person from Portugal or Brazil and we're going to talk about like machine learning or something, I always have to preface the conversation with I'm Brazilian and speak Portuguese. I'm not claiming I'm not fluent, but I may trip up here and there because of this, this and this. And I think it's also because sometimes, if I do, I have, because sometimes, if I do, I have my friends in brazil. Every time we trip up on something they were like oh, marilo thinks he's european now. Oh, now he doesn't know how to speak portuguese anymore I think I'm like that I'm not me, but my friends would say that to me.

Speaker 4:

They were like oh, mario thinks it's better than us. Now, it's not. That's not true at all. I just get sad. But yeah, talking about um, sharing content, you mentioned um, that's a big part of what you do. No, yeah, how would you, how would describe your role? You mentioned advocate.

Speaker 1:

But uh, yeah, yeah how would you describe it there? Well, I think it's a. It's interesting topic because a lot of people have a misconception about it. But I was just saying before we start the podcast that I actually code as much than I was doing as a staff engineer. So I was working in tech as a staff engineer because as a staff engineer, until you pass senior, you're basically leading project.

Speaker 1:

You're still coding but, coding for bootstrapping, or there is different structure in tech companies. Some of them are. There is different kind of archetype in staff engineer, but the bottom line of this is that you're not coding as much as a senior engineer. That's definitely true. You're writing RFC, reviewing things reviewing PR.

Speaker 1:

You do a lot of reviews, and that's the same here as DevRel. It's just. The other side of work is different. I do content but I still need to run stuff in production. Who would be if I teach other people data engineering concept or Doug DB concepts if I'm not running production pipeline Right? So this is really important to me and I think it's just that there is different kinds of data which are. Some of them are less technical, depending on the the, the product they sold. Yeah, if they sold like a new eye friendly to design pipeline, yet they don't like you know they, we could argue that they code because you know it's a building blocks yeah but the point is that they're probably less technical than someone you know, selling a database product, or yeah?

Speaker 2:

but to understand correctly, like you still like your day-to-day job, you're still coding as a data engineer yeah.

Speaker 1:

so it's like I need to be able to code and to do projects, deeper project and those inspire my content, okay, and then I have, uh, also surface content that you know I know the audience need, which is kind of boring for me, like I won't, like you know, learn specifically new, and so it's kind of like finding a good middle ground where I have, you know, deep project that keeps my technical skills sharp, where it's going to address more to intermediate and advanced users base, but those you know baseline of user are rather small versus the you know what you need to raise the awareness which is you, which is the beginner, so yeah.

Speaker 1:

So it's finding the good match between the overlap and doing compromise with those. But yeah, sometimes I'm like, okay, I need to go back to code because I've done too much LL Word stuff. And so, yeah, some other companies like found opportunities where you do a custom integration, you're really part of the product team. Here we are, we have our own data pipeline internally, we do our dock footing, we use our own tools. So I think like that's clear opportunity for me. And I've heard that in other like data company where they have, you know, dev, dev, rel, they all mostly work.

Speaker 2:

Also, I know, for example, dbt on their internal data pipeline and so on I think within the team is also a bit of a fine line to tread, in a sense, like if you're too far removed from the content, like then you just become someone from the marketing team, right?

Speaker 1:

yeah, yeah, yeah yeah, and that's the the fine line. Uh to, especially in startup, which is often hard to to find, I think as you grow, because mother like, it's now like a 50 people company. But it was nine, I was a 19 employee a year ago so we are growing and actually I have two new uh teammates in my team which are more on the marketing side to help me out. But there's this kind of thing where I feel like, okay, this is too much marketing for me.

Speaker 3:

I shouldn't be doing that.

Speaker 1:

And I think that's one of the big challenges. If you work on DevRel, you need to say no, you need to say where is your added value? Is it like at the end? That's why I'm defining myself. I'm an engineer. And this is what should be valued, not like. You know social media, you know a community manager or whatsoever.

Speaker 4:

It's a part of my work, but I shouldn't you know that's not where I had the most value. Yeah, I see what you're saying, I think maybe. Well, I also have some questions on developer relations. Maybe, just to go over, you linked this post as well. Yeah, I linked this post yeah, I saw some funny quotes. I haven't read this thoroughly, right, but DevRel is not quote, unquote someone who travels to a lot of conferences, right.

Speaker 3:

But, he does, though right Not anymore.

Speaker 1:

So, again, startup, you need to wear multiple hats. I think you can see, especially in the tech world. There is DevRel that you only see in a specific channel, which I mean blog or YouTube as you grow, or conference. So, for example, in Confluent there is a DevRel that is doing a lot of meetups, like mostly 90% of the time she's doing meetups and conference, but then she has less time, of course, for blogging.

Speaker 1:

You cannot just do everything right, and so I think, as you grow as a someone in DevRel or as your team grow, you're gonna, you know, specialize in one in other area where you are the most comfortable. That's why, like here, saying that it's someone that's traveling, I think it's completely untrue if you're not comfortable with big conference or speaking. Applying to conference is also like I hate that. It's taking a lot of time, like picking the right conference, sending a topics, getting a rejection and like what the hell? Yeah, um, and so I think, especially with now the online content picking up, you can really, you know, do compromising. I'm not going to cover this and I'm going to cover over there, which I found it's fine, so, so yeah and dev rel.

Speaker 2:

Well, maybe for the people DevRel stands for Developer Relations We've been talking about it too. To me, devrel is a function in an organization. I'm not sure it would be interesting to hear your input. To me, it's a bit like it's a function in your company that is tasked with building both external and internal community around either your product service, your, your brand like. Is that a correct yeah, representation of?

Speaker 1:

yeah, I think it's like. So I like to define like. So there is different sub role. You were asking uh just before. Uh, so developer advocate, developer relation, they all overlap somehow. Like developer advocate, you could say like you focus on the awareness. So it's the like in marketing. We say the first part of the funnel, like people getting to you to know the product, and then there is maybe developer relation or developer experience focusing on the docs and the ui right feedback loop. So there's different area, but in general, you want basically to build a strong and vibrant community that uh help each other, as you said, around products or service. And I like to say that in general, I tell technical story to help people you know achieve that because at the end of the day, would it be be a blog or a video or in person? You tell a technical story and to help people use a given product.

Speaker 4:

You mentioned tech advocate. We talked about DevRel Evangelist, yeah. Would, you equate to advocate.

Speaker 1:

Yeah, I think it's. Um, I think it's, it's, it's all overlapping. I mean, I don't have that much experience in there. Well, uh, I have, uh, my manager, which does more. So that's the question. I can give it back to you. But, um, but in my opinion I would say they, they, they, they kind of uh overlap, I think, evangelism. I see that more into a specific area rather than a product like I don't know. If you're a consultancy you know in, you know, uh, sustainable energy, then you're evangelist around that yeah, I see um around the full domain and advocacy.

Speaker 1:

I would say more it's, it's more against specific narrow. Yeah okay, uh, narrow products, but again, yeah, those can be, to be honest, like I've interviewed to like eight companies before you know, accepting the offer of mother duck, and no one has a clear clue of like. They all ask me, like, yeah, how do you define devil? How do you measure, like you know, efficiency?

Speaker 1:

of your devil and super hard yeah, and so they have all have their idea. But the point is, uh, I think it's real, but I'm curious like what, what do you, how do you see like devil in your day-to-day, you know tech world?

Speaker 4:

Parts of looking at me. We had a bit of a. I mean we have had some light discussions about the developer relations. But I don't know. I'm asking, I mean I have no idea. But I'll just say what I kind of understood up until now that for DataRoot right, it would be kind of understood up until now that for for for data right, it would be kind of the ambassador of data routes in a way. You know, when someone thinks of theirs, they think of that person building connections, building community, knowing what it is about. Uh, so it's almost like the, the next, facing externally right to the community, right, so it's a bit like putting the name out there and all these things.

Speaker 2:

That's kind of how going at events and so on, like yeah, but we don't have a date with anyone with a dedicated role, for example but, I do think that we are to some extent cautious, conscious, sorry, about building community, but it's also more because we have a lot of people that are interested in that and that take up that role a bit out of their own passion, out of their own interest, and not necessarily because they are assigned a task.

Speaker 2:

But maybe that goes with the hand in hand, like I think Muriel does this to some extent, as well as with the Data Topics podcast, with the podcast and I think when you mentioned that's something important, like we don't have someone dedicated.

Speaker 1:

It's exactly when you start your data journey. You have your software engineer doing small report or business doing funding excel and you know you don't have dedicated people in data. Right, if you start and if just some people take that responsibility and then as you scale, I think, for you know you don't have specific sas products.

Speaker 1:

So I think it's different yeah, but I'm also curious, like you know, as a consumer how did you, how do you deal, like what's, what's your experience so far with, with devrol in in other products than mother duck, for example?

Speaker 4:

so I think, I mean, I think you're a good example because I think you I mean, when I think of mother duck, I think of you right, it's like you're really the face ambassador. You have a lot of content, you do the workshops like you're very engaged, um, but I've also seen at conferences people that were developer advocates for companies and their talks had nothing to do with the product at all, but they're like, oh, my developer advocate about this. They were talking about like, uh, python, it was a python, right, so it makes sense, but had nothing to do with the product and like they mentioned it very little, but like they mentioned, I work for for this. And then after the the talk I asked them oh, can you tell me more about it? But, and the colors matched, but not even the logo was there and I'm like it's cool, but I'm wondering if this is what being a tech advocate is like it is.

Speaker 1:

So some people said that devrel is like marketing without people knowing it's marketing. You've been trapped.

Speaker 3:

Got it.

Speaker 1:

So I'm doing kind of the same, like I mean I hate like having this cap of marketing, but I think there is some truth to it right.

Speaker 1:

Because, again, I'm an engineer first and foremost, but I do also sometimes not mentioning MotherDag at all, or on the last slide In Jordan don't fire me if you listen to this, because I think that's not the main point. If people are interested or are convinced by my technical story that is somehow related or not like into the sector you know about data or data engineering, they'll come back and ask question. So, and as, exactly as you said, like, oh, I'm curious, like you know what is it that he's doing, and so on, and so, and that's the people that you know, that's the best that you want to attract, because you're just providing, you know, clear value to them, and after they're going to look up to the product and so on, and that's always my goal.

Speaker 2:

so I think that is uh. If we see it as part of marketing, that's good marketing, like you want to. To say, someone has a problem, think about, think about this problem. I know you can. You can potentially use this for that.

Speaker 3:

I'm not saying I'm selling this.

Speaker 2:

You can use it for these problems like, like it's a different way of of connecting with people.

Speaker 1:

So and the second part I want to tell is that, um, we, I forget about it. So first was uh, basically that's marketing without knowing it, and the second part I'll come back. It's gonna come back later, but uh, just, uh, just an important way. Yeah, it's that basically, developer advocates always side to the developer side.

Speaker 1:

So, I'm the one you know that always a bit grumpy internally about the product, like, oh, we should do that. I mean I always formulate internally it's like, oh, you know, I've run those things and I'm frustrated because I'm also an user heavily of the products, right, which is not always the case. Like there is a lot you know of rnd. When you build a database and all those software engineers, they have a different background, they, you know, they get disconnected from the actual you know usage of this database in this context and I'm like kind of like one of the users so and so if I get feedback from someone at my workshop on that and I'm completely agree, I'm gonna surface that a lot and I think that's one thing is that the developer should always be on the side of the developer and not on the product.

Speaker 1:

So marketing or product will tell you yeah, we can work on that, or this is not a bug, this is a feature. But I will be, you know, transparent this is a bug. Yeah, definitely Because, like I value the trust that people have in me outside my job at ModernDuck, because I'm doing also content, you know, on the side, on my personal channels, and I don't want that distress to be broken but I think that that's uh dynamic with the community is also super relevant for to improve a product yeah, because you know what, what is in the minds of the people, what works, what are frustrations, what are people happy about?

Speaker 2:

and, like you, could you also have a better understanding of someone that is far removed from the technology?

Speaker 1:

yeah to uh to translate that into new features, into and so this is where it's kind of our overlap. Coming back to the role definition, there there is developer relation which are really close to the product and their role is really about, you know, getting all ears and feedback, that in a clear way. For me it's molly and formally like I share a couple of stuff here and there. We're still a small company, but yeah, that's kind of an example where advocacy can be on the awareness and relation more on the product side.

Speaker 4:

And it also mentions here, just with the blog post they do mention here that DevRel is a translator of feedback.

Speaker 1:

Yeah, it's a good. I mean there is a reference to an older blog. In that blog it's kind of an inception stuff, but they're a good reference. I would recommend the audience to give it a read to understand and also, you know, give your perspective when you meet someone, because there is actually backlash on when I give my name as a dev rel, like people intervene. Oh, he's going to sell us that.

Speaker 4:

So sometimes I hide it yeah, I just, but I'm a data engineer, which is true yeah but you know, and so this is why my, my job title is like data engineering and dev, rel, and yeah, uh, maybe one question you we talked a lot about the product and you being on the developer side and not the product does it make sense to have a dev rel for a company that is doesn't have a product that they're selling?

Speaker 1:

then yeah, that's. That's a good question. I think, um, if you are, you know, a service, then the people you're gonna talk is not necessarily the developer directly, right? So it feels hard to have a developer, because that's usually someone which is, you know, pretty technical. So what I see, if you don't have a product, you're providing service. So I don't see, correct me, I don't have any example coming up in myales or technical sales. That's going to play kind of their role of like yeah, those people would like our service to be packaged like this, can we do something so they feedback this, but it's not like as a technical product. They can really report a message log or something directly, and that needs to be technical.

Speaker 4:

I see Interesting log or something directly, and that needs to be technical. So I see interesting maybe uh, and I don't want to spend too much time, but I couldn't get out of this recording without mentioning another title tech influencer.

Speaker 1:

I know, yeah, so that's, I think, a new um, a new trend. Uh, like people see, and there is a lot of backlash on all the social, on Reddit, and it's been on me also. But, as I said, like I have a tough skin, I'm just crying for like three days and then I'm good, then it's fine as soon as we stop the stream. The point is that we were discussing is I started doing content just by sharing, like sharing information, sharing learning, and actually it's a good process for an engineer to put your thoughts on paper.

Speaker 1:

And the first blog I wrote was for the Spark Summit, so it's already like almost four or five years ago, because it was called now the AI Summit by Databricks and anyway, it was just data engineering highlights my top talks and my key takeaway. And I needed to do that because I was paid for the tickets and not everybody in the team had the opportunity to go to the conference. So they asked me can you do an internal presentation? I'm like, yeah, I could write a blog actually and share it. And Matej from Databricks. So the creator of Spark shared my blog on social, not because I was a rock star and it was an awesome blog, maybe because there was nobody Like four or five years ago.

Speaker 1:

The landscape has been changing drastically and so if you roll back that time, there is a lot of people around that's been growing their audience, you know, far bigger than me, and they all look for monetization different way. They accept sponsoring posts. They're being transparent or not, you know they say yeah for free, so they're like it's like a marketing announcement, it's kind of clear. Other, they get the disclosure I never had to. I accepted one time, you know, in sponsoring thing. It's when DBT invited me to their conference and I was like just say yeah, they pay the trips and I can do whatever I want. That's the deal, um, on content on the conference. But it's just to say that there is multiple way to monetize.

Speaker 1:

You know, as you grow your audience and people you know have different response to this kind of content or change because people at the end you know needs to pay their bills yeah exactly and so I think I think that's the bad thing about tech influencer is that it's a group of all those people Like people still doing that for free and having a full-time job and doing it on weekends, right, people like me, which is kind of in between, does have a devil, you know, and some bias against their employer, but still have some time also to do content and other stuff that embrace like, yeah, we're a media tech influencer, so we will, you know, create a reach to your event or to your product, and I think that's that's where it gets, um, you know, a bit bad for your community is that there is no clear definition it it's still relatively new.

Speaker 2:

I personally maybe I'm thinking more about influencers, quote unquote in the podcast world, because I listen to a lot of podcasts- yeah.

Speaker 2:

Like I have no issue whatsoever with uh, with monetization as long as it's transparent maybe that's my uh like as long as you know that there's a reason that people are promoting something, I think that is important, um, but what I do notice is, a lot of time, like people are built an audience and at some point it reaches a certain size that monetization becomes relevant, and they do. They do activate one way or not, one form or not of monetization, and they get a latchback from the community. Yeah, because the community is not used to it.

Speaker 2:

But at the same time, like my opinion, like these people like to invest a lot of time and effort to build this right to build this for the community that listens to this or watches this or reads this for free, right yeah, I think that's also a logical thing to to do the other day. Everyone needs to pay their bills, and I think yeah, but but to me, like the core thing is like be transparent, like yeah where does where? Where is what? What is your honest opinion and what is what is something that you're promoting?

Speaker 4:

I think, but I also think some I have the impression. I don't know if this is legit or not, but I do feel like people get used to the free content in a way.

Speaker 3:

Yes, definitely.

Speaker 4:

Once it's not free, they feel like they're entitled to it somehow. And I feel like, if you have something, like you have a course and it was free and now you have to pay for it, for me it's fine, right, it's your decision. Ooh, traitor, yeah, but you're not taking money from my pocket and now I have to. You know, like you don't want to do it, it's fine, don't do it, you know. If you think it's too much money not worth it.

Speaker 2:

It's also fine, right, like there's no hard feelings. Right, are you on patreon or something?

Speaker 1:

no, but I'm gonna start a couple of courses. Um, but again, like the reason I'm doing courses, yeah, there is a monetization behind the scene, but foremost is that I get too many requests how I can scale it. Yeah, so I could do a courses for free. Yeah, but that takes a lot, a lot of time right exactly and so, uh, I would rather like.

Speaker 1:

My intention is like, okay, if I'm breaking, then you know, based on the time invested for those kind of things. Um, but people say, oh, you know, he started selling me courses and I think there is like a lot of people, like in AI, for example, jumping just on the hype and claiming AI engineer and selling course and I think there is a lot of delusion on that. But the one advice I can give to the listener here and the community is that just take back and make your opinions and, like, the tech influencer is a really big basket. So just sort it out on people, see how they you know what's their background, how much content they've been putting there, where they are working for, and you know, give your own opinion, but just don't put everybody in the same basket. That's.

Speaker 4:

It's not the case yeah, and maybe segueing my way into the next topic. It's a bit of a forced one, so get ready A forced one, a bit of a forced one, no one is noticing yeah.

Speaker 4:

I think I put the story in my head and I'm like, yeah, that's forcing it a bit, so I'm just letting the audience know. But sometimes it's not like about content, but maybe there's something with like open source projects right, they start for free open source, they get bigger and then at one point this takes much longer than much more time. It requires way more time, and then people decide that they need to monetize this somehow right, and it's also a fair thing, which I mean. There are many open source projects that kind of go through that trajectory, right. But one of the things that could have been what happened with Terraform.

Speaker 4:

Oh yeah, see, that was very forced, that was a forced segue yeah.

Speaker 1:

Yeah, no, but I think what happens is that we were in a golden area where money was free and all the tech company were starting their project with open source and really permissive license, and so we're just there to help the community and just burn the VC cash money, and I think, again to the same extent, that some tech influencer need to pay their bills at the end. There is kind of this delusion happening now, where we see a lot of companies either increasing their price, their cloud price, or changing the license. I mean, openai is a bigger one, but Terraform, I think, was just kind of inevitable.

Speaker 4:

It wasn't that bad of a segue. I feel like I could have done a better job, but there was some connection there.

Speaker 2:

So what is the news around Terraform?

Speaker 4:

Why am I bringing this up? That's thanks for putting me back on track. That's because, well, for a quick recap, right. So Terraform changed the license. A lot of the community was really upset and I do think, just to highlight that, just to highlight, it is different from content online, because a lot of people contributed to terraform, because it was an open source project. So the people basically it's like, almost like you, meddy creates a course that is free and then I contribute the course, somehow I make a few uh lessons there, and then mel meddy says, oh, actually, now it's going to be paid. So I feel like that that would be more and I won't give you a penny and you won't give me a penny. So I think that would be a bit closer, right. So it's different, for sure. Um, so then people got mad. There's the open tofu, there's the season disease and now, uh, hashicorp. So the company that created the reform announced that hashicorp joins ibm to accelerate multi-cloud automation. So this is the the big one yes, what's your take on that, bart?

Speaker 2:

I'm curious um, it's a big one.

Speaker 4:

What's your take on that, Bart? I'm curious.

Speaker 2:

It's a difficult one. I think what we will see is maybe even more money going to the Terraform CLI. I think because there's a lot like IBM themselves, like the Terraform provider for IBM, they will invest more in it. I think they need to keep investing in it because all the other cloud providers they have a vested interest.

Speaker 1:

But do you see any customer with IBM Cloud in Belgium?

Speaker 2:

No, I don't, but I think.

Speaker 3:

IBM will invest in it for this reason.

Speaker 2:

I think the managed services of HashiCorp.

Speaker 4:

Like default and whatnot. Yeah, to be seen.

Speaker 2:

How it will play out in the coming years.

Speaker 1:

Yeah, so I've heard that 80% of their revenues were coming from like 20% of their customers.

Speaker 1:

So, basically, to put it simply, it was a really slow growth and you know things were going down and uh, and so red hat, you know, have been acquired by ibm and, yeah, this thing, where it's not, it's not a matter of like how many you know uh customer you have if it doesn't translate into paid customer and, most of all, like enterprise customer, which where you know the biggest revenue is going to come from.

Speaker 1:

I think this is where, um, red hat was challenging, but now it's doing, I think, pretty well in ibm and I think, uh, I should call. Basically, in the blog I link I had it on our blog they mentioned that there is two options Either they're going to be part of kind of like IBM cloud and do the things that you just say, like investing into the provider, or they're going to join Red Hat. To join red hat, um, and there might be some stuff licensed back to, uh, to the open source community, um, like to be seen, because vedetta has still like been operated, you know, pretty independently, uh, since they're john, but I mean, ashi corp is not as big as the red hat.

Speaker 1:

So it's a bit a question mark on what's gonna happen yeah, it's difficult to make.

Speaker 2:

Uh, it's difficult to look in the future. I think, what? What if we look at the open source community? If you bring up red hat, what everybody will also bring up is centos yeah that was more or less discontinued a few years ago that's true yeah, centos being the more or less open source version of red hat linux, that is, um, which was a binary compatible to red hat linux, um which a lot of people also attributes to the acquisition by ibm, but no one clearly knows, of course. But yeah, it's um yeah, I think.

Speaker 4:

Uh, indeed, on the blog here that they announced, they do mention that Terraform HashiCorp will be a division. I think right here somewhere. Yeah, basically a division within IBM, and they also mentioned the open source community. Let me see, where is it?

Speaker 1:

Yeah, I think we I meanrosoft is also a good example like abm is that we see that as a big, you know corporate, uh company not doing anything for open source. But you know, time has changed and microsoft is a good example where no one would have thought that they would be, you know the biggest contributor.

Speaker 2:

So who knows like how they, uh, they're gonna play their cards exactly exactly, and I think there that you can make the parallel action with hip hitup, where when the acquisition of microsoft off hitup happened, yeah everybody thought it was the end of the open source era of hit up like yeah, like the tomorrow everybody was going to use hitlab did it happen like it's still the default for open sources hit up and I think they're still very good at keeping that position very true, despite, or maybe even thanks to, being part of a big player like microsoft.

Speaker 2:

But yeah, I my own, my personal thing. I think terraform cli will see continuing for the foreseeable future. Yeah, the managed services, I don't know. I think there's also a lot of overlap. If you look at the, the big landscape of ibm, maybe they have some, maybe some convergence plans, uh, um, for the future.

Speaker 4:

But let's see and ibm is a winner in this acquisition.

Speaker 1:

I think it's a good move from ibm I mean now, yes, because it's a the economy is downturn. So I mean it was going down for ashi corp, but it's, like you know, when there are other companies that have been devaluated and being acquired. So for those big boats, I think, it's a good moment.

Speaker 3:

It's a good moment.

Speaker 1:

To do acquisitions Like we are like small citizen and we just buy, you know, stocks when they're low and they just buy a company when they're low.

Speaker 4:

Normal. Yeah, yes, true, true, true. And you mentioned um github art. Have you ever heard of this? Uh, github co-pilot thing?

Speaker 2:

I think I've heard of it.

Speaker 4:

Yeah, I think you've heard of it. Um, there's this new thing co-pilot workspace. Well new thing, I don't know.

Speaker 1:

Oh yeah, two hours ago, super new thing so the thing is that it's been announced at the GitHub universe in December.

Speaker 4:

Yes.

Speaker 1:

But, there was no, it was just a marketing teaser.

Speaker 2:

Okay, this is the first.

Speaker 1:

This is a technical preview now, and this is really impressive.

Speaker 2:

I put like what does it do? What is the proposition?

Speaker 1:

It has access to the whole code base.

Speaker 2:

Of your code.

Speaker 1:

base Of your code base Of your code base, so whatever you ask. So here you can ask, you know, to plan something, add, you know a button somewhere. It knows which file it needs to interact, to update and add a button to the app and so you can extend like, yeah, automated PR for for change or backfix, and I think it's like you know. You know, you heard about the Devin marketing thing and I think here, um, I haven't it's in technical preview, but my guess is that there is kind of like less promises but better commitment that what you know devin did where it was, like it's an ai engineer and like everybody's always going to replace everybody. But you see, the added value of github copilot, it's not replacing any developer, it's, you know, improving their productivity. Yeah, I think that's, uh, that's clear.

Speaker 2:

Next step. So marina has the teaser video here on the screen. Maybe for the listeners. Do you understand correctly from this Like you go to your hit the project page, you ask a question somewhere. Like you say like I have this UI, please add a button with this functionality and it's going to create a PR for you. Basically.

Speaker 1:

Yeah, but there is also integration within VS Code so you can ask the current chat and it will update the files. But this is, like I would say, the end to end use case, where you just go to the repo, ask something and he created the PR versus here versus. You could actually do like. I do stuff like this. I give context, you know um, you know um, I basically uh, change the files on prompting and I give like this is how my this file looks, like this house, I don't give everything. And then I say, can you implement this and where I should put the file?

Speaker 2:

no, those pre pre-prompt is not needed anymore because he has the full context because when I, when I code in vs code, when I use copilot like it's, I use it a bit as a very smart autocomplete, yeah, but it's always like in a very local way, yeah exactly, yeah I want to introduce this logic, but I'm now in this file, so I'm going to create a little bit, this little part of this logic, and then I'm going to move to another file and I'm going to add that part.

Speaker 2:

This is basically a way to hopefully do.

Speaker 1:

Yeah, that's why it's called like github co-pilot workspace it's just like it's covering. Uh, yes, I think I'm yeah, this, uh, I mean I, you can join the waiting list uh I just uh joined this uh this morning.

Speaker 1:

It's technical preview, so it's going to take some time, so I think uh before uh it getting in hands of multiple people. But yeah, I think it's uh could be nice. The first thing I thought about this is like documentation websites. I am also responsible for a documentation website and sometimes you have less technical people so everything is in git. There is pr. Sometimes there is people you know want to fix, typo or rephrase something. You give them access to the repo better, and not someone that's comfortable with Git. So you could say, hey, give me a PR, improve this typo on this, and it just do the commit on the right mark, then file and so on.

Speaker 4:

That's true, that's true, there's a lot of possibilities there. Yeah, and yeah. I also think that this smart code completion in a way that they'll move more towards, indeed, like the whole, you have the context of your whole project as a whole.

Speaker 4:

There's also just to be complete here. I have heard I haven't used this another AI assistant. I guess code assistant called Cody, it's from Sourcegraph. So I guess code assistant called Cody it's from Sourcegraph. So I think I don't know too much about it, but even the tagline here is AI that knows your entire code base, right. So there's a VS Code extension and it's, I think, sourcegraph. Also, the product is related to having the whole project as a graph, I guess, like the dependencies and whatnot.

Speaker 2:

So I do see like this is one, but this is if I understand correctly like this is this is uh, you're using this as a smart autocomplete with the full context?

Speaker 4:

yes, but you're still making the change locally.

Speaker 2:

Well with the workspace like you're you're changing it everywhere where it's relevant. I feel.

Speaker 1:

I feel so bad for all the startups really those projects.

Speaker 1:

Yeah, because the point is that get up even if get up gives like a half house packed features of this compared to you know, for example, this case they just have the integration to make it the developer experience like just delightful, right the same, with like vs code and dev container and all those things, so, uh, so yeah, I think it's going to be really hard to compete and I think in general, that's something I've been advocating in data in general is that developer experience over features Like how easy it is to use and understand your product. This is the number of features that there is, because there is a plethora of tools and when you pick a library, a framework, if it's difficult to understand, documentation is not clear for you, the productivity is going to go down.

Speaker 1:

But someone else also has to maintain that like you're not going to stay forever there or in the project, and so I think, in the same way, like I think, get up as a really big edge over there, yeah, but I think also features is probably easier to sell to the managers.

Speaker 4:

Like I think get up as a really big edge over there, yeah, but I think also features is probably easier to sell to the managers. Like I'm going to work on this feature and this feature because I do feel I agree with you 100, but I do feel like developer experience is something harder to quantify. But I've also seen projects that I feel like they have so many features that the the projects almost bloated in a way. Yeah, you know, like they do a lot of things. Nothing is amazing. They do a lot of stuff. It's like the Swiss knife like, or jack of all trades and master of none kind of thing. Right, and it's like to me sometimes I'd rather pick like five different tools and each tool does one thing really well and has a very nice experience. They pick one that does all the things as well.

Speaker 1:

Yeah, no, that's true, but that's the way, like, how you sell it to business there, and then there is. I really feel like, as you know, with your developer view, when you pick things.

Speaker 4:

Yeah, I agree.

Speaker 1:

And they're like, yeah, developer experience is just going to beat any features that it provides. Yeah. So I would say yes. 10 tools that everyone is super easy to use and do. One thing could be actually better than.

Speaker 4:

I agree. Maybe before the kerfuffle in AI part Kerfuffle I was just adding the notes live Before all the commotion Editing in production. Just wanted to share a bit about this as well, because you mentioned the code completion that Snowflake actually released LLM for enterprise AI. That's what they call it Efficiently intelligent, truly open. So this is from the Snowflake research, so I read this article as well.

Speaker 2:

They have such a great name. You can do so many things with it.

Speaker 1:

Arctic. Now it's.

Speaker 2:

Snowflake Arctic but.

Speaker 1:

Polar is not belonging to us, but that could with it. Arctic, and now Snowflake Arctic. Arctic, yeah, but Polars is not belonging to us, but that could be a good acquisition for them.

Speaker 2:

That is true, that's true, oh wow, yeah, like their M&A strategy is like purely based on.

Speaker 3:

Fortress game yeah, no, but.

Speaker 4:

TechDB makes more sense.

Speaker 1:

Maybe it's one part of the strategy of the creator of Polars right, maybe it's one part of the strategy of the creator of Polar right Way ahead of their game.

Speaker 4:

Yeah, this shit. So it's again. It's an LLM. I was a bit like, okay, is this a service on Snowflake? Apparently it's not. It's like a purely open source model, right.

Speaker 2:

And it's a domain specific model.

Speaker 4:

So it is domain specific, and that's the thing that I was a bit, because they mentioned Arctic is officially intelligent, truly open. I think they even made some more bold claims that it's the best or whatever. What do they mean by efficiently intelligent and truly open? Truly open is Apache 2.0 license, so truly open in that sense, officially intelligent. They called it enterprise ai because, according to them, this excels in tasks that are related to enterprise ai, which is sql generation or sql, depending on your what you prefer uh code, uh generation. So one of the benchmarks is actually what you're saying, part the. There's a data set that, based on doc strings or function names, you have to create the code. So this is the benchmark, and also I think it was like breaking down prompts into instructions, right. So that's what I was really looking into. So top tier enterprise intelligence at incredibly low training cost, right. So they mentioned here the data sets and I also looked into them to really see.

Speaker 2:

So they're also open datasets. They are Transparent, transparent.

Speaker 4:

Yeah like basically the benchmark. So here below you can see the scores.

Speaker 2:

Oh, these are the benchmark sets.

Speaker 4:

okay, yeah, yeah yeah, and for those they used these datasets right, and basically they called Enterprise AI an average between SQL generation using the Spider dataset, coding, which is human, eval plus and MPP MPP is most basic Python programming or something, and instruction following so if eval, which I think is cool, I think. But to be honest, they do have other differences. On the architectures right, and one thing I thought was pretty cool is that they do have a series of blog architectures right, and one thing I thought it was pretty cool is that they do have a series of blog posts on dissecting lms and what is their use case for this?

Speaker 4:

it's really just to create sql. I mean, they're really saying like this is I think they're going to integrate this on the ui.

Speaker 1:

I know what their use case is marketing I think, I think like, if you, if you miss the ai train right now just by sharing free content or doing free content like this, people are not going to perceive you as like yeah, are you following this up? Yeah, and so they've been, they've been they've been integrating stuff in their UI on SQL generation and so on. But I think today, like a lot of other data companies, it's mostly about awareness and we'll see how it gets. But yeah, I'm a bit skeptical.

Speaker 1:

I'm happy to try it out regarding SQL generation, because I think SQL encoding like we worked at the mothercom text to sequel model and that's really specific compared to coding, because coding your prompt is going to be much more accurate, easily as a software engineer that say, can you give me a class or whatever, but sequel, you need to have you know a good metadata catalog to feed, to say, yeah, what's the you know churn this month, for example, and I think Databricks also released last year something I don't know the name, to train on company data, right Basically to have all the semantic information.

Speaker 1:

But again, if you don't feed that cementing and you don't define what is a customer for you and there is a manual work to be there. So, yeah, I think there is a lot of you know smoke and mirrors over there, but I do think it's nice. I think it's nice that people like just see the blog. First thing is AI. I know it know it's sorry I mislead at snowflake, but I wouldn't be surprised that people start to just highlight more um, you know ai worked at different companies?

Speaker 4:

yeah, I think so, but I think, looking so just to, I do believe that part of the reason why they try to build this as well was to try to integrate it in their UI, like how to complete it as well, because even looking at the benchmarks and the metrics, it's only Python datasets, like writing, coding, python or SQL, which is exactly what Snowflake has.

Speaker 2:

And I think what I would have interest in is not necessarily generating the SQL, but more asking a question, not caring about the SQL, and I think they have the right product to be able to do that as well, to put it in front of business users and say how many customers did we have last month. But for that you have the challenge that you just mentioned. You have these tables with very often very, very undescriptive column, like every abbreviations of column name. So you need some metadata on what does this column actually yeah, actually contain? Right, like the column is ctr.

Speaker 2:

Underscore name maybe means customer name, maybe it means click through rate right you need some extra information, and that's that's indeed something that, uh, when it comes to a code, like you can extract the logic from the code around it. Yeah indeed, but you don't have the pipelines coming before that.

Speaker 1:

Yeah, and you don't really need. You don't have this hard dependency on data, which? Is different, and I think this is where we're getting into the delusion. Phase right, we started with the hype where people say, yeah, this is not really working. Yeah, phase right we started with the hype where people say, yeah, it's not really working yeah, no clue if your data model or your data is, you know, is not working out.

Speaker 1:

Uh, you know your output is not working out, but still it. It is interesting to see that it starts to be embedded, you know, uh, here and there. So, yeah, I'm really like looking forward that those are being pushed back to the product and especially with, like those smaller models coming in, yeah, that we we really start to have, you know, lower, lower cost usage of this within your existing data warehouse product.

Speaker 4:

Yeah, yeah, just I also put on the screen for people following the live stream. This is kind of how the data set looks like, which is very similar to what you were mentioning, like there's a net question natural language like what is the maximum minimum budget for the departments? And then there's a sequel query this is their benchmark asset yeah, probably. I mean, I don't know if they tested them well, if they trained on this.

Speaker 1:

There's leakage, right, but uh, yeah I mean, like you see the colon, like here max, like just the line, yeah, budget and billion.

Speaker 2:

That's the column name like that's like super obvious column name. That's not real life. Right, if you've ever looked at a at a table coming from sap, that is not real life exactly, yeah.

Speaker 1:

So yeah, that's. That's actually interesting to look.

Speaker 4:

That could be like an entire podcast or blog, like let's look at the train data set and how real it is, like close to real life, yeah I mean even this, like the, this is the, the python stuff like they put m mbpp and it's like whole coding right, like code intelligence, and but even here is like the like it's most basic python programming. So it's like, yeah, okay, it's not, but their audience.

Speaker 1:

I don't think their audience is pretty advanced in python in general. I think even if they are, it's mostly an opportunistic. But I bet that most people that have python pipeline there, if they move to snowflake they might be, you know, just saying okay, we have, it's a small company, or they don't want to have another compute runtime. True, but yeah, I bet it's mostly data analysts that do a bit of Python or some data engineer maybe, but so for them I think that's good enough. True, true.

Speaker 4:

But I think for me, after reading this and thinking because they also talk about the inference cost and all that and the number of parameters but to me it really just feels like they specialized on one domain and that's why it's higher performance. Yeah, I think that's also natural right. And more on AI, namely AI kerfuffle what do you got Bart?

Speaker 2:

some legal commotion. I think. Interesting piece of news. I think it's from yesterday. Let me just quickly check from the Financial Times that the Financial Times and OpenAI have a licensing deal, a content licensing deal. Do you have a subscription?

Speaker 4:

for Financial Times.

Speaker 1:

That's the big question of the podcast.

Speaker 2:

So what?

Speaker 3:

is it here? Oh, I can actually open it.

Speaker 2:

I can actually open the article without subscription yeah um but uh. So it's an article on the financial times but we'll link another uh where there is no paywall.

Speaker 2:

But this is interesting, especially with uh in the, the legal uh stuff going on in the background, with the new york times suing both openai and microsoft for using their content without a licensing deal and basically what happens here most likely because we don't have a lot of the details is that this is after the fact that probably openai also used financial times data and now they have struck a deal that where there is a mutual benefit and there is very little information on the exact terms, but it allows openai to one use their data for training, but also to to use live excerpts, live summaries from articles to answer queries in chat, gpt do you know the requirement from from the times?

Speaker 2:

like to quote maybe the source, because it's not in the article and I don't think it's very new, so we will see the implementation in the coming weeks I had two big thing happening me.

Speaker 1:

This, like I saw last week, I was chatting on concept around lms, okay, and uh, chat gpt refer me to a blog of someone in my direct network that I know. Oh yeah, like it was cool With a quote. I ask usually my prompt like, if you have source, give me the information. But it was actually a blog from a friend that I know and I read that blog so I felt a bit stupid. And I saw another tweet from Gravtex. I know someone over there that said that, um, they get leads. People are recommend, like chat GPs, recommending graph techs the company product for for specific use case. So it's kind of like the new SEO you know game within but no one knows Still how it works, is still the black box. So I think for times it's, it would be, uh, you know, still a big deal. If you like coding and tend to like get like one month for free, uh, you know, if you click here I'm sure we are just a couple of months ahead before getting ads within the chat let's see.

Speaker 2:

I actually in the. There was a very recent interview with Sam Altman. Who was? It A long podcast interview. I forgot who the podcast host was, but he actually mentioned this advertisement, that he was very against advertisement and that he wants a model without advertisement. But let's see what the future holds. Right yeah.

Speaker 4:

Who knows?

Speaker 2:

Who knows. But it's interesting because it's apparently and I didn't know it's the fifth deal this year which where it's where OpenAI struck a deal with a press agency. So it has similar agreements with Associated Press, with Axel Springer, a Germany-based news agency, le Monde in France and Prisa Media in Spain.

Speaker 2:

Oh, wow, spain, all for which we don't know any financial terms, and we yeah, this is a bit scary, I don't know like that, they don't share anything this is a bit scary, but I think it's also for everybody, like in this space, a bit scary in a sense that what the New York Times is doing can go either way. Right, because what the New York Times is doing is that, no, what OpenAI is saying is that their use of this data is under fair use policy and copyright law. That's what they're stating, and New York Times is saying, no, we want to take this to court, we do not believe that this is fair use. But if the court decides that this is fair use, then there is a precedent to say we don't even need all these collaborations with these players.

Speaker 3:

We can just go for it.

Speaker 2:

So there's also maybe, like there is this gap period where there's also an opportunity for these companies to partner with OpenAI, because now there is, like this uncertainty, where there is also an incentive for.

Speaker 4:

OpenAI to partner. Yeah, that's true, that's true.

Speaker 2:

Yeah, I feel like there's a lot riding on that decision. Yeah, and I'm very interested to to uh to see where we uh, where we will be in a year from now.

Speaker 1:

Yeah, because all of this, these, these legal procedures take a lot of time, yeah, and they're not ready. I mean to license stuff I know like, for example, in another area. It's uh, I don't know what's the name of the company, but it's's, you know, the deep fake, tom Cruise. The guy created the company, yeah, metamorph or something.

Speaker 2:

Yeah, he made a few of these.

Speaker 1:

But there is one company which is the best in deep fake advertisement.

Speaker 1:

So they license and they did a lot of research and work on how to license the face of someone because this is where the opportunity exactly as mentioned so that someone from basketballs can say, yeah, you can use my face to create video ads or you know our voice and so on, but there is nothing legal that's been, you know, built around this, and so I think like, yeah, that's a, that's a lot of work. I think also what I seen in fintech, for example, is that it's also a super struggle to be the first one and work with government and legal procedure so easily.

Speaker 3:

If you're the second is the best, because all the other things have been there and you just yeah and you just uh, do better that.

Speaker 1:

Your competitor is sometimes like they haven't. You know they are their first, so that's the their marketing advantage. But uh, yeah, I think it's. I think the point is, um, if open ai like we see all those deals that being done is it's gonna be, you know, maybe an healthy, more market where other companies like co-air and so on is going to be able to to build those relationships with different people?

Speaker 4:

you know, major news paper yeah, let's see, let's see and maybe on the most awaited piece of news yeah, because this is a very interesting segue.

Speaker 2:

Actually it's very long. So we uh we discussed last time actually that there is currently a big hip-hop beef going on, a bit of a fight between kendrick and drake uh, too much niche to go into here. But we we brought it up last time because uh drake, in his diss track used uh ai generated uh, used a ai generated voice, so he transferred his voice. No, well, he well, he sang something, rapped something and transferred someone else's voice, both to Tupac's and to Snoop Dogg's. And the interesting thing about the hip-hop beef is that now that every time that something comes up, there's this discussion is this real or is this AI generated? People don't know anymore. It's so good.

Speaker 2:

And now that Drakeke used these uh deep fakes basically and it's a bit similar to the main that you were just saying midi so, uh, he used this deep fake of tupac shakur, uh, who is uh passed away already a long time ago, but his estate basically threatened legal action against drake, yeah, and Drake took it out, took it off by now. That's what happened. But the difficult thing of this is that deepfakes voice, but also like face recognizability, like is the I'm definitely not an expert in this, but how I understand it is that the person's likelihood is not covered by the Copyright Act, so there is even less of a legal framework around. A person's likelihood is not covered by the copyright act, so there is even less of a legal framework around a person's likelihood yeah, and that here there is.

Speaker 2:

Also. You have a bit of this same fear, like no one really wants to take this to court because even much more than than with copyright, like if you, if a court would rule that, for example, drake's use of tupac's voice is allowed, then you have, like, this huge economic shock to the music industry yeah, that's true so everybody is very. There's also a rumor that that is why there's no actual legal actions.

Speaker 2:

There's just a threat of going into legal actions yeah um, because everybody's super hesitant to move to getting a president, because no one knows what. Yeah, what's going on?

Speaker 1:

yeah, so it's mostly just to have a settlement. Yeah, a bit more money. Yeah, there is a lot of stuff. Uh, I think there is a.

Speaker 1:

On youtube, there is a channel I really like which is uh, there are wind, uh, wind it, and it takes like uh common, sound like michael jackson and do uh like a country version oh, I heard that, yeah, it's really good and uh and that's really fun uh thing, but it's true that there is no legal framework for for this around around music and we'll need it, um and and there is also a dystopia where people say no, we don't want to have a new hit from Michael Jackson, which I found like yeah that's probably, I think, to coming back initially to the podcast that I said sideways. Like you know, transparency is key here.

Speaker 2:

So I think, if you mentioned.

Speaker 1:

Like this is AI generated and you're okay with that, then that's fine, but yeah, you need to be aware. Mentioned like this is ai generated and you're okay with that, yeah, uh, then that's fine, but yeah, you need to be aware, because otherwise it's uh, it's not fair to the consumer, it's not fair to the you know, original artist. Yeah to to say, and I think it's like you know, like you get some stuff from ikea or you buy like homemade furniture, we're gonna get homemade music, homemade video, because a lot of things is gonna be generated by ai right and you, you also very quickly get into this, uh, into this ethical discussion, like with, for example, deceased artists like

Speaker 1:

to push the core.

Speaker 2:

What if the estate sells his likelihood to and that the estate is okay for? It to be used, that they streak a licensing deal, like I mean, there's a very ethical component to that as well.

Speaker 1:

Like yeah, but there was an article to say like would you watch a podcast with abraham lincoln, like ah? We have a question, like discussing philosophy about like today's society. With his experience like this is really crazy, but we, we could you know potentially do that and it's like which legal framework you're gonna put that it's, it's pretty difficult.

Speaker 4:

That was also one video, I think was like joe rogan, have you a podcast with steve jobs? Yeah, yeah, after as well, that got famous, but yeah, it's tricky. It's like, uh, because I also think it's like if you had abraham lincoln, you really would like the person to be there. Right, like the ideas and the it's not just the image and the voice, right, it's like yeah.

Speaker 1:

But in the with the metaverse, you're going to be there.

Speaker 3:

Yeah, yeah, yeah.

Speaker 4:

Yeah, it's like. It feels like this is the beginning of a black mirror episode. Of course, yeah, why? Of course, yeah, of course. Why do you think I have?

Speaker 2:

a dart. There it all comes together. Yeah, let's see, that's what I'm saying. Like I'm very interested to see where all this uh from the moment that we have legal precedent and what it will actually mean, and to for companies also to have a bit of a guidance on, uh, what is okay, what it's not okay to do. And today I have the feeling like 90% is gray zone.

Speaker 1:

Yeah, it is Indeed.

Speaker 4:

I think this is a good place to wrap up, If everyone's okay with that, unless you have something that you really want to.

Speaker 2:

Do you have a hot take?

Speaker 4:

There is a hot take. You want me to hit the button there.

Speaker 2:

Oh yeah.

Speaker 1:

Oh, hot, hot, hot, hot, hot, hot, hot, hot, hot, hot, hot, hot. This is a hot take that many brought. Yeah, actually it's a really small project, but, um, so it's mostly just to express my, uh, my opinion, because it's a really small project, but I haven't seen prompt data engineering, uh, so, meaning like embedded in the tools of data engineers.

Speaker 1:

I think it's coming Like actually we talked a bit with, like Snowflake, but where basically I'm writing a data frame and I say so. I think there is an example here return the country. So I have a data set with the key you know from a key code, from country, and I want the swing of the country. For example, we can do that with lm. I think the limit today is like we, we are relying on third party api calls, yeah, and the size of the lm is still a bit too big. But I think we we haven't, we haven't seen anything invested into the data engineering landscape with LLMs.

Speaker 2:

I saw one thing, but you're talking here about, like querying a data set or really setting up data pipelines from one source to target.

Speaker 1:

Yeah, I think that, but I think really just, you know, helping us to create data sets with.

Speaker 2:

LLMs directly. Okay, okay.

Speaker 1:

It's like remove any punctuation problem rather than having us to look at the data set and encoding those things for example.

Speaker 2:

So saying in natural language like please create this data set with per month, the disk api, these values, use these sources for us and keep it up building in the tools that we are using today so that, because today we still like write manually, like monkey, like it's like okay, do this do that and we copy paste, but it's not integrated.

Speaker 1:

I don't see like. For me it's like I'm calling a function and this function is, you know, doing magic stuff behind the scene, rather than me having to.

Speaker 2:

And how would you see that to be really like, like, what kind of tools are you thinking about?

Speaker 1:

spark or only the reframe libraries. I think it's good to to embrace that because, like, that's more function based, so you can say yeah, uh, you know, remove uh nuls if there is any, and so you may have a column, you know, which doesn't have news. Have zero, yeah, and it gives you a report back and add the code base. I say so, yeah, a bit more interactivity and and and look to help data engineer, rather than having a context switch where I have a chat box. I have seen like actually spark.

Speaker 4:

Yeah, the english sdk exactly. So I think this is a good example. So this, I mean, I actually heard about it, but you still need an API key, right? So it's still like a plugin in a way. I haven't used this myself, but you have like the Spice Park AI, you activate it and whatnot, and then you have a similar, so this is just SQL, but then you have this DF, so the data frame, ai transform, and then you have basically a prompt.

Speaker 1:

So this is one example. Actually, this is the only one that I saw. I should have mentioned this one because I saw the small site project.

Speaker 4:

But this is the only one I've seen as well. Yeah, and actually I've seen it, but I haven't. I saw this once and I haven't seen any updates or anything.

Speaker 2:

But this is also, arguably, even though we discussed a lot of challenges with metadata before like this is the easy case. Like you have a data set and you want to query the natural language.

Speaker 4:

Yeah.

Speaker 2:

Like so you behind the scenes, you build up some SQL to query that right? If you have metadata, you can assume maybe we're going to cut relatively close. But it's much harder to say. I'm going to build this data pipeline, I want to stream that from there to there, I want to fetch these data sources like yeah, I see what you're saying.

Speaker 2:

So like you just make a prompt and then you would actually do the data sources, join the stuff and already transform and push in the other one yeah, like, for example, if you would, if I'm just thinking about how I would do that, like if I'm, if I'm building a spark based data pipeline in vs code. What I would like to have is to that there is a lot of context on what are the sources I'm connecting to. Yeah, yeah, what are the things like? Where am I writing out to? Like, like, what is my orchestration tool? Like, how do I deploy them, orchestrate, what options do I have there? Like to have all that in your context, basically so that you can prompt for that yeah, but I still think there there isn't.

Speaker 1:

I haven't seen anything. I mean this is a small thing but I think that the technology is not maybe ready to be embedded in the tools. But uh, yeah, I would love to like. Sometimes I'm like, oh, I need like uh country code or I need uh, um, you know, celsius to fahrenheit conversion. Yeah, and I build those things and I'm like I should be able to just ask it like that's a really common thing.

Speaker 2:

I think a things and I'm like I should be able to just ask it like that's really common thing I think a very fair point is what you mentioned like typically, what do we miss in this whole chain is like rich metadata. Yeah, on data that you have yeah, yeah.

Speaker 1:

But again there is like a small thing where, like celsius to fire net, like that's yeah so consider should be a function and yeah and it's built in and, I think, building like a repository of like common data tasks and cleaning up, you know parsing, or you have an address and fuzzy matching, all those kind of things that you use like you should just have one like prompt and should be doing for you. Regex yes that's like, yeah, that's. I think, the first thing I I started to use chat. Gpt was for reg x.

Speaker 4:

Yeah man, I never I, I yeah how often have you visited uh? Reg x 101.com every time I use reg x, like 100 of the time I go there. But like, really, like, I really avoid reg much as possible if I need to, even if I go to gpt. How do I do this then? Oh yeah, you can use this reggae expression.

Speaker 2:

It's like no, I'm gonna look for some other way, like I just can't there was a one point in my career where I thought I don't really read any track is one I want to like, where I had the feeling I can become a good at track, and then I thought maybe I went too far. Yeah, maybe I need a change of pace.

Speaker 4:

You're bragging to your friends and then they're like everyone's looking at you weird, like use that much yeah, no I'm not a big fan of ragics, but cool, let's see. Maybe there's a side project there to uh, build some more stuff. Yeah, the lm is it true, true, true, and I think with that we can wrap up. Actually, maybe just take a couple steps back. Maybe we can end the podcast with a different outro. Maybe I know you talked about Genai music and you released a very tear-dropping song.

Speaker 2:

Oh, not sure if I can do this on the spot. I generated a song on the yeah, yeah sunoai and it is, it is crazy good.

Speaker 4:

Yeah, yeah, it's crazy good. Do you have anything you can share already?

Speaker 2:

I think, so I hope that the people will hear it maybe I'll also put sunoai on the screen wake wake.

Speaker 3:

Are you ready? Yes, you hear it right? Maybe yeah, but just real quick.

Speaker 4:

This is the website that you use, right? You just put a prompt and they'll create a song for you.

Speaker 2:

This is what you did and it is, but like with super minimal efforts, like I'm very uh, like uh. If this is where we are today, I, I it's. It's hard to imagine a world where we can distinguish this from actual songs in two years from now. Yeah, all right, play it. I'll play it.

Speaker 3:

I'll play it Bye. They move in harmony symphony. They pursue Great song.

Speaker 4:

That's great.

Speaker 1:

That was great, that was great.

Speaker 4:

Yeah, I think it's a hit Billboard top of the charts. I listened to the song on repeat. Now, for some reason, it spoke to me, I don't know. It's your song and you saw that. He talked about the two dogs that this guy, the hero of the song, has, which, as Mehdi knows, Mehdi thanks a lot for joining us it was a pleasure super cool to have you here yeah, I'll see you around.

Speaker 1:

Hope you had as much of a good time as we did. Of course, if there is ducks, I'm out.

People on this episode