DataTopics Unplugged: All Things Data, AI & Tech
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
DataTopics Unplugged: All Things Data, AI & Tech
#63 What’s Next for Open Source? Astral’s business model, WordPress, Deno 2.0 & One Year of DataTopics!
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. DataTopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
In this special one-year anniversary episode, we reminisce about our journey and dive into some intriguing tech stories:
- WordPress Governance Drama: We discuss recent issues with WordPress. Find out what’s behind the Automattic and WP Engine tension.
- Astral’s Business Model: Charlie Marsh shares insights into how Astral plans to balance open-source ideals with profitability.
- Deno 2.0 Release: Deno 2.0 claims to be a “Cargo for JavaScript.” Check out its new features and see how it compares to Node.js.
- OpenAI’s Soaring Valuation: OpenAI has hit a staggering $150 billion valuation after raising $6.5 billion in new funding.
- Adobe’s GenAI Policy: Adobe clarified their stance on GenAI, ensuring Firefly is only trained on stock images to support creators.
- Instructor Library for LLMs: Discover the Instructor library for turning unstructured data into structured outputs with ease.
- Repo2txt Tool: Convert your GitHub repo into a single text file using Repo2txt for easy analysis.
- Retro PC Fonts Galore: Explore a treasure trove of vintage fonts with the Ultimate Old-School PC Font Pack.
- Bop Spotter – Cultural Surveillance: Bop Spotter uses Shazam to capture the music trends and cultural vibes of San Francisco’s Mission District.
You have taste In a way that's meaningful to someone. Same jingle.
Speaker 2:Hello, I'm Bill Gates.
Speaker 1:Someone's doing too many items that I didn't have time for. I was back in Important stuff. You know A lot of government usually gets slightly wrong.
Speaker 2:I'm reminded it's a rust here. Rust.
Speaker 1:This almost makes me happy that I didn't become a supermodel.
Speaker 2:You think the Rust stuff is?
Speaker 1:too old now. I need something flasher.
Speaker 2:Well, I'm sorry guys, I don't know what's going on.
Speaker 1:Thank you for the opportunity to speak to you today.
Speaker 2:I like Rust.
Speaker 1:It's really an honor to be here.
Speaker 2:Rust Data topics. Welcome to the data topics. Welcome to the data topics podcast.
Speaker 1:Hello and welcome to Data Topics Unplugged, your casual corner of the web where I discuss what's new in data every week, from gaming to blogging, everything goes. Today is the Monday, october 7th of 2024. My name is Murillo. I'll be hosting you today together by trying to get this stuff together. Bart Hi, hello, bart, behind the scenes we have Alex. Hey, alex, yeah, okay, we need to get your mic, alex, so at least people can hear your cheerful voice. He's happy to be here, maybe. First things first, we have a slightly different decor.
Speaker 2:Decor. Yeah, for the people watching the video it looks slightly different than otherwise.
Speaker 1:Yeah, I would say so.
Speaker 2:A bit different. Maybe it is, it's our, it's a one-year. The Data Topics one-year anniversary. Data Topics Unplugged Data Topics Unplugged bit different. Maybe it is our, it's a one year. The data topics one year anniversary.
Speaker 1:Their topics unplugged. Unplugged yes different versions before indeed, we had different versions before. You want to go back? Take it back. Do we have the harp? Like you know, like once upon a time, you know, yeah, oh, wow, take it away, bart, how, how? What's the story of?
Speaker 2:the how did we get here? Uh, I should have prepared a bit because I don't know. The timeline is a bit vague for me. But we started um with uh it had a different name back then tour the tools, which um was more or less a monthly episode where we interviewed someone that was either involved in a startup or was building a it was a library maintainer or something like this.
Speaker 1:Uh, that we had a bit of a casual interview with a lot of times was a demo right, so it was a lot of like.
Speaker 2:The video component was also always there, right, exactly, exactly, yeah, um, and that was uh, more like started relatively frequently, like, say, monthly, and then we became a bit more ad hoc and we restarted again and then, uh, in october of last year, we uh together re-engineered it a bit to have the data topics unplugged. That is our casualitchat, which is more or less? Weekly on everything that is new in data and AI.
Speaker 1:Yes, and I think that between there was also a bit of data topics plugged.
Speaker 2:let's say yeah, it was also another attempt Well attempt, I think we had a few successful ones, with a lot of listeners that were very well prepared, topical discussions with experts indeed indeed, um, but took a lot, a lot of time to prepare and this is not hard to really have that momentum going indeed indeed this is not something we do full-time by any means, right, so it's more of a hobby project, right, yeah, yeah, it still is.
Speaker 1:it's fun, fun hobby project. But and I still feel like there was a lot of interesting discussions we had stuff we saw in the news and I think we also had that like, oh yeah, we can just discuss this with people, right? I think it's something that we would already do or we were already doing, right, discussing new things. So it made sense. So here we are, one year unplugged and, yeah, we moved it to weekly. Actually, I think we started biweekly or no.
Speaker 1:We switched very quickly to weekly, it's been a year and it's crazy, like the volume of episodes, because I think already, like the well, we can actually look up Datatop topicsio. This is the last episode, but if we see all the episodes here, the first one was we should have done this sooner. Yeah, wow. The first one on buzzsprout right, because to the tools used to be a youtube thing yeah, right, and there was a little break and then came back. This is march 25th of 2022 and the what was it's like? Ai powered AI powered Coca-Cola. Yeah, this was the first unplugged episode, which was October 2nd Exactly, or now October 7th Exactly, so we passed the one year anniversary, so hence the decoration, and actually, like One year older, one year wiser, exactly, time flies up when discussing the news of data, ai technology. Great company, right, bart, right, right, alex, yeah, yeah, yeah, okay, great, indeed. So quite a lot of stuff and actually in this one year, I think most of the episodes are already on the unplugged style, right, yeah, but we are already planning for this next year to have a bit more of a deep dive, so something a bit more like what data topics into the tools was Right, so stay tuned. We have some cool guests already, not as a replacement, but as an add on, exactly as an add on. So, if you like this as well, feel free to reach out to data topics at dataio. Yeah, if you know any cool guests as well, feel free to reach out, and that's it.
Speaker 1:So now to the business at hand, right? Um, wordpress, maybe I thought we could start here. I think, uh, what, what? What did you put on the show notes here, bart? We shouldn't, we should not talk about wordpress. So next topic uh, no, but why shouldn't we talk about WordPress?
Speaker 2:Because everybody's already talking about it. Okay, I don't think we can give more information on what is happening.
Speaker 1:But maybe for people that if it turns out, this is the first time someone is hearing about it. Can you give a TLDR?
Speaker 2:I'll try, but there's a lot of angles to this.
Speaker 1:Well, maybe even take a step back. I'm assuming most people know WordPress, or I don't know if it's our niche as well. Do you know what WordPress is? Okay?
Speaker 2:So WordPress is a content management system, I'd say, to easily build websites. Roughly 40% of the public websites runs on WordPress, which is huge. It's a lot. It's a lot.
Speaker 1:It's a lot More than it feels like more than you should, I like, feels like more than two.
Speaker 2:That's true, Okay, and I'll try to give a summary because I followed it here and there, so roughly two weeks ago. I want to say that Matt Mullenweg Mullenweg I don't know how to pronounce it he is the founder of WordPress. He's also one of the directors behind Automatic, which is the commercial company behind WordPress. Automatic has its own hosting service for WordPress, which is WordPresscom. Wordpress is open source in itself, or no? Wordpress is open source. Yeah, so WordPresscom is the website, the hosting services that are being offered by Automatic, a company that Matt is behind, that he founded. Then you also have WordPressorg, which is the quote-unquote non-profit organization to govern the whole open-source aspect of WordPress, and Matt basically put a statement on WordPressorg, which is important here. Matt basically put a statement on WordPressorg, which is important here, that basically said that WP WordPress Engine WP Engine, which is another hosting service of WordPress, which has become very big and rumors says that revenues surpass Automattic's own hosting revenues.
Speaker 2:Okay, so it's a very big player, and Matt went very public saying that they're a bad actor in the community because by default and that's an explanation by default they disable a number of features that make WordPress what it is. So, for example, they disable the ability to have revision history stuff like this. He also called out that employees of WP Engine are not allowed to contribute to open source or not allowed is maybe overstating it they don't get the time to contribute to open source. I see this triggered a lot of discussions in all directions and I think what is most in the public eye now is that the governance of the open source part of WordPress is not what it's. What is most the public eye now is, like the, the governance of the open source part of wordpress is not what it should be, because matt, as founder of automatic commercial company behind wordpress, should not use the platform wordpressorg to make these kind of statements so it's like just just because he has influence, he just decided to state his opinion.
Speaker 1:But in that sense he's just someone giving his opinion. He doesn't have any other authority. I guess he has a reputation.
Speaker 2:Well, that's the problem. He has a lot of things to say at WordPressorg as well, which makes that maybe there are not the right incentives at play here to correctly govern an open source community.
Speaker 1:Yeah, and I think also well, again, we don't want to beat the how does what's the same? Beat the horse? No, something to that. Beat a dead horse, yeah, anyways. Uh, it's that, um, because we talked a lot about like open source, sustainability and open source versus commercial projects and all these things, um, but also, yeah, like it's an open source project, right, so people could argue that it's fair game that they're doing whatever they're doing.
Speaker 2:Yeah, Right, so, and I think we should maybe leave it at that. There was actually a very interesting article which we will link in the show notes, by uh uh, to say uh, which is uh, the founder of Drupal, which is the, I think, the WordPress of the corporate world, which has a stance on, or an explanation on how they handled it the makers versus takers challenge.
Speaker 1:Yeah.
Speaker 2:Okay, interesting, which I think this is a bit about. And if you say it's fair game, then you're a taker.
Speaker 1:Yeah, indeed, indeed. I think there was also the elastic search thing that happened over the summer that we didn't cover, but we did notice as well. Right, that Open Source sorry, open Source Elasticsearch I think it was Open Source and it was like MIT. And then I think AWS was a taker in that sense, and then they switched the license so it was not Open Source anymore, and then AWS had diverged a bit from the Elasticsearch because they couldn't rely on it anymore. And then Elasticsearch, they were saying like no, we were always open source at heart, so they went back to open source, and now there's a big discrepancy, deviated enough from AWS engine to Elasticsearch that they said, yeah, it served its purpose. So I thought it was an interesting thing and even made me think because we talked a lot about the OpenTofu stuff made me wonder as well if this could also be a potential move from HashiCorp in the future. Let's see Interesting stuff. Interesting stuff, maybe, talking about open source and business models.
Speaker 1:One thing that I came across was also the Astral Astral. Do you remember the Astral part?
Speaker 2:Astral is about the company, but I know it's as the github organization behind uh, uv and rye. Yes, you're gonna rough right yes, and rough indeed indeed.
Speaker 1:So it started with rough. So this is just uh from simon willinson's uh blog and, um, basically u. Uv is making a lot of noise in the Python community. A lot of projects are moving to UV, including FastAPI and all these things.
Speaker 2:You're a believer, right A believer.
Speaker 1:I don't know, I feel like I'm not, don't crawl back. I think I use it Today. That's what I would tell everyone to just go because but not because it's my opinion necessarily just because I feel like everyone's doing it and I think we need to have consensus. And I'm just saying that compared to Rai, for example. But anyways, I digress. There was some hesitation. Right that UV is that. Uh, uv is part of astral.
Speaker 1:Astral was a company that was founded, so the creator, you're saying uv as a, as a package manager is a core component in the ecosystem yes, so yeah, maybe backtracking a bit uv is, uh, it was like a pip tools replacement, but now is a package manager, so is in the same equivalence of poetry of pm, of a hatch, all of Hatch, all these things, right, and it made a lot of noise. And also, the thing that stands out is that it's written in Rust, right, and even advertised itself as the cargo which is the Rust package manager for Python, and a lot of people are very enthusiastic about it. But some people were a bit hesitant and one of the reasons why people were hesitant is because Astro is a private, like, is a for-profit company, right, they made a lot of money, they had a lot of investors, right, and they were a bit hesitant on what's going to happen, right, am I betting my whole project, infrastructure and templates and all these things on? Maybe they'll be private later on. One of the comments from, I think, armin so that's the creator of Rai he said well, even if Astro does pull the rug, right, they just make it like private again. At least you can always fork the project right, because the project up to this state it is open source, so it's still a big step forward in the community, so people shouldn't be too stressed about it, right? But Charlie Marsh. So he's the creator of UV, he's the founder of Astral and all these things.
Speaker 1:He discussed a bit. What's the Astral's business proposition, let's say? And he said he doesn't want to charge people to use the tools. And this is I'm just quoting from from him on macedon. Well, according to the, the blog here, I don't want to charge people money to use our tools and I don't want to create incentive structure whereby our open source offerings are competing with any commercial offerings, which is what you see with a lost of hosted open source SaaS business models. What I want is a good software that vertically integrates with the open source tools, blah, blah, blah. So I'm not going to go over the rest.
Speaker 1:So, basically, he said he doesn't want to have his company to be the managed version of this, but then, because it's open source, you create the friction because, even on the WordPress example, there are two companies that host WordPress websites. What they would like to do is to have it's open source. You create the friction because, like even on the WordPress example, right, there are two companies that host WordPress websites. What they would like to do is to have this as open source but then create other tools that those are private, that integrate with these things. And which are these tools? I don't know.
Speaker 2:He doesn't mention it, at least not that I could tell.
Speaker 1:He just hopes to have an ID in the future. Well, I hope he. That's what he mentioned, right?
Speaker 2:so an example he gives here is to have an enterprise focused private package registry which kind of exists already.
Speaker 1:no, like an Anaconda alternative, yeah, but I feel like there's also. There's a Nexus, I think is one, I think there are a few already, but yeah. So basically he wants everyone to be using UV and then he wants to create something private that integrates super well with UV and that would be for sale. Basically, okay, let's see.
Speaker 2:Yeah, I'm not sure how optimistic I feel about it would have been more interesting if there would be a clear plan like we're building this, yeah, like by March this is the roadmap.
Speaker 1:Yeah, by by march, you're gonna see something like this x and y. But also, if it was a bit like, why would you like what? Yeah, okay, you create a uv, but any company can create something that integrates uv if it's uv's open source, right, it's not like I don't know for me, like it doesn't I mean, but as an enterprise customer, if you are going to depend on a private repository, for example, if you're willing to pay for this private repository, you're probably going to trust the company that has the biggest package manager Like out there.
Speaker 2:Biggest community following and I think there is something to.
Speaker 1:And then we go back to another episode that you had. I guess in this case open source is just marketing.
Speaker 2:I think how they're doing it now.
Speaker 1:yes, they're building a huge community Because they're saying like the building community has their eyes on it.
Speaker 2:now. What is that still going to do? Exactly, and people trust it. It's like a good product. Without this, no one will be looking at the restaurant Exactly. So in this case, so open sources.
Speaker 1:Yeah, marketing. Yeah, actually, and maybe this is for people that are Good to see you People that are wondering what's the inside joke here. Is that November 6th 2023.
Speaker 2:Oh wow, it's almost a year ago.
Speaker 1:Yeah, we released an episode that we discussed open source and the title of that episode was OSS Open Source Software Equals Marketing. Bit clickbaity, but it's a bit clickbaity. Yeah, a bit clickbaity, but it did the trick. It did the trick. You know. So cool Another thing rolling off of Astral and UV as well. One thing that made some noise was Deno 2.0.
Speaker 2:Yeah, I didn't see the noise actually. I saw that you added it to the.
Speaker 1:I saw that on different podcasts or even YouTube channels and some stuff. So Deno is on JavaScript land and TypeScript land, right? So, barton, you know this better than I do, so feel free to jump in and correct me if I say something wrong.
Speaker 2:It was released 19th of September, Dino 2.
Speaker 1:19th of.
Speaker 2:September and I think it's not stable.
Speaker 1:It's release candidate it was at least, so it's not like the stable.
Speaker 2:It's in RC.
Speaker 1:yeah, so in JavaScript land there's a lot of different run times.
Speaker 2:If we understand, one of them is node like the standard, I guess yeah, if you think about javascript on the back end, so not on your browser.
Speaker 1:Yes, think about the awesome in the back end, like the biggest player probably is node nodejs um and maybe just for people that like, yeah, when we say the on your browser, is really because some javascript runs on your chrome application. Pretty much right, like on the actual browser on your laptop. A lot of javascript, yeah. And then when you say the server side, we're saying there's a computer somewhere else that processes some data and then sends the process data to your browser. Like you're not going to do a lot of heavy computations on your machine, on your browser, it's not made for that. So when we say server, it's just like a computer somewhere else, right.
Speaker 2:Yeah, exactly Like. Basically, if you would write a script in Python, you can also write it in JavaScript and typically you do that with Nodejs which it has been around the longest, I'd say which probably for a lot of people just getting into JavaScript on the backend will default to Nodejs. You have two other big players. One is Deno, which came around, I want to say 2020.
Speaker 1:And it's the same. Well, we'll talk more about Deno, but it's the same creator as Node.
Speaker 2:It's the same creator as Node. And then you have Bun yeah, bun is, and so Nodejs and Deno yeah bun is, uh yeah, and so no js and uh dino. They are both built on the v8 javascript engine by google and bun has uh has their own implementation of the javascript engine, so really focused on uh being very, very fast yeah, I think it is today less mature than dino is yeah, I did.
Speaker 1:Actually there was a project that well, I was just following the I think it was for the Python user group website that I was using Reflex, which is basically you write Python, it transpiles it to Nextjs, and on their documentation and all this stuff they were also using Bun. So I did try a bit but it was even like yeah, so it was faster and I think that was a bit the promise. And I think Deno is also a bit the promise that it's faster. But from what I gathered, bun was compatible with Node, all the Node modules and all these things, so it was a bit ahead. But Deno 2.0 also brings parity there, so like yeah, and so D.
Speaker 2:So dino is from the beginning a bit focused on uh, on security. Yeah, a bit more opinionated on how to do uh, package management, dependency resolution stuff, like they have their own system. That's why they were not fully uh, fully compatible um and uh something else. I lost my train of thought.
Speaker 1:But that's fine. But so Deno 2, and again, yeah, Deno is the same creator as Node. Yeah, and if you really look closely, they just rearranged the letters a bit right Deno, Node and it's, and now Node 2.0, it is compatible with the Node packages. So that's a big step. And yeah, yeah, it comes with a lot of features and all these things, right. So basically, any Node project you have today, you can just kind of have Deno, and they also have the standard library that is already built in, yeah.
Speaker 2:And what is nice and it's actually what I wanted to say is that a lot of people don't write JavaScript, but they write in TypeScript yes, somewhat statically typed. I'd say yeah, somewhat. And normally when you do this in nodejs, you have like a. You have a transpilation step which translates your typescript to javascript, and with dino it's built in, so you don't need to do the transpiling.
Speaker 1:Indeed, indeed, it's a good, that's a good comment and um, yeah, so now, basically, dino seems to be all the hype, or at least seems like yeah, it is, it is faster as well than node. Uh, there was a apparently a talk from the node creator that, like mistakes or things he regrets about creation of node that he, I think he learned from and started from scratch on dino.
Speaker 2:Um, so I think the nice thing with um, with dino um as compared to bun but this is very subjective is that I have the feeling that there is a lot of support for uh dino already out there for in hosted services like, I think, fly to the uh dino support when you slack has defaulted to dino for it's for when, but when?
Speaker 1:yeah, yeah so it seems like it's being picked up as a standard, indeed, indeed, well, I think also the name, right, the creator of Node. I think it brings a lot of weight. Deno is also part of a company, a for-profit company, and Deno is also written in Rust. So I think a lot of the stuff that I heard. People are excited about this because they also speed up the V8 engine, right, so there are also things that could be reused, right, but it's retaining rust, so it also should be faster. And I mean the other thing that I also saw that I thought was interesting is that you can create binary applications with Deno, apparently Okay. So is that you can create binary applications with Deno, apparently Okay. So you can have like for different architectures and different platforms as well, so you can have a JavaScript code and then it just kind of outputs a binary, which is something that Python does not have. I think. Well, maybe there are some packages and stuff.
Speaker 2:There are wrappers, yeah.
Speaker 1:Yeah, exactly, but it's not something like really built in which I thought it was cool. Again, maybe people need to fact check me on this, but I remember seeing this and looking at Deno and listening to it it made me wonder is Deno 2 kind of like QV Part of a for-profit company written in Rust? Package manager kind of resolves fast and all these things?
Speaker 2:Is it just in terms of functionality or in terms of the premise? That is also part of, I think here it's easier to. They want to commercialize because Deno is a runtime right. True, and it's easier to. I think they were actually working on it. Can you go to products? They have a platform where you can very easily deploy these apps built in Deno.
Speaker 2:Deno for enterprise maybe, which is a very logical step to commercialize this right, like, if you're a company that goes heavy on Deno, that you subscribe to denocom to deploy your apps. True, if you don't have a lot of complexity in terms of infrastructure. That's true. That's true, makes a lot of sense. I was also wondering because, like Dino, it's more, let's say, realistic short term, that you can believe that they will be break even and it will actually generate money and that they will be able to sustain the development of Dino.
Speaker 1:But I think it's also the way that it was, like the way that Astra was created. It was also a bit the impression that you get is like oh, the impression that you get is like oh, I created this package, that is popular, hugely popular, hugely popular.
Speaker 2:And then his phone rang and then he's like oh yeah, you want money and I was like oh yeah, why not?
Speaker 1:So that's why I feel like that's a bit of a difference.
Speaker 1:I mean, it's an impression. We're not in these discussions, of course, but that's a bit the feeling that we get. And also, I was wondering I think UV got very popular because it was like cargo for Python, but if someone said, like Dino is cargo for JavaScript, I wonder if there'll be more commotion, like more, Actually, because I'm not sure. I also have an impression a lot of people that go to Rust come from Python as well. Actually, I did think I saw a survey, well, some time ago that most of the people that got into Rust came from the Python developers. So I'm not sure if the correlation between JavaScript and Rust, or JavaScript, iScript and Rust is less present than Python to Rust.
Speaker 1:Getting attacked by balloons yeah right, Falling balloons, but yeah. So I haven't tried Deno yet, but I do think I'm going to switch as soon as there is a stable thing there. So I'm curious about it. I don't do a lot of JavaScript either, but you know, got a key step today with the coolest and the latest but yeah, Maybe timely news as well Open AI valuation. Won't talk too much about it, I guess.
Speaker 2:Yeah, it has been in the news so many times by all channels, but they raised capital. I think they raised $6 billion, if I'm not mistaken. $6.5 billion in funding. Funding, which already on its own is an amount that is hard to even, uh, imagine. Yeah, right, and that puts their evaluation at 150 billion dollars it's a lot of money which is crazy yeah, right what is the gdp of belgium? Like maybe five times that or something. Let's see, let's see gdp of belgium oh, five times that roughly indeed yeah, it's big numbers, huh.
Speaker 2:But that's absurd. Yeah, five open eyes, and then we can close down Belgium.
Speaker 1:Belgium starts to sweat.
Speaker 2:But that's absurd, huh? Yeah, indeed, Apparently it's linked to like they're raising the capitals linked to go more towards a four-profit company like Anthropic. There's bound to be some changes there.
Speaker 1:And do you think they will keep up with? Well, they are ahead today. Right Of the other players, I would say they're still ahead. Yeah, do you think they will keep this for a while? Because, yeah, I heard some arguments that if it's just a matter of compute or, in time, people, people are going to get there difficult to say, like they have a lot of talents that actually went to entropic.
Speaker 2:Um, but the amount of capital if you say, is it just compute, you need this capital, right, there are not a lot of others that have this capital, which is maybe also not even true, because, xai, how much did they get funding? I think it's also around this amount. Series B funding it's also 6, six billion. A lot of money, xei, by uh, by uh, tesla, guy right, elon, elon, but yeah, so yeah, but a lot of coverage on this, on all challenge, so let's, uh, maybe move along.
Speaker 1:A lot of coverage on this on all challenges, so let's maybe move along. Maybe, I don't know, adobe releases a stance on Gen AI.
Speaker 2:Yeah, I think that's an interesting one. So we had Adobe roughly four or five months update their terms and conditions which gave a lot of backlash from the community, which, more or less and I'm very much summarizing what I have in mind. I'm not exactly sure how much of that is correct, but in their terms and conditions that were updated, it basically said that Adobe Firefly, which is their Gen AI model, that it was allowed to train on everybody's data.
Speaker 1:Like, regardless if you use GNI or not.
Speaker 2:Regardless if you use NAI or not. Yeah, and there was no, I think it was. You were opted in by default, like there was a lot of discussion on this, and especially because it was a bit vague the definition. So they released this statement last week somewhere where they explained their position on Gen AI that Gen AI is a tool to support the creative process, that is not a creative process on its own, and that they also adjusted their terms and conditions to make it very clear that Adobe Firefly is never trained on any user or customer creations, so it's trained on any user or customer creations, so it's trained on stock images. Basically, I haven't checked the updated terms and conditions, but I think this is a message that you would like to see from Adobe.
Speaker 1:But maybe I don't follow 100%. They said that people are opted in that your data is going to be used to train a model. That's how it looked like in the old terms and conditions, but they're saying like, oh, there was some miscommunication. That's not actually what we do Exactly.
Speaker 2:That's what they're saying now. Yeah, well, it looks like. Let's try it, See what it gives. See if we can get away with it, okay, if try it, see what it gives yeah, see if we can get away with it. Okay if we can't get away with it. Let's, let's take a step back and uh, just in terms of conditions.
Speaker 1:Probably someone just put something. Yeah, just include it there.
Speaker 2:Opted in no one's gonna check exactly. And then?
Speaker 1:it was like well, what the you know? And then it's like whoa, whoa, whoa.
Speaker 2:We misunderstood, that's not what we meant, you know we're all for the community exactly, we love you guys and it's fine but I mean, this is what the community think wants to see, right? Yeah, I think for sure. The stance of Adobe and you use Adobe Especially because Adobe is as like when it comes to creative tools I think they have I don't know there's numerous, but just a good feeling is like 80% of the market. It does feel like they are.
Speaker 2:It's crazy if you work with it like if you're not a hobbyist, like if you work with these things, you kind of need to go for I think so, yeah, I think, unless you're in a very specific niche.
Speaker 2:Yeah, um, but it's also the like, the, the connections between the different tools that they have yeah, yeah, you make a you make a design in a vector-based thing and then you can move it to somewhere else to finish it, and that's a bit uh, and it's very hard to find this in any other it's polished kind of like what apple products are as well it's, uh, it's. I think there are niche products that are more polished than any of adobe's. Okay, but because you have this whole ecosystem, you go for the ecosystem, yeah indeed, I don't want to resist too much.
Speaker 1:And you use Adobe? No, I use.
Speaker 2:Adobe. Yeah, I tried the other things as well, but in the end you go back to Adobe.
Speaker 1:All roads lead to Adobe.
Speaker 2:Yeah, that is a bit the feeling I have. I don't think it's good for the community, but that's where we're at.
Speaker 1:Yeah, no, I know what you're saying. I know what you're saying. Cool, maybe more on AI, maybe some stuff from the tech corner, because a wise man once said that a library keeps the mind at peak. We should have a bell like a ping More on Gen AI. Then this is a package that I came across, so actually, well, maybe a prelude. I don't know if that's the right word. So when I'm talking Gen AI now, I'm talking about text, natural language processing, and one thing that well, it's not well, is it Gen AI In the sense of generative? It's more like LLMs, I guess. So Gen AI is generative AI, so it's to create new things with AI.
Speaker 1:A lot of the times, this uses LLMs. So large language models, right. But large language models, they're just language models that are large, so you don't need to use it for creating text, right. And what I've noticed is that in NLP natural language processing there are a lot of different tasks. So, for example, one is sentiment analysis.
Speaker 1:You have a piece of text and you see how angry the person are, how happy the person is, right. One thing is like name identity recognition. So you have a piece of text and then from that text, for example, what are the company names? What are the person names, right? So a tricky example is like if you say Apple Apple in certain context, maybe it's a company name or maybe just a fruit. So there are a few examples in which it's not as obvious, and LLM is actually pretty good at these things as well.
Speaker 1:So one common task is to go from unstructured and unstructured just mean text to something that is structured, right. So if you have, if you tell me a story about yourself, that's just text. But what I want to get out of it is what's the name of the main character? How old is the main character? Uh, what's the name? Well, did I say the name? Well, name, age, uh, gender, whatever. So, basically, have these fields and you already know that if the LLM says his age is Bob, you know it's wrong. Right, because you expect a natural number. Or sometimes, if you have a user input, maybe you know the user needs to be legal age, above legal age, so you can add some validation. It needs to be above 18. And if it's less than 18, you know something's wrong.
Speaker 2:So let's say you're reading a biography, like I've written down my biography into two pages of text. Yes, you want to move this your life. You want to move this to like a structured like what's this age? Where's he from? Exactly what is he doing? Yes, that's very good.
Speaker 1:So, um, let's imagine you have this, all this text all over the place, and you want to have something more structured. This is actually actually a very well going from this unstructured to structured. It's not very like. A lot of people figure this out already, and there are many different Python libraries that you use to do this. So the first one that I came across is called Magentic.
Speaker 1:Ah, and the other thing that people a lot of times do is that they use the PyDentic models. So PyDentic models is basically like a Python class. Actually, it's very popular because it's what's used in FastAPI, right, and basically whenever you put the data in there, you can actually already run some validation. So whenever you create something from text or from a json, uh, it says it tries to parse. So if you have like 1.02 as a string, but you know you expect a floating point, it already converted for you, okay, and if you cannot, then you just raise an error, a validation error, and it's a bit the standard I feel in in Python. Like, if you want to have these validation things To use PyDantic, you mean PyDantic, yeah, yeah, yeah. So one of the things that they do is like yeah, like Magentic is one of these packages, right, that does this, right, so, et cetera, et cetera, so basically.
Speaker 2:So this does the. I have my biography. You want to move this to a pidentic class? Yes, an instance of a pidentic class which has what is the age Exactly. What is the nationality? Yes. What's the career path?
Speaker 1:And it can be also more complex, right, if you have kids, you can have like a sub item, let's say, and then for each so how do you go from a text to this pydantic class where you say what is the h?
Speaker 2:that needs to be an integer? Like what is this magentic? It's called yes. Like what does it do in?
Speaker 1:between. So this, well, what I'm assuming that magentic does and I don't know all the internals it will basically send a prompt to say to well, probably has a templated prompt right Saying this is the question that I want, this is the object in response, and it probably tells ChagPT saying whatever, just answer the class instance right With the values.
Speaker 2:So you have a prompt where you say I want to extract this and this and this, and the Magentic basically adjusted that prompt to say give answer to this, but do it in this shape in this format, indeed, indeed.
Speaker 1:Um, so again in the python api and that's what I didn't like as much because it feels too magic, right? Um, basically, in magentic you have a decorator on a python function, um, and then this is additional prompt that you can add, right. So simple example here and for people following the video, you can see the people in the audio you have a class, superhero with name, age, power and enemies. Which enemies is a? Well, name is a string, age is an integer, power is a string and enemies is a list of strings. And then you just create a class, but a class doesn't have, no, sorry, a function and the function doesn't have anything. You just have a little prompt create a superhero named name. You pass the name in the function here and then it returns a superhero and then whenever you call create superhero, it will return that for you like magic. Um, yeah, I don't like this so much because it feels like there's a lot of stuff. You have to kind of keep it in your head. I feel Like, what is it doing, right?
Speaker 1:Another one that I came across, and that's the one that I actually wanted to share today, is called Instructor. It's similar, but the way they do. It is like you have usually. You have a let's see if I can. Cookbook is like you have usually. Have a see if I can cookbook. You have uh, whenever using python, uh, open a open ai, for example, you should have a client right and in instructor you basically wrap the client with the instructor method. Let's say Right and then once you do that, you can basically say for this prompt, I want the response model to be document extraction Right.
Speaker 2:So it's a bit like yeah, it does a bit less things for you, but it just basically makes the opening eye client a bit richer, in a sense that you can say very much specify like how should the output look like? Exactly?
Speaker 1:So, and then, like again, because you're using the OpenAI client, you can actually have a little conversation, right, because basically, in ChargePT, if you ask two questions in a row, the third question is actually gonna, you're gonna pass everything above in the conversation, right? And that's something you can also mimic with the API, with the Python SDK, right? So you can also do that. You can also have like a like you're a helpful bot, and this and this and this, and you can also pass all that in and then you can just say the output of this prompt is going to be in the shape of document extraction Interesting, right. Yeah, document extraction Interesting, right. Yeah, you have a question? No, I'm good. Yeah, there's more stuff. I guess graph visualization Not too, I didn't, but it does.
Speaker 1:For example, maybe the model does make a mistake, right, and maybe in the validation that you have, you already know the model made a mistake. So in that case. So in that case, what you can do, you can already give back the model with the error that he found. So, for example, if I say, oh, what's bart's age is bob? Then he will say like no, this is wrong, and it will return this error to chat gpt and we'll give it another try to get the right answer and then usually have like a max retries is three and if it's more than three it would just say error, right. And then the nice thing as well about this is well, this is I guess a broadly I'll call this a guardrail right. So you basically have something that the out, the chat gpt like model gives. But after you pass into this validation logic, you know that it's not going to be complete gibberish. Right, it will conform to some logic that you're specifying, right, for example, and yeah, and because you say it's a guardrail.
Speaker 2:Isn't it also a bit of a risk for hallucination? Let's take the example of the biography, and you're going to have a class that says what's his name, what's his nationality, what's his age, and you're going to enforce that structure. But let's say the h is not in there, isn't it just gonna come up with an h?
Speaker 1:well, in pidentic models you can say give an h or none.
Speaker 2:So basically, let's say you need to give an h, but document your processing. You didn't know there was no h yeah, like, what will?
Speaker 1:what will happen in that case? I think you just get an error in the end. You hope well, because you will retry a few times. Well, yeah, that's true, it could hallucinate. You could just come up with a name. Right, that is true, that is true. But I think in a way is like if you.
Speaker 2:It really depends, I think, to how the interaction with openai goes. If it's like this is the prompt, this is the output, and then you have a correction on that, but you still have the full history. Like, let's say, the H is not in there. You get asked and the first reaction you get is there is no H, you're going to retry? Yeah, please open it, give an.
Speaker 1:H, he's like yeah, okay.
Speaker 2:Oh yeah, you're right, there is no H Third time. Just come, Just come up with an H, yeah yeah, yeah, it really depends on how these retries are done right. I would like to see the internals of that.
Speaker 1:That's true, Maybe you can also try retrying. No, we don't need to go into the details now.
Speaker 2:Yeah, but it's a good point If it's a risk to trigger hallucination or not.
Speaker 1:Yeah, that's true. That's true, Maybe true, maybe another thing yeah, like on pidentic, you can also add description of the classes. So even if the class, like the parameter name, is very, for example, name, you just have a name you can also add a description saying the name is the name of the city where bart grew up when he was between 8 and 10 years old. Right, and it would take that in consideration when creating the, the response, as well. So it's also rich. Um, it's a good. It's a good, it's a good question. Huh, stop after time. Yeah, so there's, there's some some other things here, but, uh, to be tried and to, to see how how much you induce hallucination by doing these things, right, uh, pom-pom, what else we have? What is this?
Speaker 2:repo to text part uh, just something that I saw passing by somewhere, I think, on lobsters um, which I think is interesting if you want to play around with this stuff. It is a very small script and with a small GitHub pages, I think, hosted API, hosted UI, where you can say you're going to input a repository, a GitHub repository, and you're going to get all the text in a single text file, okay, meaning that you can very easily drop it in chat, gpt or in tropics cloth or that's why you used it.
Speaker 2:Well, I didn't use it. I just started passing by uh, and I think it's uh an easy way. Like this is just more for experimentation, a bit hacking to see, like what can I do with if I have the full code in the context, like, uh, what does it do versus a single file? Like how much to play around with it? This is easy. I would never use this in a in a more If I have the full code in the context, like, what does it do versus a single file? Like how much to play around with it? This is easy. I would never use this in a more automated or production setting, but I, just as an experimentation tool, like, just give me my full code base in one TXT file. I thought it was an interesting one.
Speaker 1:Indeed. Actually, this is the use case for them as well, right.
Speaker 2:Yeah, I think this is the use case for them as well.
Speaker 1:Right, yeah, I think this is the use case, Because I was thinking like why else would you do this?
Speaker 2:This is why this is the only way. Yeah, yeah, yeah.
Speaker 1:Yeah, cool, Cool. I'm also wondering if this is a bit how, like if you ask Copilot or something like this code assistance, they give a bit the context of the project. If it's a bit the way that they, they would tackle it good question.
Speaker 2:Good question, but you haven't used this. I haven't used it. Do you have a use case already for it? Uh, well, I sometimes uh drop in uh, let's say uh function definitions into just gpt and ask it something about it. Would be interesting to see, like, if I just drop in the full code base and then ask something about it, like true, what would be the difference in the quality of the output?
Speaker 1:well, I'm wondering if, like sometimes, there's a function that depends on another script in the same project. Right, a bit more understanding of what does the function actually do in the context, yeah, because if you send all of this, you're sure that everything that is not there is going to be a third-party library that maybe chad gpt already already has the context right Cool, cool, cool, cool, cool cool. So maybe S Well a few more topics. I think we have a bit more time still.
Speaker 2:The ultimate old-school PC font pack Old-school.
Speaker 1:Yeah, yeah, I know you're a big font guy.
Speaker 2:In fact, we have an episode that typography will be the death of you, bart. Yeah, someday it will. Someday, someday. Yeah, this um int10horg old school pc fonts will link it. Um very cool website. Uh, I think I saw it passing by on hacker news. Um, really is showing it. You can actually like on the top right, like uh, change some fonts on the website and it's just, uh, it's just beautiful huh this is actually quite you have like, uh, a lot of fonts from different PC vendors, hardware etc.
Speaker 2:There's like the video hardware. There's a Trident font. I used to have a Trident video card back in the day, really Way back in the day. My first computer that I built had a Trident card.
Speaker 1:Oh really, yeah, how old were you when you built a computer?
Speaker 2:That's a good question. I want to say like two, when I build it no, no, no, no bit of a genius it's fine, no, no no, but I yeah, I don't know. To be honest, I think anywhere between 10 and 12, something like that, yeah cool maybe.
Speaker 1:Um so, when you were like, uh, when you were a kid, this is how websites look like. This is how programs look like.
Speaker 2:Yeah, we had MS-DOS. Actually, the first UI like the UI-based operating system I used was not Windows, it was OS2 Warp.
Speaker 1:How was that it?
Speaker 2:was very warpypy thanks, that answers more questions I think it was ahead of its time yeah, yeah, but it died no I think it lived on a long time in uh in like uh, more uh like uh like corporate systems like main mainframes stuff like that Okay.
Speaker 1:Interesting.
Speaker 2:But very cool If you have any use case for old school retro looking fonts, which I will now have to come up with again.
Speaker 1:Check out this website, right? It's a bit the opposite for you, right, Bart? You find this, and then you kind of come up with a reason why you need to use it. I think this is a very good reason.
Speaker 2:Like the fact that this exists, that this exists. You need to do something with it, right yeah? Like this big repository, like it's not just one retrofonte, Like it's I want to say hundreds yeah there's a lot of.
Speaker 1:look at this Whoa font index. Whoa Cool, huh Much wow.
Speaker 2:There's actually even the. I think everybody's had this somewhere in their educational career. Uh, texas instruments calculator. Yes, the texas instrument font is on there that is cool.
Speaker 1:You know, actually I thought like, so I don't think I'm qualified, but I thought it would be really nice to hear a talk about fonts, just fonts, that's true. You know like what they are, and because even the whole the llama ttf you know that could be like the very end you know like how it starts, and then you have this and then you need wasm, because you want to have the arrow when the thing, and then you can extrapolate this and have wasm, what is wasman? Then you have the number.
Speaker 2:There's an engine to render the phone exactly. Yeah, it would be very cool.
Speaker 1:Yeah, like going from the very beginning, like what was fonts all about? You can even go from that story from the steve jobs, you know how like he was saying, like yeah, he was you know the story from the studio with the fonts and stuff, because he was like calligraphy, I guess.
Speaker 1:He said that he had quit the degree and then he's like, well, I don't have to do the degree, he was just on campus. He said, well, I guess I can just um go to the courses that I'm interested in. So he saw this calligraphy course, yeah, and he just went there and then he learned about the different fonts and stuff and he really liked it. So when he was programming the os, I guess he wanted to include this different font. And then, according to him on the speech, he says, yeah, and since microsoft copies everything from apple, they also include the fonts on the word document thing. So now nowadays we only have all these fonts.
Speaker 1:Because I went to that class, so and that's how it starts, right, I actually patched. I did a little experiment. I wanted to patch the data roots icon as a character on my terminal because, like you know, with the power 10k you have usually the max symbol in the beginning. Yeah, actually show. And I wanted to change that to the data roots icon to see if I could do it. And I just followed some tutorials and actually I was pretty simple with, like svg, you create this and you create like a code for that and then you just put there.
Speaker 2:It's pretty cool but yeah, but like you need to create a new unicode code for that, yeah, and you have to include that on your power 10k profile yeah okay, yeah, it's cool, it's cool.
Speaker 1:But uh, from that to llama ttf, I think it's still a big step. So, yeah, cool if you ever uh use floppies floppy disks yeah, yeah, yeah, did you ever use it, alex?
Speaker 2:she just knows them off. The icon, the same just knows the icon, the save icon.
Speaker 1:Yeah, she's like wow, the save icon, you printed it Cool 3D print of the save icon. Yeah, yeah, it's like wow, why are you walking with those Cool? Maybe one last topic for today, something that intrigued me when I read this on the notes Bob Spotter oh, oh, that's cool yeah this is cultural surveillance. No one notices, no one consents. But it's not about catching criminals, it's about catching vibes. Oh, not what I expected. Cool huh yeah cool.
Speaker 2:What is this? Um cool, huh, yeah, cool. What is this? Um, the wait, wait, let me just stick finding my website. So the a guy, riley waltz, installed somewhere high up on a pole in the mission of san francisco. I'm not sure what the mission is. Is it the district or uh, I'm not exactly. Uh, he installed a android phone with which is constantly just shazamming, recognizing the music, basically, okay, um, and it's uh. When you go to the website waltzercom slash bob spotter, uh, you see everything more or less real time from uh music that picks up and it gives a bit of a like uh, what are the music slash cultural vibes of? Uh, the mission in san francisco?
Speaker 1:which is really cool, right? So you just go to a place and there's just a tower or something that is just picking up the music that is playing he just literally installed, just uh, an android phone somewhere high up that picks up the music cool, but then like people just listening or like it's like a square that has a lot of speakers. I don't know. I don't know the location.
Speaker 2:That's what I'm saying. I'm not sure like the mission of san francisco there. I actually quickly googled it, like there is a mission district but there is also, like, uh, a church. So I'm not. I think it's a district. They probably mean somewhere high in the mission district. Wow, but it's cool, we could do that here in leuven at the other market, that's true, you could do it.
Speaker 1:This is also the not sure.
Speaker 2:If you want to know, though, the agrit but the website is also very cool website is also a very retro looking.
Speaker 1:You probably went to the old fonza website for sure, for sure it's the other hype and this battery concept, right 87, that's the the battery of the phone, I guess. Apparently, yeah, yeah.
Speaker 2:So he just like goes there and charges, plugs that battery well, it's night that you see there right like it's a lost, uh music lost number is funky town by lips at 2 51 am yeah so it's probably draining the battery and I would assume it's linked to a, to a small uh solar panel or something oh true to uh to charge it. That would be cool. That's what. That's what I do with my bird cam yeah okay, that works like a small in belgium.
Speaker 1:It works. You can do it's a cloudy all the time no, I like I have a.
Speaker 2:I have a small uh wi-fi camera that is powered by, but it has an internal battery, but it's powered by, like it's charged with a solar panel, cool.
Speaker 1:Yeah, it's installed in my birdhouse do you have to often uh recharge it like manually or? No, it's just a solar panel that does it the solar panel does it all you can.
Speaker 2:Oh, that's super cool I assume that he has something like that, otherwise he wouldn't show the percentage here. It was just plugged into the.
Speaker 1:Well, maybe there's a right ah, look, and then you have some stats here. Total shazams ever detected 1400. I'll make it bigger. Alex 1436. That's an average of 165.7 songs per day, barely curling at 87%, a decrease of 7% in the last four hours. Last ping was three minutes ago. Less than 10 minutes ago is good. Download a CSV of all Shazam results ever. Shout out to Nikhil Rafi and Shihao for advice how building these meetings Cool huh, really cool.
Speaker 2:I think what now? Someone should stab up and to actually take this link to the CSV and make a Spotify playlist jam out of it. Oh yeah, To like live. Keep playing this. Are you volunteering?
Speaker 1:or back to her If I have time it would be cool, right, no, it would be cool, right, no it would be really cool.
Speaker 2:It would be really cool. Just follow this location from anywhere. Yeah right, like what are people listening there? That's cool.
Speaker 1:That is cool, really cool. I think that's all we have time for today, unless there's something you really want to talk about now.
Speaker 2:Bart, no, I think I'm good, I think I'm good. No hot takes today Maybe I think what was interesting is the contextual retrieval of Anthropic. We'll touch upon it next week.
Speaker 1:Yeah, when we have more time we'll read it over. All these things, the world of data and AI is never a single, no, never a boring day. Is that the same? Never a boring day, never a boring day in the world of data, ai and tech and data topics in one year. Thanks y'all, you have taste in a way that's meaningful to software people hello, I'm bill gates I would. I would recommend, uh, typescript.
Speaker 2:Yeah, it writes a lot of code for me and usually it's slightly wrong. I'm reminded incident's a bust here Rust, Rust this almost makes me happy that I didn't become a supermodel. Cooper and Nettix Well, I'm sorry guys.
Speaker 1:I don't know what's going on. Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here Rust, rust Data topics Welcome to the data. Welcome to the data topics podcast.