DataTopics Unplugged: All Things Data, AI & Tech

#65 The Art of Data Storytelling: A Deep Dive with Angelica Lo Duca

DataTopics

Send us a text

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.

In this episode, we dive into the world of data storytelling with special guest Angelica Lo Duca, a professor, researcher, and author. Pull up a chair as we explore her journey from programming to teaching, and dive into the principles of turning raw data into compelling stories.

Key topics include:

Angelica’s background: From researcher to professor and published author

Why write a book?: The motivation, process, and why she chooses books over blogs

About the book: Data Storytelling with Altair and Generative AI

Overview of the book: Who it’s for and the key insights it offers

What is data storytelling and how it differs from traditional dashboards and reports

Why Altair? Exploring Altair and Vega-Lite for effective visualizations

Generative AI’s role: How tools like ChatGPT and DALL-E fit into the data storytelling process, and potential risks like bias in AI-generated images

DIKW Pyramid: Moving from raw data to actionable wisdom using the Data-Information-Knowledge-Wisdom framework

Where to buy her books:
https://www.amazon.com/stores/Angelica-Lo-Duca/author/B0B5BHD5VF
https://www.amazon.com/Become-Great-Data-Storyteller-Change/dp/1394283318
https://www.amazon.com/Data-Storytelling-Altair-Angelica-Duca/dp/1633437922/

Snippet: https://livebook.manning.com/book/data-storytelling-with-altair-and-ai/chapter-10/16

Connect with Angelica on Medium for more articles and insights: https://medium.com/@alod83/about

Speaker 1:

You have taste in a way that's meaningful to software people.

Speaker 2:

Hello, I'm Bill Gates.

Speaker 1:

I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.

Speaker 2:

I'm reminded incidentally of Rust here Rust.

Speaker 1:

This almost makes me happy that I didn't become a supermodel. Cooper and Netties.

Speaker 2:

Boy. I'm sorry guys, I don't know what's going on.

Speaker 1:

Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here.

Speaker 2:

Rust Data topics.

Speaker 1:

Welcome to the data. Welcome to the data topics podcast. Hello and welcome to Data Topics. Welcome to the Data Topics. Welcome to the Data Topics Podcast. Hello and welcome to Data Topics. I'm Plugged Deep Dive, your casual corner of the web where we discuss all about data storytelling with Altair and AI. My name is Murillo, I'll be hosting you today and I'm joined by Angelica Loduca. Did I say your name?

Speaker 2:

Yes, hi everybody, I'm really excited to be here with you today.

Speaker 1:

All right, we're excited to have you here. You're speaking with us from Italy, is that correct?

Speaker 2:

Yes, yes, all right.

Speaker 1:

In Pisa Very cool. How's the weather there, by the way?

Speaker 2:

Today it's very sunny, but yesterday there was a heavy rain, and so ah really, yeah, it's strange weather.

Speaker 1:

Yeah, so happy to have you here. Uh, we were chatting a bit before we started the the episode as well, so a lot of cool stuff to talk about, but maybe for people that haven't uh, heard about you, don't know about you yet, could you give a little bit of an introduction, like your background, how you got into programming, data science, your past experiences and all these things?

Speaker 2:

yes, I am a researcher at the institute of informatics and telematics of the national Research Council in Italy. My research interests are data science, data storytelling, data engineering and also web applications. But my field is research, but I also have a connection with the industry. For this reason, I searched also for collaborations with the industries and for a study topics that are more related to the industry rather than research. Strictly said, I am also a professor of data journalism at the University of Pisa and I also like writing very much. I have a blog post. I have written almost four books. I am currently writing my fourth book and I'm really excited about the topics I study and I investigate.

Speaker 1:

Very cool. Thank you for the introduction, so maybe a quick question for my curiosity Data journalism. What is it about? If someone is attending your classes, what can they expect?

Speaker 2:

attending your classes. What can they expect? Yes, data journalism is a subfield of data storytelling, because it's communicating news, which derives from journalism derived from data, and it's a combination. Data journalism is a combination between technological requirements given by data, such as data analysis, data exploration and also communication skills, because you have to communicate what you learn from data data. Compared to data storytelling, which is a broader field, data journalism is more focused to news, and so you have to build stories which are fresh, because news is always fresh and you have to extract this news from data. Yes, this is very interesting.

Speaker 1:

My students also always enjoy the course and I have different theses and, yes, they are very interested and this course is for data science people that are following like a data science track, or is for data science people that are following a data science track, or is it more for people that are following a journalism kind of track?

Speaker 2:

No, the strange is that this course is in the degrees, the master degrees, of digital humanities, and so it's very strange because it's a connection between humanities and technology, and I think it's very fascinating this aspect.

Speaker 1:

Yeah, that's true. Do you have a lot of people from different backgrounds, like people that are more technical and less technical? So I guess it's probably a challenge for you as well, right Having such different backgrounds in the same class.

Speaker 2:

Yes, it's very difficult to manage both. But while in the previous years I focused more on technical programming languages such as Python, over the years I have realized that it's better to use other types of tools, such as Tableau or Power BI to do the visual representations of data, because some students have difficulties in writing, code debugging and so on, debugging and so on. But also I encouraged them to use, maybe, chatgpt to correct errors, while it's not strictly academic.

Speaker 1:

Yeah, I don't know if that's controversial, but yeah, I think ChatGPT is relatively new, I would say. So I think you're pretty quick on adapting to your course if you're already instructing the ins and outs of, uh yeah, gen ai, and I think we're going to talk a bit more when we're talking about your, your latest book, right? So data storytelling with out there in ai. I think it's out there that you pronounce it, but, um, but before we go there, you also mentioned you wrote other books as well. Yes, would you like to?

Speaker 2:

Yes, since I love exploring many different fields, I have written. My first book was about data science and it was focused especially on experiments and how to track your experiments. The title is Comet for Data Science. It was published in 2022 by PACT publication and the topic is focused on Comet, which is an experimentation platform to track your experiments. Since then, comet has evolved and now it also supports MLOPs and also it's integrated, I think, with generative AI, but I don't know more in detail.

Speaker 1:

Yes, this is the book, right.

Speaker 2:

Yes, this one. Yes, this is the book, right? Yes, this one and the second book. I co-authored this book with other people and it's Learning and Operating Presto, which is by O'Reilly Media, and it's more focused on data engineering, data engineering and how to use Presto as a query engine for your system. This one is the book, and now O'Reilly has also provided in its platform the possibility to read the book in German and Spanish with an automatic translation, and this is very interesting because you can read the book in your own language, so you don't have to write in any other languages.

Speaker 1:

O'Reilly takes care of the translation.

Speaker 2:

Yes, great, and I think that in the near future they also will provide other languages. How does this?

Speaker 1:

work actually Is it automatic, but then someone validates the translation.

Speaker 2:

Yes, they use a tool, but I don't remember the name. It's an automatic translation using Gen AI. I think, ah, wow, yes, translation using GenAI. I think, ah, wow, yes, and in fact, they write also under the book. If you find some errors, please send us a feedback about these so they can correct the errors.

Speaker 2:

The third book is this one. This is Data Storytelling with Altair and AI, which was recently released by Manning Publications, and it's about the last part of the data science workflow, which is data communication, and the focus is on data storytelling. I think that this book fills a gap in the literature because it's focused on Python, and I have met different people who say that they build boring charts in Python. Python and in this book you find different interesting charts and how to declutter them, how to modify them and how to tailor them for your audience and in Python. But you are not alone. You can use Gen AI to help you, and this, I think, is interesting and, uh, currently just last book. I'm writing a fourth book, which is become a great data storyteller, by wiley, which will be released by the end of january in 2025, and it's a more theoretical book about the concepts of story how to extract a story from data, and this is my overview.

Speaker 1:

Wow, this is really cool. So you kind of go end-to-end right. You have the more data engineering focus, you have the data science focus or the MLOps data science with Comet, and then all the way to the data storytelling part.

Speaker 2:

Yes, Very nice very nice.

Speaker 1:

So maybe a personal curiosity from me, right, what motivated you to write so many books and why? Well, why did you choose these topics as well? But I think there are other ways, like if someone is they want to share some knowledge. Today, there are different ways they can do this. They could write blog posts, they could write, they could write a book. So what was the motivation to go for books in this case?

Speaker 2:

well, the truth is that since I was a child, my dream was to become a writer. Yes, I wrote many tales, short stories, also fictional stories, which were published by local editors, local publishers, and yes, but then I realized that I had to write also in my field. And it's a personal idea to share knowledge in a book, because I think that I have a blog post where I publish many things, I share contents and so on. But a book is more complete and I have many books. I like reading and so I also like writing, and I think that people in a book find a more comprehensive guide to do something. The drawback of the book is that maybe the technology can get older before and the book is still there. With an old technology, the code maybe doesn't work because it's old, but the principles described in the book are still valid, and I believe in the importance of books because you can read it, especially if they are hard copies, they are better yeah, I also saw in the book as well, I skimmed through it.

Speaker 1:

You also mentioned that there is a structure right to the topics, they kind of progress, so it's prepared in a way for you to kind of go step by step and upgrade your game there. So it's, I agree, it's really cool. Um, and indeed so you also mentioned the challenge, right, that things move fast in the tech industry. Um, how do you do you, how do you do anything to mitigate the, the changes? Let's say, uh, is there something you can do aside from releasing new versions, new editions?

Speaker 2:

Yes, firstly, you can write a second edition of a book, but I think that the most important aspect here is to have a book repository with the code, because every book has every technical book has a GitHub, usually has a GitHub repository with the code and you can update it. Maybe in the book the code is still old, but in the repository you can find the new version. It would be very good if readers could contribute to the code maintenance, and this is a dream that every writer has, because you don't write for the mass, you write for the single reader, and when a reader reads a book, there is a connection between you and the reader. And when a reader reads a book, there is a connection between you and the reader. It's a personal connection. While maybe in a presentation you talk to a great audience, here in a book you talk to just one, with just one people, one person, and it's a very interesting relationship that the reader establishes with the writer.

Speaker 1:

Very cool and I also saw on the. Well, I think if you buy the physical book, you also have access to the discussions on the I think it's the many platform, so you also have that right, which I thought was really really cool yes, yes, unfortunately these discussions are not very appreciated by readers because usually we don't have the time to write, but I think that engaging with the author could be very interesting.

Speaker 2:

I personally read many books, many technological books, and once after reading a book I contact the author and ask yes, in my blog post. I also have some interviews with other authors blog post. I also have some interviews with other authors and I ask them many questions about the book, and then sometimes I also ask the author to answer my questions and I publish them as a blog post because it's very fun to see in their answers their writing style, because it seems that you are reading again.

Speaker 2:

You are still reading the book. It's very fun.

Speaker 1:

No, that's a great idea. Actually, do you have any book recommendations there, any book that surprised you when you asked these questions?

Speaker 2:

Many books. One of the books that I really like is the Chart Spark by Ellie Torban. The name seems to suggest something related to charts, but building graphs, but in. Indeed, this book suggests you how to be creative in your job, and it's very uh. I I probably suggest you to to read it, because it changes your uh.

Speaker 1:

Yes, this is that I have on the. I put the book on the screen for people that are just listening. But yes, the charts, yes. Harness your creativity in data communication and stand out and innovate.

Speaker 2:

Yes, and maybe it could be very interesting for people dealing with the data, especially who has a technical background and believes that creativity belongs only to genius and it's not really cool.

Speaker 1:

Call out there yes and um the the process of writing a book, how like, uh, I'm assuming it's it's very labor intensive, right, I'm assuming it takes, it takes time. There's uh sending to the publishers and all these things could. Could you talk a bit about what's the work that goes behind publishing a book?

Speaker 2:

Yes, the first and the most most. No, not most. The first step is to have an idea, and the idea is the most difficult thing because you have to write. The market is full, and so it's very difficult to have an idea approved by a publisher, because there are books about all the topics and you have to be innovative in some way. A market analysis on the topics that are trending at the moment, and if you know at least one of these topics, you can combine your idea, your topic or your experience with this trending topic, and this is the first possibility.

Speaker 2:

Once you have the idea, you have to write the outline of the book, which should be innovative. All the chapters should be innovative. If you write the same things, you will be rejected, and so you have to write something innovative. If you write the same things, you will be rejected, and so you have to write something innovative. Then you propose your idea and when you have the publisher approved the idea, you start writing, and I think this is a sort of you enjoy this, but also, in some periods, you can view writing as a rock, as something very heavy, because you have deadlines. You have to meet your deadlines and you have to respect them and you have deadlines. You have to meet your deadlines, you have to respect them and you have to write. You can plan to write some pages each day and so before the deadline you have the chapter ready.

Speaker 2:

But you have also to study, because you don't know everything To write a book you need to study, and if you don't like this, don't write a book.

Speaker 1:

So usually when the deadlines are more based on chapters like deliver the first chapter, second chapter, it's more like a linear thing.

Speaker 2:

Yes, you see everything chapter by chapter, because maybe you have to write 300 pages. If you look at all the pages you become crazy. You have to write to see a single chapter, single page.

Speaker 1:

Today I will write three pages, okay yeah, yeah, I think I wouldn't be good for writing a book, because I uh, I sometimes I have a lot of motivation but then it goes away the next day and I'm like I don't think it would be for me. I do like studying, though, and I think it's interesting you mentioned this, because I'm sure you learn a lot in writing a book. I also I'm a believer that in trying to teach people and I think writing a book is a way of teaching you also learn a lot, right? So a lot of the times, I try to do some presentations, and sometimes are topics that I'm not very familiar, but in trying to present it to people, I definitely learn a lot. So it's also one way, one of the ways that I really like to to learn stuff, and I also feel like it's useful because you're also sharing your knowledge with other people, which is yes, I think this is the most valuable thing that you share knowledge and you hope to leave something to others.

Speaker 2:

Yeah, to give something to others and maybe the effort that you employ to do something, maybe with this effort you can pave the way to other people.

Speaker 1:

Yes, yeah, definitely Cool, maybe. How long does it take the whole process from like you have an idea um you contact the publisher all the way to the publishing date. How long does it usually take?

Speaker 2:

it depends on the publisher, but usually you contact a senior acquisition editor. Personally, I did it this way you go to LinkedIn, you search for the publishing house that you want to publish and look for senior editors, senior acquisition editors. Then you send them an email and usually they answer very quick, very quickly. Yes, because you talk with a direct person. You don't contact the general publishing house, you contact just one person and so they answer. You contact just one person and so they answer yes, it's fast. But then you negotiate with the acquisition editor the topic Maybe they are not interested in your topic you need to modify. And if finally they are interested, they accept your proposal and they send the proposal under review to the editorial board. This usually takes two weeks.

Speaker 1:

After these weeks you have to answer yes or no and then from that it takes what, maybe a year for you to have the whole book written out, or how is it more or less, or?

Speaker 2:

to write the whole book. You usually you employ one here one year okay.

Speaker 1:

so it's a lot of work? Yes, but sometimes also two years.

Speaker 2:

It depends on the yes.

Speaker 1:

And what's your favorite part of writing a book?

Speaker 2:

Writing and studying everything. But it depends on the book, because in some books I also like to insert some personal anecdotes, and so this is I think this is the most fun thing, because every time you go out with friends, with family, you see everything as a, as a possible anecdote to insert in your book and all your life becomes your book. Oh, that's cool. Yes, because you can write a very technical book without any emotion, any mood. But I don't like this style. I like to add something related to A little spice. Yes, I like to enjoy this writing and it's not a very sad, boring writing.

Speaker 1:

This is nice and maybe what's the thing you like the least about writing a book what's it sorry? The thing you like the least, the thing that you don't like in writing a book okay, the deadlines, the deadlines.

Speaker 2:

Sometimes they are very strict, uh and the and you have to be to respect them. You can't ask the editor to delay some days, but you can delay one month. You can't, and so maybe sometimes you are in pressure that you have to write this before that date. The chapter should be ready yeah, I can imagine.

Speaker 1:

I can imagine cool, so maybe we can dive more into the book, right? So data storytelling with out there in ai, um, maybe. First question is why Altair? So, for the people that are not very familiar with all the plotting libraries in Python, there are quite a few. I think the basic one the most, I don't know if it's the most popular, but the basic one is matplotlib yes, right. And then there's a Seaborn, which I think is built on top of that, which is a bit fancier graphs. Why Altair? I'm not even sure. Is Altair Altair? How do you pronounce it actually? Do you know?

Speaker 2:

I don't know, I say Altair, but I know it's wrong, but I like it. Okay, we can go.

Speaker 1:

We roll with Altair for this chat. Why Altair?

Speaker 2:

Because I think it's the only Python library which is suitable for data storytelling. Because Altair contains different functions, provides different functions and methods you can use to transform your data dynamically while building your chart, and also it receives as an input directly Pandas data frame. Instead, in Matplotlib you have to modify something before using the data frame, and so Altair is more versatile. I have tested also Matplotlib, but when you want to build a story, you want to insert annotations or context or text after. It is better, more user-friendly.

Speaker 1:

Sorry what you mean. It is user-friendly like it's easier to use the API is nicer and all these things.

Speaker 2:

Yes, and for this I have a student now that is building a Python library for data storytelling at the top of Altair. Because, yes, in Python there is not a library for data storytelling and we are defining this with methods such as add context, add annotation. Because, for example, if you look at D3, which is a JavaScript library for data visualization, there is an additional plugin which provides the possibility to add annotations, for example, very easily. Instead, in Python, you can't you have to manually calculate the position of a text within a chart. Instead, if you have directly a library for data storytelling, you can use it very quickly and we are defining this.

Speaker 1:

I see, so this is the library you mentioned, right? D3 is JavaScript, so you cannot use it in Python and it has some really nice. There's also the interactivity to it, right?

Speaker 2:

Yes, and Altair is built on the top of VegaLite, which is a grammar which uses D3. And for this reason, Altair is very user-friendly, because the tree allows you to build many, many visual stuff in all the ways.

Speaker 1:

Interesting. Yeah, I think this is the GitHub page for Altair. Yes, actually it's Vega Altair. So indeed, they actually take it from the how do you call it the grammar, right? So maybe what's for people that never heard of this visualization grammar, how would you explain it to?

Speaker 2:

them of rules to describe a chart and Vega uses JSON to describe all the parts of the chart and, for example, if you have an axis, you use the key X axis and the value of the axis, for example. And Altair is a Python interface to build Vega graphs.

Speaker 1:

So it's like under the hood, it's still Vega, which you said uses D3. So it's JavaScript, but then Altair is like a translation layer between these programming languages, kind of. This is cool.

Speaker 2:

In fact, an advanced use of Altair could be to insert directly the Vega code in the chart. It's an advanced use, but I don't cover it in the chart. It's an advanced use, but I don't cover it in the book.

Speaker 1:

Yes, maybe also to touch a bit in the book and I skimmed through it. Like I mentioned, it's a mix between theory there's also some examples, like you mentioned, some anecdotes, but there are also some code right, so it is also for people to get practical with it. It's not just the foundations, but there are also some code right, so it is also for people to get practical with it. It's not just the foundations, but there is also some hands-on, let's say. So I thought it was a really nice mix, really nice mix. And the one thing you mentioned that I thought was interesting you said Matplotlib is a plotting library right, but it's not a data storing or it's not very well suited for data storytelling. So how would you compare the differences between, just like a plotting library, what makes a good plotting library or what makes a good data storytelling library? You already mentioned some things like adding the context easily and finding the locations of objects in your plot, but is there something else that you add to it?

Speaker 2:

Yes, essentially, a data storytelling library should tell a story with your chart, and a story has characters and plots and a plot, but this is a deep level of usage of data storytelling in terms of a chart, a data store, a visual story, a chart in a store sorry. A data store, a data chart, data chart, data visualization no, sorry that's tricky, I know yes, a chart telling a story.

Speaker 2:

now I'm here. A chart telling a story should have a good title, a subtitle, some annotations. You have also to put credits in your chart and, the most important thing, in every story there is an end of the story and the end in data. Storytelling is adding next steps. What should we do after reading a story? This is done by the next steps. Examples of the next steps could be learn more, maybe a button to click to learn more about the topic, or an action that you can do, for example, a decision for decision makers. You could provide some different options in your next step part, and so a story should have these parts. In Matplotlib, you have to integrate all these parts separately. You can do it, you can, you still can, but it's more difficult. In Altair, you can build layers and so with your layer, each layer could contain a part of the story and you can use just the first layer or add more layers with annotations and next steps, credits and so on. This is difference. You have different layers. You can work on.

Speaker 1:

So I'm going to try to repeat what you said in my words to make sure that I'm understanding Out there. It's set up in a way that it's easier to define these different layers and when you're trying to build a data story, it's useful to have this concept of layers because there are different components that you want in your plot, in your data, basically to tell this story, whereas Matplotlib is more building blocks, let's say so you could still do this, but it becomes harder to achieve that because it's too low level, right? Like the way that it's organized. Organized, it's not made for this. Is that a good, uh, rephrasing good summary? Great, got an a cool, very nice, very nice.

Speaker 1:

So maybe in about your book? So, uh, it's structured in different parts, right? So the first one is introducing Altair and generative AI to data storytelling. Rory, talk about Altair. What about the generative AI part? How does the generative AI part come into this? And that's actually what I thought was the most surprising. When I saw, I think, because the title is just data storytelling with Altair and AI and when I saw that it was generative AI, I was a bit surprised. I was like, oh okay, cool.

Speaker 2:

Yes, just a quick note. The title has changed over the course of the book. The first title was Data Storytelling with Altair and GitHub Copilot the first title?

Speaker 2:

Yes, because at the beginning the book included only GitHub Copilot. Github Copilot is an assistant for writing code. It's very famous. But then I decided to include also other aspects of generative AI and, in particular, chatgpt and Dolly to generate images, and the title moved to AI-assisted data storytelling and this was the second title, but then AI-assisted data storytelling. It was not good. There was a third, fourth title and finally the title was this one Data Storytelling with Altair and AI. So the types of AI included in the book are ChatGPT, dolly and GitHub Copilot.

Speaker 2:

Github Copilot is used as an assistant to build the draw chart, and so you use Copilot to build the chart in Altair, to build the chart in Altair. Instead, copilot ChartGPT is used in two ways. The first way is to generate the more engaging text, text and, for example, the title or the subtitle, the annotations for different types of audiences. And in the book you also use RAG Retrieval, augmented Generation to use AI, generative AI to adapt the content, for example, of your data, to a specific audience. You extract from a set of documents you have, you extract the main content and you can adapt, for example, to a public of executives, of professionals, or to a general audience so you mentioned.

Speaker 1:

So the co-pilot, the first part is also so the co-pilot is a coding assistant, right, but it's ai generally I powered. But then you mentioned so you can use your use in the book the use of co Copilot for setting the best titles, the best subtitles. So it's still about the content, it's not necessarily about the code. Still.

Speaker 2:

No, this is ChatGPT for the code.

Speaker 1:

Ah, ChatGPT.

Speaker 2:

Sorry for the content. For the content Instead Copilot.

Speaker 1:

Yes, and you also mentioned the RAG component. Could you just I'm not sure if I fully understood, so maybe for the people that are listening that haven't heard of RAG, could you also explain a bit what RAG is and how does the RAG fit in the data storytelling?

Speaker 2:

Yes, RAG is a technique, generally speaking, to fine-tune a large language model on a specific topic. So in the book you take your data. You have, for example, a set of documents about the topic you are investigating. About the topic you are investigating, you give this set of documents to your RAG model and you extract a summary or a description for your chart, for example, the description, an annotation, a title based on the content of your documents, and you can decide if you want to use this data story for different types of audiences. For example, you have the documents, you can decide to extract a title for a general audience, extract a title for an audience of professionals, and the model generates different titles based on your needs. This is very interesting for, especially for developers, who don't have sometimes the skills of communication, and they can automate this process using generative AI.

Speaker 1:

I see so it's like you have still the graphs, the plots, the data, the foundations are the same, but then you want to tweak a bit the type of title to cater more to a general audience or to more business audience or more technical audience, but the values of the plot, it's they're still the same. It's just about the story, like in, how you hook people in and the links and all these things.

Speaker 2:

Yes, yes, okay the chart is the same, but you can decide to use different titles, annotations and much more based on your audience and also on your objective. For example, if you want to entertain an audience, you can ask to generate a joke based on your data.

Speaker 1:

Have you tried that? Are the jokes good?

Speaker 2:

No, I didn't. But I usually use to inform or to make the title more persuasive, but not for jokes. But you can.

Speaker 1:

Okay, cool. We talked about Copilot. We talked about ChatGPT and DALI. Maybe for the people that don't know exactly what DALI is, because I think today it's built in ChatGPT- yes, but just another thing.

Speaker 2:

you can use ChatGPT also for another purpose in the book, and that is to generate ideas.

Speaker 1:

Ah, like a brainstorming tool.

Speaker 2:

Yes, yes, Because maybe you can discuss with ChatGPT to extract new ideas about a topic. I don't use ChatGPi to extract insights from data because at the time of writing the book this was not possible, but maybe an updated version of the book could include also this part.

Speaker 1:

Yeah, I can imagine if it takes one year. It's probably very hard to, because this GNI stuff is moving very quickly these days.

Speaker 2:

Yes, yes, and also Dolly. Dolly is for generating images, and maybe the images generated in the book are less engaging than the ones that you can see in Dolly, but so the techniques, the concepts, are still valid.

Speaker 1:

I see Very, very, very cool. Maybe also so you were talking a lot about the data storytelling before. I think you mentioned in your classes. You also talk about Power BI and Tableau. You also talked about some plotting libraries. Could you compare a bit all the things we've mentioned so far, like the dashboarding tools and also, I guess, data talk storytelling? What is the difference between them? Could you use something like Tableau or Power BI to do the data story? Would you say it's a good tool for that or not? Yes, indeed, it's a good tool for that or not?

Speaker 2:

Yes, indeed, tableau, I think, has a very low learning curve because it's very difficult to learn, but once you have learned it, it's a wonderful tool to represent charts and maps and so on, and I also in the final part of the book there is also a comparison with Tableau and you can also incorporate your charts in Tableau.

Speaker 2:

I describe how to incorporate your charts in Tableau and Power BI and also in Comet, because I wrote the previous book on Comet and the difference is that in Tableau you can still write stories. I think maybe it's even better than Python for writing stories. Maybe it's even better than Python for writing stories, but it's not for developers. In the book I developed the idea that if you write all your analysis in Python, it could be. The best solution is to build also your communication part in Python, because you usually write the code in Python and the report you write in using another tool, but instead you can write everything in Python and you can use Altair to do this. The fun thing is that I have tried to implement some case studies that I propose in the book also in Tableau and the result yes, and the result is quite similar.

Speaker 1:

Yes ah, really okay, okay. So I guess if someone that is doing their analysis in Python but they already know Tableau, it's also a good tool for them to, yes, apply these principles in Tableau.

Speaker 2:

Yes, you can apply the principles I describe in the book also to other tools, also to Matplotlib. If you want the principles, Then the code is specific for Python Altair.

Speaker 1:

Yeah, and maybe you talked a bit about these principles and I see you referring to the book the D-I-K-W pyramid. Would you like to? I'm assuming that that's well. Maybe you want to explain a bit what is this pyramid and how this ties to data storytelling.

Speaker 2:

Yes, indeed, I met for the first time this pyramid in a book by Joseph Berengueres, which is Data Visualization Something, and I liked, I loved this idea. This pyramid means data. It starts from the bottom. It means data, knowledge, wisdom, and this parameter moves from data to wisdom, and the idea in the book is also to move from raw data to wisdom. When you reach the wisdom level, you have built a data-driven story. Yes, this is the pyramid data information.

Speaker 1:

For the people following on the audio I just put on the screen, like it's just from the Wikipedia, it's just the data, so it's a pyramid and again like, as you mentioned, in the bottom you have data, then it goes up to information, then it stacks up to knowledge and wisdom. And then I guess what you're saying is that if you do a good data story, people are on the wisdom part. You start from the facts but they acquire wisdom. Let's say yes.

Speaker 2:

And when you move from data to information, you are extracting an insight from data because you have answered to a question and you have filtered your data, which are raw data, and extracted just one insight from your data At the information level. You have an insight. At this level, you represent your chart. This next level is from information to knowledge.

Speaker 2:

When you moved from information to knowledge, you add context to your data, which means that you are reaching your data, your information, with the background. The background is all the relevant things that the audience must know to understand your chart and the context depends on your audience and the context depends on your audience. If you have an audience of technical people, you don't need to add the details, technical details, because they already know them, but if you have a general audience, you have to add a general overview of your data, what they represent, what they are, and so on. Finally, when you add the final level, from knowledge to wisdom, you are dialoring your chart for a specific audience, a specific culture, a specific Maybe. You are talking, you are telling your story to people located, I don't know, in China, which has different background than me, and so you need to adapt the story to them to make them understand your story.

Speaker 1:

And then I guess, if you're changing that layer, I guess, so I think we're in the knowledge layer, right? So, from data information to knowledge, can you reuse the bottom of the pyramid for other cultures? So the example you gave is China, but if you say Brazil, I'm from Brazil, by the way, so can you reuse the bottom two? The idea is that you can still use the bottom two layers of it.

Speaker 2:

Yes, and also you add a call to action to your story. You invite the audience to do something after encountering your story, and I really like this pyramid because it's a conceptual framework that helps you to transform data into story.

Speaker 1:

All right, that's really, really cool and that's what you dive in on the part two of the book. Yes, right, and then on part two have delivering the, the data story, which I guess you talk a bit more about, the, the gen ai and some dangers or some things that you can dive in there. Is that correct?

Speaker 2:

yes, yes, there there are some ethical maybe ethical problems you can can have when you use the AI. You must always check the produced content Because maybe they could have bias, hallucinations and so on, and so you have to control the output, the produced output. Recently, I attended a conference where the main topic was about explainable AI. I don't deal with this topic in my book, but I think that this explainable AI is trying to explain what the AI produces and why the AI generates this type of content. I think that this is a good direction of study, especially in research.

Speaker 1:

Yeah, I do think so, especially with the GenAI models, that they get bigger and they get less explainable and, yeah, there's the risk of hallucination a lot of the times, right? So I think, now more than ever, I think this is a very, very relevant uh field of research. Um, indeed, and you mentioned the some examples. I thought so just to sneak peek of the. And you mentioned some examples I thought so just to sneak peek of the book you mentioned, like, yeah, if you put people in different colors, the color sometimes associated to moods, right. So maybe some things that you're not intended. So this is just to give an example, right.

Speaker 1:

If you say, give me a picture of a man in a yellow chair, maybe there are other things that the model will output that are not intended, right? Maybe, if it says yellow, does it mean? Does it specify a mood? Does it specify a race? Does it specify something there that you also need to take into account? Right, and you also give some practical um tips, as here, right, because, uh, in chat, gpt, or there is also the problem of reproducibility, right. So there is a, the problem of reproducibility, right. So there is a temperature parameter that you can say if you want something. That is very stochastic in a way. Right, you can opt that in and you also give some practical examples of how you can make things reproducible, how you can verify that these things are going. Maybe question how do you come up with these examples? Is it just from your experience?

Speaker 2:

Yes, from my experience. I did some tests and I realized that there were these problems. But regarding reproducibility, I have read a very exciting thing. Maybe the listeners already know, but I have read in a book entitled Adam's Fine Book of Stable Diffusion or something like this, that this book is about generating images with stable diffusion and you can use pass a parameter to these models which is called SID, the SID parameter, which generates always the same images, and so it's a deterministic approach. And from this I have realized that also ChatGPT, according to me, generates the same outputs with the same seed, but you don't see the seed.

Speaker 1:

Hmm, but then the seed is like produced at runtime, or is it just something that? How do you specify the seed?

Speaker 2:

In stable diffusion, you pass it as a parameter of the function, and so you can decide.

Speaker 1:

Yeah, because stable diffusion is open source. Or yeah, right.

Speaker 2:

Yes, yes, right, yes chat GPT is not open source chat GPT you can't, but I think that there is this parameter, but you cannot set. It's not open to people, but you can.

Speaker 1:

I see, yeah, I think this, yeah, yeah, could be. I think the the reproducibility part of Gen AI also interests me when it comes to building applications that you want to test, right. So how can you make sure? Like, yeah, one of the first things, you have a bug, right, and then people are like, oh, I cannot reproduce the bug, it doesn't work on my machine, so I cannot help. Right, and I think with gen ai there's a bit of that. Oh, yeah, gen chat gpt told me something that violates whatever it's like. Well, how did you do this right? And if you cannot reproduce, it's not that it's useless, but it's, it's a bit less helpful right to to really debug. So that's also something that I'm personally interested in.

Speaker 1:

As Gen AI becomes more and more industry ready, let's say, right, how can we test these things? So you know very, very, very cool, cool stuff as well. I see that you really dive in the data story, data storytelling parts. Maybe if you had to summarize for someone, like an elevator pitch, maybe, what makes a good data story, what does does it? How do you know that a data story is good? According to?

Speaker 2:

you in there are. The first element is it's it the message and retells it to another audience or acts based on what the story tells. This is the first thing. The second you can recognize if a story is a good data story if it has characters and a plot, and this is what I explain in my next book, which is Become a Great Data Storyteller, where you will see how to extract characters and plots from data. You will build a sidekick. You will extract a hero from data who has an objective to reach, but there is a problem between the hero and the object of desire. You will extract an antagonist, and so on. This is my next book. I will reveal it when my book will be published.

Speaker 1:

And then next year we can sit down again and chat about your new book as well. Okay, great, no, I think it's cool. I also really like the idea of. I do think that stories they're very powerful in the sense that, um, it's, it's sticky.

Speaker 1:

I think I've read also a book uh, make it stick uh which talks about like I don't know, maybe real quickly here, um, but basically it's in a way it's about stories that are memorable. So and I was thinking about it when you said a good data story is when it's retold to someone else or it kind of touches people, right, it's not just something you hear and then you kind of forget about it, you go on your day. It's something that stays with you. And on the make it stick book, they also talk about some some things in that realm. Right, they kind of try to identify some patterns, right, and even one of the of the things on the book that they talk about is putting stuff in a story, right, I think we relate a lot to stories and it's something that we can empathize with.

Speaker 1:

So if you have something good to say, but you can say it, well, I think it's way more powerful than just saying stuff. Right. It's way more powerful than just saying stuff right. So I definitely think it's a very worthwhile topic. I think there's a lot of stuff there to unravel, so I think it's really cool that you do all this nice work as well and put everything on a nice structured format for all of us. You already mentioned that you have your next book coming up in January, yes, so good luck with that. Is there anything else that you want to share before we wrap it up and call this a pod?

Speaker 2:

I suggest you to to the audience also to read a lot, to study and to enjoy coding, because it's very fun to try new things and to to code yes, I agree, definitely agree.

Speaker 1:

I subscribe to that. Maybe people on your um. How can people find you? You mentioned you have a blog, uh can we share this? Here as well yes, it's uh, or maybe we can put on the show notes as well, if you can send it to me and we put it on the show notes so if people want to find you on your blog there. The book is the data storytelling with out there and gen ai or nai, sorry, not gen ai um and, uh, I think that's it.

Speaker 1:

Let's see. Yes, yeah, how can people buy your book, by the way? So what's the best way that people can?

Speaker 2:

Amazon.

Speaker 1:

I think Amazon that's the best bank for your book there.

Speaker 2:

Yes, or the Manning website, where there are special offers. Sometimes there are promotions and maybe they're here in the O'Reilly platform.

Speaker 1:

I just opened the first one, but you heard here first Go to Amazon, get your book there, get your copy. Thank you very much for taking this time to chat with us. Good luck. Well, I imagine you have a lot of deadlines for your next book as well, so I'll let you do it. Thanks a lot, thank you.

Speaker 2:

But to you, for it was a pleasure to chat with you today.

Speaker 1:

Likewise, likewise. Thank you, you have taste in a way that's meaningful to software people.

Speaker 2:

Hello, I'm Bill Gates.

Speaker 1:

I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.

Speaker 2:

I'm reminded incidentally of Rust here.

Speaker 1:

Rust. This almost makes me happy that I didn't become a supermodel. Cooper and Ness.

Speaker 2:

Well, I'm sorry guys, I don't know what's going on.

Speaker 1:

Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here Rust Rust Data Topics.

Speaker 2:

Welcome to the Data Topics. Welcome to the Data Topics Podcast.

People on this episode