by Angelica Lo Duca (CNR-IIT)
At CNR-IIT we are investigating if generative AI can be used to improve Data Storytelling and generate more engaging and informative data stories.
Data Storytelling (DS) is communicating data through narratives. Traditionally, DS is done manually. We can use the Data-Information-Knowledge-Wisdom (DIKW) framework to transform raw data into a data story [1]. First, we start building a data-driven story by extracting insight from data and representing it graphically (from data to information). Next, we add context, which describes all the additional information required to understand the data (from information to knowledge). Finally, we include the next steps in the story, inviting the audience to do something. This is the famous call to action phase, which must be permanently anchored to an ethical framework (from knowledge to wisdom).
Over the past year, a new trending technology called Generative Artificial Intelligence (GenAI) has emerged [L1]. GenAI, a subfield of AI, can generate new content, such as text, images and voice, based on the examples it has learned. GenAI can perform different tasks, such as automatising boring operations. One field of application of GenAI could be DS. If we consider a data story composed of tasks (e.g. story planning, execution and communication), we could use GenAI in different ways to implement a task: as a creator, an optimiser, a reviewer or an assistant [2]. The most straightforward approach involves using GenAI to generate the content of a story under the storyteller’s supervision.
Combining GenAI and DS
Figure 1 (a) shows a possible integration of GenAI into the DIKW pyramid. Starting from the bottom of the pyramid, GenAI can help data storytellers extract insights by discovering patterns and correlations among data samples and identifying anomalies (from data to information). Next, GenAI can generate relevant context related to the extracted insights regarding textual annotations, images reinforcing the described concepts, and voice (from information to knowledge). Finally, GenAI can fine-tune the proposed call to action by anchoring it to an ethical framework (from knowledge to wisdom) [3].
Figure 1: (a) How GenAI can be applied to the DIKW pyramid to build a data-driven story. (b) An example of question/answer steps to make ChatGPT generate an annotation to include in a data-driven story.
The proposed approach is just one of the ways to incorporate GenAI into the DIKW pyramid. There are countless other ways to leverage these tools, such as synthesising large amounts of data, developing personalised, predictive models, and constructing personalised recommendations based on data.
As an example of using GenAI to generate textual annotation, we consider a case study where we want to build a data-driven story regarding the homeless problem. We started a conversation with a GenAI tool (i.e. ChatGPT [L2]) to extract a possible context that describes the situation where homeless individuals live. The context should be a short and engaging sentence. Figure 1 (b) shows the steps involved in the conversation. Q means the user questions (which will form the basis of our prompts), and A the ChatGPT answers.
ChatGPT generated the text to include in our story after four steps. We used the following strategy to make ChatGPT generate the desired text:
- Describe: ask ChatGPT to describe the problem in general. In this case, ask ChatGPT to describe the homelessness condition in general. As an answer, ChatGPT generates a long text.
- Shorten: ask ChatGPT to write a summary of the generated text.
- Transform: ask ChatGPT to make the summary more engaging for the audience.
- Shorten: if the text is still long, ask ChatGPT to reduce it.
Without realising it, we applied the DIKW model to the use of ChatGPT. Starting from a long text (data), we extracted the information (summary) and then converted it into knowledge and wisdom (engaging text). In other words, when we talk to ChatGPT to generate context, we can organise the conversation as a story.
Ethical Considerations
It is worth noting that combining GenAI and DS may generate biased or inaccurate stories. In addition, data storytellers may use GenAI to build fake data stories that seem realistic to manipulate their audiences. For this reason, data storytellers must control the output produced by GenAI continuously.
To mitigate these risks, using GenAI responsibly and ethically is essential. Data storytellers should be transparent about using GenAI in their work and should take steps to ensure that the stories they generate are accurate and unbiased. This includes using high-quality data, training GenAI models on diverse datasets, and being aware of the potential for bias in the data and the models.
Conclusions
In conclusion, GenAI can be a powerful tool for DS to generate content for data stories, such as text, images and voice. However, using GenAI responsibly and ethically is essential, as it can be used to generate biased or inaccurate stories. Data storytellers should be transparent about using GenAI in their work and should take steps to ensure that the stories they generate are accurate and unbiased. Future research about AI-assisted DS should address these ethical challenges.
Links:
[L1] https://www.techtarget.com/searchenterpriseai/definition/generative-AI
[L2] https://chat.openai.com/
References:
[1] K. McDowell, “Storytelling wisdom: Story, information, and DIKW,” J.of the Association for Information Science and Technology, vol. 72, no. 10, pp. 1223–1233, 2021. https://doi.org/10.1002/asi.24466
[2] H. Li et al., “Why is AI not a panacea for data workers? An interview study on human-AI collaboration in data storytelling,” 2023. https://doi.org/10.48550/arXiv.2304.08366.
[3] A. Lo Duca, “Data Storytelling with Generative AI using Python and Altair,” Manning Publications, 2024. https://www.manning.com/books/data-storytelling-with-generative-ai
Please contact:
Angelica Lo Duca, CNR-IIT, Pisa, Italy