Let us begin with the essential question of why the use of Natural Language Generation (NLG) in the context of big data projects is crucial for the business success of companies in the future. To answer this question, we will take a brief look into the past.
Over the last 15 years, decision-makers all around the world concluded that data would change the way we do business. As early as 2010, different media outlets, like the Financial Times and Forbes, noted the importance of analysing real-time data in addition to collecting historical, transaction-based data, and how much resources companies invest in data management in order to make better and more informed decisions. ‘Data is the new oil’ has become a common analogy.
Why is data the new oil?
Today, no one doubts that big data will open new dimensions in knowledge transfer, decision-making and solving different kinds of societal and business problems. Business models of today’s most valuable companies are based on data and internet users are generating the unbelievable amount of 2.5 quintillion bytes of data worldwide every day.
Yet 90% of the data that is generated is still unstructured. It comes from emails, social media posts, chats, voice messages or videos. In addition, there are countless data generated daily by software. So, we produce a massive amount of data by the systems we develop and build up enormous costs. Because all this information is useless unless it is processed in order to make its content useful for business.
Let’s face it: How many successful Big Data projects do you have and how much effort has already been invested into it?
As Google’s Chief Economist Hal Varian put it: “The challenge is to understand, process, visualise and communicate data in order to create value from it.” So how do we connect the dots?
To make this sheer volume of data manageable, companies today are building huge enterprise data warehouses. And they are investing heavily in hardware. To manage all that, a plethora of new roles have emerged in companies, e.g., Data Analysts, Solution Architects, Data Integration Engineers, or Data Strategists. And above all, it needs business intelligence, which is applied in data processing and data analysis, to make the masses of data we collect usable. There are countless valuable vendors and BI solutions such as Tableau, Power BI, Qlick and others.
Extracting insights from data with natural language generation
But here, too, the question may be asked: Who, apart from the experts mentioned, uses BI solutions today? Who, apart from a few experts in your companies, is able to understand the machine-generated analyses today? In business processes that are critical to companies, there are many stakeholders with different requirements, with a wide variety of sometimes contradictory information needs, and with diverse knowledge backgrounds. Do these stakeholders really understand what the visualisation of data analyses is all about?
To use Hal Varian’s words again: Besides all the other points, communication is needed to extract value from big data. And communication needs natural language because language is the fundamental layer for human interaction.
If we consider that almost every business model, every business-relevant process is based at various points on human interaction and thus on language, then the importance of the use of Natural Language Processing (NLP) becomes apparent. There is much evidence to suggest that a large proportion of interaction based on written or spoken communication between machine and human, human and machine, but also between machine and machine will in the future become increasingly analysable and automatable using NLP.
Almost all large companies are investing in their own NLP tech stack (GPT-3 from OpenAI, BERT from Google, Turing-NLG from Microsoft e.g.). This also can be clearly seen in the number of publications at the most important NLP conference – the ACL. In 2020, 165 of the 780 papers published were submitted by teams from Microsoft, Facebook, Google, IBM, Amazon, and Salesforce.
These companies invest so heavily in NLP because NLP is the essential driver for any kind of machine to human or human to machine communication. So back to the question, are there successful data projects? Yes, there are already many examples of successful data projects using NLG to communicate data insights. To make it more tangible, here are four examples:
Four NLG use cases, endless benefits
Based on Natural Language Generation Technology, the German Football Association generates automatic football reports for all matches in nonprofessional football across all leagues in women’s, men’s, youth, and senior football. Counting pre- and post-match reports this comes up to the impressive amount of up to 124,000 reports every single week.
Let’s take this example to shortly explain, how NLG works. Due to the rules set by the German Football Association, every referee has to submit a match report right after the end of the match. In professional football, data vendors even constantly collect and transmit data during a match. This data is sent via API to Retresco’s NLG solution, textengine.io, which analyses the data and extracts the relevant information, e.g., who scored a goal and when, who was substituted and who was dismissed from the game with a red card?
This data is enriched with historical data, such as that one of the teams is the current champion or has the top scorer in its own ranks. Based on this information, the textengine generates different text types such as match previews or post-match reports, player portraits or social media snippets, depending on the use case and customer. In professional football, if desired, with a high level of narrative detail. In different languages and in real time as soon as the data is available.
Automatically generated real estate exposés, product descriptions & BI reports
The real estate platform Immobilienscout24 generates real estate descriptions in real time just as users enter the features of the properties they want to list on the platform. In addition to the user’s input data, Immobilienscout also uses geodata from third-party databases to comprehensively describe not only the property itself but also its surroundings and the area it is in. Approximately 75% of the text suggestions are accepted as is or only marginally changed by the user.
The multinational consumer electronics retail chain MediaMarktSaturn creates product descriptions for more than 5000,000 of their products across almost all categories in unique versions on two different platforms.
And finally, NLG is increasingly being integrated into BI solutions in order to create relevant business reports for various stakeholders in real time.
Five key values of Natural Language Generation
As you can see there are a many existing use cases with enormous business potential in very different business areas. The overview below shows the real benefits of NLG that are already possible today.
What’s next: Can NLG also be creative?
Probably yes. Right now, NLG vendors develop solutions to realise the benefits NLG more easily. Language models such as GPT-3 from OpenAI enable the fully automated creation of texts based on minimal human input. We ourselves developing solutions in which the software independently makes suggestions for the formulation of texts.
These are exciting times, and we are still at the beginning of NLG development. We at Retresco strongly believe that soon almost every company in the world will use NLG technology to optimize their business processes. Consciously or unconsciously.
Retresco empowers companies to automatically generate high-quality texts based on data. As a pioneer in the field of AI-based language technologies, the Berlin-based tech company has been developing cross-industry solutions for the efficient and future-oriented automation of business processes since 2008.