From NLG research: How does neural network text generation work?


Several times a year, Retresco holds an in-house hackathon. That means two days away from day-to-day work – time to engage with new topics, think outside the box, and try out new approaches. This helps us look at our own technologies from different perspectives and make innovative advances.

Inspired by the E2E NLG Challenge, set by Heriot-Watt University in 2017, a team of developers from Retresco set about replacing the traditional text generation process with a neural network.

The objective of the end2end approach is to enable an NLG system to learn the process of text generation independently, from one end (structured data) through to the other (a finished text). The ultimate hope is to produce a system that could in future simply be supplied with data to learn and abstract from, enabling the production of new expressions and texts without human intervention.

To understand the scale and advantages of end2end text generation, it is helpful to take another look at how things are today – the ‘traditional’ text generation process.

How does automatic text generation work?
Before a system such as rtr textengine can start generating texts, an initial set-up phase is required, during which the structural framework of the desired text outputs is defined in the program code. It is at this stage that all of the various input parameters, from subject and text type to the length of the final product, are defined in a ‘story plot’. This is followed by the text generation process, which can be subdivided into various steps which flow smoothly into one another.

In the first step, an algorithm decides what information the text needs to convey. To do this it searches the structured data for relevant content such as extreme values and aggregates it into ‘messages’. As it is extremely difficult to construct a generally applicable model for doing this, extracting information that is genuinely relevant and interesting is often the most challenging part of the text generation process.

Next, the ‘text planner’ puts the statements into a meaningful order using the predefined story plots, then the ‘sentence planner’ assigns language-specific information to the messages. To do that, the system selects the most suitable expressions from its repertoire and populates them with the selected data values.

This is about much more than simply inserting those data values. The data is ‘lexicalized’: the words are given meaning. At a basic level, textengine must use words in the grammatically correct form as dictated by the sentence structure. But lexicalization can also involve textengine using descriptors such as ‘serial champions’ or ‘record holders’ in a text about Bayern Munich FC to give the text variance.

In the last step, the text is put into its final format. This includes finishing touches such as inserting paragraph breaks and checking capitalization to produce a publication-ready text.

The entire process happens in real time. So in the field of automated football reporting, for example, this means that numerous unique texts will be ready to access right after the final whistle. But to enable this process to take place at all, users must repeat the set-up process for each new area: the desired output format must be specified, thresholds defined and varied expressions entered. Scalable text generation is then possible.

So what would happen if a neural network was used for this middle part of the process? Then you would have data at one end, texts at the other – and a neural network in the middle taking care of everything else.

How does end2end text generation work?
In end2end text generation, the traditional generation pipeline is replaced by a neural network. That means amalgamating all of the steps in the traditional text generation process described above into a single model. That model would learn to select what it wants to say, how it wants to order its statements, and the best linguistic forms to use automatically.

This approach originates from machine translation, which uses a model like this to translate a French sentence into an English one, for example. In end2end text generation, the French sentence is replaced with appropriately coded structured data, and the model learns how to ‘translate’ from the data into a corresponding text directly. As well as machine translation, this approach is also already in use in speech-to-text and automated text summarization applications.

Applied to NLG, an end2end system of this kind, supplied with large volumes of structured data and the relevant texts, would learn for itself how to produce new expressions and texts. The major advantage of this is obvious: new use cases could be developed within a very short time, boosting performance significantly – without an initial human set-up phase.

For this approach to work, however, the model would have to be trained with vast quantities of text pairs, and preparing these would involve considerable effort.

Does end2end text generation work?
More or less.

“Initial results

What about the all-in-one PC Apple iMac with German keyboard. The computer is already supplied with the Windows 10 Home (64-bit) operating system. Fast, silent and insensitive to vibrations – the computer’s SSD hard drive.”

The impression gained from previous research was confirmed: While texts are fluent and readable and new expressions are sometimes produced, the weaknesses of the technology are quickly apparent. The intention of the generated text is discernible, but as well as being grammatically incorrect it also contains factual errors – since when have Macs used Windows 10?

The major advantage of autonomy also means surrendering opportunities for intervention: An autonomous system needs no correction in the best-case scenario – but nor does it permit such correction. This produces factual and linguistic errors.

However, these could only be prevented by an appropriate data situation. For a more realistic result, staggeringly large volumes of training data would be needed – particularly data which is consistent and matches the texts imported. It is this that makes end2end text generation unrealistic.

But the project was not without its uses – far from it: while end2end is no replacement for the sophisticated technology in rtr textengine, we did gain a deeper understanding of the underlying technology. And we could apply this new knowledge at various points to improve individual components in our pipeline. It might be possible to use the end2end architecture to combine known phrases to automatically generate new expressions or to optimize the lexicalization process by learning a language’s word formation rules. So the end2end architecture could help us make aspects of our technology better and more efficient.

Once again, the hackathon showed how important it is for our team to take time out of their day-to-day work to engage with new approaches, gain a deeper understanding of our technologies and develop new ideas. Want to be at our next in-house hackathon? Take a look here.

Back to the news overview