Blog   

Whisper Hackathon: Evaluating how OpenAI's speech recognition model can be used in our company

Photo by Anastasia Linnik

Anastasia Linnik

Chief Artificial Intelligence Officer, Retresco

Teilen

Twice a year we at Retresco are organising a hackathon to give our colleagues the opportunity to work on ideas they have or new features they want to test. They come up with great projects, that often build the basis for feature developments we actually integrate into our products and services.

We asked one of our colleagues to give us an insight into their project at our last hackathon on October 12th to 14th. This is what Marco, our Computer Linguist Developer, told us about it:

For our last hackathon I teamed up with four other colleagues to work with OPENAI's open source model "Whisper". Whisper is a multitask model that promises very good performance in speech recognition and is also trained in translation and speech identification. Our main goal was to figure out if Whisper was suitable to be used in our company.

After transcribing the first audio files and familiarising ourselves with the model, our idea as a team was to build a search engine for videos. This means to make the speech uttered in a video searchable via text search and also display links with matching timestamps or embed the videos at those timestamps. Whisper divides the text it transcribes into segments, each containing a start and end time. We indexed these segments in an Elasticsearch database, which offers the possibility to implement an intelligent search. During the hackathon, we implemented a functioning POC including a frontend, and our skills in the team complemented each other very well.

We did not only work on the backend and on implementing a frontend, but also on the details and a better understanding of the model and its outputs. The transcriptions of Whisper were surprisingly good. The only minor shortcoming was the transcription of proper names, some of which were not transcribed in the correct spelling, or had some wrong sounds.

Whisper Hackathon - Screenshot

Overall, the POC shows that such a video search engine with Whisper can work very well, and the team had a lot of fun making the hackathon a great success for the team. The last steps to use the POC also in everyday life for transcribing the weeklies, would be to extend the upload endpoint with the indication of the video link.

If you want to learn more about our work and team, please visit our “about us” section: https://www.retresco.com/about-us

Back to the news overview