How OpenAI and LangChain helped me migrate my church's sermons to Spotify
Part 2 - AI-Generated Summary
First, what about that empty description field?
In Part 1 of this article series, I shared how I have been working to migrate my church’s sermon recordings from a website to Spotify. I used some common tools like Python and Excel to download the recordings to my local machine, and then used UIPath to automate the Spotify website and upload them. But in that process, I noticed this big, empty description field.
And then I went on to describe the conversation I had with myself:
Me: Wouldn’t it be nice to provide a brief description of each episode, so a listener gets more information than just what’s in the title?
Me: Definitely! I listen to a lot of podcasts and the descriptions let me search for topics I’m particularly interested in.
Me: But the original site didn’t contain a description so where am I supposed to get it? I don’t have time to listen to nearly 300 episodes and create my own summary.
Me: Hmmm… good question… what about the speakers? Could they provide the summaries?
Me: I’m sure they don’t have time, and some of those sermon recordings go back 6 years, so they would likely need to re-listen to their sermon to provide a summary.
Me: Ewww! Who likes listening to themself speak?
Me: Not me! I’d rather listen to a teacher’s nails on the chalkboard!
Me: Can’t AI create summaries?
Me: I think so…
I’ve been using AI in a number of ways lately including GitHub Copilot to help me code, ChatGPT to help answer questions and help with code, and Midjourney to help me create images. Since I’d heard AI could summarize text, I figured how hard could it be? All I need to do is convert the audio recordings to text, then submit them to GPT4 or some other AI model to get a summary. But, while it in the first part of this post I mentioned how easy it was to use UIPath, a new tool for me, to accomplish my task, that was definitely not the case with creating a summary.
All I need to do… (famous last words)
The first step, converting the recordings to text, did end up being delightfully easy and extremely affordable. Since I wanted to avoid uploading all of the recordings, and because I didn’t want to spend any extra money, I looked for an open-source tool that I could use to transcribe the recordings locally. The first tool I found is called “Speech Translate”, and I was able to use it to bulk transcribe the nearly 300 sermons in just a few hours.
To get started, I followed the readme in the GitHub repo to download and run Speech Translate using my GPU (Yes! I finally get to put my new-to-me GPU to good use!). Then I set the Mode to “Transcribe”, left the default Model of “Tiny (32x speed)” and selected a file. Speech Translate quickly download the 72mb model and transcribed my recording in just 1 minute! That’s certainly way too fast and can’t be correct so I didn’t even review the result.
Next, I downloaded the model “Medium (2x speed)”, selected the same file and it ran for 7 minutes. I reviewed the transcribed text and was delighted by how accurate it looked even though the recording equipment at the church is nothing fancy. So, I selected the rest of the files and let it through the rest of the day and into the next. I estimate it took about 34 hours to transcribe all the files.
Want to know a secret? As I was writing this post, I decided to compare a couple of “Tiny” 1-minute transcriptions to the corresponding “Medium” 7-minute transcriptions, and you know what? One was identical, and the other was nearly identical only differing by a few dozen words. Maybe when AI is doing all my work, I’ll have time to listen to the recordings and figure out which of the models actually produced the best transcription, but for now I would definitely consider the “Tiny” model for tasks where speed is important. Obviously, your mileage may vary.
Summarizing the transcription
I learned quickly that transcriptions with 8,000-10,000 words can’t just be dropped into ChatGPT and summarized. They are too long, at least with today’s token limits.
I researched and tried a number of different online summarizers, but for all their marketing the results were garbage, and many would have been prohibitively expensive. I also tried a variety of solutions that ran locally, some using OpenAI and some using open-source models that could be downloaded and run on my machine. I was particularly hopeful that Serge would work because it makes it super easy to download and use a variety of open-source models, but the results I saw were either empty or way off track.
In the end, what worked best for me was heavily refactoring a simple sample app from shub.codes called PDFSummarizer. It pointed me to LangChain, which allowed me to split the long transcription text into bite-sized chunks with some overlap, then submit the chunks to OpenAI for summarization.
Online Summarizers
I don’t want this post to get too long, so suffice it to say that I tried a number of online summarizers. Some offered free trials, and some didn’t but none of them worked for me. The ones with free trials didn’t give me the kind of simple summary I was looking for, and others would have been too expensive because all of the costs for this migration project are coming out of my pocket. I’m confident both the quality of these solutions will go up and the prices will come down over time, so don’t discount them - they may work great for you.
Serge
From the Serge GitHub readme “Serge is a chat interface crafted with llama.cpp for running Alpaca models.” Serge has a web interface and runs in a Docker container, so installation is extremely easy. Once it’s running you can choose which model to download and run.
The first challenge with Serge is figuring out which model to use. At the time I’m writing this, Serge supports 51 different models, that are intended for different purposes. Unfortunately, the interface in Serge where you can download the models doesn’t tell you anything about them. However, I did find a few of the models that Serge supports on HuggingFace’s OpenLLM Leaderboard so I tried the ones that looked appropriate for my task as well as a couple of others I could find some info on. All in all, I tried about 8 different models.
Another challenge, albeit a minor one in my case, is Serge doesn’t yet support using your GPU. There are some issues indicating that this support may be coming, but for now you’ll have to settle for using your much slower CPU. However, even running on CPU (Intel Core I5 12900K) it typically took about 1 minute for Serge to complete a summary.
Finally, the real blow to using Serge wasn’t Serge’s fault. The models just weren’t giving me useful results. Many times, I would get a summary of just the last 10 or so sentences of the transcript. Some of the models only returned the very last word of the transcript. And others seemed to be trying to continue the transcript. I tried a variety of system prompts and chat prompts, and even used tried using a prompt generator in ChatGPT to craft a good prompt, but none of the results were useful. However, I’m going to keep my eye on this as I love the idea of running open-source models on my own machine.
LangChain FTW
I decided to go back to using an OpenAI model to get the quality of results I was looking for, so I searched for a solution to summarizing large volumes of text with OpenAI and I found this article: Build an AI-Powered PDF Summarizer with LangChain and OpenAI – shub.codes. Its subject is summarizing PDFs, but I figured extracting the text from the PDF was a step I could skip. It also demonstrates the use of embeddings and that’s something I’ve wanted to learn more about.
Well, I’m not sure what transpired between the publishing of that article and now, but some things in LangChain have changed. For example, in the article langchain has a PDF() method like this:
# Load the PDF into LangChain
pdf = langchain.PDF(pdf_file)
That led me to think there might be a TEXT() or similar method, but that does not appear to be the case in the current version. Even the PDF() method is gone.
Soooo… I rewrote the whole thing using the methods I could find and so that it fit my mental model. While it took way longer than anticipated, I’m pretty happy with the results.
Here is the code I ended up with that will transcribe a text file called C:\temp\mp3_transcribed.txt.
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain import PromptTemplate
from os import getenv
from dotenv import load_dotenv, find_dotenv
# Define the prompt template
prompt_template = """Write a concise summary of the following transcription:
{text}
CONCISE SUMMARY:"""
# Load the .env file into the environment
load_dotenv(find_dotenv())
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
# Read the transcription file
filePath = "C:\\temp\\mp3_transcribed.txt"
with open(filePath, "r") as f:
fileContents = f.read()
# Connection for OpenAI
llm_openai = OpenAI(
openai_api_key=getenv("OPENAI_API_KEY"),
temperature=.5
)
# Split the fileContents into chunks and create a document with the chunks
texts = CharacterTextSplitter(".", chunk_size=4000, chunk_overlap=200).split_text(fileContents)
docs = [Document(page_content=t) for t in texts]
# Construct the chain
openai_chain = load_summarize_chain(
llm_openai,
chain_type="map_reduce",
return_intermediate_steps=False,
map_prompt=PROMPT,
combine_prompt=PROMPT,
)
# Get the summary
summary = openai_chain.run(docs)
print(summary)
You’ll need to do the following for the code to run:
Install the prerequisites:
pip install langchain openai
Create a .env file with your OpenAI API Key:
OPENAI_API_KEY=[YOUR OPENAI API KEY]
Affordable Data Summarizer
While I was doing all the work with LangChain, I really wanted the ability to compare the summary results from both OpenAI and from Azure OpenAI, which I had recently worked with. And I wanted to see which was more affordable.
Inspired by the original app’s use of Streamlit, Affordable Data Summarizer was born!
It’s a super easy-to-use app that will create summaries both OpenAI and Azure OpenAI for you to compare.
You can read more about the Affordable Data Summarizer and get instructions for running it yourself at Affordable Data.