• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, July 3, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

A Coding Implementation to Build an Interactive Transcript and PDF Analysis with Lyzr Chatbot Framework

Josh by Josh
May 28, 2025
in Al, Analytics and Automation
0
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


In this tutorial, we introduce a streamlined approach for extracting, processing, and analyzing YouTube video transcripts using Lyzr, an advanced AI-powered framework designed to simplify interaction with textual data. Leveraging Lyzr’s intuitive ChatBot interface alongside the youtube-transcript-api and FPDF, users can effortlessly convert video content into structured PDF documents and conduct insightful analyses through dynamic interactions. Ideal for researchers, educators, and content creators, Lyzr accelerates the process of deriving meaningful insights, generating summaries, and formulating creative questions directly from multimedia resources.

!pip install lyzr youtube-transcript-api fpdf2 ipywidgets
!apt-get update -qq && apt-get install -y fonts-dejavu-core

We set up the necessary environment for the tutorial. The first command installs essential Python libraries, including lyzr for AI-powered chat, youtube-transcript-api for transcript extraction, fpdf2 for PDF generation, and ipywidgets for creating interactive chat interfaces. The second command ensures the DejaVu Sans font is installed on the system to support full Unicode text rendering within the generated PDF files.

import os
import openai


openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY_HERE"

We configure OpenAI API key access for the tutorial. We import the os and openai modules, then retrieve the API key from environment variables (or directly set it via os.environ). This setup is essential for leveraging OpenAI’s powerful models within the Lyzr framework.

import json
from lyzr import ChatBot
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript
from fpdf import FPDF
from ipywidgets import Textarea, Button, Output, Layout
from IPython.display import display, Markdown
import re

Check out the full Notebook here

We import essential libraries required for the tutorial. It includes json for data handling, Lyzr’s ChatBot for AI-driven chat capabilities, and YouTubeTranscriptApi for extracting transcripts from YouTube videos. Also, it brings in FPDF for PDF generation, ipywidgets for interactive UI components, and IPython.display for rendering Markdown content in notebooks. The re module is also imported for regular expression operations in text processing tasks.

def transcript_to_pdf(video_id: str, output_pdf_path: str) -> bool:
    """
    Download YouTube transcript (manual or auto) and write it into a PDF
    using the system-installed DejaVuSans.ttf for full Unicode support.
    Fixed to handle long words and text formatting issues.
    """
    try:
        entries = YouTubeTranscriptApi.get_transcript(video_id)
    except (TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript):
        try:
            entries = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
        except Exception:
            print(f"[!] No transcript for {video_id}")
            return False
    except Exception as e:
        print(f"[!] Error fetching transcript for {video_id}: {e}")
        return False


    text = "\n".join(e['text'] for e in entries).strip()
    if not text:
        print(f"[!] Empty transcript for {video_id}")
        return False


    pdf = FPDF()
    pdf.add_page()


    font_path = "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf"
    try:
        if os.path.exists(font_path):
            pdf.add_font("DejaVu", "", font_path)
            pdf.set_font("DejaVu", size=10)
        else:
            pdf.set_font("Arial", size=10)
    except Exception:
        pdf.set_font("Arial", size=10)


    pdf.set_margins(20, 20, 20)
    pdf.set_auto_page_break(auto=True, margin=25)


    def process_text_for_pdf(text):
        text = re.sub(r'\s+', ' ', text)
        text = text.replace('\n\n', '\n')


        processed_lines = []
        for paragraph in text.split('\n'):
            if not paragraph.strip():
                continue


            words = paragraph.split()
            processed_words = []
            for word in words:
                if len(word) > 50:
                    chunks = [word[i:i+50] for i in range(0, len(word), 50)]
                    processed_words.extend(chunks)
                else:
                    processed_words.append(word)


            processed_lines.append(' '.join(processed_words))


        return processed_lines


    processed_lines = process_text_for_pdf(text)


    for line in processed_lines:
        if line.strip():
            try:
                pdf.multi_cell(0, 8, line.encode('utf-8', 'replace').decode('utf-8'), align='L')
                pdf.ln(2)
            except Exception as e:
                print(f"[!] Warning: Skipped problematic line: {str(e)[:100]}...")
                continue


    try:
        pdf.output(output_pdf_path)
        print(f"[+] PDF saved: {output_pdf_path}")
        return True
    except Exception as e:
        print(f"[!] Error saving PDF: {e}")
        return False

Check out the full Notebook here

This function, transcript_to_pdf, automates converting YouTube video transcripts into clean, readable PDF documents. It retrieves the transcript using the YouTubeTranscriptApi, gracefully handles exceptions such as unavailable transcripts, and formats the text to avoid issues like long words breaking the PDF layout. The function also ensures proper Unicode support by using the DejaVuSans font (if available) and optimizes text for PDF rendering by splitting overly long words and maintaining consistent margins. It returns True if the PDF is generated successfully or False if errors occur.

def create_interactive_chat(agent):
    input_area = Textarea(
        placeholder="Type a question…", layout=Layout(width="80%", height="80px")
    )
    send_button = Button(description="Send", button_style="success")
    output_area = Output(layout=Layout(
        border="1px solid gray", width="80%", height="200px", overflow='auto'
    ))


    def on_send(btn):
        question = input_area.value.strip()
        if not question:
            return
        with output_area:
            print(f">> You: {question}")
            try:
                print("<< Bot:", agent.chat(question), "\n")
            except Exception as e:
                print(f"[!] Error: {e}\n")


    send_button.on_click(on_send)
    display(input_area, send_button, output_area)

Check out the full Notebook here

This function, create_interactive_chat, creates a simple and interactive chat interface within Colab. Using ipywidgets provides a text input area (Textarea) for users to type questions, a send button (Button) to trigger the chat, and an output area (Output) to display the conversation. When the user clicks send, the entered question is passed to the Lyzr ChatBot agent, which generates and displays a response. This enables users to engage in dynamic Q&A sessions based on the transcript analysis, making the interaction like a live conversation with the AI model.

def main():
    video_ids = ["dQw4w9WgXcQ", "jNQXAC9IVRw"]
    processed = []


    for vid in video_ids:
        pdf_path = f"{vid}.pdf"
        if transcript_to_pdf(vid, pdf_path):
            processed.append((vid, pdf_path))
        else:
            print(f"[!] Skipping {vid} — no transcript available.")


    if not processed:
        print("[!] No PDFs generated. Please try other video IDs.")
        return


    first_vid, first_pdf = processed[0]
    print(f"[+] Initializing PDF-chat agent for video {first_vid}…")
    bot = ChatBot.pdf_chat(
        input_files=[first_pdf]
    )


    questions = [
        "Summarize the transcript in 2–3 sentences.",
        "What are the top 5 insights and why?",
        "List any recommendations or action items mentioned.",
        "Write 3 quiz questions to test comprehension.",
        "Suggest 5 creative prompts to explore further."
    ]
    responses = {}
    for q in questions:
        print(f"[?] {q}")
        try:
            resp = bot.chat(q)
        except Exception as e:
            resp = f"[!] Agent error: {e}"
        responses[q] = resp
        print(f"[/] {resp}\n" + "-"*60 + "\n")


    with open('responses.json','w',encoding='utf-8') as f:
        json.dump(responses,f,indent=2)
    md = "# Transcript Analysis Report\n\n"
    for q,a in responses.items():
        md += f"## Q: {q}\n{a}\n\n"
    with open('report.md','w',encoding='utf-8') as f:
        f.write(md)


    display(Markdown(md))


    if len(processed) > 1:
        print("[+] Generating comparison…")
        _, pdf1 = processed[0]
        _, pdf2 = processed[1]
        compare_bot = ChatBot.pdf_chat(
            input_files=[pdf1, pdf2]
        )
        comparison = compare_bot.chat(
            "Compare the main themes of these two videos and highlight key differences."
        )
        print("[+] Comparison Result:\n", comparison)


    print("\n=== Interactive Chat (Video 1) ===")
    create_interactive_chat(bot)

Check out the full Notebook here

Our main() function serves as the core driver for the entire tutorial pipeline. It processes a list of YouTube video IDs, converting available transcripts into PDF files using the transcript_to_pdf function. Once PDFs are generated, a Lyzr PDF-chat agent is initialized on the first PDF, allowing the model to answer predefined questions such as summarizing the content, identifying insights, and generating quiz questions. The answers are stored in a responses.json file and formatted into a Markdown report (report.md). If multiple PDFs are created, the function compares them using the Lyzr agent to highlight key differences between the videos. Finally, it launches an interactive chat interface with the user, enabling dynamic conversations based on the transcript content, showcasing the power of Lyzr for seamless PDF analysis and AI-driven interactions.

if __name__ == "__main__":
    main()

We ensure that the main() function runs only when the script is executed directly, not when it’s imported as a module. It’s a best practice in Python scripts to control execution flow.

In conclusion, by integrating Lyzr into our workflow as demonstrated in this tutorial, we can effortlessly transform YouTube videos into insightful, actionable knowledge. Lyzr’s intelligent PDF-chat capability simplifies extracting core themes and generating comprehensive summaries, and also enables engaging, interactive exploration of content through an intuitive conversational interface. Adopting Lyzr empowers users to unlock deeper insights and significantly enhances productivity when working with video transcripts, whether for academic research, educational purposes, or creative content analysis.


Check out the Notebook here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

READ ALSO

Confronting the AI/energy conundrum

Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Parameters

Related Posts

Confronting the AI/energy conundrum
Al, Analytics and Automation

Confronting the AI/energy conundrum

July 3, 2025
Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Parameters
Al, Analytics and Automation

Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Parameters

July 2, 2025
Novel method detects microbial contamination in cell cultures | MIT News
Al, Analytics and Automation

Novel method detects microbial contamination in cell cultures | MIT News

July 2, 2025
Baidu Researchers Propose AI Search Paradigm: A Multi-Agent Framework for Smarter Information Retrieval
Al, Analytics and Automation

Baidu Researchers Propose AI Search Paradigm: A Multi-Agent Framework for Smarter Information Retrieval

July 2, 2025
Merging design and computer science in creative ways | MIT News
Al, Analytics and Automation

Merging design and computer science in creative ways | MIT News

July 1, 2025
Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and Semantic Kernel
Al, Analytics and Automation

Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and Semantic Kernel

July 1, 2025
Next Post
How to create custom names for top fans on Facebook

How to create custom names for top fans on Facebook

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Eating Bugs – MetaDevo

Eating Bugs – MetaDevo

May 29, 2025
Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

May 30, 2025
Entries For The Elektra Awards 2025 Are Now Open!

Entries For The Elektra Awards 2025 Are Now Open!

May 30, 2025

EDITOR'S PICK

Following Pixel, Google ‘voluntary exits’ continue with PeopleOps

Following Pixel, Google ‘voluntary exits’ continue with PeopleOps

June 5, 2025
Latest Cisco Data Center Innovation Available to Customers Simply and Flexibly Through Cisco Enterprise Agreement

Latest Cisco Data Center Innovation Available to Customers Simply and Flexibly Through Cisco Enterprise Agreement

June 2, 2025
Adding support for Google Pay within Android WebView

Adding support for Google Pay within Android WebView

May 29, 2025
How to Get Paid Immediately with Zoho

How to Get Paid Immediately with Zoho

July 2, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • 5 Best Customer Communications Management Software I Like
  • Anniversary Stories: LP Steele of POPLIFE Looks Back, and to the Future
  • No-cost AI tools that amplify teaching and learning
  • A 2025 guide for marketers
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?