Recording to Tasks

A Python CLI tool that transforms audio/video meeting recordings into structured transcriptions and to-do lists using OpenAI APIs (Whisper for transcription and GPT for analysis).

🚀 Features

Automatic transcription of audio and video files using Whisper/GPT-4o
Intelligent extraction of tasks, decisions, and action items
Large file handling with automatic chunking
Multi-format support (MP4, MOV, WAV, MP3, M4A, etc.)
Parallel processing to speed up transcription
Automatic retry on API errors with exponential backoff
Precise timestamps for each section
Structured output in Markdown format
💰 Cost optimization with transcription-only mode (-t flag)
Enhanced error handling with response validation and diagnostics

🛠️ Setup and Installation

Prerequisites

Python 3.8+ installed on your system

ffmpeg installed:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

OpenAI account with API key

Installation

Option 1: Automatic Setup (Recommended)

Clone the repository:

git clone <repository-url>
cd RecordingToTasks

Run the setup script:
```
./setup.sh
```
The script will automatically install:
- ffmpeg (if not present)
- Python virtual environment
- All required dependencies
- Create .env file from template
Configure your API key:

Edit the .env file and insert your OpenAI API key:
```
OPENAI_API_KEY=sk-your-actual-api-key-here
```

Option 2: Manual Setup

Clone the repository:

git clone <repository-url>
cd RecordingToTasks

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Configure environment variables:
```
cp env.example .env
```
Edit the .env file and insert your OpenAI API key:
```
OPENAI_API_KEY=sk-your-actual-api-key-here
```

Verify Installation

python main.py --help

Or run the setup test:

python test_setup.py

📖 Usage

Basic Command

# Process audio/video file (transcription + analysis)
python main.py /path/to/your/recording.mp4

💰 Cost-Saving Mode: Use Existing Transcription

If you already have a transcription file, you can skip the expensive transcription step and only run the task analysis:

# Use existing transcription (SAVES MONEY!)
python main.py -t output/recording_transcription.txt
python main.py --transcription transcription.txt

Cost Comparison:

Full processing (37 min video): ~$0.23 (transcription) + ~$0.01 (analysis) = $0.24
Transcription-only mode: ~$0.01 (analysis only) = 96% cheaper!

This is perfect for:

Testing prompt changes
Debugging analysis issues
Re-running analysis with different settings
Processing the same recording multiple times

Examples

# Transcribe a video file
python main.py meeting_2024_01_15.mp4

# Transcribe an audio file
python main.py call_with_client.wav

# Process multiple files in sequence
python main.py file1.mp4 file2.wav file3.m4a

# Use existing transcription (cost-saving mode)
python main.py -t output/meeting_transcription.txt

# Show help
python main.py --help

Supported Formats

Audio: .wav, .mp3, .m4a, .flac, .aac, .ogg, .wma Video: .mp4, .mov, .avi, .mkv, .wmv, .flv, .webm, .m4v Transcriptions: .txt (with -t/--transcription flag)

Output

The tool generates files in the output/ folder:

filename_transcription.txt - Complete transcription with timestamps (not generated in -t mode)
filename_tasks.md - Structured AI-powered analysis:
- Executive Summary - Concise 2-3 sentence summary of concrete decisions
- Action Items / To-Do List - Tasks organized by:
  - Category (development, testing, documentation, infrastructure, meeting, other)
  - Priority (high 🔴, medium 🟡, low 🟢)
  - Detailed description, responsible party, deadline, and context
  - Markdown checkbox format for tracking: - [ ] Task
- Decisions Made - List of concrete decisions
- Next Steps - Identified future actions
- Additional Notes - Technical references, links, relevant information
Intelligent Filtering: The AI distinguishes between generic discussions and actionable tasks, automatically ignoring casual conversation and including only concrete commitments.

⚙️ Configuration

Environment Variables (`.env` file)

# API Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_ORG_ID=your_org_id_here  # Optional

# ============================================
# PRESET CONFIGURATIONS (2025 Optimized)
# ============================================

# PRESET 1: PREMIUM (Maximum Quality) ⭐ DEFAULT
# Cost: $0.367/hour | Quality: 10/10
TRANSCRIPTION_MODEL=gpt-4o-transcribe
ANALYSIS_MODEL=gpt-5-mini

# PRESET 2: RECOMMENDED (Best Value)
# Cost: $0.181/hour | Quality: 9/10 (51% cheaper)
# TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
# ANALYSIS_MODEL=gpt-5-nano

# PRESET 3: BALANCED
# Cost: $0.187/hour | Quality: 9.5/10
# TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
# ANALYSIS_MODEL=gpt-5-mini

# ============================================

# Language for transcription (ISO-639-1 format, improves accuracy)
# it=Italian, en=English, es=Spanish, fr=French, de=German
LANGUAGE=en

# Speaker Diarization (optional, requires pyannote-audio)
ENABLE_DIARIZATION=false                     # Identify speakers (true/false)
NUM_SPEAKERS=2                               # Number of speakers (if diarization=true)

# Processing Configuration
MAX_RETRIES=3                                # Retry attempts on errors
MAX_PARALLEL_TASKS=3                        # Parallel transcription workers
SIZE_LIMIT_MB=20                            # File size limit for chunking (MB)

🎯 Recommended Presets (2025)

Preset	Transcription	Analysis	Cost/hour	Quality	When to Use
Premium ⭐	gpt-4o-transcribe	gpt-5-mini	$0.367	10/10	Default - Critical meetings, maximum accuracy
Recommended	gpt-4o-mini-transcribe	gpt-5-nano	$0.181	9/10	Limited budget, frequent use (51% savings)
Balanced	gpt-4o-mini-transcribe	gpt-5-mini	$0.187	9.5/10	Complex task analysis, budget-conscious

Model Details and Quality

Transcription Models:

Model	Price	WER	Quality	Notes
gpt-4o-transcribe ⭐	$0.006/min	2.46%	10/10	Near-human accuracy, excellent for difficult audio
gpt-4o-mini-transcribe	$0.003/min	8.9%	9/10	Excellent quality/price ratio, 50% cheaper
whisper-1 (legacy)	$0.006/min	7.88%	7/10	Deprecated, no advantage vs new models

WER = Word Error Rate (lower is better)

Analysis Models:

Model	Input/Output (per 1M tokens)	MMLU	Quality	Notes
gpt-5-mini ⭐	$0.25 / $2.00	~88%	10/10	Advanced reasoning, excellent for complex tasks
gpt-5-nano	$0.05 / $0.40	~82%	9/10	Perfect for structured task extraction
gpt-4o-mini	$0.15 / $0.60	82%	8/10	Economical alternative, proven and reliable

MMLU = Massive Multitask Language Understanding (higher is better)

💰 Cost Estimates (2025)

Costs depend on recording length.

Default Configuration - PREMIUM (Maximum Quality)

Transcription: gpt-4o-transcribe ($0.006/min)
Analysis: gpt-5-mini ($0.25/$2.00 per 1M tokens)
Total 1 hour: $0.367 (maximum quality, WER 2.46%)

Detailed Breakdown (1-hour meeting)

Component	Model	Calculation	Cost
Transcription	gpt-4o-transcribe	60 min × $0.006/min	$0.360
Analysis	gpt-5-mini	12,500 input × $0.25/1M 1,750 output × $2.00/1M	$0.003 $0.004
TOTAL			$0.367

Preset Comparison (1-hour meeting)

Preset	Transcription	Analysis	Total	Quality	Savings
Premium ⭐ (default)	gpt-4o-transcribe	gpt-5-mini	$0.367	10/10	-
Recommended	gpt-4o-mini-transcribe	gpt-5-nano	$0.181	9/10	51%
Balanced	gpt-4o-mini-transcribe	gpt-5-mini	$0.187	9.5/10	49%
Legacy (old)	whisper-1	gpt-4o-mini	$0.363	7/10	1%

Cost Notes:

Premium preset offers maximum quality (WER 2.46% vs 8.9%)
Recommended preset saves 51% while maintaining excellent quality (9/10)
Costs calculated for 1-hour meetings (~12,500 input tokens, 1,750 output)
For frequent use, consider Recommended preset to optimize costs
Use -t flag with existing transcriptions to save 96% on re-processing

🔧 Large File Handling

The tool automatically handles large files:

Automatic chunking: Files > 20MB are split into chunks
Parallel processing: Multiple chunks processed simultaneously
Timeline reconstruction: Timestamps preserved in final output

🛠️ Development

Project Structure

RecordingToTasks/
├── main.py              # Main script
├── requirements.txt     # Python dependencies
├── setup.sh            # Automatic installation script
├── test_setup.py       # Setup verification test
├── .env                # Configuration (not committed)
├── env.example         # Configuration template
├── README.md           # Documentation
├── CLAUDE.md           # Technical documentation
├── .gitignore          # Files to ignore
├── venv/               # Virtual environment
├── temp/               # Temporary files
├── output/             # Output files
└── tests/              # Test scripts and samples

Dependencies

openai: OpenAI API client
python-dotenv: Environment variable management
ffmpeg: Audio/video processing (external dependency)
pyannote.audio (optional): Speaker diarization to identify who speaks
- Requires Hugging Face account and token
- Enable with ENABLE_DIARIZATION=true in .env file

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/feature-name
Commit your changes: git commit -am 'Add new feature'
Push the branch: git push origin feature/feature-name
Open a Pull Request

🐛 Troubleshooting

Common Errors

"ffmpeg not found"

# Verify installation
ffmpeg -version

# If not installed, follow prerequisites

"OpenAI API key not found"

# Verify .env file
cat .env

# Make sure the key is correct

"File too large"

The tool automatically handles large files
Increase SIZE_LIMIT_MB in .env if needed

Transcription errors

The tool automatically retries with exponential backoff
Check internet connection
Verify OpenAI API rate limits

JSON parsing errors

Now includes enhanced diagnostics with finish_reason validation
Increased token limit from 3000 to 8000 for long transcriptions
Check logs for detailed error information

Debug Mode

For more detailed debugging, temporarily modify main.py:

import logging
logging.basicConfig(level=logging.DEBUG)

📄 License

MIT License - see LICENSE file for details

🤝 Support

For bug reports or feature requests, open an issue on GitHub.

Note: This tool is optimized for Italian and English meetings. For other languages, you may need to modify the analysis prompts in main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
env.example		env.example
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh
test_setup.py		test_setup.py

Folders and files

Latest commit

History

Repository files navigation

Recording to Tasks

🚀 Features

🛠️ Setup and Installation

Prerequisites

Installation

Option 1: Automatic Setup (Recommended)

Option 2: Manual Setup

Verify Installation

📖 Usage

Basic Command

💰 Cost-Saving Mode: Use Existing Transcription

Examples

Supported Formats

Output

⚙️ Configuration

Environment Variables (.env file)

🎯 Recommended Presets (2025)

Model Details and Quality

💰 Cost Estimates (2025)

Default Configuration - PREMIUM (Maximum Quality)

Detailed Breakdown (1-hour meeting)

Preset Comparison (1-hour meeting)

🔧 Large File Handling

🛠️ Development

Project Structure

Dependencies

Contributing

🐛 Troubleshooting

Common Errors

Debug Mode

📄 License

🤝 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment Variables (`.env` file)

Packages