Fully automated Reddit story scraping β AI voice synthesis β viral short-form video generation
Features β’ Architecture β’ Installation β’ Usage β’ Tech Stack
ViralContent Factory is an end-to-end automated content generation system that transforms Reddit stories into professionally edited, viral-ready short-form videos for TikTok, YouTube Shorts, and Instagram Reels. The pipeline handles everything from content discovery to final video rendering with zero manual intervention.
- π€ Fully Autonomous: Set it and forget it. The system runs via scheduled tasks (3 videos per batch)
- π§ AI-Powered Intelligence: Multi-provider LLM router with automatic failover across 5+ AI services
- π― Production-Ready: Includes failover systems, persistent database, and email alerting
- β‘ Optimized Performance: Word-level subtitle sync, smart caching, and resource management
- π Scalable Architecture: Modular phase-based design for easy extension and maintenance
- π Smart LLM Routing: Automatic failover between Groq, Cerebras, Gemini, HuggingFace, and OpenRouter
- Multi-Source Scraping: Waterfall system across 30+ high-engagement subreddits (AITA, TIFU, TrueOffMyChest, confessions, pettyrevenge, etc.)
- Smart Filtering:
- Language detection (English-only)
- Optimal word count (120-380 words for 60-180 second videos)
- Duplicate prevention via persistent JSON database
- Automatic removal of deleted/removed posts
- AI Enhancement:
- Multi-provider LLM router with automatic quota management
- Gender detection for voice matching (fast models)
- Viral hook generation with creative reasoning (strong models)
- Hook A/B testing (AI-generated vs original title ranking)
- Dynamic SEO tag generation (5 keywords per video)
- Slang/acronym normalization (AITA β "Am I the jerk", 19F β "a 19 year old woman", etc.)
- Failover System: Falls back to local cold storage if all live sources fail
- Upload Automation: YouTube and Instagram automation modules (setup required)
- Edge TTS Integration: Microsoft's neural voices for natural-sounding narration
- Dynamic Voice Selection: Gender-matched voices (3 female variants: Jenny/Michelle/Aria, 1 male: Christopher)
- Word-Level Timing: Precise timestamp extraction for perfect subtitle synchronization
- Sync Offset System: Configurable timing adjustment (-0.3s default) for perfect alignment
- Fallback Mechanisms: Sentence-level heuristics if word boundaries fail
- JSON Export: Word-by-word timing data saved for video compositor
- 9:16 Vertical Format: Optimized for mobile-first platforms
- Dynamic Background Selection: Random gameplay footage (Minecraft, GTA 5)
- Animated Subtitles:
- Impact font with stroke for maximum readability
- 3-word chunks with pop-in animations
- Mathematically synced to word-level audio timestamps
- Configurable sync offset for perfect timing
- Smart Cropping: Automatic center-crop from 16:9 to 9:16
- Random Start Points: Prevents repetitive background footage
- Test Mode: 10-second preview rendering for quick testing
- Multi-Provider Architecture: Supports 5 AI providers with automatic failover
- Intelligent Task Routing:
- Fast models (OpenRouter, HuggingFace, Gemini) for classification and tagging
- Strong models (Groq, Cerebras) for creative writing and reasoning
- Quota Management: Automatically detects rate limits (429, 400 errors) and switches providers
- Error Recovery: Retry logic with provider fallback chain
- Cost Optimization: Routes cheap tasks to free tiers, expensive tasks to premium models
- Automated Cleanup: Removes temporary audio/JSON files after each run
- Batch Management: Collects 7+ videos before triggering upload alert
- Email Notifications: Gmail SMTP alerts when batch threshold reached
- Sanitized Filenames: OS-safe naming with Reddit ID-based uniqueness
- Error Handling: Comprehensive try-catch blocks with detailed logging
- Video Path Utilities: Batch processing helpers for upload automation
- Persistent Database: JSON-based story tracking with "used" flag system
- Sleep Prevention: Windows execution state management to prevent system sleep
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MAIN PIPELINE ORCHESTRATOR β
β (main_pipeline.py) β
β Prevents system sleep during run β
ββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββ΄βββββββββ
β β
βΌ βΌ
βββββββββββ βββββββββββ
β Phase 1 ββββββββ Phase 2 β
β Scraper β β Audio β
β +AI LLM β β +Timing β
ββββββ¬βββββ ββββββ¬βββββ
β β
β βΌ
β βββββββββββ
β β Phase 3 β
βββββββββββββ Video β
βCompositorβ
ββββββ¬βββββ
β
βΌ
βββββββββββββββββ
β Cleanup & β
β Notification β
βββββββββ¬ββββββββ
β
βΌ
βββββββββββββββββ
β Upload β
β Automation β
β (Manual/API) β
βββββββββββββββββ
AutoContent/
βββ π main_pipeline.py # Orchestrator - coordinates all phases, prevents sleep
βββ π phase1.py # Content acquisition & AI processing (30+ subreddits)
βββ ποΈ phase2.py # Audio synthesis & word-level timestamp extraction
βββ π₯ phase3.py # Video composition & subtitle rendering
βββ π€ llm_router.py # Multi-provider LLM failover system (5 providers)
βββ π₯ yt_downloader.py # Background footage downloader (yt-dlp wrapper)
βββ π§ reminder.py # Batch management & email alerts (7-video threshold)
βββ π€ yt_automation.py # YouTube upload automation (OAuth setup required)
βββ π± ig_login.py # Run this script and log into instagram only once (One-Time Run)
βββ π± insta_automation.py # Instagram upload automation (Graph API setup required)
βββ π§ get_videopaths.py # Video path utility for batch processing
βββ βοΈ run_factory.bat # Windows Task Scheduler entry point (3 videos per run)
βββ π¦ requirements.txt # Python dependencies
βββ ποΈ scripts.json # Persistent story database with "used" tracking
βββ π hidden_depedencies.txt # System dependency checklist
βββ π TrendingDescription.txt # Sample trending content reference
βββ π¬ downloads/ # Background video assets (2 videos included)
βββ π€ reels/ # Final rendered videos (staging area)
βββ π¦ ready_to_upload/ # Batched videos ready for upload (7 videos)
| Category | Technology | Purpose |
|---|---|---|
| Language | Python 3.11+ | Core runtime |
| AI/LLM | Multi-Provider Router | Groq, Cerebras, Gemini, HuggingFace, OpenRouter |
| Voice Synthesis | Edge-TTS | Neural text-to-speech (streaming) |
| Video Processing | MoviePy 1.0.3 | Compositing & rendering |
| Image Processing | ImageMagick | Text rendering backend for subtitles |
| Web Scraping | Requests | Reddit JSON API interaction |
| NLP | langdetect | Language filtering |
| Video Download | yt-dlp | Background footage acquisition |
| smtplib | Gmail SMTP notifications | |
| Environment | python-dotenv | Secure credential management |
# Required System Dependencies
- Python 3.11 or higher
- FFmpeg (for audio/video processing)
- ImageMagick (for subtitle rendering)
- Deno or Node.js (for yt-dlp YouTube signature extraction)git clone https://github.com/indiser/ViralContent-Factory.git
cd viralcontent-factorypip install -r requirements.txtDependencies installed:
- requests
- python-dotenv
- langdetect
- edge-tts
- moviepy==1.0.3
- yt-dlp
- groq
- openai
- google-genai
- huggingface_hub
Windows (via winget):
winget install Gyan.FFmpeg
winget install ImageMagick.ImageMagick
winget install DenoLand.DenomacOS (via Homebrew):
brew install ffmpeg imagemagick denoLinux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpeg imagemagick
curl -fsSL https://deno.land/install.sh | shCreate a .env file in the project root:
# LLM API Keys (at least one required, more = better failover)
GROQ_API_KEY=your_groq_api_key_here
CEREBRAS_API_KEY=your_cerebras_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
HUGGINGFACE_API_KEY=your_huggingface_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Gmail SMTP (for batch notifications)
EMAIL_USER=your_email@gmail.com
EMAIL_APP_PASS=your_gmail_app_passwordNote: For Gmail, you need to generate an App Password (not your regular password)
LLM Keys: You only need ONE API key to start, but having multiple provides better reliability through automatic failover
python yt_downloader.py "https://youtube.com/watch?v=MINECRAFT_VIDEO_ID"
python yt_downloader.py "https://youtube.com/watch?v=GTA5_VIDEO_ID"Or manually place 9:16 or 16:9 gameplay videos in the downloads/ folder.
Current background videos:
- Insanely Crazy GTA 5 Mega Ramp Gameplay (4K 60fps)
- Minecraft Parkour Gameplay No Copyright (4K)
Edit phase3.py line 5 to match your ImageMagick installation:
os.environ["IMAGEMAGICK_BINARY"] = r"C:\Program Files\ImageMagick-7.1.2-Q16-HDRI\magick.exe"Edit run_factory.bat lines 5 and 17 to match your project location and Python installation:
cd /d "C:\Users\YOUR_USERNAME\Desktop\AutoContent"
"C:\Path\To\Your\python.exe" main_pipeline.pypython main_pipeline.py- Open Task Scheduler
- Create a new task:
- Trigger: Daily at 3:00 AM (or your preferred time)
- Action: Run
run_factory.bat
- The system will automatically:
- Generate 3 videos per run (configurable in batch script)
- Collect videos until 7+ are ready
- Send email alert when batch threshold is reached
Batch script configuration:
- Edit
run_factory.batline 9 to change video count:FOR /L %%A IN (1,1,3)(change 3 to desired count)
python reminder.pyThis checks if 7+ videos are ready and moves them to ready_to_upload/ folder.
python get_videopaths.pyReturns absolute paths of all videos in ready_to_upload/ for batch upload scripts.
1. [03:00 AM] Task Scheduler triggers run_factory.bat
2. [03:00:05] Phase 1 scrapes random subreddit from 30+ sources
3. [03:00:12] LLM Router tries OpenRouter β generates viral hook
4. [03:00:15] Gender detected: Female β Voice: en-US-AriaNeural (random from 3 variants)
5. [03:00:18] Hook ranking: AI vs Original β Winner selected
6. [03:00:22] SEO tags generated: ["reddit", "storytime", "drama", ...]
7. [03:00:45] Phase 2 generates audio + word-level timestamps
8. [03:01:30] Phase 3 renders vertical video with animated subtitles
9. [03:02:00] Cleanup removes temporary audio/JSON files
10. [03:02:05] Loop repeats 2 more times (3 videos total per run)
11. [03:06:15] Reminder script checks inventory (9/7 videos)
12. [03:06:20] Email sent: "π’ FACTORY ALERT: Weekly Batch Ready"
13. [03:06:25] 9 videos moved to ready_to_upload/ folder
14. [Manual] Run upload automation scripts or manual upload
Edit phase1.py lines 30-65:
SUBREDDITS = [
"AmItheAsshole",
"AITAH",
"YourNewSubreddit", # Add here
]Current subreddits (30+): AmItheAsshole, AITAH, TrueOffMyChest, confessions, confession, tifu, pettyrevenge, entitledparents, MaliciousCompliance, EntitledPeople, relationships, relationship_advice, Vent, stories, moraldilemmas, self, PointlessStories, TwoHotTakes, dating, offmychest, UnsentLetters, SeriousConversation, Adulting, lonely, BreakUps, TalesFromTheFrontDesk, legaladvice, RBI, UnresolvedMysteries, Glitch_in_the_Matrix, raisedbynarcissists, dadjokes, Jokes
Edit phase2.py lines 6-10:
WOMAN_VOICE_LIST = [
"en-US-JennyNeural",
"en-US-MichelleNeural",
"en-US-AriaNeural",
"en-GB-SoniaNeural", # Add British accent
]Male voice is set on line 19: "en-US-ChristopherNeural"
Edit phase1.py line 175:
if 120 < len(words) < 380: # Change word count range (current: ~60-180 seconds)Edit phase3.py lines 33-44:
txt_clip = TextClip(
chunk_text,
font="Impact", # Change font
fontsize=85, # Adjust size
color="white", # Change color
stroke_color="black", # Outline color
stroke_width=5, # Outline thickness
method="caption",
size=(video_width * 0.9, None)
)Edit phase3.py line 20:
chunk_size = 3 # Words per subtitle (current: 3 words)If subtitles appear too early or late, edit phase2.py line 15:
SYNC_OFFSET = -0.3 # Negative = earlier, Positive = laterEdit llm_router.py lines 125-127:
CHEAP_PROVIDERS = [openrouter_chat, hf_chat, gemini_chat]
STRONG_PROVIDERS = [groq_chat, cerebras_chat]Edit main_pipeline.py line 18:
TEST_MODE = True # Renders only first 10 secondsSolution: Update the path in phase3.py line 5 to match your installation
Solution: The subreddit may have no posts matching criteria. The system will automatically try the next subreddit in the randomized list
Solution: Ensure FFmpeg is in your system PATH. Run ffmpeg -version to verify. The yt_downloader.py script includes dependency checks
Solution:
- Enable 2FA on Gmail
- Generate an App Password
- Use the App Password in
.env, not your regular password
Solution:
- Check that at least one API key is valid in
.env - Verify API quotas haven't been exceeded
- Check internet connection
- The router automatically tries all 5 providers before failing
Solution: The system automatically falls back to sentence-level timing. This is expected behavior for some voices
Solution: Install Deno or Node.js for YouTube signature extraction. The script checks dependencies automatically
- Average Runtime: 2-3 minutes per video (single-threaded)
- Batch Runtime: ~6-9 minutes for 3 videos (run_factory.bat default)
- Video Quality: 1080x1920 @ 30fps (9:16 vertical)
- Audio Quality: Edge TTS neural voices (streaming)
- Storage: ~15-25MB per final video
- Success Rate: 95%+ (with multi-subreddit + LLM failover)
- LLM Failover: <2 seconds between provider switches
- Subtitle Sync: Β±0.3s accuracy with configurable offset
- Content Sources: 30+ subreddits with randomized selection
- β No user data collection
- β
API keys stored in
.env(gitignored) - β Reddit scraping complies with API terms
- β All content is public domain (Reddit posts)
- β No personal information in generated videos
- β Multi-provider LLM routing prevents vendor lock-in
- Multi-provider LLM router with automatic failover (5 providers)
- Batch video management system (7-video threshold)
- Word-level subtitle synchronization with timing offset
- Hook A/B testing (AI vs Original title ranking)
- Dynamic SEO tag generation
- Gender-based voice selection
- Automated cleanup system
- Email notification system
- YouTube upload automation (OAuth setup required)
- Instagram Reels upload automation
- TikTok upload automation (no official API - Selenium needed)
- Thumbnail generation with text overlay
- Analytics dashboard (views, engagement tracking)
- GPU-accelerated rendering (NVENC support)
- Cloud deployment (AWS Lambda + S3)
- Web UI for manual overrides
- Multi-language support (Spanish, French, etc.)
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Reddit API - Content source
- Microsoft Edge TTS - Neural voice synthesis
- Groq, Cerebras, Gemini, HuggingFace, OpenRouter - LLM infrastructure
- MoviePy - Video processing framework
- yt-dlp - Video download utility
Project Link: https://github.com/indiser/ViralContent-Factory