"Thinking Machine" is an interactive AI-powered media art installation that creates a symbiotic relationship between artificial intelligence and human presence. The system uses computer vision to detect human subjects and responds with visual and auditory feedback, creating a heartbeat-like pulse effect synchronized with philosophical reflections generated by a local language model.
- Real-time Person Detection: Uses YOLO for accurate human detection via camera input
- Dynamic Visual Effects: Apple image with pulsing heartbeat animation triggered by human presence
- AI-Generated Philosophy: Local LLM (Ollama + TinyLLama) creates philosophical maxims from observed objects
- Immersive Audio: Synchronized heartbeat sound effects with volume optimization
- Fullscreen Experience: Borderless fullscreen display for exhibition environments
In addition, the face detector and text to speech examples were appended like below.
- Face detector: Detect face components likes lip, eye etc.
- Speech to text and text to speech: Convert between text and speech.
- PersonDetector: YOLO-based computer vision for human detection
- HeartbeatAudio: Audio management with MP3/WAV support
- LLMProcessor: Local language model integration via Ollama
- Visual Engine: Pygame-based rendering with real-time effects
- Python: 3.8 or higher
- Hardware:
- NVIDIA GPU recommended (for YOLO acceleration)
- USB camera or webcam
- Audio output device
- Storage: ~2GB for models and dependencies
- RAM: Minimum 4GB, recommended 8GB+
Install Ollama and download the TinyLLama model:
# Install Ollama (see https://ollama.ai)
ollama serve
# Download the language model
ollama pull tinyllamapip install -r requirements.txtEnsure these files are in the project directory:
apple.png- The main image for visual effectsheartbeat.mp3- Audio file for heartbeat sound
python thinking_machine.py- ESC Key: Exit the application
- Camera Detection: Automatic - system activates when person occupies >10% of screen
- Audio: Automatic - heartbeat plays when human presence detected
Key parameters in the source code:
HEARTBEAT_BPM = 40 # Heartbeat frequency
PULSE_DEPTH = 0.15 # Visual pulse intensity
PERSON_AREA_THRESHOLD = 0.1 # Detection sensitivity (10%)- Camera Input → OpenCV capture
- Object Detection → YOLO v8 person detection
- Area Calculation → Person-to-screen ratio analysis
- Trigger Logic → Heartbeat activation threshold
- Base Image → Apple.png rendering
- Heartbeat Function → Mathematical pulse simulation
- Image Scaling → Dynamic resize based on pulse
- Color Tinting → Blue/red overlay effects
- Screen Composition → Centered fullscreen display
- Object Classification → YOLO class detection
- Text Generation → Format: "class[count: confidence]"
- LLM Processing → Philosophical maxim generation
- Text Rendering → Overlay on visual output
- Check if other applications are using the camera
- Try different camera indices (0, 1, 2)
- Verify camera drivers are installed
- Ensure Ollama server is running:
ollama serve - Verify model is downloaded:
ollama list - Check network connectivity to localhost:11434
- Convert MP3 to WAV if pygame doesn't support MP3
- Check system audio settings
- Verify audio file exists in project directory
- Use CUDA-enabled GPU for YOLO acceleration
- Reduce camera resolution for better FPS
- Adjust detection confidence thresholds
f:\projects\art_ai\
├── thinking_machine.py # Main application
├── requirements.txt # Python dependencies
├── README.md # Documentation
├── apple.png # Visual asset
└── heartbeat.mp3 # Audio asset
- Mount camera at eye level facing exhibition space
- Connect to display monitor/projector
- Ensure audio output is properly configured
- Test detection range and sensitivity
- Install all dependencies
- Test camera detection:
python -c "import cv2; print(cv2.VideoCapture(0).isOpened())" - Verify Ollama service:
curl http://localhost:11434/api/tags - Run full system test before exhibition
- Detection Sensitivity: Adjust
PERSON_AREA_THRESHOLDfor space size - Audio Volume: Set appropriate level for venue
- Display Resolution: Configure for target screen/projector
- LLM Timing: 5-second minimum intervals prevent spam
- YOLO inference: ~30-60ms per frame (GPU accelerated)
- LLM generation: ~2-5 seconds per query
- Audio latency: <100ms for responsive feedback
- Visual rendering: 60 FPS target framerate
- Thread-safe LLM processing with daemon threads
- Camera error handling with fallback messages
- Audio system graceful degradation
- Fullscreen escape mechanism (ESC key)
This project is developed for the AI x ART media exhibition 2025. Please contact the author for usage permissions and exhibition licensing.
Taewook Kang
Email: laputa99999@gmail.com
Project: Thinking Machine - AI Art Exhibition 2025