
Detect violence in videos with frame-by-frame severity analysis. AI-powered moderation for streaming, social media, and user-generated content.
The Bynn Video Violence Detection model analyzes videos to identify and classify violent content using advanced AI vision analysis. This model provides three-tier classification with temporal localization of violent events.
Video violence spreads virally and causes lasting harm. Graphic footage of assaults, accidents, and atrocities can traumatize viewers, trigger PTSD in survivors, and inspire copycat violence. Platforms face pressure from users, advertisers, and regulators to remove such content quickly—yet violence in video is harder to detect than in static images because it unfolds over time.
A fight scene in a movie differs from real-world assault footage. Sports violence is consensual; street violence is criminal. News coverage of conflicts serves public interest even when disturbing. Platforms need detection that understands these distinctions and provides precise timestamps, enabling rapid review of specific segments rather than entire videos.
For physical security, video violence detection transforms passive CCTV into active threat prevention. Real-time analysis of surveillance feeds can detect fights, assaults, or aggressive confrontations the moment they begin—alerting security teams to respond immediately. Schools, transit systems, entertainment venues, and public spaces can identify violent incidents in progress, enabling intervention that prevents escalation and saves lives.
When provided with a video, the detector classifies the overall violence level and provides precise timestamps for violent events. The model understands context, distinguishing between severe real-world violence and stylized or fictional depictions of conflict.
Achieving 91.0% accuracy, the model uses Bynn's Visual Language Model technology optimized for video analysis at 4 FPS to match training conditions and provide accurate temporal event detection.
The model performs comprehensive video violence analysis:
The API returns a structured JSON response containing:
The video contains elements considered severe violence, including but not limited to:
The video contains elements considered light violence, including but not limited to:
The video contains no violence, including but not limited to:
| Metric | Value |
|---|---|
| Classification Accuracy | 91.0% |
| Average Response Time | 10,000ms |
| Max File Size | 100MB |
| Supported Formats | MP4, MOV, AVI, WebM, MKV |
| Analysis Frame Rate | 4 FPS |
Important Considerations:
This model provides probability-based classifications, not definitive content judgments.
Best Practice: Use the events timeline to efficiently review flagged content and make informed moderation decisions.
Vision Language Model for image/video understanding with reasoning
media_typestringType of media being sent: 'image' or 'video'. Auto-detected if not specified.
imageimage_urlstringURL of image to analyze
https://example.com/image.jpgbase64_imagestringBase64-encoded image data
video_urlstringURL of video to analyze
https://example.com/video.mp4base64_videostringBase64-encoded video data
Structured Violence Detection response
responseobjectStructured response from the model
eventsarrayviolencestringsevere_violencelight_violenceno_violencethinkingstringChain-of-thought reasoning from the model (may be empty)
{
"model": "vlm-video-violence-detection",
"image_url": "https://example.com/image.jpg"
}{
"inference_id": "inf_abc123def456",
"model_id": "vlm_video_violence_detection",
"model_name": "Violence Detection",
"moderation_type": "video",
"status": "completed",
"result": {
"response": {
"events": null,
"violence": "severe_violence"
},
"thinking": ""
}
}429 HTTP error code along with an error message. You should then retry with an exponential back-off strategy, meaning that you should retry after 4 seconds, then 8 seconds, then 16 seconds, etc.Integrate Violence Detection into your application today with our easy-to-use API.