Detector24
Violence Detection
VideoStandard Moderation

Violence Detection

Detect violence in videos with frame-by-frame severity analysis. AI-powered moderation for streaming, social media, and user-generated content.

Accuracy
91%
Avg. Speed
10.0s
Per Minute
$0.0300
API Name
vlm-video-violence-detection

Bynn Video Violence Detection

The Bynn Video Violence Detection model analyzes videos to identify and classify violent content using advanced AI vision analysis. This model provides three-tier classification with temporal localization of violent events.

The Challenge

Video violence spreads virally and causes lasting harm. Graphic footage of assaults, accidents, and atrocities can traumatize viewers, trigger PTSD in survivors, and inspire copycat violence. Platforms face pressure from users, advertisers, and regulators to remove such content quickly—yet violence in video is harder to detect than in static images because it unfolds over time.

A fight scene in a movie differs from real-world assault footage. Sports violence is consensual; street violence is criminal. News coverage of conflicts serves public interest even when disturbing. Platforms need detection that understands these distinctions and provides precise timestamps, enabling rapid review of specific segments rather than entire videos.

For physical security, video violence detection transforms passive CCTV into active threat prevention. Real-time analysis of surveillance feeds can detect fights, assaults, or aggressive confrontations the moment they begin—alerting security teams to respond immediately. Schools, transit systems, entertainment venues, and public spaces can identify violent incidents in progress, enabling intervention that prevents escalation and saves lives.

Model Overview

When provided with a video, the detector classifies the overall violence level and provides precise timestamps for violent events. The model understands context, distinguishing between severe real-world violence and stylized or fictional depictions of conflict.

Achieving 91.0% accuracy, the model uses Bynn's Visual Language Model technology optimized for video analysis at 4 FPS to match training conditions and provide accurate temporal event detection.

How It Works

The model performs comprehensive video violence analysis:

  • Frame-by-frame analysis: Processes video at 4 FPS for optimal detection accuracy
  • Scene understanding: Evaluates context to determine if violence is real, staged, or fictional
  • Severity assessment: Distinguishes between graphic harm and stylized conflict
  • Temporal localization: Provides precise start and end times for violent events

Response Structure

The API returns a structured JSON response containing:

  • violence: Classification level - "severe_violence", "light_violence", or "no_violence"
  • events: Array of detected violent events:
    • start: Start timestamp (mm:ss.ff format)
    • end: End timestamp (mm:ss.ff format)
    • label: Short description (2-4 words, e.g., "fist fight", "weapon drawn", "blood visible")

Classification Levels

severe_violence

The video contains elements considered severe violence, including but not limited to:

  • Real physical fights causing visible injury or harm
  • Bloody wounds, injuries, or graphic harm
  • Assault, battery, or physical attacks on people
  • Weapons being used against people
  • Scenes of war, combat, or armed conflict
  • Torture, abuse, or cruel treatment
  • Dead bodies or severe physical trauma

light_violence

The video contains elements considered light violence, including but not limited to:

  • Characters caught in exaggerated animated altercations
  • Imagery shows comedic fights without pain or damage
  • Groups of people depicted in playful or non-lethal fighting
  • Display of blood without the wound being visible or detailed
  • Light injuries that don't imply suffering or distress
  • Cartoon or fictional violence without realistic consequences
  • Sports-related physical contact (boxing, wrestling, martial arts)

no_violence

The video contains no violence, including but not limited to:

  • Characters appear in serene or conflict-free contexts
  • Absence of any blood, harm, or clashes
  • No hints of violence, confrontation, or struggle
  • Scenes that maintain a non-aggressive or calm tone

Performance Metrics

Metric Value
Classification Accuracy 91.0%
Average Response Time 10,000ms
Max File Size 100MB
Supported Formats MP4, MOV, AVI, WebM, MKV
Analysis Frame Rate 4 FPS

Use Cases

  • Social Media Moderation: Automatically flag or remove graphic violent content from video platforms
  • News & Media: Apply content warnings to graphic footage while preserving newsworthy material
  • Content Editing: Use event timestamps to identify and edit specific violent scenes
  • Gaming & Entertainment: Categorize video game footage and trailers for age ratings
  • Education Platforms: Filter violent content from educational video libraries
  • Security & Surveillance: Detect violent incidents in security camera footage

Known Limitations

Important Considerations:

  • Fictional Content: Highly realistic video game or movie content may be classified similarly to real violence
  • Cultural Context: Martial arts demonstrations or cultural practices may be flagged
  • Sports Content: Contact sports with visible injuries may be classified as light violence
  • Fast Action: Very rapid violent sequences may have slightly imprecise timestamps
  • Audio Not Analyzed: Violence classification is based on visual content only

Disclaimers

This model provides probability-based classifications, not definitive content judgments.

  • Screening Tool: Use as part of a broader content moderation strategy
  • Timestamp Review: Use event timestamps for efficient human review of flagged content
  • Context Matters: The same violent imagery may be appropriate in different contexts (news, documentaries, educational content)
  • Human Review: Severe violence detections should be reviewed by trained moderators
  • Moderator Welfare: Content moderators reviewing flagged content should have appropriate support resources

Best Practice: Use the events timeline to efficiently review flagged content and make informed moderation decisions.

API Reference

Version
2601
Jan 3, 2026
Avg. Processing
10.0s
Per Minute
$0.03
Required Plan
trial

Input Parameters

Vision Language Model for image/video understanding with reasoning

media_typestring

Type of media being sent: 'image' or 'video'. Auto-detected if not specified.

Example:
image
image_urlstring

URL of image to analyze

Example:
https://example.com/image.jpg
base64_imagestring

Base64-encoded image data

video_urlstring

URL of video to analyze

Example:
https://example.com/video.mp4
base64_videostring

Base64-encoded video data

Response Fields

Structured Violence Detection response

responseobject

Structured response from the model

Object Properties:
eventsarray
violencestring
Possible values:
severe_violencelight_violenceno_violence
thinkingstring

Chain-of-thought reasoning from the model (may be empty)

Complete Example

Request

{
  "model": "vlm-video-violence-detection",
  "image_url": "https://example.com/image.jpg"
}

Response

{
  "inference_id": "inf_abc123def456",
  "model_id": "vlm_video_violence_detection",
  "model_name": "Violence Detection",
  "moderation_type": "video",
  "status": "completed",
  "result": {
    "response": {
      "events": null,
      "violence": "severe_violence"
    },
    "thinking": ""
  }
}

Additional Information

Rate Limiting
If we throttle your request, you will receive a 429 HTTP error code along with an error message. You should then retry with an exponential back-off strategy, meaning that you should retry after 4 seconds, then 8 seconds, then 16 seconds, etc.
Supported Formats
mp4, mov, avi, webm, mkv
Maximum File Size
100MB
Tags:violencesafetyvideovlmai-analysis

Ready to get started?

Integrate Violence Detection into your application today with our easy-to-use API.