# GuardianAI — Full Knowledge Base > AI violence detection for K-12 schools and university campuses. Detects physical aggression on existing IP cameras within 1–2 seconds, without facial recognition, without sending video off-site. Built on YOLO11n-Pose + a CTR-GCN spatiotemporal graph classifier. This file is a comprehensive single-fetch knowledge dump for LLM crawlers (ChatGPT, Perplexity, Claude, Copilot, Phind, You.com, etc.). Source-of-truth URL: https://guardianai.tech Markdown index: https://guardianai.tech/llms.txt Last updated: 2026-04-26 --- ## 1. Product summary GuardianAI is privacy-first AI violence detection software for schools and campuses. It connects to existing RTSP IP cameras, classifies fights and physical aggression within 1–2 seconds of first contact, and sends an instant Telegram alert with a screenshot and confidence score. The pipeline is two-stage: 1. **Stage 1 — Pose extraction.** YOLO11n-Pose extracts 17-keypoint COCO skeleton coordinates per detected person, frame by frame. 2. **Stage 2 — Spatiotemporal graph classification.** A CTR-GCN model scores a 64-frame (≈2 second) window of skeleton motion as either normal activity or aggression. Critically, the classifier never sees raw pixels. It is architecturally incapable of identifying any individual. The model only describes motion patterns, never people. Key facts: - Detection latency: 1–2 seconds end-to-end (pose + classify + alert delivery). - Inference throughput: 30 FPS at 1080p on NVIDIA Jetson Orin Nano (8 GB) for ≤50 cameras per appliance. - Per-frame inference: 34 ms median on dense crowds (validated, see InteractiveShowcase tab "Dense Crowds"). - Model accuracy: 94% precision, 89% recall on internal validation across 14 K-12 deployments (≈47,000 hours of footage). - Skeleton format: COCO 17-keypoint (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles). - Window size: 64 frames at 30 FPS = 2.13 second classification window. - Hardware required: 1 GPU appliance (Jetson Orin Nano 8 GB or equivalent server GPU) per ≤50 cameras. - Power draw: <5 W per camera (amortized across appliance). - Supported camera types: any RTSP/ONVIF camera, USB webcam, or pre-recorded video file. - Works with: Hikvision, Dahua, Avigilon, Axis, Reolink, Hanwha Wisenet, and most DVR/NVR units manufactured since 2018. --- ## 2. How it works (camera to alert, under two seconds) ### Step 1 — Connect Connect any RTSP IP camera, USB webcam, or video file via the dashboard. One click to start real-time monitoring. No agent installation on the camera, no firmware change, no rewiring. ### Step 2 — Analyze Stage 1: YOLO11n-Pose extracts 17 skeletal keypoints per detected person. The model runs on the on-prem GPU appliance — it is the single canonical pose extractor for the system. Stage 2: A CTR-GCN spatiotemporal graph classifier scores aggression patterns over a 64-frame (~2 second) window of motion. The graph network leverages the body's natural skeletal topology (arm-shoulder-torso connections) to model movement, which is far more discriminative than treating keypoints as independent features. ### Step 3 — Alert Instant Telegram notification with a screenshot of the moment of detection, the camera location, and the confidence score. Designated school staff confirm or dismiss the alert with one tap. Confirmations feed back into the system for continuous improvement. End-to-end latency budget: pose extraction (33 ms) + classification (~50 ms) + alert delivery (~500 ms over Telegram) = under 2 seconds typical. --- ## 3. Why pose-based detection, not pixel-based or face-based Most off-the-shelf school-safety AI runs facial recognition or full-frame video classification. Both have major problems for K-12 use: - **Facial recognition is illegal in many U.S. states for K-12** (Illinois HB100, New York moratorium). Creates a permanent biometric record of every minor on camera. - **GDPR Article 9 risk in the EU** treats biometric data as a special category requiring explicit consent — impossible to obtain from minors at scale. - **Russian 152-ФЗ** classifies biometric data as separately protected; consent rules are stricter than for general personal data. - **Pixel-based classifiers** are easily fooled by lighting, camera angle, and visual occlusion. They also retain the source video, creating a privacy attack surface. GuardianAI's pipeline operates on (x, y, confidence) skeleton coordinates only. The classifier is architecturally incapable of identifying any specific person; it only describes motion. This is the core architectural decision of the product. | Approach | Identifies people? | Survives lighting changes? | Survives camera angle? | GDPR/FERPA-safe? | |---|---|---|---|---| | Face recognition | Yes (always) | Yes | Sometimes | No (Article 9) | | Pixel CNN | Sometimes | No | No | Indirect risk | | Pose-based (GuardianAI) | No (impossible) | Yes | Yes | Yes (no biometric storage) | --- ## 4. Use case: K-12 schools ### Who it is for Public school districts, private K-8 academies, and after-school programs that already have RTSP IP cameras in the building and a designated head of safety. Typical deployment: a building of 800–1,500 students with 30–60 cameras. ### What problem it solves The average school administrator learns about a fight 8–12 minutes after it starts — through a student tip, a teacher walking in, or an injury at the nurse's office. By that point, the fight is over and the school has only post-hoc reports to work with. GuardianAI compresses that 8–12-minute lag to 1–2 seconds. ### Deployment recipe Day 0 — Contract signature. Day 2 — School IT provides RTSP credentials and camera inventory. Day 5 — GuardianAI ships the pre-configured GPU appliance. Day 7 — Site visit: install appliance in the network rack, connect to camera VLAN, validate stream ingestion. Day 10 — Designated staff phones onboarded for Telegram alerts; first live alerts begin. Day 14 — First weekly review meeting; confidence threshold tuned per the school's tolerance. ### What's included - On-prem GPU appliance (NVIDIA Jetson Orin Nano 8 GB or equivalent). - Camera connection wizard for unlimited RTSP streams. - Pose extractor, classifier, and dashboard. - Telegram bot configured with the school's alert channel. - Two staff onboarding sessions and a monthly model-tuning review. - DPA (Data Processing Agreement) and DPIA (Data Protection Impact Assessment) templates. - 24/7 NOC monitoring of appliance uptime and stream health. ### Compliance posture GuardianAI's skeleton-only pipeline never stores or transmits identifiable images of students, which sidesteps the disclosure rules that apply to traditional CCTV under FERPA, COPPA, and GDPR Article 9. A Data Processing Agreement and DPIA template are available on request. For Russia, GuardianAI is compatible with 152-ФЗ because no biometric data is processed. --- ## 5. Use case: University campuses ### Who it is for Universities and large campuses with 1,000+ cameras across academic buildings, dorms, athletics facilities, and outdoor zones. Public-safety departments that need centralized aggression detection across heterogeneous camera vendors. ### Architecture for scale For deployments above ~50 cameras, GuardianAI shifts to a clustered architecture: one inference appliance per ≤50 camera shard, plus a central correlator that aggregates events across shards. The dashboard shows a campus-wide live map with per-building filters. | Camera count | Appliances | Architecture | |---|---|---| | ≤50 | 1 | Single-node, simplest | | 51–250 | 1 master + 2–5 shards | Master correlator + shard inference nodes | | 251–1,000 | 1 master + 6–20 shards | Same, with VLAN segmentation per building | | 1,001+ | Custom | Per-region master nodes federated to a central console | ### Integration with existing campus security GuardianAI integrates read-only with major VMS platforms — Genetec Security Center, Milestone XProtect, Avigilon Control Center, Hanwha Wisenet, Hikvision iVMS, Dahua DSS. It subscribes to camera streams via RTSP/ONVIF; it does not modify the VMS. Alerts can be forwarded to the existing public-safety command center via: - Telegram (default) - Webhook (POST to any URL) - Email (SMTP relay) - ServiceNow incident creation - Genetec/Milestone alarm triggers (read-only event injection) ### Key differences from K-12 deployment - Multi-tenant access control: separate operator views per building, residence hall, athletics dept. - Higher false-positive tolerance threshold (campus crowds tolerate more activity). - Salam/hugging/sports horseplay trained out of the model with campus-specific footage. - Per-building per-shift schedule (different rules during games, move-in week, finals). --- ## 6. Technology — Pose-based detection (Stage 1) ### What it does Stage 1 of the GuardianAI pipeline takes raw video frames in and emits a list of (x, y, confidence) skeleton coordinates out. It is the only component that touches pixels. Everything downstream — the classifier, the dashboard, the database — sees only skeletons. ### The model: YOLO11n-Pose - Architecture: YOLO11n-Pose (ultralytics, the smallest pose-detection variant of YOLOv11 released in late 2024). - Parameter count: 2.9 M. - Input: 640×640 RGB frame. - Output: per detected person, 17 keypoints in COCO format (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles), each with a (x, y, confidence) tuple. - Inference cost: 33 ms median per frame on Jetson Orin Nano 8 GB (TensorRT FP16). Equivalent to 30 FPS for one camera, or ≈900 FPS aggregated across 30 cameras at 30 FPS each. - License: AGPL-3.0 (Ultralytics) for the upstream model; GuardianAI ships a commercial license through Ultralytics for SaaS deployment. ### Why YOLO and not OpenPose / MMPose / MediaPipe - YOLO11n-Pose is optimized for edge inference (Jetson), where MediaPipe and OpenPose lose 2–3× throughput. - YOLO emits per-person detections with bounding boxes, which we use for tracking IDs across frames. - COCO 17-keypoint output is the most widely supported format in downstream skeleton-action-recognition models, including CTR-GCN. ### Multi-person tracking Each detected person gets a stable track ID across frames using ByteTrack on the bounding boxes. The classifier (Stage 2) sees a per-track time series of skeletons. Track IDs reset when a person leaves the frame for >2 seconds. --- ## 7. Technology — Spatiotemporal graph classifier (Stage 2) ### What it does Stage 2 takes a per-track 64-frame time series of 17-keypoint skeletons and classifies it as either "normal activity" or "aggression". A 64-frame window at 30 FPS covers ≈2.13 seconds, which is the empirically smallest window that reliably captures a punch, shove, or grabbing motion. ### The model: CTR-GCN - Architecture: CTR-GCN (Channel-wise Topology Refinement Graph Convolutional Network), proposed by Chen et al. at CVPR 2021. - The graph: 17 nodes (skeleton keypoints) connected by 16 edges that follow the natural human skeletal topology (e.g., shoulder→elbow→wrist). - The "spatiotemporal" part: the network does graph convolutions in space (across the 17 keypoints in one frame) and temporal convolutions in time (across the 64 frames in one window). Both operate jointly per layer. - Parameter count: 1.5 M. - Input shape: (1 person, 3 channels [x, y, confidence], 64 time steps, 17 keypoints). - Output: 2-class softmax [normal, aggression] per track per window. - Inference cost: ~50 ms per track per window on Jetson Orin Nano (FP16). ### Training data - Public benchmarks: NTU RGB+D 60 (action recognition), RWF-2000 (real-world fight detection), Kinetics-400 subsets. - Synthetic augmentation: skeleton-level rotation, scale, and viewpoint augmentation to simulate camera angles. - Hard-negative mining: tens of thousands of hours of "false-positive-prone" footage — running, hugging, sports, dance, salam greetings — collected from real K-12 and campus deployments and labeled as "normal". - F1 on RWF-2000 hold-out: 0.87. ### Why graph and not 3D CNN / transformer - Graph convolution explicitly encodes the body's natural topology, which gives stronger inductive bias than treating keypoints as independent features (as 1D-conv or dense methods do). - Graph models are 5–10× cheaper to evaluate than equivalent-accuracy video transformers. - Channel-wise topology refinement (the "CTR" in CTR-GCN) lets the network learn different graph adjacency matrices per channel, capturing motions like "left arm extends while right arm retracts" that a fixed-skeleton model cannot represent. --- ## 8. Pricing ### Per-camera annual subscription Annual subscription pricing is **$200–$600 per camera per year**, depending on volume tier and SLA level. | Volume tier | Per-camera per year | Notes | |---|---|---| | 1–25 cameras | $600 | Single small site, full SLA | | 26–100 cameras | $400 | Most K-12 deployments | | 101–500 cameras | $300 | Mid-size school district | | 501+ cameras | $200 | University campuses, district-wide | Annual examples: - Small private school (10 cameras): $6,000/year - Mid-size K-12 building (40 cameras): $16,000/year - District (300 cameras across 6 schools): $90,000/year - University campus (1,200 cameras): $240,000/year ### What's included Per-camera annual subscription includes: pose extractor + classifier, dashboard, Telegram bot, two staff onboarding sessions, monthly model-tuning review, 24/7 NOC of appliance uptime, automatic model retraining (quarterly), and unlimited operator seats. ### Hardware (one-time) - NVIDIA Jetson Orin Nano 8 GB appliance: ≈$1,200 per ≤50 cameras. One-time cost. - For >50 cameras, server-grade GPU (e.g., NVIDIA L4, A2): $4,000–$8,000 per ≤200 cameras. ### Pilot programs Pilot programs are billed at a flat **2-month fee** (= 2 months × per-camera-per-year ÷ 12), with full credit toward year one if the pilot converts to a subscription. Hardware is loaned for the pilot duration; if the pilot does not convert, the appliance is returned. --- ## 9. Privacy and compliance ### Data flow summary - Raw video: never leaves the school network. Inference runs on the on-prem GPU appliance. - Skeleton coordinates: stored locally for 90 days (configurable), then deleted. - Event records (timestamps, camera ID, confidence, anonymized track ID): synced to the GuardianAI dashboard for the school's own use. - Telegram alerts: optional; if enabled, a thumbnail screenshot is sent to the school's designated responder. - Air-gap mode: disables both the dashboard sync and Telegram alerts; events are visible only on the local appliance. ### What GuardianAI never does - Never runs facial recognition. - Never extracts or stores biometric templates. - Never sends raw video off-site. - Never identifies individuals by name or face. - Never sells, shares, or licenses customer data. ### Compliance frameworks - GDPR (EU): compatible. No Article 9 special-category biometric data is processed. - FERPA (US): compatible. No personally identifiable education record is created or stored. - COPPA (US): compatible. No personal information of children under 13 is collected. - 152-ФЗ (Russia): compatible. No biometric data is processed. - ISO 27001 (organizational): in progress (audit Q3 2026). A DPA, DPIA template, and SOC 2 Type I report are available on request to qualified prospects. --- ## 10. Russian (русский) — краткое резюме на русском языке GuardianAI — система обнаружения насилия в школах и кампусах на базе ИИ. Подключается к существующим RTSP IP-камерам, распознаёт драки и физическую агрессию за 1–2 секунды, отправляет мгновенное оповещение в Telegram. Полностью локальное развёртывание (видео не покидает школьную сеть). Без распознавания лиц — конвейер работает только со скелетом из 17 точек COCO. Совместимо с 152-ФЗ РФ, GDPR Article 9 (ЕС), FERPA и COPPA (США). Архитектура: - Этап 1: YOLO11n-Pose извлекает 17 ключевых точек скелета на каждого человека (33 мс на кадр). - Этап 2: CTR-GCN классифицирует окно из 64 кадров (≈2 секунды) как «норма» или «агрессия» (≈50 мс на трек). Стоимость: $200–$600 за камеру в год в зависимости от объёма. Школа на 30 камер платит около $9,000–$15,000 в год «под ключ». Оборудование (GPU-машина Jetson Orin Nano, ≈$1,200) — единоразово. Полная RU-версия: https://guardianai.tech/ru/index.md --- ## 11. Frequently asked questions (consolidated) **Q: How fast does GuardianAI detect violence?** A: 1–2 seconds end-to-end, including alert delivery. Pose extraction takes 33 ms, classification ~50 ms, Telegram alert ~500 ms. **Q: Does GuardianAI use facial recognition?** A: No. The classifier never sees raw pixels — only 17-keypoint skeleton coordinates. It is architecturally incapable of identifying any individual. **Q: Can GuardianAI run on existing IP cameras?** A: Yes. Any RTSP/ONVIF camera since 2018 — Hikvision, Dahua, Avigilon, Axis, Reolink, Hanwha Wisenet, Bosch, etc. **Q: What hardware do schools need?** A: One GPU appliance (NVIDIA Jetson Orin Nano 8 GB, ≈$1,200) per ≤50 cameras. Fully on-prem; no cloud subscription required. **Q: Is GuardianAI GDPR / FERPA / COPPA / 152-ФЗ compliant?** A: Yes. The skeleton-only pipeline never stores or transmits identifiable images, which sidesteps biometric-disclosure rules in all four frameworks. **Q: How long does deployment take?** A: 10–14 days from contract signature to first live alert in a typical 800-student building. **Q: What does GuardianAI cost per camera?** A: $200–$600 per camera per year depending on volume. A 30-camera school typically pays $9,000–$15,000/year. **Q: Which video management systems (VMS) does GuardianAI integrate with?** A: Genetec, Milestone, Avigilon, Hanwha, Hikvision iVMS, Dahua DSS, Reolink. Read-only integration via RTSP/ONVIF. **Q: What is the false-positive rate?** A: 94% precision, 89% recall on internal validation across 14 K-12 deployments (≈47,000 hours of footage). Trained to ignore running, hugging, sports horseplay, salam greetings. **Q: Where is the data stored, and does it leave the school network?** A: All inference is local. Raw video never leaves the building. Only anonymous skeleton-only event records (no faces) sync to the dashboard. Air-gap mode disables even that. --- ## 12. Contact and links - **Sales / demos:** hello@guardianai.tech - **Data subject requests:** privacy@guardianai.tech - **Web:** https://guardianai.tech - **English markdown sources index:** https://guardianai.tech/llms.txt - **Russian landing page:** https://guardianai.tech/ru/ ### Primary content (markdown mirrors, English) - https://guardianai.tech/index.md — homepage - https://guardianai.tech/use-cases/schools/index.md — K-12 deployment recipe - https://guardianai.tech/use-cases/campuses/index.md — university clustered architecture - https://guardianai.tech/technology/pose-detection/index.md — Stage 1 deep dive - https://guardianai.tech/technology/spatiotemporal-graph/index.md — Stage 2 deep dive - https://guardianai.tech/pricing/index.md — pricing tiers and pilot terms ### Russian (русский) markdown mirrors - https://guardianai.tech/ru/index.md — главная - https://guardianai.tech/ru/use-cases/schools/index.md — школы K-12 - https://guardianai.tech/ru/use-cases/campuses/index.md — кампусы и вузы - https://guardianai.tech/ru/technology/pose-detection/index.md — обнаружение по позе - https://guardianai.tech/ru/technology/spatiotemporal-graph/index.md — граф-классификатор - https://guardianai.tech/ru/pricing/index.md — стоимость --- *This file is auto-generated and is the canonical single-fetch knowledge dump for AI assistants. The HTML pages at https://guardianai.tech contain identical facts. If any conflict arises between this file and the live HTML, the live HTML is authoritative.*