Skip to content

PolicyAI API Integration Guide (v2)

Info

This guide covers the v2 API. For the legacy v1 API, see the PolicyAI v1 Integration Guide.

Overview

The PolicyAI API is simple to integrate into your app wherever you need to check for policy violations. PolicyAI will examine the provided content using our moderation LLM, identify if there was a policy violation, and give you back details about the violation so you can take action on the content. Common actions include taking the content down or flagging the associated user account for review by AiMod or a moderator.

With the API you can:

  • Create and edit versioned policies
  • Create labeling endpoints that reference your policies
  • Label content by sending it to a labeling endpoint

We also provide a UI where you can:

  • Sign up for an account
  • Create, edit, manage, and test your policies
  • Access your account details

PolicyAI URL

UI: https://policyai-v2.musubilabs.ai
API: https://api.musubilabs.ai/policyai/docs

Sign up for an account

To create an account, navigate to the UI where you'll be prompted to sign up!

After you've created an account, you can view your account details on the Account page of the UI, including your organization memberships.

Organizations

When you create an account, your own personal organization is created for you in our system. Organizations are used by PolicyAI to determine who has access to what. Any policies or evaluation results that you create using your personal organization ID are accessible only to you!

API Authentication

When you use the API directly, you can authenticate using your API key like this:

GET https://api.musubilabs.ai/policyai/api/version
accept: application/json
Musubi-Api-Key: [your API key here]

You can grab your API key from the API Settings page of the UI.

Create your Policy

You can create and edit policies and policy versions on the Manage Policies page of the UI.

Each policy has one or more versions, and for each version you can configure the policy text, policy settings, and output fields.

Use the Policy Converter

The Policy Converter can generate a complete policy from a brief description of what you want to moderate, or convert an existing policy into the LLM-optimized format described below. It uses AI to structure the policy, sharpen the language, and organize it into the correct format — saving you the effort of writing or reformatting it manually.

Policy text

Policy text uses a structured markdown format with two main sections: Instructions and Policies.

  • # Instructions — Cross-cutting guidance that applies across all categories, such as subversion detection, language handling, or content-type-specific rules.
  • # Policies — The violation categories. Each category is an ## heading with a description of what constitutes a violation. Include a **Not a violation:** list under a category to clarify exceptions. Always include an ## Other category at the bottom for content that doesn't clearly violate any policy.

Example policy text for moderating chat messages:

# Instructions

- Users may use leetspeak, emoji, or other techniques to evade filters.
  Look through these obfuscation attempts when evaluating content.
- If the violating meaning requires multiple logical leaps to interpret,
  default to allowing the content.

# Policies

## Selling

- Actively selling any service or product
- Stating a price for a service or product with intent to sell it

**Not a violation:**

- Mentioning a product or brand casually without intent to sell

## Scamming

- Tricking someone into sending money, crypto, or signing up for a fraudulent service
- Directing someone off-platform using suspicious contact info

**Not a violation:**

- Sharing personal contact info without suspicious context

## Drugs

- Selling or seeking regulated or controlled substances

## Other

Any content that does not CLEARLY violate one of the above policies is Other.

Labels are automatically extracted from the ## headings in the Policies section. Each heading becomes a label — headings named "Other" or "Safe" are mapped to a CLEAR assessment, and all others are mapped to FLAGGED.

Policy settings

PolicyAI supports customizing each policy in several ways.

Model

Choose the model that the policy will use to check for violations. We'll provide guidance on which model we expect to work best for you for a given application, and you can experiment on your datasets as well.

Content types

PolicyAI supports these kinds of content:

  • Text
  • Image
  • Audio
  • Video
  • PDF
  • Combinations of the above (e.g. Text + Image, or Video with accompanying Text)

You can configure a policy to apply to only certain content types, which is useful when a policy is constructed and tested specifically for a specific content type. For example you may define one policy for user profile images and another for post text.

Audio and video must be paired with a model whose configuration supports the corresponding input modality — the API will reject a policy version that requests audio or video on a model that doesn't support it.

Media submission, formats, and limits

Audio, video, and PDF are submitted by URL only (no base64 inline payloads). Images can be submitted by URL or base64. Any container or codec that FFmpeg can decode is accepted (common examples: MP3, WAV, M4A, FLAC, OGG for audio; MP4, MOV, WebM, MKV for video).

Type Max file size Max duration / pages Submission
Image 10 MB URL or base64
Audio 120 MB 1 hour (3600 s) URL only
Video 100 MB no hard limit URL only
PDF 30 MB 20 pages URL only

Size limits are enforced while the file is being fetched — a URL pointing to an oversized file is rejected mid-download as soon as the limit is exceeded. Duration limits are checked after decoding. Files that exceed any limit are rejected with an error; they are not truncated or downsampled.

When fetching media, PolicyAI sends a PolicyAI/1.0 (Data Fetcher; +https://musubilabs.ai) User-Agent. Make sure URLs you submit are reachable from the public internet and that any host-side filtering allows this User-Agent.

How media is processed
  • Audio is normalized to 16 kHz mono 16-bit PCM WAV before evaluation.
  • Video is decomposed into a set of representative frames (resized to max 512 px wide) using scene-change detection plus coverage-based sampling, up to 100 frames per video. Any audio track on the video is extracted and processed alongside the frames using the same audio pipeline and limits above; videos without an audio track are still accepted.
  • Although there is no fixed video duration cap, very long videos still produce only ~100 sampled frames, so fine-grained moments may be missed. For long-form content, prefer slicing into shorter clips at the points you care about.
  • PDF pages are passed directly to the model. Files exceeding 20 pages are rejected.

Testing your policy

Once you've put together a policy, test individual text or image content against it to make sure that it surfaces violations as you would expect.

To test your policy against a curated dataset, you can upload a test dataset on the Manage Datasets page of the UI, and run the policy against that dataset on the Test Policies page of the UI.

Setting up your labeling endpoint

Once your policy is ready, create a labeling endpoint so you can start labeling content against it.

Head to the Manage Labeling Endpoints page to create a new endpoint. Each endpoint references one or more policy labelers, identified by policy key and version number.

Tip

Create separate endpoints for each context where you'd want a different policy applied — for example, one for messages, one for profile bios, and one for event photos. You can also maintain separate endpoints per environment (dev, prod, staging).

Label your content

Once you have your endpoint ready, you can use it to label content via the API:

POST https://api.musubilabs.ai/policyai/v2/labels/byEndpoint/{organizationId}/{endpointName}/
Musubi-Api-Key: [your API key here]
Content-Type: application/json

{
    "content": [
        {
            "type": "TEXT",
            "text": "the content to check"
        }
    ]
}

The response contains a list of labels (one per labeler configured in the endpoint). Each label includes:

  • The assessment, either CLEAR or FLAGGED. FLAGGED means that a violation was detected.
  • The label assigned by the policy (e.g. the violated category).
  • Additional output fields from the labeler, such as the reason for the result.

At this point, you can incorporate these results into your system as needed 🥳