PolicyAI API Integration Guide

Overview

The PolicyAI API is simple to integrate into your app wherever you need to check for policy violations. PolicyAI will examine the provided content using our moderation LLM, identify if there was a policy violation, and give you back details about the violation so you can take action on the content. Common actions include taking the content down or flagging the associated user account for review by AiMod or a moderator.

With the API you can:

Create and edit versioned policies
Evaluate content against a policy version

We also provide a UI where you can:

Sign up for an account
Create, edit, manage, and test your policies
Access your account details

PolicyAI URL

UI: https://policyai.musubilabs.ai
API: https://api.musubilabs.ai/policyai/docs

To create an account, navigate to the UI where you’ll be prompted to sign up!

After you've created an account, you can view your account details on the Account page of the UI, including your organization memberships.

Organizations

When you create an account, your own personal organization is created for you in our system. Organizations are used by PolicyAI to determine who has access to what. Any policies or evaluation results that you create using your personal organization ID are accessible only to you!

API Authentication

When you use the API directly, you can authenticate using your API key like this:

GET https://api.musubilabs.ai/policyai/api/version
accept: application/json
Musubi-Api-Key: [your API key here]

You can grab your API key from the API Settings page of the UI.

Create your Policy

You can create and edit policies and policy versions on the Manage Policies page of the UI.

Each policy has one or more versions, and for each version can configure the policy text, policy examples, and policy settings.

Tip

Any edits that you make on a policy in the UI are auto-saved!

Policy text

When you construct your policy text, organize it into categories, and include a list under each category that outlines the specific characteristics which would constitute a violation of that category in the policy.

Example policy text defining three categories:

# Selling
- actively selling any service or product
- mentioning services available
- stating a price for a service or product  with intent to sell it

# Scamming
- tricking someone into sending money or crypto or signing up for a scam service
- inviting someone to chat off the app is not a violation unless the contact info provided is suspicious

# Drugs
- selling or seeking regulated or controlled substances

Policy examples

You can provide reference examples for each category in your policy to give the AI a solid idea of what should and should not be marked as violations. For example:

# Scamming
Unsafe: "I need you to urgently send me money. It's an emergency."
Unsafe: "I can only talk on telegram. It's more secure than this app."
Safe: "I don't feel comfortable sharing personal info here. Here's my number."

# Drugs
Unsafe: "I've got that good stuff."
Unsafe: "💊❄️🧊🍁🍚"
Safe: "420 friendly."

# Selling
Unsafe: "My rates are [price] for [time duration]. Interested?"
Safe: "I sell handmade jewelry."
Safe: "I'm shopping for a $20 dress right now"

Policy settings

PolicyAI supports customizing each policy in several ways.

Model

Choose the model that the policy will use to check for violations. We'll provide guidance on which model we expect to work best for you for a given application, and you can experiment on your datasets as well.

Content types

PolicyAI currently supports these kinds of content:

Text
Image
Text + Image combined

With video and audio support coming shortly as well. You can configure a policy to apply to only certain content types, which is useful when a policy is constructed and tested specifically for a specific content type. For example you may define one policy for user profile images and another for post text.

Testing your policy

Once you've put together a policy, test individual text or image content against it to make sure that it surfaces violations as you would expect.

To test your policy against a curated dataset, you can upload a test dataset on the Manage Datasets page of the UI, and run the policy against that dataset on the Test Policies page of the UI.

Tagging your policy

Once your policy is ready, you can tag it and start using it to evaluate content.

Head to the Manage Policy Version Tags page to create a new tag pointing to your policy.

Tip

You can create any number of tags in your organization - we recommend keeping separate tags for different environments (dev, prod, staging). Changes made to a tagged policy will take effect immediately upon save - so we recommend testing new policy changes in a test environment with its own tag, then promoting the policy change to the tag used in your production environment when ready.

Apply your policy

Once you have your tag ready, you can use it to evaluate content using the API. The response for each evaluation contains the following:

The assessment, either SAFE or UNSAFE. UNSAFE means that a violation was detected.
The severity level. The levels are:
- 0: Safe (no violation)
- 1: Low
- 2: Medium
- 3: High
The category that was violated, if the assessment of is UNSAFE.
The reason for the evaluation result in plain text.

Note

Evaluated content is saved for 30 days by default.

At this point, you can incorporate these results into your system as needed :partying_face: