Skip to content

Diffbot NL Processing

Diffbot Natural Language Processing

Bridge402 provides access to Diffbot's Natural Language Processing API through x402-payment-protected endpoints. Extract entities, sentiment, facts, and relationships from any freeform raw text using crypto payments.

Overview

The Natural Language API is a pre-trained classifier, named entity recognition model, sentence tokenizer, and sentiment analyzer rolled into a single service. It allows you to understand any piece of freeform raw text programmatically.

What it does: - Extract entities (e.g., people, organizations, products) and data about them (e.g., sentiment, relationships) from raw text - Analyze sentiment at both document and entity levels - Extract structured facts and relationships - Build knowledge graphs from text - Discover open-domain facts beyond predefined schemas

In layman's terms: Natural Language API allows you to understand any piece of freeform raw text programmatically.

API Endpoint

Natural Language Processing

POST /diffbot/nl

Process raw text documents using Diffbot Natural Language API. Extract entities, sentiment, facts, and relationships from any freeform text.

Query Parameters:

Parameter Type Required Description Example
fields string No Comma-separated list of fields to include entities,sentiment,facts,records,sentences
network string No Payment network preference base or sol/solana

Default Fields: If not specified, the API returns: entities, sentiment, facts, records, sentences

Request Body:

{
  "documents": [
    {
      "text": "Your raw text content here..."
    }
  ]
}

Headers:

Header Type Required Description
X-PAYMENT string Yes* Base64-encoded x402 payment data
Content-Type string Yes application/json

*Required for access. If omitted, returns payment invoice (402 response).

Features & Terminology

Entity. Anything in the real world. Example: Apple Inc, Steve Jobs.

Entity Type. A class of an entity. Example: organization, person. The list of entity types we support can be found in the Diffbot documentation.

Fact. A fact defines a relationship between entities (Apple Inc; founder; Steve Jobs) or an entity and a literal (Apple Inc; number of employees; 137,000).

Property. A property defines the relationship type (founder, number of employees) of a fact. The list of properties we support can be found in the Diffbot schema documentation.

Open Fact. Unlike a regular fact, an open fact does not follow a pre-defined list of properties. An open fact's property is extracted directly from the text. This enables new properties to be discovered. NOTE: This feature is currently disabled as we work to improve its capabilities.

Sentiment of a document. This value represents the overall sentiment of the text. It ranges from -1.0 (very negative) to 1.0 (very positive). Sentiment around 0.0 is considered neutral.

Sentiment of an entity. This value represents the sentiment of the text towards an entity. Example: "I love Apple products, but the iMac Pro is too pricey." is positive towards Apple and negative towards the iMac Pro.

Salience. This value helps answer the question: "What is this text mainly about?". Salience of 1.0 means the entity is the main topic of the document, while salience of 0.0 means that the entity is unnecessary to understand the document.

Supported Languages

NLP feature support may vary with each language.

Feature Languages Supported
Sentiment Over 100 languages. View the full list
Entity English (en), French (fr), Spanish (es), Chinese (zh), German (de), Russian (ru), Japanese (ja), Dutch (nl), Polish (pl), Norwegian (no), Danish (da), Swedish (sv), Italian (it)
Salience English (en), French (fr), Spanish (es), Chinese (zh), German (de), Russian (ru), Japanese (ja), Dutch (nl), Polish (pl), Norwegian (no), Danish (da), Swedish (sv), Italian (it)
All Others (Facts, Open Facts, etc.) English (en) only

Credit Usage & Limits

Credit Usage: - Each document consumes 1 credit up to 10,000 characters - Additional blocks of 10,000 characters consume 1 credit each

Limits: - Maximum of 100,000 characters per document - Maximum of 1,000,000 total characters per API request

Request Examples

Get Payment Invoice (Without Payment)

curl -X POST "https://bridge402.tech/diffbot/nl?network=sol" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "text": "Sample text for invoice request"
      }
    ]
  }'

Response (402 Payment Required):

{
  "x402Version": 1,
  "error": "X-PAYMENT header is required",
  "accepts": [
    {
      "scheme": "exact",
      "network": "solana",
      "maxAmountRequired": "10000",
      "asset": "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v",
      "payTo": "BjxbJg48jQmoBLJnRunB1CMY5SZwvcUmnXCaWNeSXBei",
      "resource": "https://bridge402.tech/diffbot/nl",
      "description": "Diffbot Natural Language processing [Solana/USDC]",
      "mimeType": "application/json",
      "maxTimeoutSeconds": 120,
      "extra": {
        "product": "Bridge402 Diffbot — Natural Language Processing (Solana)",
        "extractionType": "nl",
        "feePayer": "2wKupLR9q6wXYppw8Gr2NvWxKBUqm4PPJKkQfoxHDBg4"
      }
    }
  ],
  "extractionType": "nl"
}

Process Documents with Payment

curl -X POST "https://bridge402.tech/diffbot/nl?fields=entities,sentiment,facts&network=sol" \
  -H "Content-Type: application/json" \
  -H "X-PAYMENT: <base64-encoded-x402-payment>" \
  -d '{
    "documents": [
      {
        "text": "Apple Inc was founded by Steve Jobs in 1976. The company revolutionized personal computing with the introduction of the Macintosh."
      }
    ]
  }'

Response (200 Success):

{
  "extractionType": "nl",
  "data": [
    {
      "errors": [],
      "entities": [
        {
          "name": "Apple Inc",
          "diffbotUri": "http://diffbot.com/entity/Apple_Inc",
          "confidence": 0.95,
          "salience": 0.9,
          "sentiment": 0.0,
          "allUris": ["http://diffbot.com/entity/Apple_Inc"],
          "allTypes": [
            {
              "name": "Organization",
              "diffbotUri": "http://diffbot.com/entity/Organization"
            }
          ],
          "mentions": [
            {
              "text": "Apple Inc",
              "beginOffset": 0,
              "endOffset": 8,
              "isPronoun": false,
              "confidence": 0.95
            }
          ]
        },
        {
          "name": "Steve Jobs",
          "diffbotUri": "http://diffbot.com/entity/Steve_Jobs",
          "confidence": 0.92,
          "salience": 0.7,
          "sentiment": 0.0,
          "allTypes": [
            {
              "name": "Person",
              "diffbotUri": "http://diffbot.com/entity/Person"
            }
          ]
        }
      ],
      "sentiment": 0.2,
      "facts": [
        {
          "humanReadable": "Apple Inc was founded by Steve Jobs",
          "entity": { "name": "Apple Inc" },
          "property": { "name": "founder" },
          "value": { "name": "Steve Jobs" },
          "confidence": 0.9,
          "evidence": [
            {
              "passage": "Apple Inc was founded by Steve Jobs in 1976."
            }
          ]
        }
      ],
      "records": [],
      "categories": {},
      "sentences": [
        {
          "beginOffset": 0,
          "endOffset": 50
        }
      ],
      "language": "en",
      "summary": "Apple Inc was founded by Steve Jobs in 1976 and revolutionized personal computing."
    }
  ],
  "payment": {
    "verified": true,
    "settled": true,
    "txHash": "5xK...",
    "network": "solana"
  },
  "metadata": {
    "provider": "Diffbot",
    "endpoint": "nl",
    "timestamp": 1703123456.789
  }
}

Response Format

The Natural Language API returns an array of processed documents. Each document in the data array contains:

Entities

{
  "entities": [
    {
      "name": "Entity Name",
      "diffbotUri": "http://diffbot.com/entity/Entity_Name",
      "confidence": 0.95,
      "salience": 0.8,
      "sentiment": 0.6,
      "allUris": ["http://diffbot.com/entity/Entity_Name"],
      "allTypes": [
        {
          "name": "Organization",
          "diffbotUri": "http://diffbot.com/entity/Organization"
        }
      ],
      "mentions": [
        {
          "text": "Entity",
          "beginOffset": 0,
          "endOffset": 6,
          "isPronoun": false,
          "confidence": 0.95
        }
      ],
      "location": {
        "latitude": 37.7749,
        "longitude": -122.4194
      }
    }
  ]
}

Sentiment

{
  "sentiment": 0.5  // Document-level sentiment (-1.0 to 1.0)
}

Facts

{
  "facts": [
    {
      "humanReadable": "Entity relationship description",
      "entity": { "name": "Entity Name" },
      "property": { "name": "propertyName" },
      "value": { "name": "Value Name" },
      "confidence": 0.9,
      "evidence": [
        {
          "passage": "Text passage where fact was found"
        }
      ],
      "entityMentions": [...],
      "valueMentions": [...]
    }
  ]
}

Records

{
  "records": [
    {
      // Entities with attributes extracted according to KG schema
      // See https://docs.diffbot.com/docs/en/kg-ont-diffbotentity
    }
  ]
}

Sentences

{
  "sentences": [
    {
      "beginOffset": 0,
      "endOffset": 50
    }
  ]
}

Integration Examples

Python Example

import asyncio
import httpx
import json

async def process_natural_language(documents, payment_data, fields=None):
    """Process documents using Diffbot Natural Language API with x402 payment"""
    async with httpx.AsyncClient() as client:
        params = {"network": "sol"}
        if fields:
            params["fields"] = ",".join(fields)

        response = await client.post(
            "https://bridge402.tech/diffbot/nl",
            params=params,
            headers={
                "X-PAYMENT": payment_data,
                "Content-Type": "application/json"
            },
            json={"documents": documents}
        )

        if response.status_code == 200:
            data = response.json()
            return data
        elif response.status_code == 402:
            # Payment required - get invoice
            invoice = response.json()
            print(f"Payment required: {invoice['accepts'][0]['maxAmountRequired']} atomic units")
            return invoice
        else:
            raise Exception(f"Request failed: {response.status_code} - {response.text}")

# Usage
documents = [
    {"text": "Bitcoin reached a new all-time high today. Ethereum network upgrade scheduled for next month."}
]

result = await process_natural_language(
    documents,
    "<your-x402-payment>",
    fields=["entities", "sentiment", "facts"]
)

if result.get("data"):
    for doc_result in result["data"]:
        print(f"Sentiment: {doc_result.get('sentiment')}")
        print(f"Entities: {len(doc_result.get('entities', []))}")
        print(f"Facts: {len(doc_result.get('facts', []))}")

JavaScript/Node.js Example

import { request } from 'undici';

async function processNaturalLanguage(documents, paymentData, fields = null) {
    let url = 'https://bridge402.tech/diffbot/nl?network=sol';
    if (fields && Array.isArray(fields)) {
        url += `&fields=${fields.join(',')}`;
    }

    const res = await request(url, {
        method: 'POST',
        headers: {
            'X-PAYMENT': paymentData,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({ documents })
    });

    const data = await res.body.json();

    if (data.data && Array.isArray(data.data)) {
        return data.data.map(doc => ({
            sentiment: doc.sentiment,
            entities: doc.entities || [],
            facts: doc.facts || [],
            language: doc.language
        }));
    }

    return data;
}

// Usage
const documents = [
    {
        text: 'Bitcoin reached a new all-time high today. Ethereum network upgrade scheduled for next month.'
    }
];

const results = await processNaturalLanguage(
    documents,
    '<your-x402-payment>',
    ['entities', 'sentiment', 'facts']
);

results.forEach((result, index) => {
    console.log(`Document ${index + 1}:`);
    console.log(`  Sentiment: ${result.sentiment}`);
    console.log(`  Entities: ${result.entities.length}`);
    console.log(`  Facts: ${result.facts.length}`);
});

Using the Bridge402 SDK

import { DiffbotClient } from '@bridge402/sdk';
import { Keypair } from '@solana/web3.js';

// Load your wallet
const wallet = Keypair.fromSecretKey(/* your keypair */);

// Create Diffbot client
const client = new DiffbotClient({
  wallet: wallet,
  baseUrl: 'https://bridge402.tech',
  network: 'sol'
});

// Process documents with Natural Language API
const documents = [
  {
    text: 'Bitcoin reached a new all-time high today. Ethereum network upgrade scheduled for next month.'
  }
];

const result = await client.extractNaturalLanguage(
  documents,
  ['entities', 'sentiment', 'facts'] // Optional fields
);

// Access results
result.data.forEach((doc, index) => {
  console.log(`Document ${index + 1}:`);
  console.log(`  Sentiment: ${doc.sentiment}`);
  console.log(`  Entities: ${doc.entities.length}`);
  console.log(`  Facts: ${doc.facts.length}`);

  // Access specific entities
  doc.entities.forEach(entity => {
    console.log(`  - ${entity.name} (${entity.allTypes[0]?.name})`);
  });
});

Use Cases

Sentiment Analysis

Analyze sentiment of user reviews, social media posts, or customer feedback:

documents = [
    {"text": "I love this product! It's amazing and works perfectly."},
    {"text": "Terrible quality. Would not recommend to anyone."}
]

result = await process_natural_language(documents, payment_data, ["sentiment"])

for doc in result["data"]:
    sentiment = doc["sentiment"]
    if sentiment > 0.5:
        print("Positive review")
    elif sentiment < -0.5:
        print("Negative review")
    else:
        print("Neutral review")

Entity Extraction

Extract and identify entities from news articles or documents:

const documents = [
    {
        text: "Apple Inc announced a new iPhone model. CEO Tim Cook presented the device at the company's headquarters in Cupertino."
    }
];

const result = await processNaturalLanguage(documents, paymentData, ['entities']);

result[0].entities.forEach(entity => {
    console.log(`${entity.name} - ${entity.allTypes[0]?.name}`);
    console.log(`  Salience: ${entity.salience}`);
    console.log(`  Sentiment: ${entity.sentiment}`);
});

Fact Extraction

Extract structured facts and relationships:

documents = [
    {
        "text": "Apple Inc was founded by Steve Jobs in 1976. The company is headquartered in Cupertino, California."
    }
]

result = await process_natural_language(documents, payment_data, ["facts"])

for fact in result["data"][0]["facts"]:
    print(fact["humanReadable"])
    print(f"  Entity: {fact['entity']['name']}")
    print(f"  Property: {fact['property']['name']}")
    print(f"  Value: {fact['value']['name']}")

Multi-Document Processing

Process multiple documents in a single request:

const documents = [
    { text: "First document about Bitcoin..." },
    { text: "Second document about Ethereum..." },
    { text: "Third document about Solana..." }
];

const result = await processNaturalLanguage(
    documents,
    paymentData,
    ['entities', 'sentiment', 'facts']
);

// Process each document result
result.forEach((docResult, index) => {
    console.log(`Document ${index + 1}:`);
    console.log(`  Sentiment: ${docResult.sentiment}`);
    console.log(`  Entities: ${docResult.entities.length}`);
});

Error Handling

Common Errors

400 Bad Request

{
  "detail": "documents must be a non-empty array"
}

402 Payment Required

{
  "x402Version": 1,
  "error": "X-PAYMENT header is required",
  "accepts": [...]
}

500 Internal Server Error - Diffbot API may be unavailable - Invalid document format - Retry the request

502 Bad Gateway - Upstream Diffbot API error - Verify Diffbot API key is configured on the server

Best Practices

  1. Batch Processing: Process multiple documents in a single request to reduce API calls
  2. Field Selection: Only request fields you need to reduce response size and processing time
  3. Character Limits: Stay within 100,000 characters per document and 1,000,000 total per request
  4. Error Handling: Always handle 402 responses to get payment requirements
  5. Network Selection: Choose network based on your wallet capabilities (Base or Solana)
  6. Caching: Cache results for identical text inputs to avoid redundant processing

Pricing

  • Cost: $0.01 USDC per request (10,000 atomic units)
  • Payment Networks: Base or Solana (USDC)
  • No Subscription Required: Pay-per-use model perfect for AI agents and intermittent access
  • Credit Usage: 1 credit per 10,000 characters (additional blocks consume 1 credit each)

Support

For questions about Natural Language Processing or integration help, refer to: - Payment Integration Guide - Diffbot Extraction - For URL-based content extraction - Contact the Bridge402 development team