Pelaikan API Documentation

Basics

The production API base is currently:

https://pelaikan-api-772587166095.europe-west9.run.app

Firebase Hosting also rewrites /api/** to the Cloud Run service, so website calls can use same-origin paths such as /api/web/audit/resolve.

Authentication

Every main endpoint requires an access token:

Authorization: Bearer <access_token>

The walkthrough token is a short-lived HS256 JWT with issuer footcheck, audience footcheck-api, a subject user id, a client id, and roles such as api or user.

For API access with a bearer token, email pelekandc@gmail.com.

Common Fields

document_id: stable id used for OCR, parsing, resolution, and result caches.
data_source: usually internet. folder is for desktop bridge clients with a local database.
internet_policy: off, smart, or on. Hosted /web/* resolution behaves as on.
client_id: normally read from the token; only pass it for bridge-style desktop operations.

Health and Identity

GET

/version

Returns the backend version payload. This endpoint is useful for smoke checks and does not use the full workflow body.

GET

/session/whoami

Returns token claims such as sub, cid, and roles. Use it to verify that the bearer token is accepted before running expensive steps.

One Go Endpoint

Use POST /v1/verify_pdf when you want the server to run OCR, parse citations, resolve sources, and analyze congruence in one request. The response is a Server-Sent Events stream, so clients receive progress and result rows as work completes.

POST

/v1/verify_pdf

Input	`multipart/form-data` with required `file` PDF, required `document_id`, and optional `config` as JSON text.
Output	`text/event-stream`. Events are `start`, `progress`, `row`, `complete`, and `error`.
Auth	Requires `Authorization: Bearer <access_token>`. The token subject is used for document registration, OCR cache, and credit checks. To request a bearer token, email pelekandc@gmail.com.

Config

The config form field is JSON. You can also fetch the live schema from GET /v1/verify_pdf/config-schema.

Field	Default	Meaning
`allowed_kinds`	`null`	Source kinds to verify, for example `["precedent","legal_norm"]`. `null` verifies all kinds.
`skip_cross_references`	`true`	Skip references like `ibid.` or `see supra`.
`require_claim`	`true`	Skip citations without an extracted claim.
`skip_dry`	`false`	Skip bare/dry citations if set to `true`.
`include_footnotes`	`true`	Parse and verify footnote citations.
`include_inline`	`true`	Parse and verify inline citations from page body text.
`inline_pages`	`null`	Optional 1-based page numbers to parse inline. `null` means all pages.
`congruence_method`	`micro_tasks`	`micro_tasks` or `standard`.
`internet_policy`	`smart`	`off`, `smart`, or `on`.
`concurrency`	`4`	Internal parallelism, clamped server-side to the endpoint cap.
`max_footnotes`	`null`	Optional cost-control cap on parsed footnotes.
`max_inline`	`null`	Optional cost-control cap on inline citations.
`force_ocr`	`false`	Force OCR even if a cached result exists.

cURL

curl -N -X POST "$API_BASE/v1/verify_pdf" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -F "file=@brief.pdf;type=application/pdf" \
  -F "document_id=brief-2026-06-05" \
  -F 'config={
    "allowed_kinds": ["precedent", "legal_norm"],
    "include_footnotes": true,
    "include_inline": true,
    "inline_pages": [1, 2, 3],
    "max_footnotes": 20,
    "max_inline": 30,
    "concurrency": 4
  }'

JavaScript

async function verifyPdf({ apiBase, accessToken, file, documentId }) {
  const form = new FormData();
  form.append("file", file);
  form.append("document_id", documentId);
  form.append("config", JSON.stringify({
    allowed_kinds: ["precedent", "legal_norm"],
    include_footnotes: true,
    include_inline: true,
    require_claim: true,
    concurrency: 4,
  }));

  const response = await fetch(`${apiBase}/v1/verify_pdf`, {
    method: "POST",
    headers: { Authorization: `Bearer ${accessToken}` },
    body: form,
  });

  const reader = response.body
    .pipeThrough(new TextDecoderStream())
    .getReader();

  let buffer = "";
  const rows = [];
  let summary = null;

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += value;

    const chunks = buffer.split("\n\n");
    buffer = chunks.pop() || "";

    for (const chunk of chunks) {
      const event = chunk.match(/^event: (.+)$/m)?.[1];
      const dataLine = chunk.match(/^data: (.+)$/m)?.[1];
      if (!event || !dataLine) continue;
      const data = JSON.parse(dataLine);

      if (event === "row") rows.push(data);
      if (event === "complete") summary = data.summary;
      if (event === "error") throw new Error(data.detail || data.error);
    }
  }

  return { rows, summary };
}

Stream Events

Event	Payload
`start`	`document_id`, page count, footnote count, `ocr_cached`, elapsed seconds.
`progress`	`stage`, `completed`, and `total`. Stages include `parse_footnotes`, `parse_inline`, and `resolve_analyze`.
`row`	One verified citation row with `kind`, `footnote_id`, `page_number`, `cite_id`, `cite_raw`, `source_kind`, `claim`, resolution fields, `congruence_label`, `calibration`, `quote_label`, and truncated `rationale`.
`complete`	All `rows`, plus `summary.total`, `summary.by_label`, and elapsed seconds.
`error`	`error` and optional `detail`. Insufficient credit is reported here when the initial balance check fails.

This endpoint is synchronous from the client's point of view, but it streams progress to avoid holding a silent long-running request. Keep the HTTP connection open until the complete event arrives.

Workflow

The API follows the same four-stage split as the root walkthrough notebook.

Upload a PDF and extract pages plus footnotes.
Parse citations from a footnote or from inline page text.
Resolve one citation to an actual source.
Analyze whether the source supports the claim.

curl -X POST "$API_BASE/web/audit/resolve" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "citation": { "...": "one parsed citation" },
    "document_id": "example-document",
    "data_source": "internet",
    "internet_policy": "smart",
    "parsed_citations": []
      }'

Web App Endpoints

Use these endpoints for the Firebase-hosted web app and for normal HTTP API clients. They store OCR and verification cache data in cloud storage under the authenticated user.

POST

/web/pdf/footnotes_md

Runs OCR on a PDF, caches OCR in cloud storage, registers the document, and extracts footnotes.

Input	`multipart/form-data`: `file` PDF, required `document_id`, optional `document_path`, optional `force_ocr` string `true`/`false`.
Output	`headings`, `footnotes`, `pages`, `cached`, `document_id`, and `ocr_config`. Each page includes `page_number`, full OCR `markdown`, and `main_text` with footnote definitions removed.

POST

/web/audit/parse_citations

Parses structured citations from one extracted footnote and caches the parse result.

Input	JSON with `footnote`, required `document_id`, optional `notes`, `data_source`, `internet_policy`, and `client_id`.
Output	`status`, normalized `footnote`, `citations`, `claim`, and `cached`. Each citation includes an id, source object, raw citation text, and often an extracted claim.

POST

/web/audit/parse_inline

Parses citations from page body text for documents that cite inline instead of in footnotes. The response is streamed as Server-Sent Events.

Input	JSON with `pages`, required `document_id`, optional `document_path`, and `data_source`. Page objects usually include `page_number`, `text`, `context`, `prev_context`, `rich_text`, and `reference_id`.
Output	`text/event-stream` progress events. The final `complete` event contains `results`, `cached_count`, `fresh_count`, `total_count`, and `inline_profile`.

POST

/web/audit/parse_selection

Parses user-selected text as a manual inline citation.

Input	`selected_text`, `page_number`, `page_text`, required `document_id`, optional surrounding page text, `heading_id`, and `data_source`.
Output	`status`, one parsed `citation` with manual metadata, a `reference_id`, and a message.

POST

/web/audit/resolve

Resolves one parsed citation to a source using citation-specific APIs, web search, and the cloud library.

Input	JSON with required `citation` and `document_id`. Optional `data_source`, `internet_policy`, `notes`, `footnote`, and `parsed_citations`. For inline citations, omit `footnote` and pass sibling citations from the same page.
Output	`status`, `result`, and `cached`. The result can include `file_id`, `method`, `score`, `url`, `source_type`, agent reasoning fields, `failure_reason`, and `failed_urls`.

POST

/web/audit/analyze

Checks whether the resolved source supports the citation claim. This is the verdict step and is intentionally fresh for web app calls, then cached for reopening the document later.

Input	JSON with `citation`, `resolve_result`, `claim`, and `document_id`. Optional `congruence_method` defaults to `micro_tasks`; optional `footnote` and `parsed_citations` add context.
Output	`status`, `evidence_snips`, `quote_result`, `quote_label`, `congruence_result`, `cite`, `citation_type`, and `source_url`. Insufficient credit can return HTTP `402`.

GET

/web/documents/{document_id}/cached-results

Returns saved parse, resolve, congruence, and inline results for restoring UI state.

Input	`document_id` path parameter.
Output	`status`, `document_id`, `cached_results`, `footnote_count`, `inline_page_count`, and `inline_parsed`.

GET

/web/proxy-pdf?url=...

Fetches an external PDF server-side for authenticated users when the browser cannot load it because of CORS. Only public HTTP(S) hosts are allowed, redirects are revalidated, and responses are size-limited.

Input	`url` query parameter pointing to a PDF.
Output	`application/pdf` bytes, or an HTTP error if the URL is unsafe, too large, unreachable, or not a PDF.

Documents, Reports, and Cached Files

These endpoints manage the authenticated user's document registry, generated reports, OCR cache files, and database artifacts.

Endpoint	Input	Output
`GET /documents?limit=50&order_by=last_accessed_at`	Optional query parameters.	`documents` list and `total`.
`GET /documents/stats`	No body.	`document_count`, `total_reports`, `total_db_size_bytes`.
`GET /documents/{document_id}`	Document id path parameter.	One document record with name, counts, cache flags, and timestamps.
`GET /documents/{document_id}/open-data`	Document id path parameter.	Reconstructed OCR pages, footnotes, cached results, and document metadata for reopening a web document.
`GET /documents/{document_id}/reports`	Document id plus optional `limit`.	`reports` list and `total`.
`POST /documents/reports`	`document_id`, `report_name`, optional `report_url`, `report_type`, and citation counts.	Registered report metadata.
`DELETE /documents/{document_id}`	Document id path parameter.	Deletion status and storage cleanup details.
`DELETE /documents/{document_id}/reports/{report_id}`	Document and report ids.	Deletion status and storage cleanup details.
`GET /documents/{document_id}/ocr`	Document id path parameter.	List of cached OCR file names.
`GET /documents/{document_id}/ocr/{pdf_filename}`	Document id and PDF file name.	Cached OCR pages for that PDF.
`POST /documents/{document_id}/ocr/{pdf_filename}`	OCR cache JSON body.	Upload status and storage blob path.
`DELETE /documents/{document_id}/ocr/{pdf_filename}`	Document id and PDF file name.	Deletion status.
`GET /documents/{document_id}/ocr/{pdf_filename}/download-url`	Optional `expiration_minutes`.	Signed or token-backed download URL.
`GET /documents/{document_id}/database/download-url`	Optional `expiration_minutes`.	Download URLs and sizes for `sources.db` and `sources.faiss`.

Batch Endpoints

Endpoint	Input	Output
`POST /documents/batches`	`items` array of `document_id`, optional `data_folder`, and `inline_citations`; optional `metadata`.	Created batch with item statuses and status counts.
`GET /documents/batches`	Optional `limit`.	Recent batch summaries.
`GET /documents/batches/{batch_id}`	Batch id.	Batch status with per-item progress.
`GET /documents/batches/{batch_id}/results`	Batch id.	Batch result item records.

Specialized Endpoints

POST

/fr/resolve

Resolves French legal references through Legifrance and Judilibre.

Input	`query` is required. Optional `source`: `legifrance`, `judilibre`, or `auto`; `kind`: `law`, `admin_case`, `judicial_case`, or `any`; optional `supplies`, `jurisdictions`, `days_back`, and `page_size`.
Output	`status` as `ok`, `not_found`, or `error`, plus `source`, `kind`, `title`, `url`, `external_id`, optional `raw`, and optional `rationale`.

POST

/metadata/extract

Extracts and saves source-document metadata for files in a document database. This is mainly for desktop or internal database-building flows.

Input	Query/body parameters accepted by FastAPI: required `document_id`, optional `document_path`, `client_id`, and `max_files`.
Output	`status`, `processed`, `total_files`, and optional `errors` or an explanatory message.

POST

/pdf/generate_html_report and /pdf/generate_html_report_preview

Generates a citation report from already parsed pages and verified footnotes. The preview endpoint returns HTML directly; the report endpoint returns a downloadable HTML file.

Input	JSON with `pages`, `footnotes`, optional `pdf_name`, `document_id`, and `document_path`.
Output	`text/html` preview or a downloadable `_citation_report.html` file.

Desktop and Bridge Routes

The plain /pdf/* and /audit/* routes mirror much of the web workflow, but they are designed for the desktop app, Word add-in, local catalog databases, and bridge clients. Use the /web/* routes for hosted HTTP API clients unless you are explicitly integrating with the desktop bridge.

Endpoint	Purpose
`POST /pdf/ocr_markdown`	Run OCR on an uploaded PDF and return structured pages.
`POST /pdf/footnotes_md`	Extract footnotes and pages from a PDF, including local cache results when a document database is available.
`POST /audit/all`	Decode a base64 DOCX and return all footnotes plus headings.
`POST /audit/parse_citations`	Parse citations from one footnote.
`POST /audit/parse_inline`	Parse inline citations from page payloads; streams progress.
`POST /audit/resolve`	Resolve a citation, optionally with local bridge context.
`POST /audit/analyze`	Gather evidence and evaluate congruence.
`POST /audit/manual_resolution`	Persist a user-chosen citation-to-file mapping.
`POST /audit/single` and `GET /audit/result/{footnote_id}`	Start and poll background single-footnote audits.

Bridge routes can require a live desktop client, a matching token client id, and local database paths. For hosted web usage, prefer the web workflow above.

Errors and Operational Notes

401: missing or invalid bearer token, or no user id in the token.
402: insufficient credits before /web/audit/analyze.
403: authenticated but not allowed for the requested scoped operation.
404: document, report, OCR cache, or database artifact was not found.
413: proxied PDF exceeds the allowed size.
415: proxied URL did not return a PDF.
500/502: backend, upstream fetch, OCR, LLM, or bridge failure.

Long-running endpoints can take several minutes. Keep client timeouts high for OCR, resolution, and analysis, and consume streaming endpoints as text/event-stream.