Basics
The production API base is currently:
https://pelaikan-api-772587166095.europe-west9.run.app
Firebase Hosting also rewrites /api/** to the Cloud Run service, so website calls can use same-origin paths such as /api/web/audit/resolve.
Authentication
Every main endpoint requires an access token:
Authorization: Bearer <access_token>
The walkthrough token is a short-lived HS256 JWT with issuer footcheck, audience footcheck-api, a subject user id, a client id, and roles such as api or user.
For API access with a bearer token, email pelekandc@gmail.com.
Common Fields
document_id: stable id used for OCR, parsing, resolution, and result caches.data_source: usuallyinternet.folderis for desktop bridge clients with a local database.internet_policy:off,smart, oron. Hosted/web/*resolution behaves ason.client_id: normally read from the token; only pass it for bridge-style desktop operations.
Health and Identity
Returns the backend version payload. This endpoint is useful for smoke checks and does not use the full workflow body.
Returns token claims such as sub, cid, and roles. Use it to verify that the bearer token is accepted before running expensive steps.
One Go Endpoint
Use POST /v1/verify_pdf when you want the server to run OCR, parse citations, resolve sources, and analyze congruence in one request. The response is a Server-Sent Events stream, so clients receive progress and result rows as work completes.
| Input | multipart/form-data with required file PDF, required document_id, and optional config as JSON text. |
|---|---|
| Output | text/event-stream. Events are start, progress, row, complete, and error. |
| Auth | Requires Authorization: Bearer <access_token>. The token subject is used for document registration, OCR cache, and credit checks. To request a bearer token, email pelekandc@gmail.com. |
Config
The config form field is JSON. You can also fetch the live schema from GET /v1/verify_pdf/config-schema.
| Field | Default | Meaning |
|---|---|---|
allowed_kinds | null | Source kinds to verify, for example ["precedent","legal_norm"]. null verifies all kinds. |
skip_cross_references | true | Skip references like ibid. or see supra. |
require_claim | true | Skip citations without an extracted claim. |
skip_dry | false | Skip bare/dry citations if set to true. |
include_footnotes | true | Parse and verify footnote citations. |
include_inline | true | Parse and verify inline citations from page body text. |
inline_pages | null | Optional 1-based page numbers to parse inline. null means all pages. |
congruence_method | micro_tasks | micro_tasks or standard. |
internet_policy | smart | off, smart, or on. |
concurrency | 4 | Internal parallelism, clamped server-side to the endpoint cap. |
max_footnotes | null | Optional cost-control cap on parsed footnotes. |
max_inline | null | Optional cost-control cap on inline citations. |
force_ocr | false | Force OCR even if a cached result exists. |
cURL
curl -N -X POST "$API_BASE/v1/verify_pdf" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-F "file=@brief.pdf;type=application/pdf" \
-F "document_id=brief-2026-06-05" \
-F 'config={
"allowed_kinds": ["precedent", "legal_norm"],
"include_footnotes": true,
"include_inline": true,
"inline_pages": [1, 2, 3],
"max_footnotes": 20,
"max_inline": 30,
"concurrency": 4
}'
JavaScript
async function verifyPdf({ apiBase, accessToken, file, documentId }) {
const form = new FormData();
form.append("file", file);
form.append("document_id", documentId);
form.append("config", JSON.stringify({
allowed_kinds: ["precedent", "legal_norm"],
include_footnotes: true,
include_inline: true,
require_claim: true,
concurrency: 4,
}));
const response = await fetch(`${apiBase}/v1/verify_pdf`, {
method: "POST",
headers: { Authorization: `Bearer ${accessToken}` },
body: form,
});
const reader = response.body
.pipeThrough(new TextDecoderStream())
.getReader();
let buffer = "";
const rows = [];
let summary = null;
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += value;
const chunks = buffer.split("\n\n");
buffer = chunks.pop() || "";
for (const chunk of chunks) {
const event = chunk.match(/^event: (.+)$/m)?.[1];
const dataLine = chunk.match(/^data: (.+)$/m)?.[1];
if (!event || !dataLine) continue;
const data = JSON.parse(dataLine);
if (event === "row") rows.push(data);
if (event === "complete") summary = data.summary;
if (event === "error") throw new Error(data.detail || data.error);
}
}
return { rows, summary };
}
Stream Events
| Event | Payload |
|---|---|
start | document_id, page count, footnote count, ocr_cached, elapsed seconds. |
progress | stage, completed, and total. Stages include parse_footnotes, parse_inline, and resolve_analyze. |
row | One verified citation row with kind, footnote_id, page_number, cite_id, cite_raw, source_kind, claim, resolution fields, congruence_label, calibration, quote_label, and truncated rationale. |
complete | All rows, plus summary.total, summary.by_label, and elapsed seconds. |
error | error and optional detail. Insufficient credit is reported here when the initial balance check fails. |
This endpoint is synchronous from the client's point of view, but it streams progress to avoid holding a silent long-running request. Keep the HTTP connection open until the complete event arrives.
Workflow
The API follows the same four-stage split as the root walkthrough notebook.
- Upload a PDF and extract pages plus footnotes.
- Parse citations from a footnote or from inline page text.
- Resolve one citation to an actual source.
- Analyze whether the source supports the claim.
curl -X POST "$API_BASE/web/audit/resolve" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"citation": { "...": "one parsed citation" },
"document_id": "example-document",
"data_source": "internet",
"internet_policy": "smart",
"parsed_citations": []
}'
Web App Endpoints
Use these endpoints for the Firebase-hosted web app and for normal HTTP API clients. They store OCR and verification cache data in cloud storage under the authenticated user.
Runs OCR on a PDF, caches OCR in cloud storage, registers the document, and extracts footnotes.
| Input | multipart/form-data: file PDF, required document_id, optional document_path, optional force_ocr string true/false. |
|---|---|
| Output | headings, footnotes, pages, cached, document_id, and ocr_config. Each page includes page_number, full OCR markdown, and main_text with footnote definitions removed. |
Parses structured citations from one extracted footnote and caches the parse result.
| Input | JSON with footnote, required document_id, optional notes, data_source, internet_policy, and client_id. |
|---|---|
| Output | status, normalized footnote, citations, claim, and cached. Each citation includes an id, source object, raw citation text, and often an extracted claim. |
Parses citations from page body text for documents that cite inline instead of in footnotes. The response is streamed as Server-Sent Events.
| Input | JSON with pages, required document_id, optional document_path, and data_source. Page objects usually include page_number, text, context, prev_context, rich_text, and reference_id. |
|---|---|
| Output | text/event-stream progress events. The final complete event contains results, cached_count, fresh_count, total_count, and inline_profile. |
Parses user-selected text as a manual inline citation.
| Input | selected_text, page_number, page_text, required document_id, optional surrounding page text, heading_id, and data_source. |
|---|---|
| Output | status, one parsed citation with manual metadata, a reference_id, and a message. |
Resolves one parsed citation to a source using citation-specific APIs, web search, and the cloud library.
| Input | JSON with required citation and document_id. Optional data_source, internet_policy, notes, footnote, and parsed_citations. For inline citations, omit footnote and pass sibling citations from the same page. |
|---|---|
| Output | status, result, and cached. The result can include file_id, method, score, url, source_type, agent reasoning fields, failure_reason, and failed_urls. |
Checks whether the resolved source supports the citation claim. This is the verdict step and is intentionally fresh for web app calls, then cached for reopening the document later.
| Input | JSON with citation, resolve_result, claim, and document_id. Optional congruence_method defaults to micro_tasks; optional footnote and parsed_citations add context. |
|---|---|
| Output | status, evidence_snips, quote_result, quote_label, congruence_result, cite, citation_type, and source_url. Insufficient credit can return HTTP 402. |
Returns saved parse, resolve, congruence, and inline results for restoring UI state.
| Input | document_id path parameter. |
|---|---|
| Output | status, document_id, cached_results, footnote_count, inline_page_count, and inline_parsed. |
Fetches an external PDF server-side for authenticated users when the browser cannot load it because of CORS. Only public HTTP(S) hosts are allowed, redirects are revalidated, and responses are size-limited.
| Input | url query parameter pointing to a PDF. |
|---|---|
| Output | application/pdf bytes, or an HTTP error if the URL is unsafe, too large, unreachable, or not a PDF. |
Documents, Reports, and Cached Files
These endpoints manage the authenticated user's document registry, generated reports, OCR cache files, and database artifacts.
| Endpoint | Input | Output |
|---|---|---|
GET /documents?limit=50&order_by=last_accessed_at | Optional query parameters. | documents list and total. |
GET /documents/stats | No body. | document_count, total_reports, total_db_size_bytes. |
GET /documents/{document_id} | Document id path parameter. | One document record with name, counts, cache flags, and timestamps. |
GET /documents/{document_id}/open-data | Document id path parameter. | Reconstructed OCR pages, footnotes, cached results, and document metadata for reopening a web document. |
GET /documents/{document_id}/reports | Document id plus optional limit. | reports list and total. |
POST /documents/reports | document_id, report_name, optional report_url, report_type, and citation counts. | Registered report metadata. |
DELETE /documents/{document_id} | Document id path parameter. | Deletion status and storage cleanup details. |
DELETE /documents/{document_id}/reports/{report_id} | Document and report ids. | Deletion status and storage cleanup details. |
GET /documents/{document_id}/ocr | Document id path parameter. | List of cached OCR file names. |
GET /documents/{document_id}/ocr/{pdf_filename} | Document id and PDF file name. | Cached OCR pages for that PDF. |
POST /documents/{document_id}/ocr/{pdf_filename} | OCR cache JSON body. | Upload status and storage blob path. |
DELETE /documents/{document_id}/ocr/{pdf_filename} | Document id and PDF file name. | Deletion status. |
GET /documents/{document_id}/ocr/{pdf_filename}/download-url | Optional expiration_minutes. | Signed or token-backed download URL. |
GET /documents/{document_id}/database/download-url | Optional expiration_minutes. | Download URLs and sizes for sources.db and sources.faiss. |
Batch Endpoints
| Endpoint | Input | Output |
|---|---|---|
POST /documents/batches | items array of document_id, optional data_folder, and inline_citations; optional metadata. | Created batch with item statuses and status counts. |
GET /documents/batches | Optional limit. | Recent batch summaries. |
GET /documents/batches/{batch_id} | Batch id. | Batch status with per-item progress. |
GET /documents/batches/{batch_id}/results | Batch id. | Batch result item records. |
Specialized Endpoints
Resolves French legal references through Legifrance and Judilibre.
| Input | query is required. Optional source: legifrance, judilibre, or auto; kind: law, admin_case, judicial_case, or any; optional supplies, jurisdictions, days_back, and page_size. |
|---|---|
| Output | status as ok, not_found, or error, plus source, kind, title, url, external_id, optional raw, and optional rationale. |
Extracts and saves source-document metadata for files in a document database. This is mainly for desktop or internal database-building flows.
| Input | Query/body parameters accepted by FastAPI: required document_id, optional document_path, client_id, and max_files. |
|---|---|
| Output | status, processed, total_files, and optional errors or an explanatory message. |
Generates a citation report from already parsed pages and verified footnotes. The preview endpoint returns HTML directly; the report endpoint returns a downloadable HTML file.
| Input | JSON with pages, footnotes, optional pdf_name, document_id, and document_path. |
|---|---|
| Output | text/html preview or a downloadable _citation_report.html file. |
Desktop and Bridge Routes
The plain /pdf/* and /audit/* routes mirror much of the web workflow, but they are designed for the desktop app, Word add-in, local catalog databases, and bridge clients. Use the /web/* routes for hosted HTTP API clients unless you are explicitly integrating with the desktop bridge.
| Endpoint | Purpose |
|---|---|
POST /pdf/ocr_markdown | Run OCR on an uploaded PDF and return structured pages. |
POST /pdf/footnotes_md | Extract footnotes and pages from a PDF, including local cache results when a document database is available. |
POST /audit/all | Decode a base64 DOCX and return all footnotes plus headings. |
POST /audit/parse_citations | Parse citations from one footnote. |
POST /audit/parse_inline | Parse inline citations from page payloads; streams progress. |
POST /audit/resolve | Resolve a citation, optionally with local bridge context. |
POST /audit/analyze | Gather evidence and evaluate congruence. |
POST /audit/manual_resolution | Persist a user-chosen citation-to-file mapping. |
POST /audit/single and GET /audit/result/{footnote_id} | Start and poll background single-footnote audits. |
Bridge routes can require a live desktop client, a matching token client id, and local database paths. For hosted web usage, prefer the web workflow above.
Errors and Operational Notes
401: missing or invalid bearer token, or no user id in the token.402: insufficient credits before/web/audit/analyze.403: authenticated but not allowed for the requested scoped operation.404: document, report, OCR cache, or database artifact was not found.413: proxied PDF exceeds the allowed size.415: proxied URL did not return a PDF.500/502: backend, upstream fetch, OCR, LLM, or bridge failure.
Long-running endpoints can take several minutes. Keep client timeouts high for OCR, resolution, and analysis, and consume streaming endpoints as text/event-stream.