Send and search documents

Index documents and retrieve them, and REST API documentation.

Interacting with the API

Once you have Memoire ready and running (the logs should indicate this), you can now start ingesting documents and search them. Memoire will be listening on port 3003.

To interact with the API, you will alway need to set the Authorization header:

    bash
    curl http://localhost:3003/endpoint -H "Authorization: Bearer my_API_KEY"

You should have set the API key in the installation step.

Ingestion

The first step will be to send documents to Memoire. You have three different way of sending documents: raw text, urls that Memoire will download, or multipart files.

With raw text

You need to send a JSON array with multiple objects (one for each document you would like to ingest). Only two fields are required: the documentID, which MUST have only characters, numbers dashes or underscore; and the content of the document (ideally in markdown).

    json
    [
  {
    "content": "# A test document\nhello world",
    "documentID": "abc-123",
    "metadata": {
      "meta": "data"
    },
    "title": "File 1"
  }
]

With urls

Note: this example below if fully functional, and will download one of our TXT files used in unit tests.

    json
    [
  {
    "documentID": "abc-123",
    "metadata": {
      "meta": "data"
    },
    "title": "A test txt file",
    "url": "https://raw.githubusercontent.com/A-star-logic/memoire/refs/heads/main/src/parser/tests/sampleFiles/test.txt"
  }
]

With multipart data

Coming very soon...

Ingestion speed & failure

The application has been optimised for fast retrieval, and will be much slower to ingest new documents.

Today you will have to wait for Memoire to finish ingesting documents, this is because we do not have yet a retry mechanism in case of failure. You can follow this issue to know when this has been changed.

Updating documents

To update documents, simply call the same endpoint with the same ID; Memoire will execute an upsert automatically.

Retrieval

Once your documents have been ingested, you are ready to do a search. Memoire will automatically execute a hybrid search and apply re-ranking on the results.

The results will be sorted by score; With the following structure:

    json
    {
  "results": [
    {
      "content": "this is a sample",
      "documentID": "my-document-1",
      "highlights": "this is the extracted context",
      "metadata": {},
      "score": 1,
      "title": "Test"
    }
  ]
}

Here's a run through all those fields:

Title; score and documentID should be self-explanatory.
The metadata is the additional metadata you added in the ingestion step. This object is always present (even if you did not set metadata).
The highlight is the chunk or specific context of the document that matches the query. In rare cases it can be missing (ex: a document that rank on keywords, but not on vectors).
The content is the full content of the document.

You will still have a little bit of logic before sending this raw to your user or LLM.

Some documents might be duplicated, because we are searching by context (and not by document). This mean you will have much more focused text (== relevant to the query) in the highlights, with the content field as a fallback when there is no highlight.

In a simplified version:

    json
    {
  "results": [
    {
      "content": "Table of content.... Chapter 1: page 3.... Chapter 2: page 7......",
      "documentID": "my-document-1",
      "highlights": "Chapter 3....",
      "metadata": {},
      "score": 1,
      "title": "My book"
    },
    {
      "content": "Table of content.... Chapter 1: page 3.... Chapter 2: page 7......",
      "documentID": "my-document-1",
      "highlights": "Chapter 1....",
      "metadata": {},
      "score": 0.5,
      "title": "My book"
    }
  ]
}

Deleting documents

In the case you need to delete a document, similarly to the ingestion, it is preferable to do this in batches. You will only need to send an array of document IDs

    json
    ["my-document-1", "my-document-2"]

Fetching documents / verifying they exist

Because Memoire can also be used as a primary store for your documents (although remember we only keep text, not the original file). It might be useful for you to get the document back (or simply check the document exist).

Two GET endpoints are available just for this.

REST documentation

Remember this documentation is always available directly from the API at http://localhost:3003/docs by setting the environment variable SHOW_DOC to true.

The doc is also available online at https://memoire.apidocumentation.com (this is temporary, and it will soon be available on astarlogic.com)