Send and search documents
Index documents and retrieve them, and REST API documentation.
Interacting with the API
Once you have Memoire ready and running (the logs should indicate this), you can now start ingesting documents and search them. Memoire will be listening on port 3003.
To interact with the API, you will alway need to set the Authorization header:
bashcurl http://localhost:3003/endpoint -H "Authorization: Bearer my_API_KEY"
You should have set the API key in the installation step.
Ingestion
The first step will be to send documents to Memoire. You have three different way of sending documents: raw text, urls that Memoire will download, or multipart files.
With raw text
You need to send a JSON array with multiple objects (one for each document you would like to ingest). Only two fields are required: the documentID, which MUST have only characters, numbers dashes or underscore; and the content of the document (ideally in markdown).
json[ { "content": "# A test document\nhello world", "documentID": "abc-123", "metadata": { "meta": "data" }, "title": "File 1" } ]
With urls
You need to send a JSON array with multiple objects (one for each document you would like to ingest). Only two fields are required: the documentID, which MUST have only characters, numbers dashes or underscore; and the url, it should be a reachable url (it can be internal or external, with or without authentication, but Memoire needs to have access to this URL).
Note: this example below if fully functional, and will download one of our TXT files used in unit tests.
json[ { "documentID": "abc-123", "metadata": { "meta": "data" }, "title": "A test txt file", "url": "https://raw.githubusercontent.com/A-star-logic/memoire/refs/heads/main/src/parser/tests/sampleFiles/test.txt" } ]
With multipart data
Coming very soon...
Ingestion speed & failure
The application has been optimised for fast retrieval, and will be much slower to ingest new documents.
Today you will have to wait for Memoire to finish ingesting documents, this is because we do not have yet a retry mechanism in case of failure. You can follow this issue to know when this has been changed.
Updating documents
To update documents, simply call the same endpoint with the same ID; Memoire will execute an upsert automatically.
Retrieval
Once your documents have been ingested, you are ready to do a search. Memoire will automatically execute a hybrid search and apply re-ranking on the results.
The results will be sorted by score; With the following structure:
json{ "results": [ { "content": "this is a sample", "documentID": "my-document-1", "highlights": "this is the extracted context", "metadata": {}, "score": 1, "title": "Test" } ] }
Here's a run through all those fields:
Title
;score
anddocumentID
should be self-explanatory.- The
metadata
is the additional metadata you added in the ingestion step. This object is always present (even if you did not set metadata). - The
highlight
is the chunk or specific context of the document that matches the query. In rare cases it can be missing (ex: a document that rank on keywords, but not on vectors). - The
content
is the full content of the document.
You will still have a little bit of logic before sending this raw to your user or LLM.
Some documents might be duplicated, because we are searching by context (and not by document). This mean you will have much more focused text (== relevant to the query) in the highlights, with the content field as a fallback when there is no highlight.
In a simplified version:
json{ "results": [ { "content": "Table of content.... Chapter 1: page 3.... Chapter 2: page 7......", "documentID": "my-document-1", "highlights": "Chapter 3....", "metadata": {}, "score": 1, "title": "My book" }, { "content": "Table of content.... Chapter 1: page 3.... Chapter 2: page 7......", "documentID": "my-document-1", "highlights": "Chapter 1....", "metadata": {}, "score": 0.5, "title": "My book" } ] }
Deleting documents
In the case you need to delete a document, similarly to the ingestion, it is preferable to do this in batches. You will only need to send an array of document IDs
json["my-document-1", "my-document-2"]
Fetching documents / verifying they exist
Because Memoire can also be used as a primary store for your documents (although remember we only keep text, not the original file). It might be useful for you to get the document back (or simply check the document exist).
Two GET
endpoints are available just for this.
REST documentation
Remember this documentation is always available directly from the API at http://localhost:3003/docs by setting the environment variable SHOW_DOC
to true
.
The doc is also available online at https://memoire.apidocumentation.com (this is temporary, and it will soon be available on astarlogic.com)