Best practices

What to look out for when using Memoire.

Ingestion speed & batch operations

Memoire is based on the assumption that you will have:

  • Plenty of time to index.
  • Search very often, with speed being a critical factor.
  • Index a lot of documents the first time (ex: onboarding a customer, deploying your app, etc).
  • Periodically add a few documents over time (ex: customer adds a new record to their CRM).
  • Delete many documents at once at fixed intervals (ex: a job queue cleans stale data).

It is important, whenever you can, to add or delete documents in batch. Adding or deleting a single document is fast, but they have a linear index time, while batch operations are... well, less linear.

Data sources & poisoning

The URL upload is a convenient way to index many documents at once, and can also reduce your egress (Memoire downloads the documents directly from the source instead of you having to send over data).

However, this method can create a vector of data poisoning (aka, bad quality data, false answers, false documents, etc). This is why it will be important for you to verify where those documents are coming from (and please, PLEASE, never allow random users to upload their own documents unless you have strong censorship countermeasures).

The biggest problem in computing is between the computer and the chair.