Data types

AIMP currently supports eight types of fields, which are used to define data mappings for indexing. Note that the whole document are stored as is regardless of its existence in index mappings, but being stored does not necessarily mean they are searchable.

text

A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that they are passed through an analyzer to convert the string into a list of individual terms before being indexed. The analysis process allows AIMP to search for individual words within each full text field. text fields are best suited for unstructured but human-readable content. If you need to index structured content such as email addresses, hostnames, status codes, or tags, it is likely that you should rather use a keyword field. AIMP supports four analyzers for tokenization: standard(default), korean, japanese, english. You can specify multiple analyzers to a single text field to improve search performance.

. (dot) character cannot be used as a field name and you can only add mappings after creating an index. Modifying or deleting existing mappings is not supported.

keyword

keyword field type is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags. Keyword fields are often used in sorting, aggregations, and term-level queries, such as term. Note that AIMP does not index any string longer than 4096.

long

A signed 64-bit integer with a minimum value of

-2^{63}

and a maximum value of

2^{63}-1

. long fields are optimized for scoring, sorting, and range queries.

double

A double-precision 64-bit IEEE 754 floating point number, restricted to finite values. double fields are optimized for scoring, sorting, and range queries.

boolean

boolean fields accept JSON true and false values, but can also accept strings which are interpreted as either true or false.

datetime

Date and time in RFC 3339 format. datetime fields are optimized for sorting and range queries, or to search for a specific time.

import pytz
from datetime import datetime

dt = datetime.now()
print(dt.astimezone(pytz.UTC).isoformat(timespec="seconds"))

# 2024-11-05T14:27:56+00:00

object

JSON documents are hierarchical in nature: the document may contain inner objects which, in turn, may contain inner objects themselves. Internally, this document is indexed as a simple, flat list of key-value pairs. The fields within the object, which can be of any data type, including object. objectMapping should be specified in order to index the fields inside the object.

{
	"content": {
    "type": "text",
    "analyzers": ["english", "korean", "japanese"]
  },
  "metadata": {
    "type": "object",
    "objectMapping": {
      "url": {
        "type": "keyword"
      },
      "author": {
        "type": "keyword"
      }
    }
  }
}

vector

The vector field type stores dense vectors of numeric values. vector fields are primarily used for k-nearest neighbor (kNN) search. The dense_vector type does not support aggregations or sorting. You add a dense_vector field as an array of numeric values A k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector, as measured by a similarity metric. AIMP supports four similarity metrics including l2_norm(euclidean), dot_product, cosine, max_inner_product. You can define the vector similarity to use in kNN search.

An example index mappings

{
  "index_id": {
    "type": "keyword"
  },
  "parent_id": {
    "type": "keyword"
  },
  "summary": {
    "type": "text",
    "analyzers": ["english", "korean", "japanese"]
  },
  "summary_model": {
    "type": "keyword"
  },
  "node_type": {
    "type": "keyword"
  },
  "embed_model": {
    "type": "keyword"
  },
  "content_type": {
    "type": "keyword"
  },
  "content": {
    "type": "text",
    "analyzers": ["english", "korean", "japanese"]
  },
  "text_embedding": {
    "dimensions": 512,
    "similarity": "cosine",
    "type": "vector"
  },
  "created_at": {
    "type": "datetime"
  },
  "updated_at": {
    "type": "datetime"
  },
  "node_info": {
    "type": "object",
    "objectMapping": {
      "next": {
        "type": "keyword"
      },
      "prev": {
        "type": "keyword"
      }
    }
  },
  "metadata": {
    "type": "object",
    "objectMapping": {
      "robots_allowed": {
        "type": "boolean"
      },
      "failed_crawl": {
        "type": "boolean"
      },
      "md5_digest": {
        "type": "keyword"
      },
      "url": {
        "type": "keyword"
},
      "og_image": {
        "type": "keyword"
      },
      "related_images": {
        "type": "keyword"
      }
    }
  }
}

Get started

Indexes

Data

Projects

text

keyword

long

double

boolean

datetime

object

vector

An example index mappings

Get started

Indexes

Data

Projects

​text

​keyword

​long

​double

​boolean

​datetime

​object

​vector

​An example index mappings

text

keyword

long

double

boolean

datetime

object

vector

An example index mappings