Weaviate is a cloud-native, modular, real-time vector search engine

Weaviate Weaviate logo

Build Status Go Report Card Coverage Status Slack Newsletter

Demo of Weaviate

Weaviate GraphQL demo on news article dataset containing: Transformers module, GraphQL usage, semantic search, _additional{} features, Q&A, and Aggregate{} function. You can the demo on this dataset in the GUI here: semantic search, Q&A, Aggregate.

Description

Weaviate is a cloud-native, real-time vector search engine (aka neural search engine or deep search engine). There are modules for specific use cases such as semantic search, plugins to integrate Weaviate in any application of your choice, and a console to visualize your data.

GraphQL - RESTful - vector search engine - vector database - neural search engine - semantic search - HNSW - deep search - machine learning - kNN

Features

Weaviate makes it easy to use state-of-the-art AI models while giving you the scalability, ease of use, safety and cost-effectiveness of a purpose-built vector database. Most notably:

  • Fast queries
    Weaviate typically performs a 10-NN neighbor search out of millions of objects in considerably less than 100ms.

  • Any media type with Weaviate Modules
    Use State-of-the-Art AI model inference (e.g. Transformers) for Text, Images, etc. at search and query time to let Weaviate manage the process of vectorizing your data for your - or import your own vectors.

  • Combine vector and scalar search
    Weaviate allows for efficient combined vector and scalar searches, e.g “articles related to the COVID 19 pandemic published within the past 7 days”. Weaviate stores both your objects and the vectors and make sure the retrieval of both is always efficient. There is no need for a third party object storage.

  • Real-time and persistent
    Weaviate let’s you search through your data even if it’s currently being imported or updated. In addition, every write is written to a Write-Ahead-Log (WAL) for immediately persisted writes - even when a crash occurs.

  • Horizontal Scalability
    Scale Weaviate for your exact needs, e.g. High-Availability, maximum ingestion, largest possible dataset size, maximum queries per second, etc. (Currently under development, ETA Fall 2021)

  • Cost-Effectiveness
    Very large datasets do not need to be kept entirely in memory in Weaviate. At the same time available memory can be used to increase the speed of queries. This allows for a conscious speed/cost trade-off to suit every use case.

  • Graph-like connections between objects
    Make arbitrary connections between your objects in a graph-like fashion to resemble real-life connections between your data points. Traverse those connections using GraphQL.

Documentation

You can find detailed documentation in the developers section of our website or directly go to one of the docs using the links in the list below.

Additional reading

Examples

You can find code examples here

Support

Contributing

Owner
SeMI Technologies
SeMI Technologies creates database software like the Weaviate vector search engine
SeMI Technologies
Comments
  • Vectorization mask for classes

    Vectorization mask for classes

    Currently all string/text values as well as the class name and property names are considered in the vectorization. However not all property names and values have be important for the context. Take the following meta class of a table as an example:

    {
                    "class": "Column",
                    "description": "",
                    "properties":[
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["int"],
                            "keywords": [],
                            "name": "index"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["text"],
                            "keywords": [],
                            "name": "name"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["string"],
                            "keywords": [],
                            "name": "dataType"
                        }
                    ]
                }
    

    In this case the vector would be created based on Column, index, name, data, type and the values of the properties. However the class name and properties shift the vector into the context of tables while the context should be based solely on the column name and dataType values.

    Proposal:

    • If nothing is specified the vector gets created automatically based on all information in the class
    • The user can explicitly mask information away from the vectorization in the schema:
    {
                    "class": "Column",
                    "description": "",
                    "vectorizeClassName": False
                    "properties":[
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["int"],
                            "keywords": [],
                            "name": "index"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["text"],
                            "keywords": [],
                            "name": "name"
                            "vectorizePropertyName": False
                            "vectorizePropertyValue": True
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["string"],
                            "keywords": [],
                            "name": "dataType"
                            "vectorizePropertyName": False
                            "vectorizePropertyValue": True
                        }
                    ]
                }
    
  • Hybrid Search (combining Vector + Sparse search)

    Hybrid Search (combining Vector + Sparse search)

    WHAT

    Combining vector search with sparse (e.g. BM25) search in one query

    WHY

    To bridge the gap between sparse search and vector search.

    Longer why

    Vector search (using dense vectors, computed by ML models) works well in-domain, but has poor performance out-of-domain. BM25 search works well out-of-domain, because it uses sparse methods (keyword matching), but can't perform context-based search. Combining both methods will improve search results out-of-domain.

    HOW

    • This issue and implementation depend on issue https://github.com/semi-technologies/weaviate/issues/2133
    • Do both a dense and BM25 search using a query (in parallel)
    • You should be able to define a function to combine the results into 1 result list, using the scores of data in both candidate lists.
    • A default function can be defined in the Weaviate setup, but can be overwritten in the GraphQL query.
    • BM25 score is unbounded. Score normalization or scaling is not a good idea, because you lose information on how good the results are textually. However, when combining BM25 with Dense search, some form of normalization may be handy, so we can choose to offer normalization methods anyways. We can explain strategies in the documentation. Possible strategies are:
      • minmax - To leave the distribution intact, the best normalization method is a minmax approach, which takes the minimum BM25 score and maximum BM25 score into account. Taking the maximum BM25 score of a particular query as maximum in the formula is not a good idea, because then you're setting this result to the maximum score regardless of the actual score. So a (theoretical) maximum needs to be defined, although BM25 is unbounded. This can be achieved by running a number of different queries, and recording the maximum value. Since this is quite complex to implement as a feature, we can start by offering a setting with a default value, which the user can change at runtime. An example can be: max((x/10), 1) if the guessed maximum is 10.
      • Arctangent - An arctan scales values in a logarithmic manner. A function to scale scores <-1, 1> (practically between <0,1> because x will be a positive score) is: 2/pi*arctan(x) (see https://www.mdpi.com/2227-7390/10/8/1335)

    Design - Weaviate setup

    Requirements:

    • Dense retrieval and sparse retrieval independently (with the same or different query)
    • Combine the result of both methods using a scoring function

    Schema The settings should be configured in the schema, per class:

    {
      "class": "string",
      "vectorIndexType": "hnsw",                
      "vectorIndexConfig": {
        ...                                     
      },
      "vectorizer": "text2vec-transformers",
      "moduleConfig": {
        "text2vec-transformers": {  
          "vectorizeClassName": true            
        }
      },                       
      "sparseIndexType": "bm25",                
      "sparseIndexConfig": {                   
        "bm25": {
          "b": 0.75,
          "k1": 1.2
        }
      },
      "properties": [                            
        {
          "name": "string",                     
          "description": "string",              
          "dataType": [                         
            "string"
          ],
          "sparseIndexConfig": {                
            "bm25": {
              "b": 0.75,
              "k1": 1.2, 
            }
          },
          "indexInverted": true                 
        }
      ]
    }
    

    Docker-compose In case we need to let Weaviate know on startup whether to enable sparse search, we can introduce an env var like ENABLE_SPARSE_INDEX:

    ---
    version: '3.4'
    services:
      weaviate:
        command:
        - --host
        - 0.0.0.0
        - --port
        - '8080'
        - --scheme
        - http
        image: semitechnologies/weaviate:1.14.1
        ports:
        - 8080:8080
        restart: on-failure:0
        environment:
          QUERY_DEFAULTS_LIMIT: 25
          AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
          PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
          DEFAULT_VECTORIZER_MODULE: text2vec-transformers
          ENABLE_MODULES: text2vec-transformers
          CLUSTER_HOSTNAME: 'node1'
          ENABLE_SPARSE_INDEX: 'true' # <== NEW. Which method (e.g. bm25) can be specified in the schema. Not sure if this variable is needed in the docker-compose actually.
        t2v-transformers:
          image: semitechnologies/transformers-inference:sentence-transformers-msmarco-distilroberta-base-v2
          environment:
            ENABLE_CUDA: 0 # set to 1 to enable
    ...
    

    Design - GraphQL Queries

    The current API is documented in this comment.

    Below is the more elaborate original proposal.

    {
      Get {
        Paper (
          hybridSearch: {               # NEW
            operands: [{
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query",
                properties: ["abstract"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                }
              },
              weight: 0.2
            }, {
              nearText: {               # inherits all fields from nearText. alternative name: dense or denseSearch
                concepts: ["my query"],
                certainty: 0.7   
              },
              weight: 0.8
            }],
            type: Sum                   # or Average, or RRF (with RRF, weights will be discarded)
          }
        ) {
          title
          abstract
          _additional {
            score    # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
          }
        }
      }
    }
    

    Multiple sparse searches should also be supported, like:

    {
      Get {
        Paper (
          hybridSearch: {               # NEW
            operands: [{
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query",
                properties: ["abstract"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                }
              },
              weight: 0.2
            }, {
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query 2",
                properties: ["title"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                },
                limit: 100
              },
              weight: 0.2
            }, {
              nearText: {              # inherits all fields from nearText. alternative name: dense or denseSearch
                concepts: ["my query"],
                certainty: 0.7,
                limit: 100   
              },
              weight: 0.6
            }],
            type: Sum                  # or Average, or RRF (with RRF, weights will be discarded)
          }
        ) {
          title
          abstract
          _additional {
            score        # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
          }
        }
      }
    }
    
  • Add _snippets in GraphQL and REST API

    Add _snippets in GraphQL and REST API

    When an object is indexed on a larger text item (e.g., a paragraph like in the news article demo) certain search terms can be found in sentences.

    The idea is to add a starting point and endpoint of the most important part of the text corpus in the __meta end-point as a potentialAnswer which can be enabled or disabled by setting an distanceToAnswer. This can work both for explore filters as where filters.

    Idea

    I was searching for something on WikiPedia under the search term: Is herbalife a pyramid scheme? and got this response.

    Because Google isn't giving the actual answer but a location for the answer, we should be able to calculate something similar.

    Screenshot 2020-06-05 at 10 53 31

    Explore example

    {
      Get{
        Things{
          Article(
            explore: {
              concepts: ["I want a spare rib"],
              certainty: 0.7,
              moveAwayFrom: {
                concepts: ["bacon"],
                force: 0.45
              }
            }
          ){
            name
            __meta {
              potentialAnswer(
                distanceToAnswer: 0.5 # <== optional
              ) {
                start
                end
                property
                distanceToQuery
              }
            }
          }
        }
      }
    }
    

    Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

    {
        "data": {
            "Get": {
                "Things": {
                    "Article": [
                        {
                            "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                            "__meta": [ // <== array because multiple results are possible and/or multiple 
                                {
                                    "property": "name",
                                    "distanceToQuery": 0.0, // <== distance to the query
                                    "start": 26,// <== just a random example
                                    "end": 130  // <== just a random example
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "errors": null
    }
    

    Where example

    {
      Get {
        Things {
          Article(where: {
                path: ["name"],
                operator: Like,
                valueString: "New *"
            }) {
            name
            __meta {
              potentialAnswer(
                distanceToAnswer: 0.5 # <== optional
              ) {
                start
                end
                property
                distanceToQuery
              }
            }
          }
        }
      }
    }
    

    Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

    {
        "data": {
            "Get": {
                "Things": {
                    "Article": [
                        {
                            "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                            "__meta": [ // <== array because multiple results are possible and/or multiple properties might be indexed
                                {
                                    "property": "name",
                                    "distanceToQuery": 0.0, // <== distance to the query
                                    "start": 26,// <== just a random example
                                    "end": 130  // <== just a random example
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "errors": null
    }
    

    Suggested (first) implementation

    1. Results are returned like the current implementation.
    2. The vectorization of the query is used to find the closest match of a word in a sentence. *
    3. When the closest word is found, the start and endpoint are found at the beginning and end of the sentence.
    4. The distanceToAnswer if the minimal distance, if it is not set, no start and end-points will be available, if multiple sentences make the mark, they will all be part of the array.

    *- there might be potential to also do this on groups of words or complete sentences.

    Related

    #1136 #1139 #1155 #1156

  • Add geotype to datatypes

    Add geotype to datatypes

    Todos

    • [x] design decisions
      • [x] name of the field
        • current proposals: geoCoordinate, geoLocation, geoPoint
          • my personal favorite being geoCoordinate
          • cc @laura-ham @bobvanluijt
      • [x] design of the where filter
        • @laura-ham suggestions?
    • [x] spike out happy path in janusgraph only
      • [x] index creation
      • [x] adding property
      • [x] searching by property within range
    • [x] add new data type on import, goal: an import with geoCoordinates field succeeds
      • [x] allow in schema creation
      • [x] allow in class instance creation
      • [x] validation?
        • [x] add basic validation
        • [x] refactor validateSchemaInBody (it's way too long and extremely difficult to read/extend)
      • [x] janus graph create vertex
    • [x] include on simple read queries
      • [x] Local Get
      • [x] Network Get
    • [x] filter by property
      • [x] Local Filters
        • [x] extract filter from graphql
        • [x] set required validations, so that required fields cannot be omitted
        • [x] apply filter in connectors (Janusgraph)
      • ~~Network Filters~~
        • nothing to do here, they use the same code as local filters
    • [x] deal with property in GetMeta and Aggregate
      • proposal for now to simply not support those fields there
      • long-term?
    • [x] rename according to latest decisions
      • [x] pluralize name geoCoordinate -> geoCoordinates
      • [x] restructure where filter
        • [x] WithinRange -> WithinGeoRange
        • [x] valueRange -> valueGeoRange
        • [x] wrap distance and geoCoordinates in separate objects
    • [x] Update docs (@laura-ham volunteered to help out here)
    • [x] e2e/acceptance test

    Original Content below: for most up-to-date summary see comments below

    Abstract Examples

    • A location in a 2-dimensional space, i.e. x=3, y=5
    • Coordinates that point to a location on a world map

    Research Questions

    • Are geo types always two-dimensional or can they also be more dimensional?
      • Part 1: Do we have use cases for more than 2d?
      • Part 2: Is there technical support for more than 2d?
    • Optimal way to query, e.g. within 500m coordinates x,y mixes concepts of metrical distance and coordinates, what API do we want

    Features we'd need

    • Import/update geo coordinates
    • use geo in where filters (see research question)
    • do we want to allow aggregation functions like Aggregate or GetMeta
      • if so, what should they look like?
      • if so, is there technical support in our current stack?
      • if so, right from the start or later?

    cc @bobvanluijt @laura-ham

  • Dropping 'things' or 'actions' from the filter.

    Dropping 'things' or 'actions' from the filter.

    In the section on filters in the GraphQL documentation, the paths of the filters are prefixed with "things" and "actions".

    This is superfluous information. The class names cannot overlap in any case.

    Current query:

     Get(where:{
          operator: And,
          operands: [{
            path: ["Things", "Animal", "age"],
            operator: LessThan
            valueInt: 5
          }, {
            path: ["Things", "Animal", "inZoo", "Zoo", "name"],
            operator: Equal,
            valueString: "London Zoo"
          }]
        }) { ... }
    

    After removing the kind of class:

     Get(where:{
          operator: And,
          operands: [{
            path: ["Animal", "age"],
            operator: LessThan
            valueInt: 5
          }, {
            path: ["Animal", "inZoo", "Zoo", "name"],
            operator: Equal,
            valueString: "London Zoo"
          }]
        }) { ... }
    
  • batch_create does not report on 'store is read-only' errors

    batch_create does not report on 'store is read-only' errors

    When using the python client with client.batch and having crossed the DISK_USE_READONLY_PERCENTAGE my ingest fails when using data_object.create but will seemingly succeed when using batches. However the ingest that succeeded using batch did not contain any items when running a Get query though they have been processed by my t2v-transformers container.

    There might be other errors that are not covered by batch.create and I only found out through a helpful comment in Weaviate Slack and after running into my issue. This seems to also have been the issue with https://github.com/semi-technologies/weaviate/issues/1929 .

  • (DOC) REST API authentication to a WCS cluster

    (DOC) REST API authentication to a WCS cluster

    Hi,

    I'm trying to connect to an authentication enabled WCS cluster, with the REST API (for WooCommerce owners who want to use WCS hosting rather than installing Weaviate).

    Could you point me to the documentation ? I received a 404 link after the cluster was created https://www.semi.technology/developers/weaviate/v1.8.0/configuration/authentication

    I noticed the Enterprise token, but I do not think it is what I need. https://www.semi.technology/developers/weaviate/current/configuration/authentication.html is only about self setup I suspect.

  • Docker-compose DB fails:

    Docker-compose DB fails: "Could not create SSTable component"

    Weaviate doesn't start with Docker-compose and keeps in a failing loop;

    db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:30:53,471 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt because it doesn't exist.
    db_1        | ERROR 2019-02-27 15:31:03,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:31:03,472 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:31:03,472 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt because it doesn't exist.
    db_1        | ERROR 2019-02-27 15:31:13,472 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:31:13,510 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:31:13,510 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt because it doesn't exist.
    
  • Can Aggregate depend on Get ?

    Can Aggregate depend on Get ?

    I noticed that the Aggregate function does not depend on the Get query. For instance, on the following query, facets are always the same, whatever the 'wpsolr_type' condition.

    {
      results: Get {
        WeaviateWooCommerce(
          limit: 10
          where: {path: ["wpsolr_type"], operator: Equal, valueString: "post"}
        ) {
          wpsolr_title
          wpsolr_product_cat_str
        }
      }
      facets: Aggregate {
        WeaviateWooCommerce(groupBy: ["wpsolr_product_cat_str"]) {
          meta {
            count
          }
          groupedBy {
            value
          }
        }
      }
    }
    
    

    I'd like to get facets related to the results instead.

    Is it possible?

  • Suggestion: auto cut-off Explore search results

    Suggestion: auto cut-off Explore search results

    Problem & current behaviour

    If the Explore filter in a GraphQL Get query is used, it is unclear how to set and control the certainty (or distance) parameter. This parameter controls what results to return, but with the current design, the user does not know what to set this parameter, to get optimal results.

    You don't want to see 'bad' results amongst the results, but you don't know beforehand where the cut off point of the certainty is. Additionally, we observed that the user prefers not to see any results if there are no good results at all.

    Proposed solution

    Automatically find a cut-off threshold for which results to show. This threshold can be calculated by e.g. an elbow in distances between each result and the query, or between a cluster of results and the query.

    IMG_20200508_144440458

    Questions

    1. Should the user have the option to set whether they want to enable this auto function on their query?
    2. What value to set the (relative) cut-off point to? (=how big should the gap between the points or clusters relatively be?)
  • SUGGESTION: Dump vectors

    SUGGESTION: Dump vectors

    Some data scientists might want to leverage the vectorization mechanism in Weaviate to train new models. The result would be similar to /things and /things/{UUID} but with a focus on a matrix to download all objects.

    RESTful URL suggestion: /c11y/vectors and /c11y/vectors/{UUID}.

    c11y/vectors/{UUID}

    returns:

    {
        "type": "thing",
        "vector": [
            0,
            0,
            0,
            0,
            //etc
        ]
    }
    

    c11y/vectors?page=0

    returns:

    {
        "result": {
            "UUID_1": {
                "type": "thing",
                "vector": [ //Array is same object as c11y/vectors/{UUID}
                    0,
                    0,
                    0,
                    0,
                    //etc
                ]
            },
            "UUID_1": {
                "type": "thing",
                "vectors": [ //Array is same object as c11y/vectors/{UUID}
                    0,
                    0,
                    0,
                    0,
                    //etc
                ]
            }
        },
        "pages": 100 // total pages
    }
    
  • Push docker container to github registry

    Push docker container to github registry

    What's being changed:

    This should enable docker image downloads from other repos without docker logins

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
  • Roaring allow list rebased

    Roaring allow list rebased

    What's being changed:

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
  • Support hybrid search in Aggregate

    Support hybrid search in Aggregate

    What's being changed:

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.

    Closes #2482

  • creationTimeUnix is beeing updated during upsert

    creationTimeUnix is beeing updated during upsert

    I’ve noticed that creationTimeUnix is being updated (and is the same to lastUpdateTimeUnix) when I upsert the record (batch write with the same id) I hoped that I could determine whether the record had been upserted by comparing those two in batch result.

    I've tested this with two documents, one was already there, the other one was being inserted: this was the result:

    [
    {
    //This one should have been updated
    'class':'P_8062f70f_d2c7_4b9b_b0b7_1c587ed49c2d',
    'creationTimeUnix': 1672845088023,
    'id':'cebcaf20-2a4b-5a42-93b9-d662c02928bc',
    'lastUpdateTimeUnix': 1672845088023,
    'properties': {'_creationTimeUnix': 1672845088023, _lastUpdateTimeUnix': 1672845088023, ...},
    'vector': [...],
    'deprecations':None,
    'result':{}
    },
    {
    //This one should have been inserted
    'class':'P_8062f70f_d2c7_4b9b_b0b7_1c587ed49c2d',
    'creationTimeUnix': 1672845088023,
    'id':'bb323fed-3026-500b-8d61-e686b75aa19e',
    'lastUpdateTimeUnix': 1672845088023,
    'properties': {'_creationTimeUnix': 1672845088023, '_lastUpdateTimeUnix': 1672845088023, ...},
    'vector': [...],
    'deprecations':None,
    'result':{}
    },
    ]
    

    According to @byronvoorbach it should work as expected and the upserted document should have only lastUpdateTimeUnix updated. Currently, there seems to be no direct way to determine how many documents were updated, inserted, upserted, and so on...

  • Improve decoding values in LSM SetDecoder by not maintaining values original order

    Improve decoding values in LSM SetDecoder by not maintaining values original order

    Access to the map has been reduced to a minimum

    What's being changed:

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
Real-time Charging System for Telecom & ISP environments

Real-time Online/Offline Charging System (OCS) for Telecom & ISP environments Features Real-time Online/Offline Charging System (OCS). Account Balance

Dec 31, 2022
community search engine

Lieu an alternative search engine Created in response to the environs of apathy concerning the use of hypertext search and discovery.

Dec 24, 2022
Self hosted search engine for data leaks and password dumps
Self hosted search engine for data leaks and password dumps

Self hosted search engine for data leaks and password dumps. Upload and parse multiple files, then quickly search through all stored items with the power of Elasticsearch.

Aug 2, 2021
A search engine for XKCD

xkcd_searchtool a search engine for XKCD What is it? This tool can crawling the comic transcripts from XKCD.com Users can search a comic using key wor

Sep 29, 2021
Zinc Search engine. A lightweight alternative to elasticsearch that requires minimal resources, written in Go.
Zinc Search engine. A lightweight alternative to elasticsearch that requires minimal resources, written in Go.

Zinc Search Engine Zinc is a search engine that does full text indexing. It is a lightweight alternative to Elasticsearch and runs using a fraction of

Jan 1, 2023
This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

Dec 22, 2022
State observer - StateObserver used to synchronize the local(cached) state of the remote object with the real state

state observer StateObserver used to synchronize the local(cached) state of the

Jan 19, 2022
traning helper. Reading real METARs

pptrain Train reading real METARs Example: $ pptrain

Jan 23, 2022
The gofinder program is an acme user interface to search through Go projects.

The gofinder program is an acme user interface to search through Go projects.

Jun 14, 2021
Universal code search (self-hosted)
Universal code search (self-hosted)

Sourcegraph OSS edition is a fast, open-source, fully-featured code search and navigation engine. Enterprise editions are available. Features Fast glo

Jan 9, 2023
using go search the Marvel universe characters via marvel api
using go search the Marvel universe characters via marvel api

go-marvel-api using go search the Marvel universe characters via marvel api Build and run tests on the local environemnt Build the project $ go build

Oct 5, 2021
Alfred 4 workflow to easily search and launch bookmarks from the Brave Browser

Alfred Brave Browser Bookmarks A simple and fast workflow for searching and launching Brave Browser bookmarks. Why this workflow? No python dependency

Nov 28, 2022
Quick search and short links for NYC Council Legislation

Quick Search and Short Links for NYC Council Legislation Quick Search Link to searches with /?q=${query}. In-browser searching is implemented with fle

Oct 12, 2022
Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Apr 21, 2022
Target Case Study - Document Search
 Target Case Study - Document Search

Target Case Study - Document Search Goal The goal of this exercise is to create

Feb 7, 2022
Native Go bindings for D-Bus

go.dbus go.dbus is a simple library that implements native Go client bindings for the D-Bus message bus system. Features Complete native implementatio

Nov 20, 2022
A toolkit for replaying time series data.

Replay Toolkit The replay package provides some simple tools for replaying captured data at realtime. I use this in various tools that take logged dat

Aug 13, 2019
A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Dec 17, 2022
Visualize plant growth over time with Go, WebDAV and WASM; @pojntfx's entry for #growlab.

Growlapse Visualize plant growth over time with Go, WebDAV and WASM; @pojntfx's entry for #growlab. Installation Containerized You can get the Docker

Feb 21, 2022