Weaviate is a cloud-native, modular, real-time vector search engine

Weaviate Weaviate logo

Build Status Go Report Card Coverage Status Slack Newsletter

Demo of Weaviate

Weaviate GraphQL demo on news article dataset containing: Transformers module, GraphQL usage, semantic search, _additional{} features, Q&A, and Aggregate{} function. You can the demo on this dataset in the GUI here: semantic search, Q&A, Aggregate.

Description

Weaviate is a cloud-native, real-time vector search engine (aka neural search engine or deep search engine). There are modules for specific use cases such as semantic search, plugins to integrate Weaviate in any application of your choice, and a console to visualize your data.

GraphQL - RESTful - vector search engine - vector database - neural search engine - semantic search - HNSW - deep search - machine learning - kNN

Features

Weaviate makes it easy to use state-of-the-art AI models while giving you the scalability, ease of use, safety and cost-effectiveness of a purpose-built vector database. Most notably:

  • Fast queries
    Weaviate typically performs a 10-NN neighbor search out of millions of objects in considerably less than 100ms.

  • Any media type with Weaviate Modules
    Use State-of-the-Art AI model inference (e.g. Transformers) for Text, Images, etc. at search and query time to let Weaviate manage the process of vectorizing your data for your - or import your own vectors.

  • Combine vector and scalar search
    Weaviate allows for efficient combined vector and scalar searches, e.g “articles related to the COVID 19 pandemic published within the past 7 days”. Weaviate stores both your objects and the vectors and make sure the retrieval of both is always efficient. There is no need for a third party object storage.

  • Real-time and persistent
    Weaviate let’s you search through your data even if it’s currently being imported or updated. In addition, every write is written to a Write-Ahead-Log (WAL) for immediately persisted writes - even when a crash occurs.

  • Horizontal Scalability
    Scale Weaviate for your exact needs, e.g. High-Availability, maximum ingestion, largest possible dataset size, maximum queries per second, etc. (Currently under development, ETA Fall 2021)

  • Cost-Effectiveness
    Very large datasets do not need to be kept entirely in memory in Weaviate. At the same time available memory can be used to increase the speed of queries. This allows for a conscious speed/cost trade-off to suit every use case.

  • Graph-like connections between objects
    Make arbitrary connections between your objects in a graph-like fashion to resemble real-life connections between your data points. Traverse those connections using GraphQL.

Documentation

You can find detailed documentation in the developers section of our website or directly go to one of the docs using the links in the list below.

Additional reading

Examples

You can find code examples here

Support

Contributing

Owner
SeMI Technologies
SeMI Technologies creates database software like the Weaviate vector search engine
SeMI Technologies
Comments
  • Vectorization mask for classes

    Vectorization mask for classes

    Currently all string/text values as well as the class name and property names are considered in the vectorization. However not all property names and values have be important for the context. Take the following meta class of a table as an example:

    {
                    "class": "Column",
                    "description": "",
                    "properties":[
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["int"],
                            "keywords": [],
                            "name": "index"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["text"],
                            "keywords": [],
                            "name": "name"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["string"],
                            "keywords": [],
                            "name": "dataType"
                        }
                    ]
                }
    

    In this case the vector would be created based on Column, index, name, data, type and the values of the properties. However the class name and properties shift the vector into the context of tables while the context should be based solely on the column name and dataType values.

    Proposal:

    • If nothing is specified the vector gets created automatically based on all information in the class
    • The user can explicitly mask information away from the vectorization in the schema:
    {
                    "class": "Column",
                    "description": "",
                    "vectorizeClassName": False
                    "properties":[
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["int"],
                            "keywords": [],
                            "name": "index"
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["text"],
                            "keywords": [],
                            "name": "name"
                            "vectorizePropertyName": False
                            "vectorizePropertyValue": True
                        },
                        {
                            "cardinality": "atMostOne",
                            "description": "",
                            "dataType": ["string"],
                            "keywords": [],
                            "name": "dataType"
                            "vectorizePropertyName": False
                            "vectorizePropertyValue": True
                        }
                    ]
                }
    
  • Hybrid Search (combining Vector + Sparse search)

    Hybrid Search (combining Vector + Sparse search)

    WHAT

    Combining vector search with sparse (e.g. BM25) search in one query

    WHY

    To bridge the gap between sparse search and vector search.

    Longer why

    Vector search (using dense vectors, computed by ML models) works well in-domain, but has poor performance out-of-domain. BM25 search works well out-of-domain, because it uses sparse methods (keyword matching), but can't perform context-based search. Combining both methods will improve search results out-of-domain.

    HOW

    • This issue and implementation depend on issue https://github.com/semi-technologies/weaviate/issues/2133
    • Do both a dense and BM25 search using a query (in parallel)
    • You should be able to define a function to combine the results into 1 result list, using the scores of data in both candidate lists.
    • A default function can be defined in the Weaviate setup, but can be overwritten in the GraphQL query.
    • BM25 score is unbounded. Score normalization or scaling is not a good idea, because you lose information on how good the results are textually. However, when combining BM25 with Dense search, some form of normalization may be handy, so we can choose to offer normalization methods anyways. We can explain strategies in the documentation. Possible strategies are:
      • minmax - To leave the distribution intact, the best normalization method is a minmax approach, which takes the minimum BM25 score and maximum BM25 score into account. Taking the maximum BM25 score of a particular query as maximum in the formula is not a good idea, because then you're setting this result to the maximum score regardless of the actual score. So a (theoretical) maximum needs to be defined, although BM25 is unbounded. This can be achieved by running a number of different queries, and recording the maximum value. Since this is quite complex to implement as a feature, we can start by offering a setting with a default value, which the user can change at runtime. An example can be: max((x/10), 1) if the guessed maximum is 10.
      • Arctangent - An arctan scales values in a logarithmic manner. A function to scale scores <-1, 1> (practically between <0,1> because x will be a positive score) is: 2/pi*arctan(x) (see https://www.mdpi.com/2227-7390/10/8/1335)

    Design - Weaviate setup

    Requirements:

    • Dense retrieval and sparse retrieval independently (with the same or different query)
    • Combine the result of both methods using a scoring function

    Schema The settings should be configured in the schema, per class:

    {
      "class": "string",
      "vectorIndexType": "hnsw",                
      "vectorIndexConfig": {
        ...                                     
      },
      "vectorizer": "text2vec-transformers",
      "moduleConfig": {
        "text2vec-transformers": {  
          "vectorizeClassName": true            
        }
      },                       
      "sparseIndexType": "bm25",                
      "sparseIndexConfig": {                   
        "bm25": {
          "b": 0.75,
          "k1": 1.2
        }
      },
      "properties": [                            
        {
          "name": "string",                     
          "description": "string",              
          "dataType": [                         
            "string"
          ],
          "sparseIndexConfig": {                
            "bm25": {
              "b": 0.75,
              "k1": 1.2, 
            }
          },
          "indexInverted": true                 
        }
      ]
    }
    

    Docker-compose In case we need to let Weaviate know on startup whether to enable sparse search, we can introduce an env var like ENABLE_SPARSE_INDEX:

    ---
    version: '3.4'
    services:
      weaviate:
        command:
        - --host
        - 0.0.0.0
        - --port
        - '8080'
        - --scheme
        - http
        image: semitechnologies/weaviate:1.14.1
        ports:
        - 8080:8080
        restart: on-failure:0
        environment:
          QUERY_DEFAULTS_LIMIT: 25
          AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
          PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
          DEFAULT_VECTORIZER_MODULE: text2vec-transformers
          ENABLE_MODULES: text2vec-transformers
          CLUSTER_HOSTNAME: 'node1'
          ENABLE_SPARSE_INDEX: 'true' # <== NEW. Which method (e.g. bm25) can be specified in the schema. Not sure if this variable is needed in the docker-compose actually.
        t2v-transformers:
          image: semitechnologies/transformers-inference:sentence-transformers-msmarco-distilroberta-base-v2
          environment:
            ENABLE_CUDA: 0 # set to 1 to enable
    ...
    

    Design - GraphQL Queries

    The current API is documented in this comment.

    Below is the more elaborate original proposal.

    {
      Get {
        Paper (
          hybridSearch: {               # NEW
            operands: [{
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query",
                properties: ["abstract"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                }
              },
              weight: 0.2
            }, {
              nearText: {               # inherits all fields from nearText. alternative name: dense or denseSearch
                concepts: ["my query"],
                certainty: 0.7   
              },
              weight: 0.8
            }],
            type: Sum                   # or Average, or RRF (with RRF, weights will be discarded)
          }
        ) {
          title
          abstract
          _additional {
            score    # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
          }
        }
      }
    }
    

    Multiple sparse searches should also be supported, like:

    {
      Get {
        Paper (
          hybridSearch: {               # NEW
            operands: [{
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query",
                properties: ["abstract"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                }
              },
              weight: 0.2
            }, {
              sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
                function: bm25,
                query: "my query 2",
                properties: ["title"],
                normalization: {
                  minmax: {
                    max: 10
                  }
                },
                limit: 100
              },
              weight: 0.2
            }, {
              nearText: {              # inherits all fields from nearText. alternative name: dense or denseSearch
                concepts: ["my query"],
                certainty: 0.7,
                limit: 100   
              },
              weight: 0.6
            }],
            type: Sum                  # or Average, or RRF (with RRF, weights will be discarded)
          }
        ) {
          title
          abstract
          _additional {
            score        # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
          }
        }
      }
    }
    
  • Add _snippets in GraphQL and REST API

    Add _snippets in GraphQL and REST API

    When an object is indexed on a larger text item (e.g., a paragraph like in the news article demo) certain search terms can be found in sentences.

    The idea is to add a starting point and endpoint of the most important part of the text corpus in the __meta end-point as a potentialAnswer which can be enabled or disabled by setting an distanceToAnswer. This can work both for explore filters as where filters.

    Idea

    I was searching for something on WikiPedia under the search term: Is herbalife a pyramid scheme? and got this response.

    Because Google isn't giving the actual answer but a location for the answer, we should be able to calculate something similar.

    Screenshot 2020-06-05 at 10 53 31

    Explore example

    {
      Get{
        Things{
          Article(
            explore: {
              concepts: ["I want a spare rib"],
              certainty: 0.7,
              moveAwayFrom: {
                concepts: ["bacon"],
                force: 0.45
              }
            }
          ){
            name
            __meta {
              potentialAnswer(
                distanceToAnswer: 0.5 # <== optional
              ) {
                start
                end
                property
                distanceToQuery
              }
            }
          }
        }
      }
    }
    

    Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

    {
        "data": {
            "Get": {
                "Things": {
                    "Article": [
                        {
                            "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                            "__meta": [ // <== array because multiple results are possible and/or multiple 
                                {
                                    "property": "name",
                                    "distanceToQuery": 0.0, // <== distance to the query
                                    "start": 26,// <== just a random example
                                    "end": 130  // <== just a random example
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "errors": null
    }
    

    Where example

    {
      Get {
        Things {
          Article(where: {
                path: ["name"],
                operator: Like,
                valueString: "New *"
            }) {
            name
            __meta {
              potentialAnswer(
                distanceToAnswer: 0.5 # <== optional
              ) {
                start
                end
                property
                distanceToQuery
              }
            }
          }
        }
      }
    }
    

    Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

    {
        "data": {
            "Get": {
                "Things": {
                    "Article": [
                        {
                            "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                            "__meta": [ // <== array because multiple results are possible and/or multiple properties might be indexed
                                {
                                    "property": "name",
                                    "distanceToQuery": 0.0, // <== distance to the query
                                    "start": 26,// <== just a random example
                                    "end": 130  // <== just a random example
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "errors": null
    }
    

    Suggested (first) implementation

    1. Results are returned like the current implementation.
    2. The vectorization of the query is used to find the closest match of a word in a sentence. *
    3. When the closest word is found, the start and endpoint are found at the beginning and end of the sentence.
    4. The distanceToAnswer if the minimal distance, if it is not set, no start and end-points will be available, if multiple sentences make the mark, they will all be part of the array.

    *- there might be potential to also do this on groups of words or complete sentences.

    Related

    #1136 #1139 #1155 #1156

  • Add geotype to datatypes

    Add geotype to datatypes

    Todos

    • [x] design decisions
      • [x] name of the field
        • current proposals: geoCoordinate, geoLocation, geoPoint
          • my personal favorite being geoCoordinate
          • cc @laura-ham @bobvanluijt
      • [x] design of the where filter
        • @laura-ham suggestions?
    • [x] spike out happy path in janusgraph only
      • [x] index creation
      • [x] adding property
      • [x] searching by property within range
    • [x] add new data type on import, goal: an import with geoCoordinates field succeeds
      • [x] allow in schema creation
      • [x] allow in class instance creation
      • [x] validation?
        • [x] add basic validation
        • [x] refactor validateSchemaInBody (it's way too long and extremely difficult to read/extend)
      • [x] janus graph create vertex
    • [x] include on simple read queries
      • [x] Local Get
      • [x] Network Get
    • [x] filter by property
      • [x] Local Filters
        • [x] extract filter from graphql
        • [x] set required validations, so that required fields cannot be omitted
        • [x] apply filter in connectors (Janusgraph)
      • ~~Network Filters~~
        • nothing to do here, they use the same code as local filters
    • [x] deal with property in GetMeta and Aggregate
      • proposal for now to simply not support those fields there
      • long-term?
    • [x] rename according to latest decisions
      • [x] pluralize name geoCoordinate -> geoCoordinates
      • [x] restructure where filter
        • [x] WithinRange -> WithinGeoRange
        • [x] valueRange -> valueGeoRange
        • [x] wrap distance and geoCoordinates in separate objects
    • [x] Update docs (@laura-ham volunteered to help out here)
    • [x] e2e/acceptance test

    Original Content below: for most up-to-date summary see comments below

    Abstract Examples

    • A location in a 2-dimensional space, i.e. x=3, y=5
    • Coordinates that point to a location on a world map

    Research Questions

    • Are geo types always two-dimensional or can they also be more dimensional?
      • Part 1: Do we have use cases for more than 2d?
      • Part 2: Is there technical support for more than 2d?
    • Optimal way to query, e.g. within 500m coordinates x,y mixes concepts of metrical distance and coordinates, what API do we want

    Features we'd need

    • Import/update geo coordinates
    • use geo in where filters (see research question)
    • do we want to allow aggregation functions like Aggregate or GetMeta
      • if so, what should they look like?
      • if so, is there technical support in our current stack?
      • if so, right from the start or later?

    cc @bobvanluijt @laura-ham

  • Dropping 'things' or 'actions' from the filter.

    Dropping 'things' or 'actions' from the filter.

    In the section on filters in the GraphQL documentation, the paths of the filters are prefixed with "things" and "actions".

    This is superfluous information. The class names cannot overlap in any case.

    Current query:

     Get(where:{
          operator: And,
          operands: [{
            path: ["Things", "Animal", "age"],
            operator: LessThan
            valueInt: 5
          }, {
            path: ["Things", "Animal", "inZoo", "Zoo", "name"],
            operator: Equal,
            valueString: "London Zoo"
          }]
        }) { ... }
    

    After removing the kind of class:

     Get(where:{
          operator: And,
          operands: [{
            path: ["Animal", "age"],
            operator: LessThan
            valueInt: 5
          }, {
            path: ["Animal", "inZoo", "Zoo", "name"],
            operator: Equal,
            valueString: "London Zoo"
          }]
        }) { ... }
    
  • batch_create does not report on 'store is read-only' errors

    batch_create does not report on 'store is read-only' errors

    When using the python client with client.batch and having crossed the DISK_USE_READONLY_PERCENTAGE my ingest fails when using data_object.create but will seemingly succeed when using batches. However the ingest that succeeded using batch did not contain any items when running a Get query though they have been processed by my t2v-transformers container.

    There might be other errors that are not covered by batch.create and I only found out through a helpful comment in Weaviate Slack and after running into my issue. This seems to also have been the issue with https://github.com/semi-technologies/weaviate/issues/1929 .

  • (DOC) REST API authentication to a WCS cluster

    (DOC) REST API authentication to a WCS cluster

    Hi,

    I'm trying to connect to an authentication enabled WCS cluster, with the REST API (for WooCommerce owners who want to use WCS hosting rather than installing Weaviate).

    Could you point me to the documentation ? I received a 404 link after the cluster was created https://www.semi.technology/developers/weaviate/v1.8.0/configuration/authentication

    I noticed the Enterprise token, but I do not think it is what I need. https://www.semi.technology/developers/weaviate/current/configuration/authentication.html is only about self setup I suspect.

  • Docker-compose DB fails:

    Docker-compose DB fails: "Could not create SSTable component"

    Weaviate doesn't start with Docker-compose and keeps in a failing loop;

    db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:30:53,471 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt because it doesn't exist.
    db_1        | ERROR 2019-02-27 15:31:03,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:31:03,472 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:31:03,472 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt because it doesn't exist.
    db_1        | ERROR 2019-02-27 15:31:13,472 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
    data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt.tmp. Found exc
    eption: std::system_error (error system:2, No such file or directory)
    db_1        | ERROR 2019-02-27 15:31:13,510 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
    m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-Data.db: std::system_error (erro
    r system:2, No such file or directory)
    db_1        | WARN  2019-02-27 15:31:13,510 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
    /keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt because it doesn't exist.
    
  • Can Aggregate depend on Get ?

    Can Aggregate depend on Get ?

    I noticed that the Aggregate function does not depend on the Get query. For instance, on the following query, facets are always the same, whatever the 'wpsolr_type' condition.

    {
      results: Get {
        WeaviateWooCommerce(
          limit: 10
          where: {path: ["wpsolr_type"], operator: Equal, valueString: "post"}
        ) {
          wpsolr_title
          wpsolr_product_cat_str
        }
      }
      facets: Aggregate {
        WeaviateWooCommerce(groupBy: ["wpsolr_product_cat_str"]) {
          meta {
            count
          }
          groupedBy {
            value
          }
        }
      }
    }
    
    

    I'd like to get facets related to the results instead.

    Is it possible?

  • Suggestion: auto cut-off Explore search results

    Suggestion: auto cut-off Explore search results

    Problem & current behaviour

    If the Explore filter in a GraphQL Get query is used, it is unclear how to set and control the certainty (or distance) parameter. This parameter controls what results to return, but with the current design, the user does not know what to set this parameter, to get optimal results.

    You don't want to see 'bad' results amongst the results, but you don't know beforehand where the cut off point of the certainty is. Additionally, we observed that the user prefers not to see any results if there are no good results at all.

    Proposed solution

    Automatically find a cut-off threshold for which results to show. This threshold can be calculated by e.g. an elbow in distances between each result and the query, or between a cluster of results and the query.

    IMG_20200508_144440458

    Questions

    1. Should the user have the option to set whether they want to enable this auto function on their query?
    2. What value to set the (relative) cut-off point to? (=how big should the gap between the points or clusters relatively be?)
  • SUGGESTION: Dump vectors

    SUGGESTION: Dump vectors

    Some data scientists might want to leverage the vectorization mechanism in Weaviate to train new models. The result would be similar to /things and /things/{UUID} but with a focus on a matrix to download all objects.

    RESTful URL suggestion: /c11y/vectors and /c11y/vectors/{UUID}.

    c11y/vectors/{UUID}

    returns:

    {
        "type": "thing",
        "vector": [
            0,
            0,
            0,
            0,
            //etc
        ]
    }
    

    c11y/vectors?page=0

    returns:

    {
        "result": {
            "UUID_1": {
                "type": "thing",
                "vector": [ //Array is same object as c11y/vectors/{UUID}
                    0,
                    0,
                    0,
                    0,
                    //etc
                ]
            },
            "UUID_1": {
                "type": "thing",
                "vectors": [ //Array is same object as c11y/vectors/{UUID}
                    0,
                    0,
                    0,
                    0,
                    //etc
                ]
            }
        },
        "pages": 100 // total pages
    }
    
  • Support for asymmetric embedding retrieval techniques

    Support for asymmetric embedding retrieval techniques

    Many of the current SOTA retrieval methods like instructor use asymmetric embeddings where the query Q and the document D use different embedding methods/prefixes (albeit in the same number of dimensions). How should we handle that? At the application level or possibly at the API level?

  • Return vectorized query with results

    Return vectorized query with results

    Summary

    When a request is made, the vectorized query is not able to be extracted for storage and analysis afterward. While a workaround for some modules is to query the inference engine directly (e.g. POST /vectors), the method is inconsistent for 3rd party providers, and double-querying can be costly and clunky.

    Proposal

    An _additional property named query_vector that returns the vectorized query with each result.

    Issues

    • [ ] Is there a better name than query_vector?
    • [ ] Are there other forms of the query that may be useful to return, such as before and after to/from movement?
    • [ ] Should it be returned with each result or just once globally? Performance may be a concern, but the flexibility of per-result vectors may be useful in the future.
  • Remove references to contextionary in response from root endpoint

    Remove references to contextionary in response from root endpoint

    What's being changed:

    Remove references to text2vec-contextionary in root endpoint

    Fixes #2400

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
  • Typo/confusing text in error message

    Typo/confusing text in error message

    When you are trying to query over class that does not exist you get very confusing message: Cannot query field "P_91d24398_6d5d_45d9_ba31_470959c8902e_topics" on type "GetObjectsObj".

    This makes no sense for query like this: {Get{P_91d24398_6d5d_45d9_ba31_470959c8902e_topics(where: {operator: And operands: [{path: ["topic_id"] operator: Equal valueInt: 3}]} limit: 10000 ){documents{ ... on P_91d24398_6d5d_45d9_ba31_470959c8902e { labels } } }}}

    Because obviously P_91d24398_6d5d_45d9_ba31_470959c8902e_topic is a class here, GetObjectsObj is probably the name of a method and field is not even mentioned there...

  • Updating OpenAI module config

    Updating OpenAI module config

    As you probably know, on 16th of Dec 2022 OpenAI has released a new embedding model, cheaper and more powerful. On my Weaviate cloud space I have a class for which I want to change the embedding model to the new one. I tried the request below, but it doesn't lead anywhere.

    curl --location --request PUT 'https://[my-workspace].semi.network/v1/schema/Posts/' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "class": "Posts",
        "vectorizer": "text2vec-openai",
        "moduleConfig": {
            "text2vec-openai": {
                "model": "ada",
                "modelVersion": "002",
                "type": "text"
            }
        }
    }'
    

    The response was

    {
        "error": [
            {
                "message": "properties cannot be updated through updating the class. Use the add property feature (e.g. \"POST /v1/schema/{className}/properties\") to add additional properties"
            }
        ]
    }
    
  • GH-2488 - Add support for new davinci model version

    GH-2488 - Add support for new davinci model version

    What's being changed:

    Adds support for version 002 and 003 to the OpenAI text2vec and support for 003 to OpenAI QA

    Review checklist

    • [ ] Documentation has been updated, if necessary. Link to changed documentation:
    • [ ] Chaos pipeline run or not necessary. Link to pipeline:
    • [ ] All new code is covered by tests where it is reasonable.
    • [ ] Performance tests have been run or not necessary.
Real-time Map displays real-time positions of public transport vehicles in Helsinki
Real-time Map displays real-time positions of public transport vehicles in Helsinki

Real-time Map Real-time Map displays real-time positions of public transport vehicles in Helsinki. It's a showcase for Proto.Actor - an ultra-fast dis

Nov 30, 2022
Vald. A Highly Scalable Distributed Vector Search Engine
Vald.  A Highly Scalable Distributed Vector Search Engine

Vald is a highly scalable distributed fast approximate nearest neighbor dense vector search engine.

Dec 29, 2022
An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy
An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy

Click to take a quick look at our demos! Image search Chatbots Chemical structure search Milvus is an open-source vector database built to power AI ap

Jan 7, 2023
Substation is a cloud native toolkit for building modular ingest, transform, and load (ITL) data pipelines

Substation Substation is a cloud native data pipeline toolkit. What is Substation? Substation is a modular ingest, transform, load (ITL) application f

Dec 30, 2022
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers

Developer-oriented Continuous Delivery Product ⁣ English | 简体中文 Table of Contents Zadig Table of Contents What is Zadig Quick start How to use? How to

Oct 19, 2021
A LoRaWAN nodes' and network simulator that works with a real LoRaWAN environment (such as Chirpstack) and equipped with a web interface for real-time interaction.
A LoRaWAN nodes' and network simulator that works with a real LoRaWAN environment (such as Chirpstack) and equipped with a web interface for real-time interaction.

LWN Simulator A LoRaWAN nodes' simulator to simulate a LoRaWAN Network. Table of Contents General Info Requirements Installation General Info LWN Simu

Nov 20, 2022
provide api for cloud service like aliyun, aws, google cloud, tencent cloud, huawei cloud and so on

cloud-fitter 云适配 Communicate with public and private clouds conveniently by a set of apis. 用一套接口,便捷地访问各类公有云和私有云 对接计划 内部筹备中,后续开放,有需求欢迎联系。 开发者社区 开发者社区文档

Dec 20, 2022
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.

unitdb Unitdb is blazing fast specialized time-series database for microservices, IoT, and realtime internet connected devices. As Unitdb satisfy the

Jan 1, 2023
A GPU-powered real-time analytics storage and query engine.
A GPU-powered real-time analytics storage and query engine.

AresDB AresDB is a GPU-powered real-time analytics storage and query engine. It features low query latency, high data freshness and highly efficient i

Jan 7, 2023
A real-time `VWAP` (volume-weighted average price) calculation engine

VWAP Overview The goal of this project is to create a real-time VWAP (volume-weighted average price) calculation engine. For this was used the coinbas

Feb 11, 2022
Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.

Phalanx Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and tr

Dec 25, 2022
MatrixOne is a planet scale, cloud-edge native big data engine crafted for heterogeneous workloads.
MatrixOne is a planet scale, cloud-edge native big data engine crafted for heterogeneous workloads.

What is MatrixOne? MatrixOne is a planet scale, cloud-edge native big data engine crafted for heterogeneous workloads. It provides an end-to-end data

Dec 26, 2022
CudeX: a cloud native intelligent operation and maintenance engine that provides service measurement, index quantification
CudeX: a cloud native intelligent operation and maintenance engine that provides service measurement, index quantification

简介 CudgX是星汉未来推出的面向云原生时代的AIOps智能运维引擎,它通过各类服务的多维度、大数据量的数据收集及机器学习训练分析,对各种服务进行指标化、数字

Oct 13, 2022
RadonDB is an open source, cloud-native MySQL database for building global, scalable cloud services

OverView RadonDB is an open source, Cloud-native MySQL database for unlimited scalability and performance. What is RadonDB? RadonDB is a cloud-native

Dec 31, 2022
Cloudpods is a cloud-native open source unified multi/hybrid-cloud platform developed with Golang
Cloudpods is a cloud-native open source unified multi/hybrid-cloud platform developed with Golang

Cloudpods is a cloud-native open source unified multi/hybrid-cloud platform developed with Golang, i.e. Cloudpods is a cloud on clouds. Cloudpods is able to manage not only on-premise KVM/baremetals, but also resources from many cloud accounts across many cloud providers. It hides the differences of underlying cloud providers and exposes one set of APIs that allow programatically interacting with these many clouds.

Jan 11, 2022
Time-Based One-Time Password (TOTP) and HMAC-Based One-Time Password (HOTP) library for Go.

otpgo HMAC-Based and Time-Based One-Time Password (HOTP and TOTP) library for Go. Implements RFC 4226 and RFC 6238. Contents Supported Operations Read

Dec 19, 2022
Time struct in Go that uses 4 bytes of memory vs the 24 bytes of time.Time

A tiny time object in Go. Tinytime uses 4 bytes of memory vs the 24 bytes of a standard time.Time{}

Oct 3, 2022
stratus is a cross-cloud identity broker that allows workloads with an identity issued by one cloud provider to exchange this identity for a workload identity issued by another cloud provider.
stratus is a cross-cloud identity broker that allows workloads with an identity issued by one cloud provider to exchange this identity for a workload identity issued by another cloud provider.

stratus stratus is a cross-cloud identity broker that allows workloads with an identity issued by one cloud provider to exchange this identity for a w

Dec 26, 2021
Cloud-Z gathers information and perform benchmarks on cloud instances in multiple cloud providers.

Cloud-Z Cloud-Z gathers information and perform benchmarks on cloud instances in multiple cloud providers. Cloud type, instance id, and type CPU infor

Jun 8, 2022
Publish Your GIS Data(Vector Data) to PostGIS and Geoserver
Publish Your GIS Data(Vector Data) to PostGIS and Geoserver

GISManager Publish Your GIS Data(Vector Data) to PostGIS and Geoserver How to install: go get -v github.com/hishamkaram/gismanager Usage: testdata fol

Sep 26, 2022