Weaviate is a cloud-native, modular, real-time vector search engine

Last update: Jan 5, 2023

Comments: 16

Weaviate

^{Weaviate GraphQL demo on news article dataset containing: Transformers module, GraphQL usage, semantic search, _additional{} features, Q&A, and Aggregate{} function. You can the demo on this dataset in the GUI here: semantic search, Q&A, Aggregate.}

Description

Weaviate is a cloud-native, real-time vector search engine (aka neural search engine or deep search engine). There are modules for specific use cases such as semantic search, plugins to integrate Weaviate in any application of your choice, and a console to visualize your data.

GraphQL - RESTful - vector search engine - vector database - neural search engine - semantic search - HNSW - deep search - machine learning - kNN

Features

Weaviate makes it easy to use state-of-the-art AI models while giving you the scalability, ease of use, safety and cost-effectiveness of a purpose-built vector database. Most notably:

Fast queries
Weaviate typically performs a 10-NN neighbor search out of millions of objects in considerably less than 100ms.
Any media type with Weaviate Modules
Use State-of-the-Art AI model inference (e.g. Transformers) for Text, Images, etc. at search and query time to let Weaviate manage the process of vectorizing your data for your - or import your own vectors.
Combine vector and scalar search
Weaviate allows for efficient combined vector and scalar searches, e.g “articles related to the COVID 19 pandemic published within the past 7 days”. Weaviate stores both your objects and the vectors and make sure the retrieval of both is always efficient. There is no need for a third party object storage.
Real-time and persistent
Weaviate let’s you search through your data even if it’s currently being imported or updated. In addition, every write is written to a Write-Ahead-Log (WAL) for immediately persisted writes - even when a crash occurs.
Horizontal Scalability
Scale Weaviate for your exact needs, e.g. High-Availability, maximum ingestion, largest possible dataset size, maximum queries per second, etc. (Currently under development, ETA Fall 2021)
Cost-Effectiveness
Very large datasets do not need to be kept entirely in memory in Weaviate. At the same time available memory can be used to increase the speed of queries. This allows for a conscious speed/cost trade-off to suit every use case.
Graph-like connections between objects
Make arbitrary connections between your objects in a graph-like fashion to resemble real-life connections between your data points. Traverse those connections using GraphQL.

Documentation

You can find detailed documentation in the developers section of our website or directly go to one of the docs using the links in the list below.

Additional reading

Examples

You can find code examples here

Support

Contributing

How to Contribute

Owner

SeMI Technologies

SeMI Technologies creates database software like the Weaviate vector search engine

https://github.com/semi-technologies/weaviate https://www.semi.technology/developers/weaviate/current/

Comments

Vectorization mask for classes

Currently all string/text values as well as the class name and property names are considered in the vectorization. However not all property names and values have be important for the context. Take the following meta class of a table as an example:

{
                "class": "Column",
                "description": "",
                "properties":[
                    {
                        "cardinality": "atMostOne",
                        "description": "",
                        "dataType": ["int"],
                        "keywords": [],
                        "name": "index"
                    },
                    {
                        "cardinality": "atMostOne",
                        "description": "",
                        "dataType": ["text"],
                        "keywords": [],
                        "name": "name"
                    },
                    {
                        "cardinality": "atMostOne",
                        "description": "",
                        "dataType": ["string"],
                        "keywords": [],
                        "name": "dataType"
                    }
                ]
            }

In this case the vector would be created based on Column, index, name, data, type and the values of the properties. However the class name and properties shift the vector into the context of tables while the context should be based solely on the column name and dataType values.

Proposal:

If nothing is specified the vector gets created automatically based on all information in the class
The user can explicitly mask information away from the vectorization in the schema:

{
                "class": "Column",
                "description": "",
                "vectorizeClassName": False
                "properties":[
                    {
                        "cardinality": "atMostOne",
                        "description": "",
                        "dataType": ["int"],
                        "keywords": [],
                        "name": "index"
                    },
                    {
                        "cardinality": "atMostOne",
                        "description": "",
                        "dataType": ["text"],
                        "keywords": [],
                        "name": "name"
                        "vectorizePropertyName": False
                        "vectorizePropertyValue": True
                    },
                    {
                        "cardinality": "atMostOne",
                        "description": "",
                        "dataType": ["string"],
                        "keywords": [],
                        "name": "dataType"
                        "vectorizePropertyName": False
                        "vectorizePropertyValue": True
                    }
                ]
            }

Hybrid Search (combining Vector + Sparse search)

WHAT

Combining vector search with sparse (e.g. BM25) search in one query

WHY

To bridge the gap between sparse search and vector search.

Longer why

Vector search (using dense vectors, computed by ML models) works well in-domain, but has poor performance out-of-domain. BM25 search works well out-of-domain, because it uses sparse methods (keyword matching), but can't perform context-based search. Combining both methods will improve search results out-of-domain.

HOW

This issue and implementation depend on issue https://github.com/semi-technologies/weaviate/issues/2133
Do both a dense and BM25 search using a query (in parallel)
You should be able to define a function to combine the results into 1 result list, using the scores of data in both candidate lists.
A default function can be defined in the Weaviate setup, but can be overwritten in the GraphQL query.
BM25 score is unbounded. Score normalization or scaling is not a good idea, because you lose information on how good the results are textually. However, when combining BM25 with Dense search, some form of normalization may be handy, so we can choose to offer normalization methods anyways. We can explain strategies in the documentation. Possible strategies are:
- minmax - To leave the distribution intact, the best normalization method is a minmax approach, which takes the minimum BM25 score and maximum BM25 score into account. Taking the maximum BM25 score of a particular query as maximum in the formula is not a good idea, because then you're setting this result to the maximum score regardless of the actual score. So a (theoretical) maximum needs to be defined, although BM25 is unbounded. This can be achieved by running a number of different queries, and recording the maximum value. Since this is quite complex to implement as a feature, we can start by offering a setting with a default value, which the user can change at runtime. An example can be: max((x/10), 1) if the guessed maximum is 10.
- Arctangent - An arctan scales values in a logarithmic manner. A function to scale scores <-1, 1> (practically between <0,1> because x will be a positive score) is: 2/pi*arctan(x) (see https://www.mdpi.com/2227-7390/10/8/1335)

Design - Weaviate setup

Requirements:

Dense retrieval and sparse retrieval independently (with the same or different query)
Combine the result of both methods using a scoring function

Schema The settings should be configured in the schema, per class:

{
  "class": "string",
  "vectorIndexType": "hnsw",                
  "vectorIndexConfig": {
    ...                                     
  },
  "vectorizer": "text2vec-transformers",
  "moduleConfig": {
    "text2vec-transformers": {  
      "vectorizeClassName": true            
    }
  },                       
  "sparseIndexType": "bm25",                
  "sparseIndexConfig": {                   
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    }
  },
  "properties": [                            
    {
      "name": "string",                     
      "description": "string",              
      "dataType": [                         
        "string"
      ],
      "sparseIndexConfig": {                
        "bm25": {
          "b": 0.75,
          "k1": 1.2, 
        }
      },
      "indexInverted": true                 
    }
  ]
}

Docker-compose In case we need to let Weaviate know on startup whether to enable sparse search, we can introduce an env var like ENABLE_SPARSE_INDEX:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.14.1
    ports:
    - 8080:8080
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: text2vec-transformers
      ENABLE_MODULES: text2vec-transformers
      CLUSTER_HOSTNAME: 'node1'
      ENABLE_SPARSE_INDEX: 'true' # <== NEW. Which method (e.g. bm25) can be specified in the schema. Not sure if this variable is needed in the docker-compose actually.
    t2v-transformers:
      image: semitechnologies/transformers-inference:sentence-transformers-msmarco-distilroberta-base-v2
      environment:
        ENABLE_CUDA: 0 # set to 1 to enable
...

Design - GraphQL Queries

The current API is documented in this comment.

Below is the more elaborate original proposal.

{
  Get {
    Paper (
      hybridSearch: {               # NEW
        operands: [{
          sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
            function: bm25,
            query: "my query",
            properties: ["abstract"],
            normalization: {
              minmax: {
                max: 10
              }
            }
          },
          weight: 0.2
        }, {
          nearText: {               # inherits all fields from nearText. alternative name: dense or denseSearch
            concepts: ["my query"],
            certainty: 0.7   
          },
          weight: 0.8
        }],
        type: Sum                   # or Average, or RRF (with RRF, weights will be discarded)
      }
    ) {
      title
      abstract
      _additional {
        score    # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
      }
    }
  }
}

Multiple sparse searches should also be supported, like:

{
  Get {
    Paper (
      hybridSearch: {               # NEW
        operands: [{
          sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
            function: bm25,
            query: "my query",
            properties: ["abstract"],
            normalization: {
              minmax: {
                max: 10
              }
            }
          },
          weight: 0.2
        }, {
          sparseSearch: {           # inherits all fields from sparseSearch. alternative name: sparse
            function: bm25,
            query: "my query 2",
            properties: ["title"],
            normalization: {
              minmax: {
                max: 10
              }
            },
            limit: 100
          },
          weight: 0.2
        }, {
          nearText: {              # inherits all fields from nearText. alternative name: dense or denseSearch
            concepts: ["my query"],
            certainty: 0.7,
            limit: 100   
          },
          weight: 0.6
        }],
        type: Sum                  # or Average, or RRF (with RRF, weights will be discarded)
      }
    ) {
      title
      abstract
      _additional {
        score        # NEW - because distance is not a good word, it really is a ranking score instead of distance between vectors
      }
    }
  }
}

Add _snippets in GraphQL and REST API

When an object is indexed on a larger text item (e.g., a paragraph like in the news article demo) certain search terms can be found in sentences.

The idea is to add a starting point and endpoint of the most important part of the text corpus in the __meta end-point as a potentialAnswer which can be enabled or disabled by setting an distanceToAnswer. This can work both for explore filters as where filters.

Idea

I was searching for something on WikiPedia under the search term: Is herbalife a pyramid scheme? and got this response.

Because Google isn't giving the actual answer but a location for the answer, we should be able to calculate something similar.

Explore example

{
  Get{
    Things{
      Article(
        explore: {
          concepts: ["I want a spare rib"],
          certainty: 0.7,
          moveAwayFrom: {
            concepts: ["bacon"],
            force: 0.45
          }
        }
      ){
        name
        __meta {
          potentialAnswer(
            distanceToAnswer: 0.5 # <== optional
          ) {
            start
            end
            property
            distanceToQuery
          }
        }
      }
    }
  }
}

Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

{
    "data": {
        "Get": {
            "Things": {
                "Article": [
                    {
                        "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                        "__meta": [ // <== array because multiple results are possible and/or multiple 
                            {
                                "property": "name",
                                "distanceToQuery": 0.0, // <== distance to the query
                                "start": 26,// <== just a random example
                                "end": 130  // <== just a random example
                            }
                        ]
                    }
                ]
            }
        }
    },
    "errors": null
}

Where example

{
  Get {
    Things {
      Article(where: {
            path: ["name"],
            operator: Like,
            valueString: "New *"
        }) {
        name
        __meta {
          potentialAnswer(
            distanceToAnswer: 0.5 # <== optional
          ) {
            start
            end
            property
            distanceToQuery
          }
        }
      }
    }
  }
}

Result where the start and end give the starting and ending position and in which property the answer / most important part can be found.

{
    "data": {
        "Get": {
            "Things": {
                "Article": [
                    {
                        "name": "Bacon ipsum dolor amet tri-tip hamburger leberkas short ribs chicken turkey sirloin tenderloin shoulder pig bresaola. Pastrami ham hock meatball rump ribeye cupim, capicola venison burgdoggen brisket meatloaf. Turducken t-bone landjaeger pork chop, bresaola pig prosciutto pastrami sausage pancetta capicola short ribs hamburger tail spare ribs. Jerky kevin doner cupim pork belly picanha, pancetta capicola pork loin alcatra corned beef shank. Bacon chislic landjaeger doner corned beef, hamburger beef ribs filet mignon turducken tri-tip andouille pastrami chuck pork loin capicola. Prosciutto shankle chislic, shoulder tri-tip turducken meatball ham pork loin fatback hamburger pork chop bacon pork belly. Kevin sausage salami spare ribs tenderloin t-bone meatball picanha flank jowl pork chop tail turducken tri-tip.",
                        "__meta": [ // <== array because multiple results are possible and/or multiple properties might be indexed
                            {
                                "property": "name",
                                "distanceToQuery": 0.0, // <== distance to the query
                                "start": 26,// <== just a random example
                                "end": 130  // <== just a random example
                            }
                        ]
                    }
                ]
            }
        }
    },
    "errors": null
}

Suggested (first) implementation

Results are returned like the current implementation.
The vectorization of the query is used to find the closest match of a word in a sentence. *
When the closest word is found, the start and endpoint are found at the beginning and end of the sentence.
The distanceToAnswer if the minimal distance, if it is not set, no start and end-points will be available, if multiple sentences make the mark, they will all be part of the array.

*- there might be potential to also do this on groups of words or complete sentences.

#1136 #1139 #1155 #1156

Add geotype to datatypes
Todos

[x] design decisions

[x] name of the field

current proposals: geoCoordinate, geoLocation, geoPoint

my personal favorite being geoCoordinate

cc @laura-ham @bobvanluijt

[x] design of the where filter

@laura-ham suggestions?

[x] spike out happy path in janusgraph only

[x] index creation

[x] adding property

[x] searching by property within range

[x] add new data type on import, goal: an import with geoCoordinates field succeeds

[x] allow in schema creation

[x] allow in class instance creation

[x] validation?

[x] add basic validation

[x] refactor validateSchemaInBody (it's way too long and extremely difficult to read/extend)

[x] janus graph create vertex

[x] include on simple read queries

[x] Local Get

[x] Network Get

[x] filter by property

[x] Local Filters

[x] extract filter from graphql

[x] set required validations, so that required fields cannot be omitted

[x] apply filter in connectors (Janusgraph)

~~Network Filters~~

nothing to do here, they use the same code as local filters

[x] deal with property in GetMeta and Aggregate

proposal for now to simply not support those fields there

long-term?

[x] rename according to latest decisions

[x] pluralize name geoCoordinate -> geoCoordinates

[x] restructure where filter

[x] WithinRange -> WithinGeoRange

[x] valueRange -> valueGeoRange

[x] wrap distance and geoCoordinates in separate objects

[x] Update docs (@laura-ham volunteered to help out here)

[x] e2e/acceptance test

Original Content below: for most up-to-date summary see comments below

Abstract Examples

A location in a 2-dimensional space, i.e. x=3, y=5

Coordinates that point to a location on a world map

Research Questions

Are geo types always two-dimensional or can they also be more dimensional?

Part 1: Do we have use cases for more than 2d?

Part 2: Is there technical support for more than 2d?

Optimal way to query, e.g. within 500m coordinates x,y mixes concepts of metrical distance and coordinates, what API do we want

Features we'd need

Import/update geo coordinates

use geo in where filters (see research question)

do we want to allow aggregation functions like Aggregate or GetMeta

if so, what should they look like?

if so, is there technical support in our current stack?

if so, right from the start or later?

cc @bobvanluijt @laura-ham

Dropping 'things' or 'actions' from the filter.

In the section on filters in the GraphQL documentation, the paths of the filters are prefixed with "things" and "actions".

This is superfluous information. The class names cannot overlap in any case.

Current query:

 Get(where:{
      operator: And,
      operands: [{
        path: ["Things", "Animal", "age"],
        operator: LessThan
        valueInt: 5
      }, {
        path: ["Things", "Animal", "inZoo", "Zoo", "name"],
        operator: Equal,
        valueString: "London Zoo"
      }]
    }) { ... }

After removing the kind of class:

 Get(where:{
      operator: And,
      operands: [{
        path: ["Animal", "age"],
        operator: LessThan
        valueInt: 5
      }, {
        path: ["Animal", "inZoo", "Zoo", "name"],
        operator: Equal,
        valueString: "London Zoo"
      }]
    }) { ... }

batch_create does not report on 'store is read-only' errors

When using the python client with client.batch and having crossed the DISK_USE_READONLY_PERCENTAGE my ingest fails when using data_object.create but will seemingly succeed when using batches. However the ingest that succeeded using batch did not contain any items when running a Get query though they have been processed by my t2v-transformers container.

There might be other errors that are not covered by batch.create and I only found out through a helpful comment in Weaviate Slack and after running into my issue. This seems to also have been the issue with https://github.com/semi-technologies/weaviate/issues/1929 .
(DOC) REST API authentication to a WCS cluster

Hi,

I'm trying to connect to an authentication enabled WCS cluster, with the REST API (for WooCommerce owners who want to use WCS hosting rather than installing Weaviate).

Could you point me to the documentation ? I received a 404 link after the cluster was created https://www.semi.technology/developers/weaviate/v1.8.0/configuration/authentication

I noticed the Enterprise token, but I do not think it is what I need. https://www.semi.technology/developers/weaviate/current/configuration/authentication.html is only about self setup I suspect.

Docker-compose DB fails: "Could not create SSTable component"

Weaviate doesn't start with Docker-compose and keeps in a failing loop;

db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt.tmp. Found exc
eption: std::system_error (error system:2, No such file or directory)
db_1        | ERROR 2019-02-27 15:30:53,471 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-Data.db: std::system_error (erro
r system:2, No such file or directory)
db_1        | WARN  2019-02-27 15:30:53,471 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-114-TOC.txt because it doesn't exist.
db_1        | ERROR 2019-02-27 15:31:03,471 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt.tmp. Found exc
eption: std::system_error (error system:2, No such file or directory)
db_1        | ERROR 2019-02-27 15:31:03,472 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-Data.db: std::system_error (erro
r system:2, No such file or directory)
db_1        | WARN  2019-02-27 15:31:03,472 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-116-TOC.txt because it doesn't exist.
db_1        | ERROR 2019-02-27 15:31:13,472 [shard 0] sstable - Could not create SSTable component /var/lib/scylla/
data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt.tmp. Found exc
eption: std::system_error (error system:2, No such file or directory)
db_1        | ERROR 2019-02-27 15:31:13,510 [shard 0] database - failed to write sstable /var/lib/scylla/data/syste
m_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-Data.db: std::system_error (erro
r system:2, No such file or directory)
db_1        | WARN  2019-02-27 15:31:13,510 [shard 0] sstable - Unable to delete /var/lib/scylla/data/system_schema
/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/system_schema-keyspaces-ka-118-TOC.txt because it doesn't exist.

Can Aggregate depend on Get ?

I noticed that the Aggregate function does not depend on the Get query. For instance, on the following query, facets are always the same, whatever the 'wpsolr_type' condition.

{
  results: Get {
    WeaviateWooCommerce(
      limit: 10
      where: {path: ["wpsolr_type"], operator: Equal, valueString: "post"}
    ) {
      wpsolr_title
      wpsolr_product_cat_str
    }
  }
  facets: Aggregate {
    WeaviateWooCommerce(groupBy: ["wpsolr_product_cat_str"]) {
      meta {
        count
      }
      groupedBy {
        value
      }
    }
  }
}

I'd like to get facets related to the results instead.

Is it possible?

Suggestion: auto cut-off Explore search results
Problem & current behaviour

If the Explore filter in a GraphQL Get query is used, it is unclear how to set and control the certainty (or distance) parameter. This parameter controls what results to return, but with the current design, the user does not know what to set this parameter, to get optimal results.

You don't want to see 'bad' results amongst the results, but you don't know beforehand where the cut off point of the certainty is. Additionally, we observed that the user prefers not to see any results if there are no good results at all.

Proposed solution

Automatically find a cut-off threshold for which results to show. This threshold can be calculated by e.g. an elbow in distances between each result and the query, or between a cluster of results and the query.

Questions

Should the user have the option to set whether they want to enable this auto function on their query?

What value to set the (relative) cut-off point to? (=how big should the gap between the points or clusters relatively be?)

SUGGESTION: Dump vectors

Some data scientists might want to leverage the vectorization mechanism in Weaviate to train new models. The result would be similar to /things and /things/{UUID} but with a focus on a matrix to download all objects.

RESTful URL suggestion: /c11y/vectors and /c11y/vectors/{UUID}.

`c11y/vectors/{UUID}`

returns:

{
    "type": "thing",
    "vector": [
        0,
        0,
        0,
        0,
        //etc
    ]
}

`c11y/vectors?page=0`

returns:

{
    "result": {
        "UUID_1": {
            "type": "thing",
            "vector": [ //Array is same object as c11y/vectors/{UUID}
                0,
                0,
                0,
                0,
                //etc
            ]
        },
        "UUID_1": {
            "type": "thing",
            "vectors": [ //Array is same object as c11y/vectors/{UUID}
                0,
                0,
                0,
                0,
                //etc
            ]
        }
    },
    "pages": 100 // total pages
}

Push docker container to github registry
What's being changed:

This should enable docker image downloads from other repos without docker logins

Review checklist

[ ] Documentation has been updated, if necessary. Link to changed documentation:

[ ] Chaos pipeline run or not necessary. Link to pipeline:

[ ] All new code is covered by tests where it is reasonable.

[ ] Performance tests have been run or not necessary.
Roaring allow list rebased
What's being changed:

Review checklist

[ ] Documentation has been updated, if necessary. Link to changed documentation:

[ ] Chaos pipeline run or not necessary. Link to pipeline:

[ ] All new code is covered by tests where it is reasonable.

[ ] Performance tests have been run or not necessary.
Support hybrid search in Aggregate
What's being changed:

Review checklist

[ ] Documentation has been updated, if necessary. Link to changed documentation:

[ ] Chaos pipeline run or not necessary. Link to pipeline:

[ ] All new code is covered by tests where it is reasonable.

[ ] Performance tests have been run or not necessary.

Closes #2482

creationTimeUnix is beeing updated during upsert

I’ve noticed that creationTimeUnix is being updated (and is the same to lastUpdateTimeUnix) when I upsert the record (batch write with the same id) I hoped that I could determine whether the record had been upserted by comparing those two in batch result.

I've tested this with two documents, one was already there, the other one was being inserted: this was the result:

[
{
//This one should have been updated
'class':'P_8062f70f_d2c7_4b9b_b0b7_1c587ed49c2d',
'creationTimeUnix': 1672845088023,
'id':'cebcaf20-2a4b-5a42-93b9-d662c02928bc',
'lastUpdateTimeUnix': 1672845088023,
'properties': {'_creationTimeUnix': 1672845088023, _lastUpdateTimeUnix': 1672845088023, ...},
'vector': [...],
'deprecations':None,
'result':{}
},
{
//This one should have been inserted
'class':'P_8062f70f_d2c7_4b9b_b0b7_1c587ed49c2d',
'creationTimeUnix': 1672845088023,
'id':'bb323fed-3026-500b-8d61-e686b75aa19e',
'lastUpdateTimeUnix': 1672845088023,
'properties': {'_creationTimeUnix': 1672845088023, '_lastUpdateTimeUnix': 1672845088023, ...},
'vector': [...],
'deprecations':None,
'result':{}
},
]

According to @byronvoorbach it should work as expected and the upserted document should have only lastUpdateTimeUnix updated. Currently, there seems to be no direct way to determine how many documents were updated, inserted, upserted, and so on...

Improve decoding values in LSM SetDecoder by not maintaining values original order
Access to the map has been reduced to a minimum

What's being changed:

Review checklist

[ ] Documentation has been updated, if necessary. Link to changed documentation:

[ ] Chaos pipeline run or not necessary. Link to pipeline:

[ ] All new code is covered by tests where it is reasonable.

[ ] Performance tests have been run or not necessary.

Real-time Charging System for Telecom & ISP environments

Real-time Online/Offline Charging System (OCS) for Telecom & ISP environments Features Real-time Online/Offline Charging System (OCS). Account Balance

Dec 31, 2022

community search engine

Lieu an alternative search engine Created in response to the environs of apathy concerning the use of hypertext search and discovery.

Dec 24, 2022

Self hosted search engine for data leaks and password dumps

Self hosted search engine for data leaks and password dumps. Upload and parse multiple files, then quickly search through all stored items with the power of Elasticsearch.

Aug 2, 2021

A search engine for XKCD

xkcd_searchtool a search engine for XKCD What is it? This tool can crawling the comic transcripts from XKCD.com Users can search a comic using key wor

Sep 29, 2021

Zinc Search engine. A lightweight alternative to elasticsearch that requires minimal resources, written in Go.

Zinc Search Engine Zinc is a search engine that does full text indexing. It is a lightweight alternative to Elasticsearch and runs using a fraction of

Jan 1, 2023

This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

Dec 22, 2022

State observer - StateObserver used to synchronize the local(cached) state of the remote object with the real state

state observer StateObserver used to synchronize the local(cached) state of the

Jan 19, 2022

traning helper. Reading real METARs

pptrain Train reading real METARs Example: $ pptrain

Jan 23, 2022

The gofinder program is an acme user interface to search through Go projects.

Jun 14, 2021

Universal code search (self-hosted)

Sourcegraph OSS edition is a fast, open-source, fully-featured code search and navigation engine. Enterprise editions are available. Features Fast glo

Jan 9, 2023

using go search the Marvel universe characters via marvel api

go-marvel-api using go search the Marvel universe characters via marvel api Build and run tests on the local environemnt Build the project $ go build

Oct 5, 2021

Alfred 4 workflow to easily search and launch bookmarks from the Brave Browser

Alfred Brave Browser Bookmarks A simple and fast workflow for searching and launching Brave Browser bookmarks. Why this workflow? No python dependency

Nov 28, 2022

Quick search and short links for NYC Council Legislation

Quick Search and Short Links for NYC Council Legislation Quick Search Link to searches with /?q=${query}. In-browser searching is implemented with fle

Oct 12, 2022

Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Apr 21, 2022

Target Case Study - Document Search

Target Case Study - Document Search Goal The goal of this exercise is to create

Feb 7, 2022

Native Go bindings for D-Bus

go.dbus go.dbus is a simple library that implements native Go client bindings for the D-Bus message bus system. Features Complete native implementatio

Nov 20, 2022

A toolkit for replaying time series data.

Replay Toolkit The replay package provides some simple tools for replaying captured data at realtime. I use this in various tools that take logged dat

Aug 13, 2019

A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Dec 17, 2022

Visualize plant growth over time with Go, WebDAV and WASM; @pojntfx's entry for #growlab.

Growlapse Visualize plant growth over time with Go, WebDAV and WASM; @pojntfx's entry for #growlab. Installation Containerized You can get the Docker

Feb 21, 2022

Weaviate is a cloud-native, modular, real-time vector search engine

Weaviate

Description

Features

Documentation

Additional reading

Examples

Support

Contributing

Owner

SeMI Technologies

Comments

Vectorization mask for classes

Hybrid Search (combining Vector + Sparse search)

WHAT

WHY

Longer why

HOW

Design - Weaviate setup

Design - GraphQL Queries

Add _snippets in GraphQL and REST API

Idea

Explore example

Where example

Suggested (first) implementation

Related

Add geotype to datatypes

Todos

Abstract Examples

Research Questions

Features we'd need

Dropping 'things' or 'actions' from the filter.

batch_create does not report on 'store is read-only' errors

(DOC) REST API authentication to a WCS cluster

Docker-compose DB fails: "Could not create SSTable component"

Can Aggregate depend on Get ?

Suggestion: auto cut-off Explore search results

Problem & current behaviour

Proposed solution

Questions

SUGGESTION: Dump vectors

c11y/vectors/{UUID}

c11y/vectors?page=0

Push docker container to github registry

What's being changed:

Review checklist

Roaring allow list rebased

What's being changed:

Review checklist

Support hybrid search in Aggregate

What's being changed:

Review checklist

creationTimeUnix is beeing updated during upsert

Improve decoding values in LSM SetDecoder by not maintaining values original order

What's being changed:

Review checklist

Related tags

Real-time Charging System for Telecom & ISP environments

community search engine

Self hosted search engine for data leaks and password dumps

A search engine for XKCD

Zinc Search engine. A lightweight alternative to elasticsearch that requires minimal resources, written in Go.

This Go based project of Aadhyarupam Innovators demonstrate the code examples for building microservices, integration with cloud services (Google Cloud Firestore), application configuration management (Viper) etc.

State observer - StateObserver used to synchronize the local(cached) state of the remote object with the real state

traning helper. Reading real METARs

The gofinder program is an acme user interface to search through Go projects.

Universal code search (self-hosted)

using go search the Marvel universe characters via marvel api

Alfred 4 workflow to easily search and launch bookmarks from the Brave Browser

Quick search and short links for NYC Council Legislation

Search running process for a given dll/function. Exposes a bufio.Scanner-like interface for walking a process' PEB

Target Case Study - Document Search

Native Go bindings for D-Bus

A toolkit for replaying time series data.

A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Visualize plant growth over time with Go, WebDAV and WASM; @pojntfx's entry for #growlab.

`c11y/vectors/{UUID}`

`c11y/vectors?page=0`