SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV.

SeaweedFS

Slack Twitter Build Status GoDoc Wiki Docker Pulls SeaweedFS on Maven Central

SeaweedFS Logo

Sponsor SeaweedFS via Patreon

SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible entirely thanks to the support of these awesome backers. If you'd like to grow SeaweedFS even stronger, please consider joining our sponsors on Patreon.

Your support will be really appreciated by me and other supporters!

Gold Sponsors

shuguang


Table of Contents

Quick Start with single binary

  • Download the latest binary from https://github.com/chrislusf/seaweedfs/releases and unzip a single binary file weed or weed.exe
  • Run weed server -dir=/some/data/dir -s3 to start one master, one volume server, one filer, and one S3 gateway.

Also, to increase capacity, just add more volume servers by running weed volume -dir="/some/data/dir2" -mserver="<master_host>:9333" -port=8081 locally, or on a different machine, or on thousands of machines. That is it!

Quick Start for S3 API on Docker

docker run -p 8333:8333 chrislusf/seaweedfs server -s3

Introduction

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:

  1. to store billions of files!
  2. to serve the files fast!

SeaweedFS started as an Object Store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation).

There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.

SeaweedFS started by implementing Facebook's Haystack design paper. Also, SeaweedFS implements erasure coding with ideas from f4: Facebook’s Warm BLOB Storage System, and has a lot of similarities with Facebook’s Tectonic Filesystem

On top of the object store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, etc.

For any distributed key value stores, the large values can be offloaded to SeaweedFS. With the fast access speed and linearly scalable capacity, SeaweedFS can work as a distributed Key-Large-Value store.

SeaweedFS can transparently integrate with the cloud. With hot data on local cluster, and warm data on the cloud with O(1) access time, SeaweedFS can achieve both fast local access time and elastic cloud storage capacity. What's more, the cloud storage access API cost is minimized. Faster and Cheaper than direct cloud storage!

Back to TOC

Additional Features

  • Can choose no replication or different replication levels, rack and data center aware.
  • Automatic master servers failover - no single point of failure (SPOF).
  • Automatic Gzip compression depending on file mime type.
  • Automatic compaction to reclaim disk space after deletion or update.
  • Automatic entry TTL expiration.
  • Any server with some disk spaces can add to the total storage space.
  • Adding/Removing servers does not cause any data re-balancing unless triggered by admin commands.
  • Optional picture resizing.
  • Support ETag, Accept-Range, Last-Modified, etc.
  • Support in-memory/leveldb/readonly mode tuning for memory/performance balance.
  • Support rebalancing the writable and readonly volumes.
  • Customizable Multiple Storage Tiers: Customizable storage disk types to balance performance and cost.
  • Transparent cloud integration: unlimited capacity via tiered cloud storage for warm data.
  • Erasure Coding for warm storage Rack-Aware 10.4 erasure coding reduces storage cost and increases availability.

Back to TOC

Filer Features

Kubernetes

Back to TOC

Example: Using Seaweed Object Store

By default, the master node runs on port 9333, and the volume nodes run on port 8080. Let's start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. We'll use localhost as an example.

SeaweedFS uses HTTP REST operations to read, write, and delete. The responses are in JSON or JSONP format.

Start Master Server

> ./weed master

Start Volume Servers

> weed volume -dir="/tmp/data1" -max=5  -mserver="localhost:9333" -port=8080 &
> weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 &

Write File

To upload a file: first, send a HTTP POST, PUT, or GET request to /dir/assign to get an fid and a volume server url:

> curl http://localhost:9333/dir/assign
{"count":1,"fid":"3,01637037d6","url":"127.0.0.1:8080","publicUrl":"localhost:8080"}

Second, to store the file content, send a HTTP multi-part POST request to url + '/' + fid from the response:

> curl -F [email protected]/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6
{"name":"myphoto.jpg","size":43234,"eTag":"1cc0118e"}

To update, send another POST request with updated file content.

For deletion, send an HTTP DELETE request to the same url + '/' + fid URL:

> curl -X DELETE http://127.0.0.1:8080/3,01637037d6

Save File Id

Now, you can save the fid, 3,01637037d6 in this case, to a database field.

The number 3 at the start represents a volume id. After the comma, it's one file key, 01, and a file cookie, 637037d6.

The volume id is an unsigned 32-bit integer. The file key is an unsigned 64-bit integer. The file cookie is an unsigned 32-bit integer, used to prevent URL guessing.

The file key and file cookie are both coded in hex. You can store the <volume id, file key, file cookie> tuple in your own format, or simply store the fid as a string.

If stored as a string, in theory, you would need 8+1+16+8=33 bytes. A char(33) would be enough, if not more than enough, since most uses will not need 2^32 volumes.

If space is really a concern, you can store the file id in your own format. You would need one 4-byte integer for volume id, 8-byte long number for file key, and a 4-byte integer for the file cookie. So 16 bytes are more than enough.

Read File

Here is an example of how to render the URL.

First look up the volume server's URLs by the file's volumeId:

> curl http://localhost:9333/dir/lookup?volumeId=3
{"volumeId":"3","locations":[{"publicUrl":"localhost:8080","url":"localhost:8080"}]}

Since (usually) there are not too many volume servers, and volumes don't move often, you can cache the results most of the time. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read.

Now you can take the public url, render the url or directly read from the volume server via url:

 http://localhost:8080/3,01637037d6.jpg

Notice we add a file extension ".jpg" here. It's optional and just one way for the client to specify the file content type.

If you want a nicer URL, you can use one of these alternative URL formats:

 http://localhost:8080/3/01637037d6/my_preferred_name.jpg
 http://localhost:8080/3/01637037d6.jpg
 http://localhost:8080/3,01637037d6.jpg
 http://localhost:8080/3/01637037d6
 http://localhost:8080/3,01637037d6

If you want to get a scaled version of an image, you can add some params:

http://localhost:8080/3/01637037d6.jpg?height=200&width=200
http://localhost:8080/3/01637037d6.jpg?height=200&width=200&mode=fit
http://localhost:8080/3/01637037d6.jpg?height=200&width=200&mode=fill

Rack-Aware and Data Center-Aware Replication

SeaweedFS applies the replication strategy at a volume level. So, when you are getting a file id, you can specify the replication strategy. For example:

curl http://localhost:9333/dir/assign?replication=001

The replication parameter options are:

000: no replication
001: replicate once on the same rack
010: replicate once on a different rack, but same data center
100: replicate once on a different data center
200: replicate twice on two different data center
110: replicate once on a different rack, and once on a different data center

More details about replication can be found on the wiki.

You can also set the default replication strategy when starting the master server.

Allocate File Key on Specific Data Center

Volume servers can be started with a specific data center name:

 weed volume -dir=/tmp/1 -port=8080 -dataCenter=dc1
 weed volume -dir=/tmp/2 -port=8081 -dataCenter=dc2

When requesting a file key, an optional "dataCenter" parameter can limit the assigned volume to the specific data center. For example, this specifies that the assigned volume should be limited to 'dc1':

 http://localhost:9333/dir/assign?dataCenter=dc1

Other Features

Back to TOC

Architecture

Usually distributed file systems split each file into chunks, a central master keeps a mapping of filenames, chunk indices to chunk handles, and also which chunks each chunk server has.

The main drawback is that the central master can't handle many small files efficiently, and since all read requests need to go through the chunk master, so it might not scale well for many concurrent users.

Instead of managing chunks, SeaweedFS manages data volumes in the master server. Each data volume is 32GB in size, and can hold a lot of files. And each storage node can have many data volumes. So the master node only needs to store the metadata about the volumes, which is a fairly small amount of data and is generally stable.

The actual file metadata is stored in each volume on volume servers. Since each volume server only manages metadata of files on its own disk, with only 16 bytes for each file, all file access can read file metadata just from memory and only needs one disk operation to actually read file data.

For comparison, consider that an xfs inode structure in Linux is 536 bytes.

Master Server and Volume Server

The architecture is fairly simple. The actual data is stored in volumes on storage nodes. One volume server can have multiple volumes, and can both support read and write access with basic authentication.

All volumes are managed by a master server. The master server contains the volume id to volume server mapping. This is fairly static information, and can be easily cached.

On each write request, the master server also generates a file key, which is a growing 64-bit unsigned integer. Since write requests are not generally as frequent as read requests, one master server should be able to handle the concurrency well.

Write and Read files

When a client sends a write request, the master server returns (volume id, file key, file cookie, volume node url) for the file. The client then contacts the volume node and POSTs the file content.

When a client needs to read a file based on (volume id, file key, file cookie), it asks the master server by the volume id for the (volume node url, volume node public url), or retrieves this from a cache. Then the client can GET the content, or just render the URL on web pages and let browsers fetch the content.

Please see the example for details on the write-read process.

Storage Size

In the current implementation, each volume can hold 32 gibibytes (32GiB or 8x2^32 bytes). This is because we align content to 8 bytes. We can easily increase this to 64GiB, or 128GiB, or more, by changing 2 lines of code, at the cost of some wasted padding space due to alignment.

There can be 4 gibibytes (4GiB or 2^32 bytes) of volumes. So the total system size is 8 x 4GiB x 4GiB which is 128 exbibytes (128EiB or 2^67 bytes).

Each individual file size is limited to the volume size.

Saving memory

All file meta information stored on an volume server is readable from memory without disk access. Each file takes just a 16-byte map entry of <64bit key, 32bit offset, 32bit size>. Of course, each map entry has its own space cost for the map. But usually the disk space runs out before the memory does.

Tiered Storage to the cloud

The local volume servers are much faster, while cloud storages have elastic capacity and are actually more cost-efficient if not accessed often (usually free to upload, but relatively costly to access). With the append-only structure and O(1) access time, SeaweedFS can take advantage of both local and cloud storage by offloading the warm data to the cloud.

Usually hot data are fresh and warm data are old. SeaweedFS puts the newly created volumes on local servers, and optionally upload the older volumes on the cloud. If the older data are accessed less often, this literally gives you unlimited capacity with limited local servers, and still fast for new data.

With the O(1) access time, the network latency cost is kept at minimum.

If the hot/warm data is split as 20/80, with 20 servers, you can achieve storage capacity of 100 servers. That's a cost saving of 80%! Or you can repurpose the 80 servers to store new data also, and get 5X storage throughput.

Back to TOC

Compared to Other File Systems

Most other distributed file systems seem more complicated than necessary.

SeaweedFS is meant to be fast and simple, in both setup and operation. If you do not understand how it works when you reach here, we've failed! Please raise an issue with any questions or update this file with clarifications.

SeaweedFS is constantly moving forward. Same with other systems. These comparisons can be outdated quickly. Please help to keep them updated.

Back to TOC

Compared to HDFS

HDFS uses the chunk approach for each file, and is ideal for storing large files.

SeaweedFS is ideal for serving relatively smaller files quickly and concurrently.

SeaweedFS can also store extra large files by splitting them into manageable data chunks, and store the file ids of the data chunks into a meta chunk. This is managed by "weed upload/download" tool, and the weed master or volume servers are agnostic about it.

Back to TOC

Compared to GlusterFS, Ceph

The architectures are mostly the same. SeaweedFS aims to store and read files fast, with a simple and flat architecture. The main differences are

  • SeaweedFS optimizes for small files, ensuring O(1) disk seek operation, and can also handle large files.
  • SeaweedFS statically assigns a volume id for a file. Locating file content becomes just a lookup of the volume id, which can be easily cached.
  • SeaweedFS Filer metadata store can be any well-known and proven data stores, e.g., Redis, Cassandra, HBase, Mongodb, Elastic Search, MySql, Postgres, Sqlite, MemSql, TiDB, CockroachDB, Etcd etc, and is easy to customized.
  • SeaweedFS Volume server also communicates directly with clients via HTTP, supporting range queries, direct uploads, etc.
System File Metadata File Content Read POSIX REST API Optimized for large number of small files
SeaweedFS lookup volume id, cacheable O(1) disk seek Yes Yes
SeaweedFS Filer Linearly Scalable, Customizable O(1) disk seek FUSE Yes Yes
GlusterFS hashing FUSE, NFS
Ceph hashing + rules FUSE Yes
MooseFS in memory FUSE No
MinIO separate meta file for each file Yes No

Back to TOC

Compared to GlusterFS

GlusterFS stores files, both directories and content, in configurable volumes called "bricks".

GlusterFS hashes the path and filename into ids, and assigned to virtual volumes, and then mapped to "bricks".

Back to TOC

Compared to MooseFS

MooseFS chooses to neglect small file issue. From moosefs 3.0 manual, "even a small file will occupy 64KiB plus additionally 4KiB of checksums and 1KiB for the header", because it "was initially designed for keeping large amounts (like several thousands) of very big files"

MooseFS Master Server keeps all meta data in memory. Same issue as HDFS namenode.

Back to TOC

Compared to Ceph

Ceph can be setup similar to SeaweedFS as a key->blob store. It is much more complicated, with the need to support layers on top of it. Here is a more detailed comparison

SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage.

Same as SeaweedFS, Ceph is also based on the object store RADOS. Ceph is rather complicated with mixed reviews.

Ceph uses CRUSH hashing to automatically manage the data placement, which is efficient to locate the data. But the data has to be placed according to the CRUSH algorithm. Any wrong configuration would cause data loss. Topology changes, such as adding new servers to increase capacity, will cause data migration with high IO cost to fit the CRUSH algorithm. SeaweedFS places data by assigning them to any writable volumes. If writes to one volume failed, just pick another volume to write. Adding more volumes are also as simple as it can be.

SeaweedFS is optimized for small files. Small files are stored as one continuous block of content, with at most 8 unused bytes between files. Small file access is O(1) disk read.

SeaweedFS Filer uses off-the-shelf stores, such as MySql, Postgres, Sqlite, Mongodb, Redis, Elastic Search, Cassandra, HBase, MemSql, TiDB, CockroachCB, Etcd, to manage file directories. These stores are proven, scalable, and easier to manage.

SeaweedFS comparable to Ceph advantage
Master MDS simpler
Volume OSD optimized for small files
Filer Ceph FS linearly scalable, Customizable, O(1) or O(logN)

Back to TOC

Compared to MinIO

MinIO follows AWS S3 closely and is ideal for testing for S3 API. It has good UI, policies, versionings, etc. SeaweedFS is trying to catch up here. It is also possible to put MinIO as a gateway in front of SeaweedFS later.

MinIO metadata are in simple files. Each file write will incur extra writes to corresponding meta file.

MinIO does not have optimization for lots of small files. The files are simply stored as is to local disks. Plus the extra meta file and shards for erasure coding, it only amplifies the LOSF problem.

MinIO has multiple disk IO to read one file. SeaweedFS has O(1) disk reads, even for erasure coded files.

MinIO has full-time erasure coding. SeaweedFS uses replication on hot data for faster speed and optionally applies erasure coding on warm data.

MinIO does not have POSIX-like API support.

MinIO has specific requirements on storage layout. It is not flexible to adjust capacity. In SeaweedFS, just start one volume server pointing to the master. That's all.

Dev Plan

  • More tools and documentation, on how to manage and scale the system.
  • Read and write stream data.
  • Support structured data.

This is a super exciting project! And we need helpers and support!

Back to TOC

Installation Guide

Installation guide for users who are not familiar with golang

Step 1: install go on your machine and setup the environment by following the instructions at:

https://golang.org/doc/install

make sure you set up your $GOPATH

Step 2: checkout this repo:

git clone https://github.com/chrislusf/seaweedfs.git

Step 3: download, compile, and install the project by executing the following command

make install

Once this is done, you will find the executable "weed" in your $GOPATH/bin directory

Back to TOC

Disk Related Topics

Hard Drive Performance

When testing read performance on SeaweedFS, it basically becomes a performance test of your hard drive's random read speed. Hard drives usually get 100MB/s~200MB/s.

Solid State Disk

To modify or delete small files, SSD must delete a whole block at a time, and move content in existing blocks to a new block. SSD is fast when brand new, but will get fragmented over time and you have to garbage collect, compacting blocks. SeaweedFS is friendly to SSD since it is append-only. Deletion and compaction are done on volume level in the background, not slowing reading and not causing fragmentation.

Back to TOC

Benchmark

My Own Unscientific Single Machine Results on Mac Book with Solid State Disk, CPU: 1 Intel Core i7 2.6GHz.

Write 1 million 1KB file:

Concurrency Level:      16
Time taken for tests:   66.753 seconds
Complete requests:      1048576
Failed requests:        0
Total transferred:      1106789009 bytes
Requests per second:    15708.23 [#/sec]
Transfer rate:          16191.69 [Kbytes/sec]

Connection Times (ms)
              min      avg        max      std
Total:        0.3      1.0       84.3      0.9

Percentage of the requests served within a certain time (ms)
   50%      0.8 ms
   66%      1.0 ms
   75%      1.1 ms
   80%      1.2 ms
   90%      1.4 ms
   95%      1.7 ms
   98%      2.1 ms
   99%      2.6 ms
  100%     84.3 ms

Randomly read 1 million files:

Concurrency Level:      16
Time taken for tests:   22.301 seconds
Complete requests:      1048576
Failed requests:        0
Total transferred:      1106812873 bytes
Requests per second:    47019.38 [#/sec]
Transfer rate:          48467.57 [Kbytes/sec]

Connection Times (ms)
              min      avg        max      std
Total:        0.0      0.3       54.1      0.2

Percentage of the requests served within a certain time (ms)
   50%      0.3 ms
   90%      0.4 ms
   98%      0.6 ms
   99%      0.7 ms
  100%     54.1 ms

Back to TOC

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The text of this page is available for modification and reuse under the terms of the Creative Commons Attribution-Sharealike 3.0 Unported License and the GNU Free Documentation License (unversioned, with no invariant sections, front-cover texts, or back-cover texts).

Back to TOC

Stargazers over time

Stargazers over time

Owner
Chris Lu
https://github.com/chrislusf/seaweedfs SeaweedFS the distributed file system and object store for billions of small files ...
Chris Lu
Comments
  • Lots of

    Lots of "volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 " error log in PROD env

    any tips to find the problem?

    when users access some files of 10TB files, they will get 404 error, to check the logs:

    I0223 06:37:10 12031 topology_vacuum.go:90] check vacuum on collection:web volume:769
    I0223 06:37:10 12031 topology_vacuum.go:90] check vacuum on collection:web volume:770
    I0223 06:37:10 12031 topology_vacuum.go:90] check vacuum on collection:web volume:771
    I0223 06:37:10 12031 topology_vacuum.go:90] check vacuum on collection:web volume:765
    I0223 06:37:10 12031 topology_vacuum.go:90] check vacuum on collection:web volume:766
    I0223 06:37:38 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:37:38 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:37448 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:37:38 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:43628 agent Dalvik/2.1.0 (Linux; U; Android 6.0.1; OPPO A57 Build/MMB29M)
    I0223 06:38:26 12067 volume_server_handlers_read.go:69] read error: Not Found /767,2b718ceb08e455_1
    I0223 06:38:26 12067 volume_server_handlers_read.go:75] request /767,2b718ceb08e455 with unmaching cookie seen: 3943228501 expected: 3983295080 from 192.1
    68.254.8:37652 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:38:26 12067 volume_server_handlers_read.go:75] request /767,2b718ceb08e455 with unmaching cookie seen: 3943228501 expected: 3983295080 from 127.0
    .0.1:43766 agent Mozilla/5.0 (Linux; Android 6.0.1; vivo Y53 Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQ
    Browser/6.2 TBS/043906 Mobile Safari/537.36 MicroMessenger/6.6.3.1240(0x26060339) NetType/4G Language/zh_CN
    I0223 06:39:02 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:39:02 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:37816 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:39:02 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:43874 agent Dalvik/2.1.0 (Linux; U; Android 6.0.1; OPPO A57 Build/MMB29M)
    I0223 06:40:00 12067 volume_server_handlers_read.go:75] request /765/36b2ca065ca535.png with unmaching cookie seen: 106734901 expected: 3831334744 from 12
    7.0.0.1:44042 agent Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604
    .1
    I0223 06:41:07 12067 volume_server_handlers_read.go:75] request /765/2b717f6f406291.png with unmaching cookie seen: 1866490513 expected: 2413760694 from 1
    27.0.0.1:44234 agent Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_5 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Mobile/15D60 MicroMessenger/6.6.3 Ne
    tType/WIFI Language/zh_CN
    I0223 06:41:37 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:41:37 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:38434 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:41:37 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:44318 agent CH999/2.9.9 CFNetwork/889.9 Darwin/17.2.0
    I0223 06:42:11 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:44422 agent Mozilla/5.0 (Linux; Android 7.0; SM-G9508 Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQ
    QBrowser/6.2 TBS/043906 Mobile Safari/537.36 MicroMessenger/6.6.3.1260(0x26060339) NetType/WIFI Language/zh_CN
    I0223 06:42:11 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:44422 agent Mozilla/5.0 (Linux; Android 7.0; SM-G9508 Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQ
    QBrowser/6.2 TBS/043906 Mobile Safari/537.36 MicroMessenger/6.6.3.1260(0x26060339) NetType/WIFI Language/zh_CN
    I0223 06:42:31 12067 volume_server_handlers_read.go:75] request /765/36b2ca065ca535.png with unmaching cookie seen: 106734901 expected: 3831334744 from 12
    7.0.0.1:44480 agent Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604
    .1
    I0223 06:42:34 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:44480 agent Mozilla/5.0 (Linux; Android 7.0; SM-G9508 Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQ
    QBrowser/6.2 TBS/043906 Mobile Safari/537.36 MicroMessenger/6.6.3.1260(0x26060339) NetType/WIFI Language/zh_CN
    I0223 06:42:34 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:44480 agent Mozilla/5.0 (Linux; Android 7.0; SM-G9508 Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQ
    QBrowser/6.2 TBS/043906 Mobile Safari/537.36 MicroMessenger/6.6.3.1260(0x26060339) NetType/WIFI Language/zh_CN
    I0223 06:44:28 12067 volume_server_handlers_read.go:75] request /656/375439177051ee.jpg with unmaching cookie seen: 393236974 expected: 3021359726 from 12
    7.0.0.1:44804 agent Mozilla/5.0 (Linux; Android 6.0; PLK-TL01H Build/HONORPLK-TL01H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.0.0 Mobile Safari/
    537.36
    I0223 06:44:44 12067 volume_server_handlers_read.go:75] request /661/33289a3ecd5edd.jpg with unmaching cookie seen: 1053646557 expected: 1674074305 from 1
    27.0.0.1:44804 agent Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 9ji/2.5.0/iPhone 6s
    I0223 06:47:22 12067 volume_server_handlers_read.go:75] request /657/38381fbac1a584.jpg with unmaching cookie seen: 3133252996 expected: 1543146961 from 1
    27.0.0.1:45338 agent Mozilla/5.0 (Linux; U; Android 7.1.2; zh-CN; MI 5X Build/N2G47H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.
    108 UCBrowser/11.8.8.968 Mobile Safari/537.36
    I0223 06:47:42 12067 volume_server_handlers_read.go:75] request /765/2b717f6f406291.png with unmaching cookie seen: 1866490513 expected: 2413760694 from 1
    27.0.0.1:45390 agent Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60 MicroMessenger/6.6.3 Ne
    tType/WIFI Language/zh_CN
    I0223 06:47:44 12067 volume_server_handlers_read.go:75] request /655/3469e5e28fd023.jpg with unmaching cookie seen: 3801075747 expected: 234264574 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (iPhone 6sp; CPU iPhone OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 MQQBrowser/8.0.2 Mobil
    e/15B87 Safari/8536.25 MttCustomUA/2 QBWebViewType/1 WKType/1
    I0223 06:47:52 12067 volume_server_handlers_read.go:75] request /657/38381fbac1a584.jpg with unmaching cookie seen: 3133252996 expected: 1543146961 from 1
    27.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.1.2; zh-CN; MI 5X Build/N2G47H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.
    108 UCBrowser/11.8.8.968 Mobile Safari/537.36
    I0223 06:47:52 12067 volume_server_handlers_read.go:75] request /655/3469e5e28fd023.jpg with unmaching cookie seen: 3801075747 expected: 234264574 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (iPhone 6sp; CPU iPhone OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 MQQBrowser/8.0.2 Mobil
    e/15B87 Safari/8536.25 MttCustomUA/2 QBWebViewType/1 WKType/1
    I0223 06:48:04 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:04 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:11 12067 volume_server_handlers_read.go:75] request /661/33289a3ecd5edd.jpg with unmaching cookie seen: 1053646557 expected: 1674074305 from 1
    27.0.0.1:45390 agent Mozilla/5.0 (Linux; Android 6.0.1; OPPO R9s Plus Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/48.0.256
    4.116 Mobile Safari/537.36 T7/10.3 baiduboxapp/10.3.6.13 (Baidu; P1 6.0.1)
    I0223 06:48:12 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:12 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:22 12067 volume_server_handlers_read.go:75] request /661/33289a3ecd5edd.jpg with unmaching cookie seen: 1053646557 expected: 1674074305 from 1
    27.0.0.1:45390 agent Mozilla/5.0 (Linux; Android 6.0.1; OPPO R9s Plus Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/48.0.256
    4.116 Mobile Safari/537.36 T7/10.3 baiduboxapp/10.3.6.13 (Baidu; P1 6.0.1)
    I0223 06:48:22 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:22 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:32 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:32 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:41 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:48:41 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:45390 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:49:16 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:49:16 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:40632 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:49:16 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:45652 agent Dalvik/2.1.0 (Linux; U; Android 8.0.0; MHA-AL00 Build/HUAWEIMHA-AL00)
    I0223 06:49:23 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:49:23 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:40642 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:49:23 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:45652 agent Dalvik/2.1.0 (Linux; U; Android 8.0.0; MHA-AL00 Build/HUAWEIMHA-AL00)
    I0223 06:49:36 12067 volume_server_handlers_read.go:75] request /659/334073b80964a9.jpg with unmaching cookie seen: 3087623337 expected: 387060166 from 12
    7.0.0.1:45706 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:49:36 12067 volume_server_handlers_read.go:75] request /660/31c4f2ae71121f.jpg with unmaching cookie seen: 2926645791 expected: 269826002 from 12
    7.0.0.1:45706 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-Hans-CN; SM-G9500 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.
    2987.108 Quark/2.4.1.985 Mobile Safari/537.36
    I0223 06:49:43 12067 volume_server_handlers_read.go:75] request /659/2af2e996acd9ce.jpg with unmaching cookie seen: 2527910350 expected: 810789430 from 12
    7.0.0.1:45706 agent Mozilla/5.0 (Linux; U; Android 7.0; zh-cn; MI 5s Plus Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.278
    5.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/9.4.11
    I0223 06:49:51 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:49:51 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:40772 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:49:51 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:45706 agent Dalvik/2.1.0 (Linux; U; Android 8.0.0; MHA-AL00 Build/HUAWEIMHA-AL00)
    I0223 06:49:52 12067 volume_server_handlers_read.go:69] read error: Not Found /661,33289a3ecd5edd_1
    I0223 06:49:52 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 192.1
    68.254.8:40776 agent Apache-HttpClient/4.5.3 (Java/1.8.0_131)
    I0223 06:49:52 12067 volume_server_handlers_read.go:75] request /661,33289a3ecd5edd with unmaching cookie seen: 1053646557 expected: 1674074305 from 127.0
    .0.1:45706 agent Dalvik/2.1.0 (Linux; U; Android 8.0.0; MHA-AL00 Build/HUAWEIMHA-AL00)
    I0223 06:50:16 12067 volume_server_handlers_read.go:75] request /655/2f47f8dc2842e1.jpg with unmaching cookie seen: 3693626081 expected: 590474922 from 12
    7.0.0.1:45834 agent Dalvik/2.1.0 (Linux; U; Android 7.1.2; M6 Note Build/N2G47H)
    I0223 06:50:21 12067 volume_server_handlers_read.go:69] read error: Not Found /655,29509f2e850bb0_1
    I0223 06:50:21 12067 volume_server_handlers_read.go:75] request /655,29509f2e850bb0 with unmaching cookie seen: 780471216 expected: 773671554 from 192.168
    [[email protected] seaweedfs]# 
    
    
  • Possible performance problems on some platforms

    Possible performance problems on some platforms

    As i said, we develop .NET client for seaweedFS but we see a pretty low performance on our testing deployment - so we did some tests.

    We tried some testing environments, but mostly the test results are almost same. We tried launching everything on localhost with:

    .\weed.exe master -mdir="D:\SeaweedFs\Master" .\weed.exe volume -mserver="localhost:9333" -dir="D:\SeaweedFs\VData2" -port=9101 .\weed.exe volume -mserver="localhost:9333" -dir="D:\SeaweedFs\VData2" -port=9102 .\weed.exe volume -mserver="localhost:9333" -dir="D:\SeaweedFs\VData2" -port=9103

    Also we, tried simmilar setup on 3 Windows computers (one master and volume, two just volumes) Also we, tried simmilar setup on 7 Linux computers (one master and volume, two just volumes)

    and we did tried benchmarking it:

    The results are

    Concurrency Level: 16 Time taken for tests: 486.478 seconds Complete requests: 1048576 Failed requests: 0 Total transferred: 1106793636 bytes Requests per second: 2155.44 [#/sec] Transfer rate: 2221.79 [Kbytes/sec]

    Connection Times (ms) min avg max std Total: 1.4 7.4 94.5 2.7

    Percentage of the requests served within a certain time (ms) 50% 6.9 ms 66% 8.0 ms 75% 8.5 ms 80% 9.0 ms 90% 11.0 ms 95% 12.5 ms 98% 14.5 ms 99% 16.0 ms 100% 94.5 ms

    And the readtest:

    Concurrency Level: 16 Time taken for tests: 237.319 seconds Complete requests: 1048576 Failed requests: 0 Total transferred: 1106781421 bytes Requests per second: 4418.42 [#/sec] Transfer rate: 4554.38 [Kbytes/sec]

    We do have major problem on that 7 linux machines setup since the write speed was ~320 items/sec and reading was almost ~700 items/sec. Its 2core machines, but we did some monitoring and... CPU is idling, HDD is idling, RAM is idling, Networking is idling... there is no reason for that slow performance... but yet it is that slow. This does not corresopnd with benchmark presented here on a single computer (not even close), so we must be doing something wrong.

    We tried plenty of parameters of -c and -size in benchmark also we did some experiments with -max in volumes but the results are almost same.

    We also notice, that pretty much every volume except one is doing totally nothing during benchmark, which is, if I understand it correctly, the bad thing because the volume server should be targeted by random. We are using latest 0.70 version. We can reproduce this results almost everytime. Is there some kind of utility to see whats blocking SeaweedFs going more... faster?

  • Slow performance in replication mode 010 while executing volume.fix.replication

    Slow performance in replication mode 010 while executing volume.fix.replication

    Hello We have a cluster in replication mode 010 (before that there was one rack with 000, which we decided to expand). At the moment, there are about 150,000 volumes, with a limit of 1000 megabytes. Now we are going through the fix.replication stage to replicate the data from the first rack to the second. We have huge problems with the speed of data access, especially during replication. Also, with a large data flow on the second rack, errors occur: requests.exceptions.ReadTimeout: HTTPConnectionPool(host='node2.example.com', port=11009): Read timed out Our cluster: 3 master nodes 2 racks of 18 raid arrays (12 terabytes each), each 70% full on the first rack The wizards are launched with the command: weed master -defaultReplication="010" -port="9333" -peers=node1.example.com:9333,node2.example.com:9333,node3.example.com:9333 -volumeSizeLimitMB="1024" -ip="node2.example.com" -metrics.address="node2.example.com:9091" Volumes are launched with the command: weed volume -dataCenter="netrack" -rack="node2" -port="11001" -port.public="11001" -mserver="node1.example.com:9333,node2.example.com:9333,node3.example.com:9333" -ip="node2.example.com" -max=10000 -dir=/mnt/disk1/swfs

    PS: At the moment, about 40% are replicated and it is impossible to work further

  • Lots of /5,1001e1b02c1b01" errors in our production server log files">

    Lots of "volume_server_handlers.go:75] read error: /5,1001e1b02c1b01" errors in our production server log files

    Here is the

    Deploy structure

    10.252.130.159:9333(master)         10.252.130.159:5088(haproxy)
                                       /                                      \ 
                        10.252.133.22:5083 (volume1)               10.252.135.207:5084(volume2)  
    
    Replication Stratage: 001
    

    So, it always gets the content via haproxy from the volume server.

    But after running few days, we get a 404 error sometimes from the 10.252.130.159:5088 for a special fid such as :5,1001e1b02c1b01.

    And we did a check and found that one of volume server(different fid on the random different server) always output the following logs:

    I0303 15:44:23 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:45:14 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:52:52 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:52:52 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:53:05 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:53:06 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:53:07 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:53:07 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    I0303 15:53:18 21828 volume_server_handlers.go:75] read error: <nil> /5,1001e1b02c1b01
    

    It means that the file for [fid:5,1001e1b02c1b01] has be damaged?

    And how to fix this?

    How could be happened? any suggestions?

  • Hot, warm, cold storage [feature]

    Hot, warm, cold storage [feature]

    I'm currently running a setup with 3 dedicated masters/filers and 4 volume servers 10 TB HDD each. Our app is writing data to daily buckets. When workload is kind of write-only, everything performs quite well. But if random reading kicks in, performance seem to suffer a lot. I'm currently investigating which side is suffering, our app or the storage. But I want to clear this out for myself: is it possible to organize hot, warm and cold tiers inside one cluster?

    I mean, create new buckets on hot storage, for example NVMe SSD based volume servers, and later move it by a call to not so often accessed HDD volume servers (warm). I've read about cloud tier uploads, so it would work for cold phase I guess. But what about hot to warm transition?

    Or maybe I'm missing something and I have just misconfigured something so I can really speed up my cluster without any extra abstractions?

  • brain split happened when network interrupts between dc

    brain split happened when network interrupts between dc

    weed vesion:1.15

    1. weed master: deployed in 3 DC , 3 nodes in dc1 + 2 in dc2 +in 2 dc3 ,
    2. volumer server : 3 nodes in dc1 + 3 in dc2 + 3 in dc3
    3. when the network between dc3 and dc2 interrupt , dc3 and dc1 also interrupt ,but network between dc1 and dc2 is ok , so dc3 is a network isolated island 。
    4. the original leader in dc1 , but when network issue happened , the second leader selected in dc3 and dc3's volumer server connect to the dc3 new leader. Forming two clusters , dc1 and dc2 is a cluster ,dc3 is a cluster
    5. when the network issue between dc3 and dc1、dc2 solved , also two leader in whole cluster util restart the dc3's leader.
  • [bug:filer] Continues to stick not to the leader raft.Server: Not current leader (critical)

    [bug:filer] Continues to stick not to the leader raft.Server: Not current leader (critical)

    Describe the bug after shutdown one volume server only one filler lost leader

    Jan 13, 2022 @ 09:10:35.681 | I0113 04:10:35     1 common.go:69] response method:PUT URL:/buckets/reports/report_631214642010_e674c402-c87e-4443-b942-d39d6417225c.pdf with httpStatus:500 and JSON:{"error":"rpc error: code = Unknown desc = raft.Server: Not current leader"}
    -- | --
    
      | Jan 13, 2022 @ 09:10:35.681 | E0113 04:10:35     1 s3api_object_handlers.go:421] upload to filer error: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 09:10:35.681 | E0113 04:10:34     1 filer_server_handlers_write.go:43] failing to assign a file id: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 09:10:35.681 | E0113 04:10:35     1 filer_server_handlers_write.go:43] failing to assign a file id: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 09:10:35.681 | E0113 04:10:35     1 filer_server_handlers_write_upload.go:172] upload error: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 09:10:34.680 | E0113 04:10:34     1 filer_server_handlers_write_upload.go:172] upload error: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 09:10:34.680 | I0113 04:10:34     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    

    System Setup weed version

    version 30GB 2.85 ea8e4ec2 linux amd64
    

    Additional context

    logs

    
    Jan 13, 2022 @ 05:02:29.653 | E0113 00:01:24     1 filer_grpc_server_sub_meta.go:133] processed to 2022-01-13 00:01:23.092106505 +0000 UTC: rpc error: code = Unavailable desc = transport is closing
    -- | --
    
      | Jan 13, 2022 @ 05:02:29.653 | I0113 00:01:23     1 filer_grpc_server_sub_meta.go:226] => client filer:10.106.65.20:[email protected]:47818: rpc error: code = Unavailable desc = transport is closing
    
      | Jan 13, 2022 @ 05:02:29.653 | I0113 00:02:18     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:29.653 | I0113 00:01:23     1 filer_grpc_server_sub_meta.go:226] => client filer:10.106.65.121:[email protected]:53630: rpc error: code = Unavailable desc = transport is closing
    
      | Jan 13, 2022 @ 05:02:29.653 | E0113 00:01:24     1 filer_grpc_server_sub_meta.go:133] processed to 2022-01-13 00:01:23.092106505 +0000 UTC: rpc error: code = Unavailable desc = transport is closing
    
      | Jan 13, 2022 @ 05:02:29.653 | I0113 00:01:24     1 filer_grpc_server_sub_meta.go:255] - listener filer:10.106.65.121:[email protected]:53630
    
      | Jan 13, 2022 @ 05:02:29.653 | I0113 00:01:24     1 filer_grpc_server_sub_meta.go:255] - listener filer:10.106.65.20:[email protected]:47818
    
      | Jan 13, 2022 @ 05:02:30.493 | I0113 00:02:19     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.493 | I0113 00:02:19     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.494 | I0113 00:02:22     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.494 | I0113 00:02:21     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.494 | I0113 00:02:23     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.494 | I0113 00:02:20     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.494 | I0113 00:02:22     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
      | Jan 13, 2022 @ 05:02:30.494 | I0113 00:02:24     1 filer_notify.go:103] log write failed /topics/.system/log/2022-01-13/00-01.cec7f54d: AssignVolume: rpc error: code = Unknown desc = raft.Server: Not current leader
    
    
  • Filer hangs or restarts on deleting large buckets

    Filer hangs or restarts on deleting large buckets

    Describe the bug I'm running single master, single filer setup with s3 gateway. Filer's store is leveldb2.

    I store lots of small files (up to 1MB) in separate per day buckets. Files are stored in nested directories (not a one dir for all). It works pretty well, unless I'm trying to drop bucket, both through API or bucket.delete. Filer might just hang with other components spitting rpc error: code = Unavailable desc = transport is closing or simply do a restart. Deleting collections is breezily fast though. Am I missing something so I can painlessly delete a whole bucket in one operation? Or I should move to another filer store?

    System Setup

    • Debian 10
    • version 30GB 2.16 6912bf9 linux amd64
  • [Emergency] Cluster failed after upgrading weedfs to version 2.62

    [Emergency] Cluster failed after upgrading weedfs to version 2.62

    Hi, I upgrade masters and volume servers from v 2.40 8000G to 2.62 8000G but cluster doesn't work Log on masters:

    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 1180 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 308 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 1182 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 309 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 1179 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 1181 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 307 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 1177 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 220 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 312 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 310 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 311 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 219 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 222 becomes crowded
    Aug 10 15:06:27 weed-master-1 seaweedfs-master[1145]: I0810 15:06:27  1145 volume_layout.go:425] Volume 1178 becomes crowded
    Aug 10 15:06:33 weed-master-1 snapd[1002]: stateengine.go:150: state ensure error: Get https://api.snapcraft.io/api/v1/snaps/sections: dial tcp: lookup api.snapcraft.io on 172.30.100.3:53: server misbehaving
    

    Other master:

    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=370 with httpStatus:404 and JSON:{"volumeId":"370","error":"volume id 370 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=1067 with httpStatus:404 and JSON:{"volumeId":"1067","error":"volume id 1067 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=1234 with httpStatus:404 and JSON:{"volumeId":"1234","error":"volume id 1234 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=1236 with httpStatus:404 and JSON:{"volumeId":"1236","error":"volume id 1236 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=609 with httpStatus:404 and JSON:{"volumeId":"609","error":"volume id 609 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=24 with httpStatus:404 and JSON:{"volumeId":"24","error":"volume id 24 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=83 with httpStatus:404 and JSON:{"volumeId":"83","error":"volume id 83 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=320 with httpStatus:404 and JSON:{"volumeId":"320","error":"volume id 320 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=1356 with httpStatus:404 and JSON:{"volumeId":"1356","error":"volume id 1356 not found"}
    Aug 10 15:06:20 weed-master-2 seaweedfs-master[1154]: I0810 15:06:20  1154 common.go:50] response method:GET URL:/dir/lookup?volumeId=1352 with httpStatus:404 and JSON:{"volumeId":"1352","error":"volume id 1352 not found"}
    
    

    Master03:

    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 volume_layout.go:376] Volume 210 has 0 replica, less than required 2
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 topology_event_handling.go:79] Removing Volume 387 from the dead volume server weed-volume-012:8086
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 volume_layout.go:376] Volume 387 has 0 replica, less than required 2
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 topology_event_handling.go:79] Removing Volume 537 from the dead volume server weed-volume-012:8086
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 volume_layout.go:376] Volume 537 has 0 replica, less than required 2
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 topology_event_handling.go:79] Removing Volume 1362 from the dead volume server weed-volume-012:8086
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 volume_layout.go:376] Volume 1362 has 0 replica, less than required 2
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 topology_event_handling.go:79] Removing Volume 759 from the dead volume server weed-volume-012:8086
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 volume_layout.go:376] Volume 759 has 1 replica, less than required 2
    Aug 10 15:06:18 weed-master-3 seaweedfs-master[1149]: I0810 15:06:18  1149 topology_event_handling.go:79] Removing Volume 801 from the dead volume server weed-volume-012:8086
    
    
  • 迁移问题

    迁移问题

    有2台服务器进行迁移,迁移不成功需要指导,最终需要实现的目的就是把mysql里的数据迁移到redis里面。 参照https://github.com/chrislusf/seaweedfs/wiki/Async-Replication-to-another-Filer#replicate-existing-files

    配置如下: 主机 数据库 192.168.20.51 mysql 192.168.20.55 redis

    192.168.20.51的配置如下 启了一个master,三个volume,一个filer: nohup /home/seaweedfs/weed master -mdir=/home/seaweedfs/data/master -port=9333 -ip 192.168.20.51 -defaultReplication=010 & /home/seaweedfs/weed volume -dir=/home/seaweedfs/volume/volume1 -max=100 -mserver=192.168.20.51:9333 -port=9001 -ip=192.168.20.51 -dataCenter=dc1 -rack=rack1 & >> /home/seaweedfs/logs/vol1.log & /home/seaweedfs/weed volume -dir=/home/seaweedfs/volume/volume2 -max=100 -mserver=192.168.20.51:9333 -port=9002 -ip=192.168.20.51 -dataCenter=dc1 -rack=rack2 & >> /home/seaweedfs/logs/vol2.log & /home/seaweedfs/weed volume -dir=/home/seaweedfs/volume/volume3 -max=100 -mserver=192.168.20.51:9333 -port=9003 -ip=192.168.20.51 -dataCenter=dc1 -rack=rack3 & >> /home/seaweedfs/logs/vol3.log & /home/seaweedfs/weed filer -master=192.168.20.51:9333 -ip=192.168.20.51 & >> /home/seaweedfs/logs/filer.log &

    cd /etc/seaweedfs,配置如下 cat filer.toml [mysql] enabled = true hostname = "192.168.20.51" port = 3306 username = "root" password = "123456" database = "filer" # create or use an existing database connection_max_idle = 2 connection_max_open = 100

    cat notification.toml [notification.kafka] enabled = true hosts = [ "localhost:9092" ] topic = "seaweedfs_filer"

    cat replication.toml [source.filer] enabled = true grpcAddress = "192.168.20.51:18888" directory = "/"

    [sink.filer] enabled = true grpcAddress = "192.168.20.55:18888" directory = "/" replication = "" collection = "" ttlSec = 0

    192.168.20.55的配置如下 启了一个master,三个volume,一个filer: nohup /home/seaweedfs/weed master -mdir=/home/seaweedfs/data/master -port=9333 -ip 192.168.20.55 -defaultReplication=010 & /home/seaweedfs/weed volume -dir=/home/seaweedfs/volume/volume1 -max=100 -mserver=192.168.20.55:9333 -port=9001 -ip=192.168.20.55 -dataCenter=dc1 -rack=rack1 & >> /home/seaweedfs/logs/vol1.log & /home/seaweedfs/weed volume -dir=/home/seaweedfs/volume/volume2 -max=100 -mserver=192.168.20.55:9333 -port=9002 -ip=192.168.20.55 -dataCenter=dc1 -rack=rack2 & >> /home/seaweedfs/logs/vol2.log & /home/seaweedfs/weed volume -dir=/home/seaweedfs/volume/volume3 -max=100 -mserver=192.168.20.55:9333 -port=9003 -ip=192.168.20.55 -dataCenter=dc1 -rack=rack3 & >> /home/seaweedfs/logs/vol3.log & /home/seaweedfs/weed filer -master=192.168.20.55:9333 -ip=192.168.20.55 & >> /home/seaweedfs/logs/filer.log &

    cd /etc/seaweedfs cat filer.toml [redis] enabled = true address = "localhost:6379" password = "" db = 0

    然后在192.168.20.51上 1).先启动了kafka 2).启动weed filer.replicate 3).启动weed filer 都成功,最后执行了此命令echo 'fs.meta.notify' | weed shell后,都是如下错误

    image

    最后的表现为: image image

    目录都过去了,为什么目录下面的数据却复制不过去,而且voulme里面的数据也没有迁移过去?

    我重新上传数据,两边却很快就同步了 image image

    总结:两边配置以后,新上传的数据,可以同步,然后之前的老数据却同步不过去,求指导,为啥?

  • Webdav server stops responding after saving some files

    Webdav server stops responding after saving some files

    WebDav server stop responding after file with some extension was uploaded.

    My configuration: I have 3 nodes in cluster. Every node runs 2 docker-container:

    • official cockroachdb container for filer metadata (v.20.1.0)
    • my custom seaweedfs container with:
    • weed master -ip=b-img02 -mdir=/opt/seaweedfs/data/master -peers=b-img00:9333,b-img01:9333,b-img02:9333 -defaultReplication=200
    • weed volume -ip=b-img02 -mserver=b-img00:9333,b-img01:9333,b-img02:9333 -dataCenter=b-img02 -rack=rack -dir=/opt/seaweedfs/data/
    • weed filer -collection=filer -defaultReplicaPlacement=200
    • weed webdav Weed version: 30GB 1.79

    Screenshot_20200603_140300

    All works fine until I trying to upload file in filer.

    In examples I create test file, upload it into filer and download from webdav.

    Example#1 - all works fine

    echo ok > data.html
    
    curl -F [email protected] http://b-img00:8888/
    >> {"name":"data.html","size":3,"fid":"7,08ba03a642ff84","url":"http://b-img02:8080/7,08ba03a642ff84"}
    
    curl -vvv http://b-img00:7333/data.html
    *   Trying x.x.x.x...
    * TCP_NODELAY set
    * Connected to b-img00 (x.x.x.x) port 7333 (#0)
    > GET /data.html HTTP/1.1
    > Host: b-img00:7333
    > User-Agent: curl/7.58.0
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Accept-Ranges: bytes
    < Content-Length: 3
    < Content-Type: text/html; charset=utf-8
    < Etag: "161503d94245de003"
    < Last-Modified: Wed, 03 Jun 2020 11:04:35 GMT
    < Date: Wed, 03 Jun 2020 11:04:57 GMT
    <
    ok
    * Connection #0 to host b-img00 left intact
    
    curl -X DELETE -vvv http://b-img00:7333/data.html
    *   Trying x.x.x.x...
    * TCP_NODELAY set
    * Connected to b-img00 (x.x.x.x) port 7333 (#0)
    > DELETE /data.html HTTP/1.1
    > Host: b-img00:7333
    > User-Agent: curl/7.58.0
    > Accept: */*
    >
    < HTTP/1.1 204 No Content
    < Date: Wed, 03 Jun 2020 11:08:15 GMT
    <
    * Connection #0 to host b-img00 left intact
    

    Ok. Example#2 - I change file extention from .html to .txt After file uploaded webdav stops to serv folder with this file (it can be downloaded via filer successfully). Another folders in webdav works fine.

    echo ok > data.txt
    
    curl -F [email protected] http://b-img00:8888/
    
    curl -vvv http://b-img00:7333/data.txt
    *   Trying xx.xx.xx.xx...
    * TCP_NODELAY set
    * Connected to b-img00 (xx.xx.xx.xx) port 7333 (#0)
    > GET /data.txt HTTP/1.1
    > Host: b-img00:7333
    > User-Agent: curl/7.58.0
    > Accept: */*
    >
    (( I await some time - but no response received ))
    ^C
    
    curl -vvv http://b-img00:8888/data.txt
    *   Trying xx.xx.xx.xx...
    * TCP_NODELAY set
    * Connected to b-img00 (xx.xx.xx.xx) port 8888 (#0)
    > GET /data.txt HTTP/1.1
    > Host: b-img00:8888
    > User-Agent: curl/7.58.0
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Accept-Ranges: bytes
    < Content-Disposition: inline; filename="data.txt"
    < Content-Length: 3
    < Content-Type: text/plain
    < Etag: "908e596f"
    < Last-Modified: Wed, 03 Jun 2020 11:09:35 GMT
    < Date: Wed, 03 Jun 2020 11:10:06 GMT
    <
    ok
    * Connection #0 to host b-img00 left intact
    
    curl -X DELETE http://b-img00:8888/data.txt
    (ok)
    
  • [volume.balance] panic: runtime error: index out of range [-1]

    [volume.balance] panic: runtime error: index out of range [-1]

    > volume.balance
    Running in simulation mode. Use "-force" option to apply the changes.
    > volume.balance -dataCenter lol
    Running in simulation mode. Use "-force" option to apply the changes.
    panic: runtime error: index out of range [-1]
    
    goroutine 1 [running]:
    github.com/chrislusf/seaweedfs/weed/shell.balanceSelectedVolume(0xc000700000?, {0x0, 0x0}, 0x0?, {0x0, 0x0, 0x800?}, 0xc0005bd990, 0x26f7708, 0x0)
    	/Users/tochka/GolandProjects/seaweedfs/weed/shell/command_volume_balance.go:262 +0x76c
    github.com/chrislusf/seaweedfs/weed/shell.balanceVolumeServersByDiskType(0x0?, {0x0, 0x0}, 0x13?, {0x0?, 0x0, 0x0}, 0xc000101c00?, {0xc000bc6c60, 0x8}, ...)
    	/Users/tochka/GolandProjects/seaweedfs/weed/shell/command_volume_balance.go:135 +0x16c
    github.com/chrislusf/seaweedfs/weed/shell.balanceVolumeServers(0x42?, {0xc000f7b190?, 0x1, 0x1?}, 0x40d625?, {0x0, 0x0, 0x0}, 0x38?, {0xc000bc6c60, ...}, ...)
    	/Users/tochka/GolandProjects/seaweedfs/weed/shell/command_volume_balance.go:115 +0xfc
    github.com/chrislusf/seaweedfs/weed/shell.(*commandVolumeBalance).Do(0xc00031b7eb?, {0xc000789540, 0x2, 0x2}, 0xc0005aa120?, {0x2bca780, 0xc00000e018})
    	/Users/tochka/GolandProjects/seaweedfs/weed/shell/command_volume_balance.go:95 +0x672
    github.com/chrislusf/seaweedfs/weed/shell.processEachCmd(0xc0005b60a0?, {0xc00031b7d0, 0x1f}, 0xc0005aa120?)
    	/Users/tochka/GolandProjects/seaweedfs/weed/shell/shell_liner.go:135 +0x3a7
    github.com/chrislusf/seaweedfs/weed/shell.RunShell({0xc0006b7f30, {0x2bc8b00, 0xc00011b758}, {0x0, 0x0}, 0x0, 0xc0006b7f40, {0xc0000546a8, 0x14}, {0x2bb6a64, ...}})
    	/Users/tochka/GolandProjects/seaweedfs/weed/shell/shell_liner.go:104 +0x548
    github.com/chrislusf/seaweedfs/weed/command.runShell(0x3e17b38?, {0xc00004c050?, 0x0?, 0x0?})
    	/Users/tochka/GolandProjects/seaweedfs/weed/command/shell.go:60 +0x2f8
    main.main()
    	/Users/tochka/GolandProjects/seaweedfs/weed/weed.go:81 +0x383
    
  • [volumeServer.evacuate] evacuate with balance

    [volumeServer.evacuate] evacuate with balance

    evacuate moves all volumes to one not the most empty server

    balance

    error: tail volume 560 from fast-volume-1.s3-fast-volume:8080 to fast-volume-7.s3-fast-volume:8080: rpc error: code = Unknown desc = streamFollow: fail to locate by appendAtNs 1656413146000000000: read entry 56: EOF
    
  • [filer]  dedup subscribers doesn't work

    [filer] dedup subscribers doesn't work

    https://github.com/chrislusf/seaweedfs/blob/b9f7b6fb9a486a9509db7ca800c9f876911413ad/weed/server/filer_grpc_server_sub_meta.go#L260-L269

    The alreadyKnown returned by the function addClient is always false. Nothing was put into fs.knownListeners.

    The following code snippet does not work: https://github.com/chrislusf/seaweedfs/blob/b9f7b6fb9a486a9509db7ca800c9f876911413ad/weed/server/filer_grpc_server_sub_meta.go#L25-L30

    https://github.com/chrislusf/seaweedfs/blob/b9f7b6fb9a486a9509db7ca800c9f876911413ad/weed/server/filer_grpc_server_sub_meta.go#L93-L96

  • Replication volumes distribute on a same machine

    Replication volumes distribute on a same machine

    Improvement Seems like we can use the same distribute logic in Erasure-Coding https://github.com/chrislusf/seaweedfs/wiki/Erasure-Coding-for-warm-storage#architecture, to achieve data security.

    @chrislusf , If you agree with it, I'm glad to achieve that.

    Screenshots weed

  • feat(filer.sync): add parallel synchronization function

    feat(filer.sync): add parallel synchronization function

    What problem are we solving?

    One filerSink synchronization is too slow. I intend to synchronize with multiple filerSink.
    This increases the processing speed.

    How are we solving the problem?

    (1) Split data and execution order. Split by pathname. I'm going to use a tree structure, and extract the affected event path. The affected set is placed separately in a worker thread. This ensures the correct send order. The top level of operation will put the children into a collection, and the speed increase is relatively small.

    (2) Multiple sending threads Create multiple FilerSink during initialization to assign to worker thread. This is the report that I created multiple 1kb files in my unit test.

    Add 100 folder and Each folder contains 100 1kb files. As it is a stand-alone machine, it has a certain impact.
    sync cost: 190s
    async cost: 165s [parallel number: 10, batch: 500, period:15s]
    async cost: 140s [parallel number: 10, batch: 1000, period:10s]
    async cost: 140s [parallel number: 20, batch: 1000, period:20s]
    

    (3) Related parameters

    • a.parallelNum, b.parallelNum: Number of synchronization workers. The default value is 1, using the original logic.
    • a.parallelBatchSize, b.parallelBatchSize: Number of events handled at a time. This parameter takes effect only if x is greater than 1.
    • parallelMaxInterval:
      Maximum processing interval.

    (4) Fix bug: nanosecond timestamp display the value of lastLogTsNs should be used time.Now().UnixNano() instead of time.Now().Nanosecond

    Checks

    • [x] I have added unit tests if possible.
    • [ ] I will add related wiki document changes and link to this PR after merging.
  • Volume server doesn't handle errors for GenerateDirUuid method.

    Volume server doesn't handle errors for GenerateDirUuid method.

    Describe the bug If volume server doesn't have permissions for the data directory, it will connect to master with an empty volume uuid rather than crashing with a helpful error message.

    Not a blocking issue, but would improve the onboarding/config experience to get better feedback instead of confusing errors elsewhere.

    System Setup

    mkdir ./restricted_folder
    chmod 600 ./restricted_folder
    chown root:root ./restricted_folder
    
    # as an unprivilaged user:
    weed master
    weed volume -mserver=127.0.0.1:9333 -dir=./restricted_folder
    
    • OS version: Ubuntu server 22.04
    • output of weed version: version 30GB 3.11 d4ef06cdcf320f8b8b17279586e0738894869eff linux amd64
    • if using filer, show the content of filer.toml: N/A

    Expected behavior Volume server crash with a helpful message (e.g. data directory not accessible)

    Screenshots Volume server connects to master despite detecting permission issues:

    Jun 27 07:03:32 seaweedtest systemd[1]: Starting Seaweed Volume...
    Jun 27 07:03:32 seaweedtest systemd[1]: Started Seaweed Volume.
    Jun 27 07:03:32 seaweedtest weed[45598]: I0627 07:03:32 45598 file_util.go:23] Folder /home/seaweed/cdn Permission: -rw-------
    Jun 27 07:03:32 seaweedtest weed[45598]: W0627 07:03:32 45598 disk_location.go:53] failed to read uuid from /home/seaweed/cdn/vol_dir.uuid : open /home/seaweed/cdn/vol_dir.uuid: permission denied
    Jun 27 07:03:32 seaweedtest weed[45598]: I0627 07:03:32 45598 disk_location.go:209] Store started on dir: /home/seaweed/cdn with 0 volumes max 20
    Jun 27 07:03:32 seaweedtest weed[45598]: I0627 07:03:32 45598 disk_location.go:212] Store started on dir: /home/seaweed/cdn with 0 ec shards
    Jun 27 07:03:32 seaweedtest weed[45598]: I0627 07:03:32 45598 volume.go:364] Start Seaweed volume server 30GB 3.11 d4ef06cdcf320f8b8b17279586e0738894869eff at <IP_REMOVED>:8080
    Jun 27 07:03:32 seaweedtest weed[45598]: I0627 07:03:32 45598 volume_grpc_client_to_master.go:52] Volume server start with seed master nodes: [<IP_REMOVED>:9333]
    Jun 27 07:03:32 seaweedtest weed[45598]: I0627 07:03:32 45598 volume_grpc_client_to_master.go:109] Heartbeat to: <IP_REMOVED>:9333
    

    Master server showing volume recognized, with empty UUID (without any errors):

    I0627 06:51:44     1 node.go:223] topo adds child O
    I0627 06:51:44     1 node.go:223] topo:O adds child 1
    I0627 06:51:44     1 node.go:223] topo:O:1 adds child <IP_REMOVED>:8080
    I0627 06:51:44     1 node.go:223] topo:O:1:<IP_REMOVED>:8080 adds child
    I0627 06:51:44     1 master_grpc_server.go:112] added volume server 0: <IP_REMOVED>:8080 []
    I0627 06:51:44     1 master_grpc_server.go:48] found new uuid:<IP_REMOVED>:8080 [] , map[<IP_REMOVED>:8080:[]]
    

    If a second volume server with similar issue connects:

    Jun 27 07:12:39 test-ansible weed[59529]: I0627 07:12:39 59529 volume_grpc_client_to_master.go:52] Volume server start with seed master nodes: [<IP_REMOVED>:9333]
    Jun 27 07:12:39 test-ansible weed[59529]: I0627 07:12:39 59529 volume_grpc_client_to_master.go:109] Heartbeat to: <IP_REMOVED>:9333
    Jun 27 07:12:39 test-ansible weed[59529]: E0627 07:12:39 59529 volume_grpc_client_to_master.go:130] Shut down Volume Server due to duplicate volume directories: [/home/seaweed/cdn]
    

    Additional context

    On disk_location.go:68 the error of GenerateDirUuid is not checked.

    On master_grpc_server.go:35 the UUID sent from the client is not validated to not be empty, or better yet be a UUID (i.e. 36 characters hex with dashes)

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jun 28, 2022
Golang implementation of distributed mutex on Azure lease blobs

Distributed Mutex on Azure Lease Blobs This package implements distributed lock available for multiple processes. Possible use-cases include exclusive

Jan 7, 2022
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is a high-performance POSIX file system released under GNU Affero General Public License v3.0. It is specially optimized for the cloud-native

Jul 4, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Jun 30, 2022
Distributed disk storage database based on Raft and Redis protocol.
Distributed disk storage database based on Raft and Redis protocol.

IceFireDB Distributed disk storage system based on Raft and RESP protocol. High performance Distributed consistency Reliable LSM disk storage Cold and

Jun 27, 2022
Lockgate is a cross-platform locking library for Go with distributed locks using Kubernetes or lockgate HTTP lock server as well as the OS file locks support.

Lockgate Lockgate is a locking library for Go. Classical interface: 2 types of locks: shared and exclusive; 2 modes of locking: blocking and non-block

Jun 16, 2022
Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

Jan 4, 2022
BlobStore is a highly reliable,highly available and ultra-large scale distributed storage system

BlobStore Overview Documents Build BlobStore Deploy BlobStore Manage BlobStore License Overview BlobStore is a highly reliable,highly available and ul

Jun 30, 2022
A distributed MySQL binlog storage system built on Raft
A distributed MySQL binlog storage system built on Raft

What is kingbus? 中文 Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Jun 16, 2022
A distributed key-value storage system developed by Alibaba Group

Product Overview Tair is fast-access memory (MDB)/persistent (LDB) storage service. Using a high-performance and high-availability distributed cluster

Jun 20, 2022
gathering distributed key-value datastores to become a cluster

go-ds-cluster gathering distributed key-value datastores to become a cluster About The Project This project is going to implement go-datastore in a fo

May 31, 2022
This is a comprehensive system that simulate multiple servers’ consensus behavior at local machine using multi-process deployment.

Raft simulator with Golang This project is a simulator for the Raft consensus protocol. It uses HTTP for inter-server communication, and a job schedul

Jan 30, 2022
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Dec 24, 2019
An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣
An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

barge A simple implementation of a consistent, distributed Key:Value store which uses the Raft Concensus Algorithm. This project launches a cluster of

Nov 24, 2021
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.

Jun 28, 2022
A distributed locking library built on top of Cloud Spanner and TrueTime.

A distributed locking library built on top of Cloud Spanner and TrueTime.

Jun 8, 2022
CockroachDB - the open source, cloud-native distributed SQL database.
CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications. What is CockroachDB?

Jun 26, 2022
💡 A Distributed and High-Performance Monitoring System. The next generation of Open-Falcon
💡 A Distributed and High-Performance Monitoring System.  The next generation of Open-Falcon

夜莺简介 夜莺是一套分布式高可用的运维监控系统,最大的特点是混合云支持,既可以支持传统物理机虚拟机的场景,也可以支持K8S容器的场景。同时,夜莺也不只是监控,还有一部分CMDB的能力、自动化运维的能力,很多公司都基于夜莺开发自己公司的运维平台。开源的这部分功能模块也是商业版本的一部分,所以可靠性有保

Jun 30, 2022