JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS Logo

Build Status Join Slack Go Report 中文手册

JuiceFS is a high-performance POSIX file system released under GNU Affero General Public License v3.0. It is specially optimized for the cloud-native environment. Using the JuiceFS to store data, the data itself will be persisted in object storage (e.g. Amazon S3), and the metadata corresponding to the data can be persisted in various database engines such as Redis, MySQL, and SQLite according to the needs of the scene.

JuiceFS can simply and conveniently connect massive cloud storage directly to big data, machine learning, artificial intelligence, and various application platforms that have been put into production environment, without modifying the code, you can use massive cloud storage as efficiently as using local storage.

📺 Video: What is JuiceFS?

Highlighted Features

  1. Fully POSIX-compatible: Use like a local file system, seamlessly docking with existing applications, no business intrusion.
  2. Fully Hadoop-compatible: JuiceFS Hadoop Java SDK is compatible with Hadoop 2.x and Hadoop 3.x. As well as variety of components in Hadoop ecosystem.
  3. S3-compatible: JuiceFS S3 Gateway provides S3-compatible interface.
  4. Cloud Native: JuiceFS provides Kubernetes CSI driver to help people who want to use JuiceFS in Kubernetes.
  5. Sharing: JuiceFS is a shared file storage that can be read and written by thousands clients.
  6. Strong Consistency: The confirmed modification will be immediately visible on all servers mounted with the same file system .
  7. Outstanding Performance: The latency can be as low as a few milliseconds and the throughput can be expanded to nearly unlimited. Test results
  8. Data Encryption: Supports data encryption in transit and at rest, read the guide for more information.
  9. Global File Locks: JuiceFS supports both BSD locks (flock) and POSIX record locks (fcntl).
  10. Data Compression: JuiceFS supports use LZ4 or Zstandard to compress all your data.

Architecture | Getting Started | Advanced Topics | POSIX Compatibility | Performance Benchmark | Supported Object Storage | Who is using | Roadmap | Reporting Issues | Contributing | Community | Usage Tracking | License | Credits | FAQ


Architecture

JuiceFS consists of three parts:

  1. JuiceFS Client: Coordinate the implementation of object storage and metadata storage engines, as well as file system interfaces such as POSIX, Hadoop, Kubernetes, and S3 gateway.
  2. Data Storage: Store the data itself, support local disk and object storage.
  3. Metadata Engine: Metadata corresponding to the stored data, supporting multiple engines such as Redis, MySQL, and SQLite;

JuiceFS Architecture

JuiceFS relies on Redis to store file system metadata. Redis is a fast, open-source, in-memory key-value data store and very suitable for storing the metadata. All the data will store into object storage through JuiceFS client. Learn more

JuiceFS Storage Format

Any file stored in JuiceFS will be split into fixed-size "Chunk", and the default upper limit is 64 MiB. Each Chunk is composed of one or more "Slice". The length of the slice is not fixed, depending on the way the file is written. Each slice will be further split into fixed-size "Block", which is 4 MiB by default. Finally, these blocks will be stored in the object storage. At the same time, JuiceFS will store each file and its Chunks, Slices, Blocks and other metadata information in metadata engines. Learn more

How JuiceFS stores your files

Using JuiceFS, files will eventually be split into Chunks, Slices and Blocks and stored in object storage. Therefore, you will find that the source files stored in JuiceFS cannot be found in the file browser of the object storage platform. There is a chunks directory and a bunch of digitally numbered directories and files in the bucket. Don't panic, this is the secret of the high-performance operation of the JuiceFS!

Getting Started

To create a JuiceFS, you need the following 3 preparations:

  1. Redis database for metadata storage
  2. Object storage is used to store data blocks
  3. JuiceFS Client

Please refer to Quick Start Guide to start using JuiceFS immediately!

Command Reference

There is a command reference to see all options of the subcommand.

Kubernetes

Using JuiceFS on Kubernetes is so easy, have a try.

Hadoop Java SDK

If you wanna use JuiceFS in Hadoop, check Hadoop Java SDK.

Advanced Topics

Please refer to JuiceFS User Manual for more information.

POSIX Compatibility

JuiceFS passed all of the 8813 tests in latest pjdfstest.

All tests successful.

Test Summary Report
-------------------
/root/soft/pjdfstest/tests/chown/00.t          (Wstat: 0 Tests: 1323 Failed: 0)
  TODO passed:   693, 697, 708-709, 714-715, 729, 733
Files=235, Tests=8813, 233 wallclock secs ( 2.77 usr  0.38 sys +  2.57 cusr  3.93 csys =  9.65 CPU)
Result: PASS

Besides the things covered by pjdfstest, JuiceFS provides:

  • Close-to-open consistency. Once a file is closed, the following open and read are guaranteed see the data written before close. Within same mount point, read can see all data written before it immediately.
  • Rename and all other metadata operations are atomic guaranteed by Redis transaction.
  • Open files remain accessible after unlink from same mount point.
  • Mmap is supported (tested with FSx).
  • Fallocate with punch hole support.
  • Extended attributes (xattr).
  • BSD locks (flock).
  • POSIX record locks (fcntl).

Performance Benchmark

Basic benchmark

JuiceFS provides a subcommand to run a few basic benchmarks to understand how it works in your environment:

JuiceFS Bench

Throughput

Performed a sequential read/write benchmark on JuiceFS, EFS and S3FS by fio, here is the result:

Sequential Read Write Benchmark

It shows JuiceFS can provide 10X more throughput than the other two, read more details.

Metadata IOPS

Performed a simple mdtest benchmark on JuiceFS, EFS and S3FS by mdtest, here is the result:

Metadata Benchmark

It shows JuiceFS can provide significantly more metadata IOPS than the other two, read more details.

Analyze performance

There is a virtual file called .accesslog in the root of JuiceFS to show all the operations and the time they takes, for example:

$ cat /jfs/.accesslog
2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
2021.01.15 08:26:11.003473 [uid:0,gid:0,pid:4403] write (17675,198,997439): OK <0.000014>
2021.01.15 08:26:11.003616 [uid:0,gid:0,pid:4403] write (17666,390,951582): OK <0.000006>

The last number on each line is the time (in seconds) current operation takes. You can use this directly to debug and analyze performance issues, or try ./juicefs profile /jfs to monitor real time statistics. Please run ./juicefs profile -h or refer to here to learn more about this subcommand.

Supported Object Storage

  • Amazon S3
  • Google Cloud Storage
  • Azure Blob Storage
  • Alibaba Cloud Object Storage Service (OSS)
  • Tencent Cloud Object Storage (COS)
  • QingStor Object Storage
  • Ceph RGW
  • MinIO
  • Local disk
  • Redis

JuiceFS supports almost all object storage services. Learn more.

Who is using

It's considered as beta quality, the storage format is not stabilized yet. If you want to use it in a production environment, please do a careful and serious evaluation first. If you are interested in it, please test it as soon as possible and give us feedback.

You are welcome to tell us after using JuiceFS and share your experience with everyone. We have also collected a summary list in ADOPTERS.md, which also includes other open source projects used with JuiceFS.

Roadmap

  • Stabilize storage format
  • Support FoundationDB as meta engine
  • User and group quotas
  • Directory quotas
  • Snapshot
  • Write once read many (WORM)
  • Trash

Reporting Issues

We use GitHub Issues to track community reported issues. You can also contact the community for getting answers.

Contributing

Thank you for your contribution! Please refer to the CONTRIBUTING.md for more information.

Community

Welcome to join the Discussions and the Slack channel to connect with JuiceFS team members and other users.

Usage Tracking

JuiceFS by default collects anonymous usage data. It only collects core metrics (e.g. version number), no user or any sensitive data will be collected. You could review related code here.

These data help us understand how the community is using this project. You could disable reporting easily by command line option --no-usage-report:

$ ./juicefs mount --no-usage-report

License

JuiceFS is open-sourced under GNU AGPL v3.0, see LICENSE.

Credits

The design of JuiceFS was inspired by Google File System, HDFS and MooseFS, thanks to their great work.

FAQ

Why doesn't JuiceFS support XXX object storage?

JuiceFS already supported many object storage, please check the list first. If this object storage is compatible with S3, you could treat it as S3. Otherwise, try reporting issue.

Can I use Redis cluster?

The simple answer is no. JuiceFS uses transaction to guarantee the atomicity of metadata operations, which is not well supported in cluster mode. Sentinal or other HA solution for Redis are needed.

See "Redis Best Practices" for more information.

What's the difference between JuiceFS and XXX?

See "Comparison with Others" for more information.

For more FAQs, please see the full list.

Owner
Juicedata, Inc
Builds the best file system for cloud.
Juicedata, Inc
Comments
  • [MariaDB] Error 1366: Incorrect string value

    [MariaDB] Error 1366: Incorrect string value

    What happened:

    While rsync from local disk to juicefs mount, it suddenly (after several hours) stopped with

    Failed to sync with 11 errors: last error was: open /mnt/juicefs/folder/file.xls: input/output error

    On the jfs log, i can find these:

    juicefs[187516] <ERROR>: error: Error 1366: Incorrect string value: '\xE9sa sa...' for column `jfsdata`.`jfs_edge`.`name` at row 1
    goroutine 43510381 [running]:
    runtime/debug.Stack()
            /usr/local/go/src/runtime/debug/stack.go:24 +0x65
    github.com/juicedata/juicefs/pkg/meta.errno({0x2df8860, 0xc0251d34a0})
            /go/src/github.com/juicedata/juicefs/pkg/meta/utils.go:76 +0xc5
    github.com/juicedata/juicefs/pkg/meta.(*dbMeta).doMknod(0xc0000e0c40, {0x7fca7e165300, 0xc00ed28040}, 0x399f6f, {0xc025268160, 0x1f}, 0x1, 0x1b4, 0x0, 0x0, ...)
            /go/src/github.com/juicedata/juicefs/pkg/meta/sql.go:1043 +0x29e
    github.com/juicedata/juicefs/pkg/meta.(*baseMeta).Mknod(0xc0000e0c40, {0x7fca7e165300, 0xc00ed28040}, 0x399f6f, {0xc025268160, 0x1f}, 0xc0, 0x7b66, 0x7fca, 0x0, ...)
            /go/src/github.com/juicedata/juicefs/pkg/meta/base.go:594 +0x275
    github.com/juicedata/juicefs/pkg/meta.(*baseMeta).Create(0xc0000e0c40, {0x7fca7e165300, 0xc00ed28040}, 0x26b5620, {0xc025268160, 0x2847500}, 0x8040, 0xed2, 0x8241, 0xc025267828, ...)
            /go/src/github.com/juicedata/juicefs/pkg/meta/base.go:601 +0x109
    github.com/juicedata/juicefs/pkg/vfs.(*VFS).Create(0xc000140640, {0x2e90348, 0xc00ed28040}, 0x399f6f, {0xc025268160, 0x1f}, 0x81b4, 0x22a4, 0xc0)
            /go/src/github.com/juicedata/juicefs/pkg/vfs/vfs.go:357 +0x256
    github.com/juicedata/juicefs/pkg/fuse.(*fileSystem).Create(0xc000153900, 0xc024980101, 0xc022a48a98, {0xc025268160, 0x1f}, 0xc022a48a08)
            /go/src/github.com/juicedata/juicefs/pkg/fuse/fuse.go:221 +0xcd
    github.com/hanwen/go-fuse/v2/fuse.doCreate(0xc022a48900, 0xc022a48900)
            /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/opcode.go:163 +0x68
    github.com/hanwen/go-fuse/v2/fuse.(*Server).handleRequest(0xc00179c000, 0xc022a48900)
            /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/server.go:483 +0x1f3
    github.com/hanwen/go-fuse/v2/fuse.(*Server).loop(0xc00179c000, 0x20)
            /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/server.go:456 +0x110
    created by github.com/hanwen/go-fuse/v2/fuse.(*Server).readRequest
            /go/pkg/mod/github.com/juicedata/go-fuse/[email protected]/fuse/server.go:323 +0x534
     [utils.go:76]
    

    Nothing was logged at MariaDB side.

    Environment:

    • juicefs version 1.0.0-beta2+2022-03-04T03:00:41Z.9e26080
    • Ubuntu 20.04
    • MariaDB 10.4
  • failed to create fs on oos

    failed to create fs on oos

    What happened: i can't create fs on oos What you expected to happen: i can create fs on oos How to reproduce it (as minimally and precisely as possible): juicefs format --storage oos --bucket https://cyn.oos-hz.ctyunapi.cn
    --access-key xxxxxx
    --secret-key xxxxxxx
    redis://:[email protected]:6379/1
    myjfs Anything else we need to know?

    Environment:

    • JuiceFS version (use juicefs --version) or Hadoop Java SDK version: juicefs version 1.0.0-beta3+2022-05-05.0fb9155
    • Cloud provider or hardware configuration running JuiceFS:
    • OS (e.g cat /etc/os-release): Centos 7
    • Kernel (e.g. uname -a): Linux ecs-df87 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
    • Object storage (cloud provider and region, or self maintained): oos
    • Metadata engine info (version, cloud provider managed or self maintained): redis:6
    • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
    • Others:
  • OVH S3 compatible storage AuthorizationHeaderMalformed the region 'us-east-1' is wrong; expecting 'gra'

    OVH S3 compatible storage AuthorizationHeaderMalformed the region 'us-east-1' is wrong; expecting 'gra'

    Hi, i tried to configure JuiceFs with an S3 compatible vendor. OVH

    juicefs format --storage s3 \
        --bucket https://s3.gra.io.cloud.ovh.net/mybucket \
        --access-key $ACCESS_KEY \
        --secret-key $SECRET_KEY \
        "redis://127.0.0.1:9190/1" \
        datavault
    

    with the response:

    2022/12/27 01:06:21.060580 juicefs[97465] <INFO>: Meta address: redis://127.0.0.1:9190/1 [interface.go:402]
    2022/12/27 01:06:21.080483 juicefs[97465] <INFO>: Ping redis: 1.546708ms [redis.go:2878]
    2022/12/27 01:06:21.083500 juicefs[97465] <INFO>: Data use s3://mybucket/datavault/ [format.go:435]
    2022/12/27 01:06:33.842261 juicefs[97465] <FATAL>: Storage s3://mybucket/datavault/ is not configured correctly: Failed to create bucket s3://mybucket/datavault/: AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'gra'
            status code: 400, request id: tx25660a4d66ac46a994f94-0063aa3702, host id: tx25660a4d66ac46a994f94-0063aa3702, previous error: AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'gra'
            status code: 400, request id: tx9552f7f3bed04a04bed96-0063aa3702, host id: tx9552f7f3bed04a04bed96-0063aa3702 [format.go:438]
    

    juicefs version 1.0.2+2022-10-13.514ef03 on Macos 11.4

    I can confirm that rclone is working perfectly

  • Performance degenerates a lot when reading from multiple threads compared with a single thread (when running Clickhouse)

    Performance degenerates a lot when reading from multiple threads compared with a single thread (when running Clickhouse)

    What happened:

    Hi folks,

    We are trying to run ClickHouse benchmark on juicefs (with OSS as the underlying object storage), and under the settings that juicefs has already cached the whole file to the local disk we notice a huge performance gap (compared with running the benchmark on Local SSD) when executing ClickHouse with 4 threads, but such degeneration doesn't happen if we limit the ClickHouse thread to 1.

    More specifically, we are running the clickhouse benchmark with scale factor 1000, and playing query 29th query (the involved table Referer sizes around 24Gi, the query is a full table scan operation), and given clickhouse 100Gi local SSD as the cache directory.

    After serveral runs to make sure the involved file are fully cached locally by juicefs, we notices the following performance numbers

    | threads | ssd runtime (seconds) | juicefs runtime (seconds) | |:-------:|:----------------------:|:--------------------------:| | 4 | 24 | 56 | | 1 | 88 | 100 |

    You could see that the juicefs suffers much more performance degenerated when the workload executing in a multiple thread fashion. Is that behavour expected for juicefs?

    Thanks!

    What you expected to happen:

    The performance gap shouldn't be such large for 4 thread settings.

    How to reproduce it (as minimally and precisely as possible):

    Playing the clickhouse benchmark inside a juicefs mounted directory.

    Anything else we need to know?

    Environment:

    • JuiceFS version (use juicefs --version) or Hadoop Java SDK version: juicefs version 1.0.0-beta2+2022-03-04T03:00:41Z.9e26080
    • Cloud provider or hardware configuration running JuiceFS: aliyun ecs.i3g.2xlarge, (local ssd instance with 4 physical cores and 32Gi memory)
    • OS (e.g cat /etc/os-release): Ubuntu 20.04.3 LTS
    • Kernel (e.g. uname -a): Linux mk1 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    • Object storage (cloud provider and region, or self maintained): OSS
    • Metadata engine info (version, cloud provider managed or self maintained): redis
    • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage): localhost, redis and juicefs are deployed on the same instance
    • Others: clickhouse latest version
  • Out of Memory 0.13.1

    Out of Memory 0.13.1

    What happened: juicefs eating every last bit of memory + swap then gets killed by oomkiller

    What you expected to happen: shouldnt it use less memory?

    How to reproduce it (as minimally and precisely as possible): store 8x104gb size files in juicefs, do rand reads on it, watch how memory consistently climbs until exhaustion

    Anything else we need to know? image

    Environment:

    • JuiceFS version (use ./juicefs --version) or Hadoop Java SDK version: juicefs version 0.13.1 (2021-05-27T08:14:30Z 1737d4e)
    • Cloud provider or hardware configuration running JuiceFS: 4 core, 16gb ram, backblaze b2
    • OS (e.g: cat /etc/os-release): ubuntu 20.04.2 LTS (Focal Fossa)
    • Kernel (e.g. uname -a): 5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    • Object storage (cloud provider and region): Backblaze B2 Europe
    • Redis info (version, cloud provider managed or self maintained): self maintained latset, from docker
    • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): juicefs and redis are locally, 5gbit to backblaze b2
    • Others:
  • Performance 3x ~ 8x slower than s5cmd (for large files)

    Performance 3x ~ 8x slower than s5cmd (for large files)

    While comparing basic read/write operations, it appears than s5cmd is 3x ~ 8x faster than juicefs

    What happened:

    #### WRITE IO ####

    $ time cp 1gb_file.txt /mnt/juicefs0/
    real	0m50.859s
    user	0m0.016s
    sys	0m1.365s
    
    $ time s5cmd cp 1gb_file.txt s3://bucket/path/
    real	0m20.614s
    user	0m9.411s
    sys	0m3.232s
    

    #### READ IO ####

    $ time cp /mnt/juicefs0/1gb_file.txt .
    real	0m45.539s
    user	0m0.014s
    sys	0m1.578s
    
    $ time s5cmd cp s3://bucket/path/1gb_file.txt .
    real	0m6.074s
    user	0m1.186s
    sys	0m2.504s
    

    Environment:

    • JuiceFS version or Hadoop Java SDK version: juicefs version 0.12.1 (2021-04-15T08:18:25Z 7b4df23)
    • Cloud provider or hardware configuration running JuiceFS: Linode 1 GB VM
    • OS: Fedora 33 (Server Edition)
    • Kernel: Linux 5.11.12-200.fc33.x86_64 #1 SMP Thu Apr 8 02:34:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    • Object storage: Linode
    • Redis info: Redis 6.2.1
    • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): redis (local), S3 Object Storage (Linode)
  • Errors on WebDAV operations, but files are created

    Errors on WebDAV operations, but files are created

    Juicefs (1.0.0-rc1) outputs a 401, but bucket is created (folder 'jfs' is created at the root path) on server (so, credentials are OK). Happy to provide more debug info if needed (note i have no admin access on WebdDAV server).

    # juicefs format --storage webdav --bucket https://web.dav.server/ --access-key 307399 --secret-key tTxX12NPMyy --trash-days 0 redis://127.0.0.1:6379/1 jfs
    
    2022/06/17 11:45:19.373300 juicefs[799909] <INFO>: Meta address: redis://127.0.0.1:6379/1 [interface.go:397]
    2022/06/17 11:45:19.375116 juicefs[799909] <INFO>: Ping redis: 82.354µs [redis.go:2869]
    2022/06/17 11:45:19.375437 juicefs[799909] <INFO>: Data use webdav://web.dav.server/jfs/ [format.go:420]
    2022/06/17 11:45:19.529334 juicefs[799909] <WARNING>: List storage webdav://web.dav.server/jfs/ failed: 401 Unauthorized: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>401 Unauthorized</title>
    </head><body>
    <h1>Unauthorized</h1>
    <p>This server could not verify that you
    are authorized to access the document
    requested.  Either you supplied the wrong
    credentials (e.g., bad password), or your
    browser doesn't understand how to supply
    the credentials required.</p>
    </body></html> [format.go:438]
    2022/06/17 11:45:19.535569 juicefs[799909] <INFO>: Volume is formatted as {Name:jfs UUID:05363857-2fce-42dc-94e2-4d41c33172d0 Storage:webdav Bucket:https://web.dav.server/ AccessKey:307399 SecretKey:removed BlockSize:4096 Compression:none Shards:0 HashPrefix:false Capacity:0 Inodes:0 EncryptKey: KeyEncrypted:true TrashDays:0 MetaVersion:1 MinClientVersion: MaxClientVersion:} [format.go:458]
    

    or trying to destroy it:

    # juicefs destroy redis://127.0.0.1:6379/1 05363857-2fce-42dc-94e2-4d41c33172d0
    
    2022/06/17 11:50:24.072353 juicefs[800639] <INFO>: Meta address: redis://127.0.0.1:6379/1 [interface.go:397]
    2022/06/17 11:50:24.073182 juicefs[800639] <INFO>: Ping redis: 67.778µs [redis.go:2869]
     volume name: jfs
     volume UUID: 05363857-2fce-42dc-94e2-4d41c33172d0
    data storage: webdav://web.dav.server/jfs/
      used bytes: 0
     used inodes: 0
    WARNING: The target volume will be destoried permanently, including:
    WARNING: 1. ALL objects in the data storage: webdav://web.dav.server/jfs/
    WARNING: 2. ALL entries in the metadata engine: redis://127.0.0.1:6379/1
    Proceed anyway? [y/N]: y
    2022/06/17 11:50:25.693697 juicefs[800639] <FATAL>: list all objects: 401 Unauthorized: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>401 Unauthorized</title>
    </head><body>
    <h1>Unauthorized</h1>
    <p>This server could not verify that you
    are authorized to access the document
    requested.  Either you supplied the wrong
    credentials (e.g., bad password), or your
    browser doesn't understand how to supply
    the credentials required.</p>
    </body></html> [destroy.go:158]
    

    On files operations, files (chunks folder is populated) but this is logged:

    2022/06/17 12:02:54.141831 juicefs[802424] <INFO>: Mounting volume jfs at /mnt/jfs ... [mount_unix.go:181]
    2022/06/17 12:04:31.053268 juicefs[802424] <WARNING>: Upload chunks/0/0/7_0_140: 403 Forbidden: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>403 Forbidden</title>
    </head><body>
    <h1>Forbidden</h1>
    <p>You don't have permission to access this resource.</p>
    </body></html> (try 1) [cached_store.go:462]
    

    Accessing WebDAV directly (outside juicefs) we can see correct file operations: image

  • freezing when tikv is down

    freezing when tikv is down

    7 pd-server node, kill 2 pd-server node, juicefs freezing.

    [2021/09/02 15:57:58.152 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.35:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:57:59.021 +08:00] [ERROR] [client.go:599] ["[pd] getTS error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster"] [stack="github.com/tikv/pd/client.(*client).handleDispatcher\n\t/root/hanson/go/pkg/mod/github.com/tikv/[email protected]/client/client.go:599"] [2021/09/02 15:57:59.022 +08:00] [ERROR] [pd.go:234] ["updateTS error"] [txnScope=global] [error="rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster"] [errorVerbose="rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster\ngithub.com/tikv/pd/client.(*client).processTSORequests\n\t/root/hanson/go/pkg/mod/github.com/tikv/[email protected]/client/client.go:717\ngithub.com/tikv/pd/client.(*client).handleDispatcher\n\t/root/hanson/go/pkg/mod/github.com/tikv/[email protected]/client/client.go:587\nruntime.goexit\n\t/snap/go/7954/src/runtime/asm_amd64.s:1371\ngithub.com/tikv/pd/client.(*tsoRequest).Wait\n\t/root/hanson/go/pkg/mod/github.com/tikv/[email protected]/client/client.go:913\ngithub.com/tikv/pd/client.(*client).GetTS\n\t/root/hanson/go/pkg/mod/github.com/tikv/[email protected]/client/client.go:933\ngithub.com/tikv/client-go/v2/util.InterceptedPDClient.GetTS\n\t/root/hanson/go/pkg/mod/github.com/tikv/client-go/[email protected]/util/pd_interceptor.go:79\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).getTimestamp\n\t/root/hanson/go/pkg/mod/github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:141\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS.func1\n\t/root/hanson/go/pkg/mod/github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:232\nsync.(*Map).Range\n\t/snap/go/7954/src/sync/map.go:345\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS\n\t/root/hanson/go/pkg/mod/github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:230\nruntime.goexit\n\t/snap/go/7954/src/runtime/asm_amd64.s:1371"] [stack="github.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS.func1\n\t/root/hanson/go/pkg/mod/github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:234\nsync.(*Map).Range\n\t/snap/go/7954/src/sync/map.go:345\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS\n\t/root/hanson/go/pkg/mod/github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:230"] [2021/09/02 15:57:59.317 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.36:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:00.608 +08:00] [WARN] [prewrite.go:198] ["slow prewrite request"] [startTS=427443780657872897] [region="{ region id: 4669, ver: 35, confVer: 1007 }"] [attempts=280] [2021/09/02 15:58:04.317 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.36:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:09.318 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.36:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:14.319 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.36:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:15.831 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.35:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:20.832 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.35:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:25.834 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.36:20160] [forwardedHost=] [error="context deadline exceeded"] [2021/09/02 15:58:30.834 +08:00] [WARN] [client_batch.go:497] ["init create streaming fail"] [target=10.188.19.36:20160] [forwardedHost=] [error="context deadline exceeded"]

    What happened:

    What you expected to happen:

    How to reproduce it (as minimally and precisely as possible):

    Anything else we need to know?

    Environment:

    • JuiceFS version (use ./juicefs --version) or Hadoop Java SDK version:
    • Cloud provider or hardware configuration running JuiceFS:
    • OS (e.g: cat /etc/os-release):
    • Kernel (e.g. uname -a):
    • Object storage (cloud provider and region):
    • Redis info (version, cloud provider managed or self maintained):
    • Network connectivity (JuiceFS to Redis, JuiceFS to object storage):
    • Others:
  • Deleted tag cause misbehaviour of `go get`

    Deleted tag cause misbehaviour of `go get`

    What happened:

    go get: added github.com/juicedata/juicefs v0.14.0
    

    What you expected to happen:

    go get: added github.com/juicedata/juicefs v0.13.1
    

    How to reproduce it (as minimally and precisely as possible):

    go get github.com/juicedata/juicefs
    

    Anything else we need to know? it looks like you probably deleted that tag, which is not supported by proxy.golang.org (see the FAQ)

    also, the current latest tag, v0.14-dev is not valid semver, thus is being ignored and mangled by the toolchain: (try v0.14.0-alpha)

    [user@localhost ~]$ go get github.com/juicedata/[email protected]
    go: downloading github.com/juicedata/juicefs v0.13.2-0.20210527090717-42ac85ce406c
    go get: downgraded github.com/juicedata/juicefs v0.14.0 => v0.13.2-0.20210527090717-42ac85ce406c
    
  • Mounting a fs with postgres + search_path throws database is not formatted, please run `juicefs format`

    Mounting a fs with postgres + search_path throws database is not formatted, please run `juicefs format`

    What happened: Mounting an s3 compatibel fs with postgres + schema as meta storage throws database is not formatted. Executing without setting search_path and thus using public works fine.

    What you expected to happen: The fs to be mounted error free

    How to reproduce it (as minimally and precisely as possible):

    juicefs format--storage s3 --bucket https://sos-de-muc-1.exo.io/testjuice-pgsql   --access-key xxx  --secret-key xxx postgres://user:pass@dburl/db?search_path=schema pgjuice
    Volume is formatted as {
      "Name": "pgjuice",
      "UUID": "33e5fe46-0fff-4a06-83c0-64251ef64df4",
      "Storage": "s3",
      "Bucket": "https://sos-de-muc-1.exo.io/testjuice-pgsql",
      "AccessKey": "vvvv",
      "SecretKey": "removed",
      "BlockSize": 4096,
      "Compression": "none",
      "KeyEncrypted": true,
      "TrashDays": 1,
      "MetaVersion": 1
    } [format.go:472]
    
    juicefs mount --background postgres://user:pass@dburl/db?search_path=schema /mnt/pgjuice
    2022/10/22 10:07:12.533714 juicefs[32846] <INFO>: Meta address: postgres://xxxx?search_path=schema [interface.go:402]
    2022/10/22 10:07:12.545518 juicefs[32846] <WARNING>: The latency to database is too high: 11.463676ms [sql.go:203]
    2022/10/22 10:07:12.548892 juicefs[32846] <FATAL>: load setting: database is not formatted, please run `juicefs format ...` first [main.go:31]
    

    Anything else we need to know?

    Environment:

    • juicefs version 1.0.2+2022-10-13.514ef03
    • Debian 11
    • Kernel (e.g. uname -a): 5.10.0-10-amd64 # 1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux
    • Object storage (cloud provider and region, or self maintained): Managed Exoscale SOS Germany Munich
    • Metadata engine info (version, cloud provider managed or self maintained): Managed Postgres 14 on Exoscale
  • Metadata Dump doesn't complete on MySQL

    Metadata Dump doesn't complete on MySQL

    What happened:

    Dump is incomplete.

    What you expected to happen:

    Dump completes successfully.

    How to reproduce it (as minimally and precisely as possible):

    Observe the dumped entries.

    juicefs dump "mysql://mount:Passw0d@(10.1.0.9:3306)/mount" /tmp/juicefs.dump
    
    2022/05/26 15:13:07.548210 juicefs[1785019] <WARNING>: no found chunk target for inode 854691 indx 0 [sql.go:2747]
    2022/05/26 15:13:07.548245 juicefs[1785019] <WARNING>: no found chunk target for inode 854709 indx 0 [sql.go:2747]
    2022/05/26 15:13:07.548264 juicefs[1785019] <WARNING>: no found chunk target for inode 854726 indx 0 [sql.go:2747]
    2022/05/26 15:13:07.548288 juicefs[1785019] <WARNING>: no found chunk target for inode 4756339 indx 0 [sql.go:2747]
     Snapshot keys count: 4727518 / 4727518 [==============================================================]  done
    Dumped entries count: 1370 / 1370 [==============================================================]  done
    

    Anything else we need to know?

    Environment:

    # juicefs --version
    juicefs version 1.0.0-dev+2022-05-26.54ddc5c4
    
    {
      "Setting": {
        "Name": "mount",
        "UUID": "5cd22d3c-12d6-4be4-9a52-19e753c416e9",
        "Storage": "s3",
        "Bucket": "https://data-jfs%d.s3.de",
        "AccessKey": "d61bc82480ab49fb8d",
        "SecretKey": "removed",
        "BlockSize": 4096,
        "Compression": "zstd",
        "Shards": 512,
        "HashPrefix": false,
        "Capacity": 0,
        "Inodes": 0,
        "EncryptKey": "/YShECK6Tirb0uHljlK8PIJ12C4Fj2idW5hbzARwYaGDGoSU>
        "KeyEncrypted": true,
        "TrashDays": 90,
        "MetaVersion": 1,
        "MinClientVersion": "",
        "MaxClientVersion": ""
      },
      "Counters": {
        "usedSpace": 4568750424064,
        "usedInodes": 4723758,
        "nextInodes": 4923302,
        "nextChunk": 4062001,
        "nextSession": 137,
        "nextTrash": 667
      },
    }
    
  •     move troubleshooting content into troubleshooting.md, and improve them

    move troubleshooting content into troubleshooting.md, and improve them

    • remove warmup content (already covered in command_reference.md) and leave references instead
    • add writeback caveats
    • improve kernel metadata cache descriptions
  • Easily Reduce ListObjects Sorting Incompatibilities

    Easily Reduce ListObjects Sorting Incompatibilities

    What would you like to be added: Lift "lexicographically sorted" requirement for metadata backup cleanup, gc, fsck, and destroy. It appears to me as though backup-cleanup already sorts the result itself, and that gc, fsck, and destroy do not actually require sorted results to function correctly. Only sync seems to actually care and depend on sorted results. Further, if that interpretation is correct, it also seems that lifting the restriction on these commands (all except sync) can easily be done by adding a simple "fail if unsorted" boolean as input to sync.ListAll(), checking that flag before triggering the "keys out of order" error in sync.ListAll(), and setting that boolean to false for backup/gc/fsck/destroy (true for sync).

    Why is this needed: JuiceFS currently does not support Storj DCS and Cloudflare R2 object stores for several useful features (metadata backup, gc, fsck, destroy, sync), unless using an intermediate gateway that sorts results. Most of these limitations appear superficial and very easily supported per description above. It appears that only sync() would require any non-trivial effort to make compatible, which could be deferred to the future.

  • Writeback cache synchronization issue

    Writeback cache synchronization issue

    What happened: JuiceFS appears to have a writeback cache synchronization issue. Example below may be exacerbated by slow IO and intolerance for misbehavior of the storage provider (STORJ), but it seems to me that the root cause is within JuiceFS.

    See chunk 11008_0_4194304 in attached log; several parallel upload attempts (resulting in server errors), continuing to try to upload after file has been deleted (at time 21:32:04), and other general signs of missing synchronization. Also appears (not observable in log below) as though --upload-delay is not being fully obeyed (uploads begin within about one second of file creation). This example is very repeatable. This problem does not occur without --writeback in the mount command.

    What you expected to happen: One concurrent upload attempt per chunk, and no attempt to continue uploading significantly after chunk has been deleted locally.

    How to reproduce it (as minimally and precisely as possible): Mount command: juicefs mount --no-usage-report --cache-size 512 --writeback -o allow_other --upload-delay 5 --backup-meta 0 --max-uploads 5 --verbose sqlite3://storj-test.db /jfs

    Test scenario: create 10 4MB files in rapid succession: for i in {1..10}; do dd if=/dev/urandom bs=1M count=4 of=./test${i}; done

    Environment:

    • JuiceFS version: juicefs version 1.0.3+2022-12-27.ec6c8abd
    • OS: Linux (AlmaLinux 8.7)
    • Kernel: 4.18.0-425.3.1.el8.x86_64
    • Object storage: STORJ
    • Metadata engine info: SQLite

    Log: See juicefs-storj-writeback-issue.log. Reflects that I have placed additional DEBUG statements just before and after s3.go:Put()'s call to s3.PutObject() to help clarify the misbehavior. Redacted my IP with <MY_IP>, juicefs cache path with <JFS_CACHE>

  • spawn background processes with context timeout

    spawn background processes with context timeout

    Some background processes may cost more time than 1 period. For example, if a subentry of trash is corrupted and cannot be parsed as yyyy-mm-dd-hh, the doCleanupTrash goroutine will run into deadloop (fixed in https://github.com/juicedata/juicefs/pull/3032, but still exists in release-1.0).

Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.
Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.

Kitten is a distributed file system optimized for small file storage, inspired by Facebook's Haystack.

Aug 18, 2022
SeaweedFS a fast distributed storage system for blobs, objects, files, and data lake, for billions of files
SeaweedFS a fast distributed storage system for blobs, objects, files, and data lake, for billions of files

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Jan 8, 2023
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

Jan 2, 2023
A user-space file system for interacting with Google Cloud Storage

gcsfuse is a user-space file system for interacting with Google Cloud Storage. Current status Please treat gcsfuse as beta-quality software. Use it fo

Dec 29, 2022
The Swift Virtual File System

*** This project is not maintained anymore *** The Swift Virtual File System SVFS is a Virtual File System over Openstack Swift built upon fuse. It is

Dec 11, 2022
Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

Aug 7, 2017
A distributed key value store in under 1000 lines. Used in production at comma.ai

minikeyvalue Fed up with the complexity of distributed filesystems? minikeyvalue is a ~1000 line distributed key value store, with support for replica

Jan 9, 2023
A FileSystem Abstraction System for Go
A FileSystem Abstraction System for Go

A FileSystem Abstraction System for Go Overview Afero is a filesystem framework providing a simple, uniform and universal API interacting with any fil

Dec 31, 2022
Go bindings to systemd socket activation, journal, D-Bus, and unit files

go-systemd Go bindings to systemd. The project has several packages: activation - for writing and using socket activation from Go daemon - for notifyi

Dec 30, 2022
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is an open-source POSIX file system built on top of Redis and object storage

Jan 5, 2023
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is a high-performance POSIX file system released under GNU Affero General Public License v3.0. It is specially optimized for the cloud-native

Jan 4, 2023
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is a high-performance POSIX file system released under GNU Affero General Public License v3.0. It is specially optimized for the cloud-native

Jan 1, 2023
a high-performance, POSIX-ish Amazon S3 file system written in Go
a high-performance, POSIX-ish Amazon S3 file system written in Go

Goofys allows you to mount an S3 bucket as a filey system.

Dec 23, 2022
GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go
GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go Overview GeeseFS allows you to mount an S3 bucket as a file sys

Jan 1, 2023
Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go
Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go

Goofys is a high-performance, POSIX-ish Amazon S3 file system written in Go Overview Goofys allows you to mount an S3 bucket as a filey system. It's a

Jan 8, 2023
REST based Redis client built on top of Upstash REST API

An HTTP/REST based Redis client built on top of Upstash REST API.

Jul 31, 2022
A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

hedge A library built on top of spindle and Cloud Spanner that provides rudimentary distributed computing facilities to Kubernetes deployments. Featur

Nov 9, 2022
A distributed locking library built on top of Cloud Spanner and TrueTime.

A distributed locking library built on top of Cloud Spanner and TrueTime.

Sep 13, 2022