Hi team,
I hope everyone is doing well.
I sat down to try and write The Next Big Thing In Syncing Files For Developers and quickly came unstuck finding a reliable library to do file watch event for macOS so I found myself wondering "how slow can it really be to just walk the folder to be sync at a regular rate?"
Heaps slow, as I found out, so after lots of searching I landed upon the blog post that refers to this repo.
My test case was the monorepo I work in- it's 26GB (has some envs and whatnot in there too at the moment) and it's about 500,000-odd files.
Here's my output:
$ go build -o bin/syncer cmd/syncer/main.go && bin/syncer -send -localPath ~/Projects/FTP/ims-mono -remotePath scratch/remote -remoteHost localhost
2022/07/10 05:09:47 500638 files in 5.432810501s <-- this is github.com/MichaelTJones/walk; absolutely smashes the CPU because goroutines
2022/07/10 05:10:09 500637 files in 21.849600317s <-- this is recursive os.ReadDir w/ string args
2022/07/10 05:10:30 500637 files in 21.596501124s <-- this is recursive file.ReadDir (like in this repo)
The code I was using is at https://github.com/initialed85/syncer; please forgive the WIP nature of it, any structure fell to pieces when I started having to hack my way around fsnotify problems and then work out walk performance issues.
So I figure either I'm doing something wrong, or the directory structure that this repo was testing against has some peculiarities that favour ReadDir.
Any ideas?