Processing large file - go

not_yet_hit_the_wall

Processing large file - go

After reading Marcel Lanz's tweet (seems somebody liked it, and it was shown in my twitter's home), and read his blog (https://marcellanz.com/post/file-read-challenge/ -- read it...it's fun and helpful!), i tried to get his rev9 work and run it in my new build mini ITX PC (Ryzen 5700G with 32GB RAM and NVME)

alt text

with no other program opened, except the terminal

alt text

and run it with the latest go 1.17

alt text

It's faaaaaaast, only 2.5s to process the file. So i give it a try to see if i can change something to make it runs faster, i changed the strings.SplitN to a less-alloc code, and see the improvement

alt text

its ~2.2s now, seems interesting to see if I can do other improvement with the current code. And after a while, it stuck me in mutex implementation of the code, so i decide to write a version with channel to see if it performs better

alt text

~1.7s now, well it make me curious more if any more juice can be squeezed, then the blog mention to optimize on the reading process, so i moved the scanner from the standard library and read it using regular read and bytes.Index

alt text

it goes down to ~1.4s ...great Then, i remember the blog mention the fastest implementation, so i download sirkon's (https://github.com/sirkon/mineislarger) Run it and amazed...

alt text

It's ~1s, soooo faaast... i'm glad that it means more improvement can be done to achieve better performance (optimize my code or rewrite it using different methods) so far there are somethings i would love to try... i'm sure i'm not yet hit the wall

UPDATE 2021-08-29 02:25 AM (GMT +7, Jakarta-Indonesia Time)

After one day uploading, and analize the possibilities to conquer sirkon's code performance which is ~1s running in my PC,( check his code at https://github.com/sirkon/mineislarger , btw i haven't analize his code, to keep me looking for fresh solution), finally i might be hitting the wall... with another method, my code finally run less then 800ms.....yaaaaayyyyyyyyyyyyyy!!!!

alt text

Maybe i'll write the explanation later, but it actually a very simple logic.

UPDATE 2021-08-29 19:25 (GMT +7, Jakarta-Indonesia Time)

After hints from Felix Geisendörfer (@felixge)

alt text

and changes some of his suggestion, we got ~700ms performance

alt text

UPDATE 2021-09-01 12:50 (GMT +7, Jakarta-Indonesia Time)

There are update ver 5, which not worth mentioning, the performance change is not significant, or even slower... So let's get to ver6, where i removed the sync.Pool, trace/profile it, and adjust some parameters to have optimal performance. and now we get under 500ms performance

alt text

and (*forget to mention the filesize processed), so here it is..

alt text

Owner
Radhika Isswandhana
Mainly write in Go(lang), delphi/free pascal, perl, php, a bit python, C, and abit of everything else depend on the job
Radhika Isswandhana
Similar Resources

An epoll(7)-based file-descriptor multiplexer.

poller Package poller is a file-descriptor multiplexer. Download: go get github.com/npat-efault/poller Package poller is a file-descriptor multiplexer

Sep 25, 2022

QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Dec 22, 2021

Goful is a CUI file manager written in Go.

Goful is a CUI file manager written in Go.

Goful Goful is a CUI file manager written in Go. Works on cross-platform such as gnome-terminal and cmd.exe. Displays multiple windows and workspaces.

Dec 28, 2022

Read a tar file contents using go1.16 io/fs abstraction

Read a tar file contents using go1.16 io/fs abstraction

go-tarfs Read a tar file contents using go1.16 io/fs abstraction Usage ⚠️ go-tarfs needs go=1.16 Install: go get github.com/nlepage/go-tarfs Use: pac

Dec 1, 2022

Open Source Continuous File Synchronization

Open Source Continuous File Synchronization

Goals Syncthing is a continuous file synchronization program. It synchronizes files between two or more computers. We strive to fulfill the goals belo

Jan 9, 2023

Cross-platform file system notifications for Go.

File system notifications for Go fsnotify utilizes golang.org/x/sys rather than syscall from the standard library. Ensure you have the latest version

Jan 1, 2023

The best HTTP Static File Server, write with golang+vue

The best HTTP Static File Server, write with golang+vue

gohttpserver Goal: Make the best HTTP File Server. Features: Human-friendly UI, file uploading support, direct QR-code generation for Apple & Android

Dec 30, 2022

Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly is an intelligent P2P based image and file distribution system.

Dragonfly Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in o

Jan 9, 2023

Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature

filetype Small and dependency free Go package to infer file and MIME type checking the magic numbers signature. For SVG file type checking, see go-is-

Jan 3, 2023
Download an upload large files to Google Drive (API v3)

gdriver gdriver is a command-line tool, written in Go, used for uploading and downloading large personal files from Google Drive (API v3). The tool pr

Nov 30, 2022
Golang PDF library for creating and processing PDF files (pure go)

UniPDF - PDF for Go UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is wr

Dec 28, 2022
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).
go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

中文 English 愿景:为用户提供最简单、可靠、高效的分布式文件系统。 go-fastdfs是一个基于http协议的分布式文件系统,它基于大道至简的设计理念,一切从简设计,使得它的运维及扩展变得更加简单,它具有高性能、高可靠、无中心、免维护等优点。 大家担心的是这么简单的文件系统,靠不靠谱,可不

Jan 8, 2023
Abstract File Storage

afs - abstract file storage Please refer to CHANGELOG.md if you encounter breaking changes. Motivation Introduction Usage Matchers Content modifiers S

Dec 30, 2022
a tool for handling file uploads simple

baraka a tool for handling file uploads for http servers makes it easier to make operations with files from the http request. Contents Install Simple

Nov 30, 2022
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site
Bigfile -- a file transfer system that supports http, rpc and ftp protocol   https://bigfile.site

Bigfile ———— a file transfer system that supports http, rpc and ftp protocol 简体中文 ∙ English Bigfile is a file transfer system, supports http, ftp and

Dec 31, 2022
Go file operations library chasing GNU APIs.
Go file operations library chasing GNU APIs.

flop flop aims to make copying files easier in Go, and is modeled after GNU cp. Most administrators and engineers interact with GNU utilities every da

Nov 10, 2022
Read csv file from go using tags

go-csv-tag Read csv file from Go using tags The project is in maintenance mode. It is kept compatible with changes in the Go ecosystem but no new feat

Nov 16, 2022
File system event notification library on steroids.

notify Filesystem event notification library on steroids. (under active development) Documentation godoc.org/github.com/rjeczalik/notify Installation

Dec 31, 2022
Pluggable, extensible virtual file system for Go

vfs Package vfs provides a pluggable, extensible, and opinionated set of file system functionality for Go across a number of file system types such as

Jan 3, 2023