Tools to help with Japanese sentence mining

jp-mining-tools

Tools to help with Japanese sentence mining

Syosetu scraper

go run cmd/scrape_syosetu/main.go --help
Usage of /var/folders/fw/0wq08yqd3fgd69t86wv72l040000gn/T/go-build4010932054/b001/exe/main:
  -end int
        the ending chapter (default 10)
  -series string
        the code of a series (default "n6316bn")
  -start int
        the starting chapter (default 1)

Example

# Download the first 50 chapters of Mushoku Tensei
go run cmd/scrape_syosetu/main.go -series n9669bk -end 50

Search for word in corpus

go run cmd/find_sentence/main.go -word これから
Owner
Anton Van Eechaute
Full-stack software engineer. Building tadoku.app
Anton Van Eechaute
Comments
  • Bump next from 12.0.5 to 12.0.9 in /frontend/grammarquiz

    Bump next from 12.0.5 to 12.0.9 in /frontend/grammarquiz

    Bumps next from 12.0.5 to 12.0.9.

    Release notes

    Sourced from next's releases.

    v12.0.9

    This upgrade is completely backward-compatible and recommended for all users on versions below 12.0.9

    Vulnerable code could allow a bad actor to trigger a denial of service attack via the /${locale}/_next/ route for anyone running a Next.js app at version >= 12.0.0, and using built-in i18n routing functionality.

    How to Upgrade

    • We have released patch versions for both the stable and canary channels of Next.js.
    • To upgrade run npm install next@latest --save

    Impact

    • Affected: All of the following must be true to be affected by this CVE
      • Next.js versions between v12.0.0 and v12.0.9
      • Using next start or a custom server
      • Using the built-in i18n support
    • Not affected:
      • Deployments on Vercel (vercel.com) are not affected along with similar environments where invalid requests are filtered before reaching Next.js.

    We recommend everyone to upgrade regardless of whether you can reproduce the issue or not.

    How to Assess Impact

    If your server has seen requests to any route under the prefix /${locale}/_next/ that have triggered a heap overflow error, this was caused by the patched issue.

    What is Being Done

    As Next.js has grown in popularity and usage by enterprises, it has received the attention of security researchers and auditors. We are thankful to our users for their investigation and responsible disclosure of the original bug.

    We've landed a patch that ensures this is handled properly so the requested route no longer crashes and triggers a heap overflow.

    Regression tests for this attack were added to the i18n integration test suite

    • A public CVE was released.
    • We encourage responsible disclosure of future reports. Please email us at [email protected]. We are actively monitoring this mailbox.

    Core Changes

    • middlewares: limit process.env to inferred usage: #33186
    • update webpack: #33207
    • Abstract out native filesystem usage from the base server: #33226
    • use text data url instead of base64 for shorter encoding: #33218
    • chore(deps): upgrade postcss: #33142
    • Fix global process testing for the process polyfill: #33220
    • Update swc: #33201
    • improve full refresh overlay: #33301
    • Custom app for server components: #33149
    • Update yarn PnP tests and disable swc file reading for PnP: #33236
    • Base Http for BaseServer: #32999
    • Update swc: #33342

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump nanoid from 3.1.30 to 3.2.0 in /frontend/anki

    Bump nanoid from 3.1.30 to 3.2.0 in /frontend/anki

    ⚠️ Dependabot is rebasing this PR ⚠️

    Rebasing might not happen immediately, so don't worry if this takes some time.

    Note: if you make any changes to this PR yourself, they will take precedence over the rebase.


    Bumps nanoid from 3.1.30 to 3.2.0.

    Changelog

    Sourced from nanoid's changelog.

    Change Log

    This project adheres to Semantic Versioning.

    3.2

    • Added --size and --alphabet arguments to binary (by Vitaly Baev).

    3.1.32

    • Reduced async exports size (by Artyom Arutyunyan).
    • Moved from Jest to uvu (by Vitaly Baev).

    3.1.31

    • Fixed collision vulnerability on object in size (by Artyom Arutyunyan).
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump nanoid from 3.1.30 to 3.2.0 in /frontend/grammarquiz

    Bump nanoid from 3.1.30 to 3.2.0 in /frontend/grammarquiz

    Bumps nanoid from 3.1.30 to 3.2.0.

    Changelog

    Sourced from nanoid's changelog.

    Change Log

    This project adheres to Semantic Versioning.

    3.2

    • Added --size and --alphabet arguments to binary (by Vitaly Baev).

    3.1.32

    • Reduced async exports size (by Artyom Arutyunyan).
    • Moved from Jest to uvu (by Vitaly Baev).

    3.1.31

    • Fixed collision vulnerability on object in size (by Artyom Arutyunyan).
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump next from 12.0.3 to 12.0.5 in /frontend/grammarquiz

    Bump next from 12.0.3 to 12.0.5 in /frontend/grammarquiz

    Bumps next from 12.0.3 to 12.0.5.

    Release notes

    Sourced from next's releases.

    v12.0.5

    This upgrade is completely backward-compatible and recommended for all users on versions below 12.0.5. A backport of the patch to Next.js 11 is available as 11.1.3.

    When a URL is provided to next-server that cannot be parsed, an unhandledPromiseRejection could occur. On Node.js versions v15.0.0, this causes the server process to exit, which can result in unexpected server crashes.

    How to Upgrade

    • We have released patch versions for both the stable and canary channels of Next.js.
    • To upgrade run npm install next@latest --save

    Impact

    • Affected: All of the following must be true to be affected
      • Next.js versions above v11.1.0 and below v12.0.5
      • Node.js above v15.0.0 being used
      • Using next start or a custom server
    • Not affected: Deployments on Vercel (vercel.com) are not affected along with similar environments where invalid requests are filtered before reaching Next.js.

    We recommend everyone to upgrade regardless of whether you can reproduce the issue or not.

    How to Assess Impact

    If you are running Node.js > v15.0.0 with Next.js, you can filter any server error logs for ERR_INVALID_URL.

    What is Being Done

    As Next.js has grown in popularity and usage by enterprises, it has received the attention of security researchers and auditors. We are thankful to GitHub user hopeless-programmer-online for their investigation and discovery of the original bug.

    We've landed a patch that ensures this is handled properly so the unhandledPromiseRejection issue no longer occurs.

    Regression tests for this attack were added to the security integration test suite

    • A public CVE is requested.
    • We encourage responsible disclosure of future reports. Please email us at [email protected]. We are actively monitoring this mailbox.

    Core Changes

    • Add a swc transform for removal of console.* calls.: #31449
    • Support ESLint v8: #29865
    • fix: allow next lint without eslint-config-next installed: #29823
    • Remove TextEncoder and TextDecoder wrappers: #31490
    • simplify output messages: #31454
    • update webpack: #31455
    • NextResponse: add .json static method: #31483
    • Use _error for development in streaming: #31466
    • Refactor the middleware SSR loader: #31508
    • Add detection for Google-PageRenderer bot: #31521

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump next from 11.1.2 to 11.1.3 in /frontend/anki

    Bump next from 11.1.2 to 11.1.3 in /frontend/anki

    Bumps next from 11.1.2 to 11.1.3.

    Release notes

    Sourced from next's releases.

    v11.1.3

    See https://github.com/vercel/next.js/releases/v12.0.5 for details about this patch.

    v11.1.3-canary.105

    Core Changes

    • Update swc-minify-enabled link: #30290
    • Fix middleware header propagation: #30288
    • Move outputFileTracing config up: #30295
    • Track usage of swc features: #30297
    • Ensure null bytes in resolved path are handled: #30313
    • Improve deprecation errors for new middleware API: #30316

    Documentation Changes

    Example Changes

    • Update image component example to use AVIF: #30294

    Credits

    Huge thanks to @​ijjk, @​styfle, @​padmaia, @​javivelasco, and @​leerob for helping!

    v11.1.3-canary.104

    Misc Changes

    • Add necessary workflow job dependencies: #30291

    v11.1.3-canary.103

    Core Changes

    • Warn when mutating res if not streaming: #30284
    • Chore/publish all swc: #30289

    Credits

    Huge thanks to @​kara for helping!

    v11.1.3-canary.102

    Core Changes

    • Add warning when LCP image is missing priority prop: #30221
    • New Middleware API signature: #30282
    • Fix trace case with tsconfig/jsconfig baseUrl: #30286

    Documentation Changes

    ... (truncated)

    Commits
    Maintainer changes

    This version was pushed to npm by vercel-release-bot, a new releaser for next since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

Words - help with a word finder game, sketches a text-processing utility program

Shell-style text processing in Go I saw a word game where the puzzle gives you six letters. By means of a clever user interface, you construct words f

Jan 1, 2022
This service will help you detect any waste of resources in your AWS account

Frugal-Hero This service will help you detect any waste of resources in your AWS account. The policy is: if it is not useful, delete it! Requirements

Jan 31, 2022
Handy tools to manipulate korean character.
Handy tools to manipulate korean character.

About hangul hangul is a set of handy tools for manipulate korean character in Go language. Example package main import ( "fmt" hangu

Oct 27, 2022
An anthology of a variety of tools for the Persian language in Golang
An anthology of a variety of tools for the Persian language in Golang

Persian tools An anthology of a variety of tools for the Persian language in Golang Todos Bill calculator Digits Validate Bank card number. Find Bank'

Nov 22, 2022
A mining pool proxy tool, support BTC, ETH, ETC, XMR mining pool, etc.

Tier2Pool A mining pool proxy tool, support BTC, ETH, ETC, XMR mining pool, etc. Build I use Ubuntu as a demo. sudo update sudo apt install git make s

Jul 29, 2022
A multilingual command line sentence tokenizer in Golang
A multilingual command line sentence tokenizer in Golang

Sentences - A command line sentence tokenizer This command line utility will convert a blob of text into a list of sentences. Demo Docs Install go get

Dec 30, 2022
go nmea sentence analyser

nmea0183 Externally configurable nmea0183 sentence analyser in a go package Status: Essentially functional but undergoing development and testing. Som

Mar 15, 2022
siusiu (suite-suite harmonics) a suite used to manage the suite, designed to free penetration testing engineers from learning and using various security tools, reducing the time and effort spent by penetration testing engineers on installing tools, remembering how to use tools.
siusiu (suite-suite harmonics) a suite used to manage the suite, designed to free penetration testing engineers from learning and using various security tools, reducing the time and effort spent by penetration testing engineers on installing tools, remembering how to use tools.

siusiu (suite-suite harmonics) a suite used to manage the suite, designed to free penetration testing engineers from learning and using various security tools, reducing the time and effort spent by penetration testing engineers on installing tools, remembering how to use tools.

Dec 12, 2022
Go efficient text segmentation and NLP; support english, chinese, japanese and other. Go 语言高性能分词

gse Go efficient text segmentation; support english, chinese, japanese and other. 简体中文 Dictionary with double array trie (Double-Array Trie) to achiev

Jan 8, 2023
Self-contained Japanese Morphological Analyzer written in pure Go
Self-contained Japanese Morphological Analyzer written in pure Go

Kagome v2 Kagome is an open source Japanese morphological analyzer written in pure golang. The dictionary/statistical models such as MeCab-IPADIC, Uni

Dec 24, 2022
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022
An easy-to-use OCR and Japanese to English translation tool
An easy-to-use OCR and Japanese to English translation tool

Manga Translator An easy-to-use application for translating text in images from Japanese to English. The GUI was created using Gio. Gio supports a var

Dec 28, 2022
qclean lets you to clean up search query in japanese.

qclean qclean lets you to clean up search query in japanese. This is mainly used to remove wasted space. Quick Start package main var cleaner *qclean

Jan 4, 2022
fim is a collection of some popular frequent itemset mining algorithms implemented in Go.

fim fim is a collection of some popular frequent itemset mining algorithms implemented in Go. fim contains the implementations of the following algori

Jul 14, 2022
Coalmine: De-mining canaries in common file formats
Coalmine: De-mining canaries in common file formats

Coalmine: De-mining canaries in common file formats Objective On-prem file checking for canaries prior to opening them in readers (e.g. Acrobat, Word,

May 20, 2022
Open Source Ethereum Mining Pool With Go
Open Source Ethereum Mining Pool With Go

Open Source Ethereum Mining Pool Features This pool is being further developed to provide an easy to use pool for Ethereum miners. This software is fu

Mar 10, 2022
Dcrpool : a stratum decred mining pool

dcrpool dcrpool is a stratum decred mining pool. It currently supports: Innosilicon D9 (default port: 5552, supported firmware: D9_20180602_094459.swu

Mar 10, 2022
Open Source Etho Mining Pool - tuned for 8000000 block hardfork on EthoProtocol blockchain.
 Open Source Etho Mining Pool - tuned for 8000000 block hardfork on EthoProtocol blockchain.

Open Source Etho Mining Pool - tuned for 8000000 block hardfork on EthoProtocol blockchain. image to be updated soon! Features This pool is being furt

Aug 13, 2022
Pet-blockchain-go is a simple proof of work mining algorithm in Go.

pet-blockchain-go Pet-blockchain-go is a simple proof of work mining algorithm in Go. Inspired by: cosme12 / SimpleCoin nosequeldeebee / blockchain-tu

Mar 10, 2022
This is a template project to help beginners learn, or to help developers develop some interesting small projects
This is a template project to help beginners learn, or to help developers develop some interesting small projects

This is a template project to help beginners learn, or to help developers develop some interesting small projects

Dec 13, 2022