⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

html-to-markdown

Go Report Card codecov GitHub MIT License GoDoc

gopher stading on top of a machine that converts a box of html to blocks of markdown

Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.

Installation

go get github.com/JohannesKaufmann/html-to-markdown

Usage

import md "github.com/JohannesKaufmann/html-to-markdown"

converter := md.NewConverter("", true, nil)

html = `<strong>Important</strong>`

markdown, err := converter.ConvertString(html)
if err != nil {
  log.Fatal(err)
}
fmt.Println("md ->", markdown)

If you are already using goquery you can pass a selection to Convert.

markdown, err := converter.Convert(selec)

Using it on the command line

If you want to make use of html-to-markdown on the command line without any Go coding, check out html2md, a cli wrapper for html-to-markdown that has all the following options and plugins builtin.

Options

The third parameter to md.NewConverter is *md.Options.

For example you can change the character that is around a bold text ("**") to a different one (for example "__") by changing the value of StrongDelimiter.

opt := &md.Options{
  StrongDelimiter: "__", // default: **
  // ...
}
converter := md.NewConverter("", true, opt)

For all the possible options look at godocs and for a example look at the example.

Adding Rules

converter.AddRules(
  md.Rule{
    Filter: []string{"del", "s", "strike"},
    Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
      // You need to return a pointer to a string (md.String is just a helper function).
      // If you return nil the next function for that html element
      // will be picked. For example you could only convert an element
      // if it has a certain class name and fallback if not.
      content = strings.TrimSpace(content)
      return md.String("~" + content + "~")
    },
  },
  // more rules
)

For more information have a look at the example add_rules.

Using Plugins

If you want plugins (github flavored markdown like striketrough, tables, ...) you can pass it to Use.

import "github.com/JohannesKaufmann/html-to-markdown/plugin"

// Use the `GitHubFlavored` plugin from the `plugin` package.
converter.Use(plugin.GitHubFlavored())

Or if you only want to use the Strikethrough plugin. You can change the character that distinguishes the text that is crossed out by setting the first argument to a different value (for example "~~" instead of "~").

converter.Use(plugin.Strikethrough(""))

For more information have a look at the example github_flavored.

Writing Plugins

Have a look at the plugin folder for a reference implementation. The most basic one is Strikethrough.

Other Methods

Godoc

func (c *Converter) Keep(tags ...string) *Converter

Determines which elements are to be kept and rendered as HTML.

func (c *Converter) Remove(tags ...string) *Converter

Determines which elements are to be removed altogether i.e. converted to an empty string.

Issues

If you find HTML snippets (or even full websites) that don't produce the expected results, please open an issue!

Related Projects

Owner
Johannes Kaufmann
Finance and Operations @ Code+Design & Software Engineering Student @ CODE
Johannes Kaufmann
Comments
  • Mention wrapper program in README.md?

    Mention wrapper program in README.md?

    Hi @JohannesKaufmann

    I love your project so much that I added a wrapper program to it:

    $ html2md -i https://github.com/suntong/lang
    [Homepage](https://github.com/)
    . . . 
    
    
    $ html2md -i https://github.com/suntong/lang -s 'div#readme'   
    ## README.md
    
    # lang -- programming languages demos
    

    Would it be OK that I PR to README.md to mention html2md when it is ready? So far I'm having these planned out:

    $ html2md
    HTML to Markdown
    Version 0.1.0 built on 2020-07-26
    Copyright (C) 2020, Tong Sun
    
    HTML to Markdown converter on command line
    
    Usage:
      html2md [Options...]
    
    Options:
    
      -h, --help                       display help information 
      -i, --in                        *The html/xml file to read from (or stdin) 
      -d, --domain                     Domain of the web page, needed for links when --in is not url 
      -s, --sel                        CSS/goquery selectors [=body]
      -v, --verbose                    Verbose mode (Multiple -v options increase the verbosity.) 
    
          --opt-heading-style          Option HeadingStyle 
          --opt-horizontal-rule        Option HorizontalRule 
          --opt-bullet-list-marker     Option BulletListMarker 
          --opt-code-block-style       Option CodeBlockStyle 
          --opt-fence                  Option Fence 
          --opt-em-delimiter           Option EmDelimiter 
          --opt-strong-delimiter       Option StrongDelimiter 
          --opt-link-style             Option LinkStyle 
          --opt-link-reference-style   Option LinkReferenceStyle 
    
      -A, --plugin-conf-attachment     Plugin ConfluenceAttachments 
      -C, --plugin-conf-code           Plugin ConfluenceCodeBlock 
      -F, --plugin-frontmatter         Plugin FrontMatter 
      -G, --plugin-gfm                 Plugin GitHubFlavored 
      -S, --plugin-strikethrough       Plugin Strikethrough 
      -T, --plugin-table               Plugin Table 
      -L, --plugin-task-list           Plugin TaskListItems 
      -V, --plugin-vimeo               Plugin VimeoEmbed 
      -Y, --plugin-youtube             Plugin YoutubeEmbed 
    

    Thanks

  • New confluence code block parser plugin

    New confluence code block parser plugin

    Hi @JohannesKaufmann

    I have the pleasure of working with this library. I had to add a confluence page parser to parse out code blocks. And I thought I'd add it back to you if you like the plugin / you think the changes are appropriate.

    Thanks for your great work on this! :)

  • html <br> not suport.

    html
    not suport.

    var html =`
    <p>1. xxx <br/>2. xxxx<br/>3. xxx</p><p><span class="img-wrap"><img src="xxx"></span><br>4. golang<br>a. xx<br>b. xx</p>
    `
    
    func Test_md(t *testing.T) {
    	var converter = md.NewConverter("", true, nil)
    	md_str,_ := converter.ConvertString(html)
    	println(md_str)
    }
    

    output

    1\. xxx 2\. xxxx3\. xxx
    
    ![](xxx)4\. golanga. xxb. xx
    

    want

    1. xxx 
    2. xxxx
    3. xxx
    
    ![](xxx)
    4. golang
    a. xx
    b. xx
    
  • Unexpected result with additional rule for custom self-closing tags

    Unexpected result with additional rule for custom self-closing tags

    I was following this example to write a rule to process custom <mention> tags in my input: https://github.com/JohannesKaufmann/html-to-markdown/blob/master/examples/custom_tag/main.go

    Result was quite surprising, however not sure if this is a bug or misuse or maybe some limitations of the library?

    Code:

    package main
    
    import (
    	"fmt"
    	"log"
    
    	md "github.com/JohannesKaufmann/html-to-markdown"
    	"github.com/PuerkitoBio/goquery"
    )
    
    func main() {
    	html := `
    	test
    	
    	<mention user="user1" />
    	<mention user="user2" />
    	<mention user="user3" />
    
    	blabla
    	`
    
    	rule := md.Rule{
    		Filter: []string{"mention"},
    		Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
    			result := "@"
    
    			u, ok := selec.Attr("user")
    			if ok {
    				result += u
    			} else {
    				result += "unknown"
    			}
    
    			return &result
    		},
    	}
    
    	conv := md.NewConverter("", true, nil)
    	conv.AddRules(rule)
    
    	markdown, err := conv.ConvertString(html)
    	if err != nil {
    		log.Fatalln(err)
    	}
    
    	fmt.Println("markdown:\n", markdown)
    }
    

    Expected output:

    markdown:
     test
    	
     @user1
     @user2
     @user3
    
     blabla
    

    Observed output:

    markdown:
     test
    
     @user1
    
    

    Moreover, if I put these strings to debug what is going on in Replacement calls, it becomes even more weird:

    		Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
    			result := "@"
    
    			u, ok := selec.Attr("user")
    			if ok {
    				result += u
    			} else {
    				result += "unknown"
    			}
    
    			html, err := selec.Html()
    			if err != nil {
    				log.Fatalln(err)
    			}
    
    			fmt.Println("content:", content)
    			fmt.Println("selec:", html)
    			fmt.Println("result:", result)
    
    			return &result
    		},
    

    Output:

    content: 
    
     blabla  
    
    selec:
    
            blabla 
    
    result: @user3 
    content: @user3
    selec:
            <mention user="user3">
    
            blabla
            </mention>
    result: @user2
    content: @user2
    selec:
            <mention user="user2">
            <mention user="user3">
    
            blabla
            </mention></mention>
    result: @user1
    
  • Nested lists aren't converted correctly

    Nested lists aren't converted correctly

    Describe the bug I'm seeing a problem converting nested HTML lists. The problem appears with either ordered (<ol>) or unordered (<ul>) lists.

    HTML Input

    <ol>
    	<li>One</li>
    	<ol>
    		<li>One point one</li>
    		<li>One point two</li>
    	</ol>
    </ol>
    

    Generated Markdown

    1. One
    
    1. One point one
    2. One point two
    

    Expected Markdown

    1. One
        1. One point one
        2. One point two
    

    Additional context I see this with the latest version (1.2.1). I'm using the following test code to check this:

    package main
    
    import (
    	"fmt"
    	"log"
    
    	md "github.com/JohannesKaufmann/html-to-markdown"
    )
    
    func main() {
    	converter := md.NewConverter("", true, nil)
    
    	html := `
    <ol>
    	<li>One</li>
    	<ol>
    		<li>One point one</li>
    		<li>One point two</li>
    	</ol>
    </ol>
    `
    
    	markdown, err := converter.ConvertString(html)
    	if err != nil {
    		log.Fatal(err)
    	}
    	fmt.Printf("md ->\n%s\n", markdown)
    }
    
    

    Thanks for the library!

  • Broken output with new lines between tags

    Broken output with new lines between tags

    The problem may appear in a wider amount of cases, but what I've got so far is the following:

    There are text posts with links to videos in specific tags

    <video>https://youtu.be/SoMeViD</video>\r\n<video>https://youtu.be/SoMeViD</video>
    

    html-to-markdown doesn't understand them, which is absolutely fine, I just want it to leave for further processing. When there is one, or they are separated with some elements - no problem at all, everything works perfectly. However when there two or more, it results in:

    https://youtu.be/BpDqa2K0hvIhttps://youtu.be/GfE2D62bMTE
    

    Or, if I wanted to make a regular link from it, or embed in iframe I would get this: https://youtu.be/BpDqa2K0hvIhttps://youtu.be/GfE2D62bMTE

    I think in such a case separators between tags, such as , \t, &nbsp;, \n, or \r\n should be kept.

  • 🐛 Bug with square brackets

    🐛 Bug with square brackets

    Describe the bug

    Found an issue with square brackets in the input which is confusing me. They end up being converted to \$& in the output. This seems to happen whether they are written in the html as [], &lbrack;, or &#91;.

    HTML Input

    <p>first [literal] brackets</p>
    <p>then &#91;one&#93; way to escape</p>
    <p>then &lbrack;another&rbrack; one</p>
    

    Generated Markdown

    first \$&literal\$& brackets
    
    then \$&one\$& way to escape
    
    then \$&another\$& one
    

    Expected Markdown

    first \[literal\] brackets
    
    then &#91;one&#93; way to escape
    
    then &lbrack;another&rbrack; one
    

    Additional context

    I had this issue come up with some options configured, but then went ahead and removed all configuration to test and I'm still seeing it. Is it something on my end I'm doing incorrectly perhaps? I'm not very experienced with golang so it's possible I'm making a silly error.

  • 🐛 Bug Can not handle img

    🐛 Bug Can not handle img

    Describe the bug A clear and concise description of what the bug is.

    HTML Input

    <figure><img class="lazyload inited loaded" data-src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png" data-width="800" data-height="600" src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png"><figcaption></figcaption></figure>
    

    Generated Markdown

    <img class="lazyload inited loaded" data-src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png" data-width="800" data-height="600" src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png">
    

    Expected Markdown

    nonting
    
  • 🐛 Bug: Support `<tt>` for code next to `<code>` tags

    🐛 Bug: Support `` for code next to `` tags

    Describe the bug Unfortunately, some sites don't use semantic markup, e.g., http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/ but instead specify the font directly using tt. Since markdown draws no distinction b/w code and things simply formatted in "typewriter style", these should be recognized at well (or, at least, as a plugin).

    HTML Input

    <tt>Some typewriter text</tt>
    

    Generated Markdown

    Some typewriter text
    

    Expected Markdown

    `Some typewriter text`
    

    Additional context N/A

  • Extra <span> elements in <code> blocks

    Extra elements in blocks

    Some websites use <code> blocks with <span> elements inside. It seems to be the case when the syntax highlighting is computed server-side, rather than on the browser with some JS library such as prettify.

    To reproduce:

    func main() {
    	converter := md.NewConverter("", true, nil)
    	url := "https://atomizedobjects.com/blog/javascript/how-to-get-the-last-segment-of-a-url-in-javascript"
    	markdown, _ := converter.ConvertURL(url)
    	fmt.Println("markdown)
    }
    

    What I get (scrolling down a bit):

    ``js
    window<span class="token punctuation">.</span>location<span class="token punctuation">.</span>pathname<span class="token punctuation">.</span><span class="token function">split</span><span class="token punctuation">(</span><span class="token string">"/"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">filter</span><span class="token punctuation">(</span><span class="token parameter">entry</span> <span class="token operator">=></span> entry <span class="token operator">!==</span> <span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token comment">// ["blog", "javascript", "how-to-get-the-last-segment-of-a-url-in-javascript"]</span>
    ``
    

    What you get if you just remove all <span> elements from the generated markdown:

    window.location.pathname.split("/").filter(entry => entry !== "");
    // ["blog", "javascript", "how-to-get-the-last-segment-of-a-url-in-javascript"]
    

    I know that an easy workaround on my side would be to just clean things up with goquery, but I figured it would be better to have it fixed here directly.

    Thanks!

  • 🐛 `<` and `>` should not be converted to `<` and `>`

    🐛 `<` and `>` should not be converted to `<` and `>`

    Describe the bug

    &lt; and &gt; should not be converted to < and >, it breaks the resulting markdown.

    HTML Input

    &lt;not a tag&gt;
    

    Generated Markdown

    <not a tag>
    

    Expected Markdown

    &lt;not a tag&gt;
    

    Additional context Markdown parsers take <not a tag> as a tag and do not show it. That's not what is in the HTML though.

    Example: https://spec.commonmark.org/dingus/?text=%3Cnot%20a%20tag%3E%0A%0A%26lt%3Bsecond%26gt%3B

  • Bump github.com/yuin/goldmark from 1.4.14 to 1.5.3

    Bump github.com/yuin/goldmark from 1.4.14 to 1.5.3

    Bumps github.com/yuin/goldmark from 1.4.14 to 1.5.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • Potential issue in the Table plugin with the isFirstTbody logic

    Potential issue in the Table plugin with the isFirstTbody logic

    Hello in the table.go plugin there's an issue with the firstSibling logic in the isFirstTbody function.

    func isFirstTbody(s *goquery.Selection) bool { firstSibling := s.Siblings().Eq(0) // TODO: previousSibling if s.Is("tbody") && firstSibling.Length() == 0 { return true } return false }

    I'm retrieving tables from confluence html format tbody-tr-th's. Somehow the firstSibling.Length() is not 0 haven't figured it out completely but when I comment it out it seems to do what it's supposed to do although might introduce a new bug :).

    github.com/JohannesKaufmann/html-to-markdown v1.3.6 github.com/PuerkitoBio/goquery v1.8.0

  • 🐛 Bug: Support MathJax custom tags

    🐛 Bug: Support MathJax custom tags

    Describe the bug MathJax is a JavaScript library allowing to add "custom tags" such as $...$ to HTML which will then be turned into e.g., MathML or whatever the browser supports.

    Depending on the Markdown implementation math is either not supported at all -- or directly through the same syntax. Either way, it'd probably make most sense to simply keep $...$ expressions intact and not escape strings contained therein. While a simple filter for that would certainly work, MathJax allows supporting different escape characters than $...$ for inline- and $$...$$ for display-math, e.g., from the article https://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/:

    <script>
    window.MathJax = {
      tex: {
        tags: "ams",                                                                       inlineMath: [ ['$','$'], ['\\(', '\\)'] ],
        displayMath: [ ['$$','$$'] ],
        processEscapes: true,
      },
      options: {
        skipHtmlTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
      },
      loader: {
        load: ['[tex]/amscd']                                                            }
    };
    </script>
    

    This would necessate parsing Js though ...

    HTML Input

    some formula: $\lambda$
    

    Generated Markdown

    some formula: $\\lambda$
    

    Expected Markdown

    some formula: $\lambda$
    

    Additional context This filter (or "unfilter") may be only activated, if MathJax is detected, and otherwise disabled. Further, as mentioned earlier, a more sophisticated parsing of the HTML may be used to detect the precise math-HTML tags used or make them configurable at the least.

  • 🐛 Bug <br> is converted into two new lines (\n\n)

    🐛 Bug
    is converted into two new lines (\n\n)

    Describe the bug

    In my testing I've found that the HTML tag <br /> gets turned into two new lines (\n\n);

    Example:

    (⎈ |local:default)
    prologic@Jamess-iMac
    Mon Aug 02 11:37:55
    ~/tmp/html2md
     (master) 130
    $ ./html2md -i
    Hello<br />World
    Hello
    
    World
    

    HTML Input

    Hello<br />World
    

    Generated Markdown

    Hello
    
    World
    

    Expected Markdown

    Hello
    World
    

    Additional context

    Is there any way to control this behaviour? I get that this might be getting interpreted as a "paragraph", but I would only expect that if there are two <br />(s) or an actual paragraph <p>...</p>. Thanks!

  • Spacing & numbering issues with nested lists

    Spacing & numbering issues with nested lists

    Describe the bug

    I see a couple issues with nested lists.

    One issue is that there are extra line breaks between list items in nested lists. When I render this in my application, it wraps text with a <p> if there's an extra line break (which has implications for margin/padding).

    Another (small) issue I see is that numbering gets off for numbered lists. I realize this doesn't matter with Markdown, but I thought I'd note it.

    HTML Input

    <p>
      The Corinthos Center for Cancer will be partially closed for remodeling
      starting <strong>4/15/21</strong>. Patients should be redirected as space
      permits in the following order:
    </p>
    <ol>
      <li>Metro Court West.</li>
      <li>Richie General.</li>
      <ol>
        <li>This place is ok.</li>
        <li>Watch out for the doctors.</li>
        <ol>
          <li>They bite.</li>
          <li>But not hard.</li>
        </ol>
      </ol>
      <li>Port Charles Main.</li>
    </ol>
    <p>For further information about appointment changes, contact:</p>
    <ul>
      <li>Dorothy Hardy</li>
      <ul>
        <li><em>Head of Operations</em></li>
        <ul>
          <li><em>Interim</em></li>
        </ul>
      </ul>
      <li>[email protected]</li>
      <li>555-555-5555</li>
    </ul>
    <p>
      <em>The remodel is </em
      ><a href="http://www.google.com/" target="_self"><em>expected</em></a
      ><em> to complete in June 2021.</em>
      <strong><em>Timeframe subject to change</em></strong
      ><em>.</em>
    </p>
    

    Generated Markdown

    The Corinthos Center for Cancer will be partially closed for remodeling
    starting **4/15/21**. Patients should be redirected as space
    permits in the following order:
    
    1. Metro Court West.
    2. Richie General.
    
       1. This place is ok.
       2. Watch out for the doctors.
          1. They bite.
          2. But not hard.
    
    4. Port Charles Main.
    
    For further information about appointment changes, contact:
    
    - Dorothy Hardy
    
      - _Head of Operations_
        - _Interim_
    
    - [email protected]
    - 555-555-5555
    
    _The remodel is_ [_expected_](http://www.google.com/) _to complete in June 2021._ **_Timeframe subject to change_** _._
    

    Note how there are extra line breaks after "2. Richie General.", " 2. But not hard.", "- Dorothy Hardy", and " - Interim".

    Also note how "4. Port Charles Main." should be "3. Port Charles Main.".

    Expected Markdown

    The Corinthos Center for Cancer will be partially closed for remodeling
    starting **4/15/21**. Patients should be redirected as space
    permits in the following order:
    
    1. Metro Court West.
    2. Richie General.
       1. This place is ok.
       2. Watch out for the doctors.
          1. They bite.
          2. But not hard.
    3. Port Charles Main.
    
    For further information about appointment changes, contact:
    
    - Dorothy Hardy
      - _Head of Operations_
        - _Interim_
    - [email protected]
    - 555-555-5555
    
    _The remodel is_ [_expected_](http://www.google.com/) _to complete in June 2021._ **_Timeframe subject to change_** _._
    

    Additional context

    I see this with the latest version (1.3.0). I'm using no plugins.

    Thanks for the utility!

  • Is `Converter` safe for use by multiple goroutines?

    Is `Converter` safe for use by multiple goroutines?

    This should be documented. Is it safe to use by multiple goroutines? Am I expected to use one single instance of Converter with same configuration across my app, or to create new in each case? What's the design, what are performance considerations?

    PS: there is sync.RWMutex within Converter struct, so the answer is probably yes, but, again, this should be documented to not guess or reverse engineer.

Take screenshots of websites and create PDF from HTML pages using chromium and docker

gochro is a small docker image with chromium installed and a golang based webserver to interact wit it. It can be used to take screenshots of w

Nov 23, 2022
🚩 TOC, zero configuration table of content generator for Markdown files, create table of contents from any Markdown file with ease.
🚩 TOC, zero configuration table of content generator for Markdown files, create table of contents from any Markdown file with ease.

toc toc TOC, table of content generator for Markdown files Table of Contents Table of Contents Usage Installation Packages Arch Linux Homebrew Docker

Dec 29, 2022
Markdown - Markdown converter for golang

markdown ?? Talks ?? Join ?? Youtube ❤️ Sponsor Install via nami nami install ma

Jun 2, 2022
Mdfmt - A Markdown formatter that follow the CommonMark. Like gofmt, but for Markdown

Introduction A Markdown formatter that follow the CommonMark. Like gofmt, but fo

Dec 18, 2022
golang program that simpily converts html into markdown

Simpily converts html to markdown Just a simple project I wrote in golang to convert html to markdown, surprisingly works decent for a lot of websites

Oct 23, 2021
Simple Markdown to Html converter in Go.

Markdown To Html Converter Simple Example package main import ( "github.com/gopherzz/MTDGo/pkg/lexer" "github.com/gopherzz/MTDGo/pkg/parser" "fm

Jan 29, 2022
Golang library for converting Markdown to HTML. Good documentation is included.

md2html is a golang library for converting Markdown to HTML. Install go get github.com/wallblog/md2html Example package main import( "github.com/wa

Jan 11, 2022
Godown - Markdown to HTML converter made with Go

Godown Godown is a tiny-teeny utility that helps you convert your Markdown files

Jan 18, 2022
Convert Microsoft Word Document to Markdown
Convert Microsoft Word Document to Markdown

docx2md Convert Microsoft Word Document to Markdown Usage $ docx2md NewDocument.docx Installation $ go get github.com/mattn/docx2md Supported Styles

Jan 4, 2023
Easily to convert JSON data to Markdown Table

Easily to convert JSON data to Markdown Table

Oct 28, 2022
Convert your markdown files to PDF instantly
Convert your markdown files to PDF instantly

Will take a markdown file as input and then create a PDF file with the markdown formatting.

Nov 7, 2022
bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

bluemonday bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user generated content as

Jan 4, 2023
gomtch - find text even if it doesn't want to be found

gomtch - find text even if it doesn't want to be found Do your users have clever ways to hide some terms from you? Sometimes it is hard to find forbid

Sep 28, 2022
Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

pfcfg-parser version 0.0.1 : 13 January 2022 A quick and simple parser for PFSense XML configuration files to generate a plain text file of the main c

Jan 13, 2022
Parse data and test fixtures from markdown files, and patch them programmatically, too.

go-testmark Do you need test fixtures and example data for your project, in a language agnostic way? Do you want it to be easy to combine with documen

Oct 31, 2022
Glow is a terminal based markdown reader designed from the ground up to bring out the beauty—and power—of the CLI.💅🏻
Glow is a terminal based markdown reader designed from the ground up to bring out the beauty—and power—of the CLI.💅🏻

Glow Render markdown on the CLI, with pizzazz! What is it? Glow is a terminal based markdown reader designed from the ground up to bring out the beaut

Dec 30, 2022
A clean, Markdown-based publishing platform made for writers. Write together, and build a community.
A clean, Markdown-based publishing platform made for writers. Write together, and build a community.

WriteFreely is a clean, minimalist publishing platform made for writers. Start a blog, share knowledge within your organization, or build a community

Jan 4, 2023
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str

Dec 13, 2022
Blackfriday: a markdown processor for Go

Blackfriday Blackfriday is a Markdown processor implemented in Go. It is paranoid about its input (so you can safely feed it user-supplied data), it i

Jan 8, 2023