It appears that reflected function calls are slow. See: https://github.com/golang/go/issues/7818
Need to investigate alternatives to reflected function calls.
Supporting data
Collected profile from benchmarks
$ go test -cpuprofile cpu.prof -memprofile mem.prof -bench . -benchmem
CPU Profile
$ go tool pprof -top -cum cpu.prof
Type: cpu
Time: Sep 11, 2018 at 4:18pm (PDT)
Duration: 1.32mins, Total samples = 285.11s (361.04%)
Showing nodes accounting for 253.95s, 89.07% of 285.11s total
Dropped 410 nodes (cum <= 1.43s)
flat flat% sum% cum cum%
164.30s 57.63% 57.63% 164.31s 57.63% runtime.usleep
0.51s 0.18% 57.81% 162.27s 56.91% runtime.lock
0.02s 0.007% 57.81% 160.68s 56.36% runtime.osyield
0.10s 0.035% 57.85% 94.14s 33.02% runtime.morestack
1.26s 0.44% 58.29% 94.04s 32.98% runtime.newstack
0.23s 0.081% 58.37% 88.93s 31.19% runtime.copystack
0.10s 0.035% 58.41% 88.50s 31.04% runtime.mcall
0.14s 0.049% 58.45% 77.98s 27.35% runtime.goexit0
0.31s 0.11% 58.56% 55.08s 19.32% runtime.stackfree
0.19s 0.067% 58.63% 54.77s 19.21% runtime.stackcacherelease
0.32s 0.11% 58.74% 47.48s 16.65% reflect.Value.Call
3.27s 1.15% 59.89% 47.02s 16.49% reflect.Value.call
0.63s 0.22% 60.11% 46.06s 16.16% runtime.schedule
0.73s 0.26% 60.37% 42.99s 15.08% github.com/dhui/thevent.(*Event).dispatch.func2
0.59s 0.21% 60.57% 42.64s 14.96% runtime.gfput
0 0% 60.57% 34.59s 12.13% testing.(*B).runN
0 0% 60.57% 34.55s 12.12% testing.(*B).launch
0.28s 0.098% 60.67% 34.35s 12.05% github.com/dhui/thevent.(*Event).dispatch
0.17s 0.06% 60.73% 32.13s 11.27% runtime.stackalloc
0.07s 0.025% 60.76% 31.94s 11.20% runtime.stackcacherefill
4.28s 1.50% 62.26% 29.30s 10.28% runtime.gentraceback
0.13s 0.046% 62.30% 27.47s 9.63% runtime.systemstack
5.71s 2.00% 64.31% 22.91s 8.04% reflect.implements
0.08s 0.028% 64.33% 21.90s 7.68% runtime.chansend1
0.09s 0.032% 64.36% 21.82s 7.65% runtime.chansend
0 0% 64.36% 20.17s 7.07% runtime.mstart
0.01s 0.0035% 64.37% 19.07s 6.69% runtime.newproc.func1
0.29s 0.1% 64.47% 19.06s 6.69% runtime.newproc1
0.04s 0.014% 64.48% 17.94s 6.29% runtime.findrunnable
0.03s 0.011% 64.49% 15.51s 5.44% github.com/dhui/thevent_test.BenchmarkEvents.func1.1
0.01s 0.0035% 64.50% 15.47s 5.43% github.com/dhui/thevent.(*Event).Dispatch
1.59s 0.56% 65.06% 15.47s 5.43% runtime.adjustframe
0.02s 0.007% 65.06% 15.05s 5.28% github.com/dhui/thevent_test.BenchmarkEvents.func1.2
0.02s 0.007% 65.07% 15.02s 5.27% github.com/dhui/thevent.(*Event).DispatchWithResults
7.74s 2.71% 67.78% 14.87s 5.22% runtime.pcvalue
0.38s 0.13% 67.92% 14.77s 5.18% reflect.(*rtype).AssignableTo
0.52s 0.18% 68.10% 14.50s 5.09% reflect.Value.assignTo
0 0% 68.10% 14.24s 4.99% runtime.semawakeup
0.38s 0.13% 68.23% 13.82s 4.85% runtime.unlock
0.63s 0.22% 68.45% 12.50s 4.38% runtime.gfget
2.18s 0.76% 69.22% 11.76s 4.12% runtime.getStackMap
0.01s 0.0035% 69.22% 10.18s 3.57% runtime.gfget.func1
0.05s 0.018% 69.24% 10.05s 3.52% runtime.park_m
9.51s 3.34% 72.58% 9.51s 3.34% runtime.pthread_cond_signal
1.61s 0.56% 73.14% 8.74s 3.07% reflect.(*rtype).typeOff
0.43s 0.15% 73.29% 8.50s 2.98% runtime.pcdatavalue
0.62s 0.22% 73.51% 7.44s 2.61% runtime.funcspdelta
1.48s 0.52% 74.03% 7.13s 2.50% reflect.resolveTypeOff
0 0% 74.03% 7.04s 2.47% runtime.semasleep
1.46s 0.51% 74.54% 6.83s 2.40% runtime.mallocgc
2.53s 0.89% 75.43% 6.58s 2.31% reflect.(*rtype).nameOff
5.44s 1.91% 77.34% 6.11s 2.14% runtime.step
5.78s 2.03% 79.36% 5.78s 2.03% runtime.pthread_cond_wait
0.30s 0.11% 79.47% 5.74s 2.01% runtime.runqput
1.34s 0.47% 79.94% 5.65s 1.98% runtime.(*_type).typeOff
0.10s 0.035% 79.97% 5.32s 1.87% runtime.runqputslow
0 0% 79.97% 5.17s 1.81% runtime.stopm
0 0% 79.97% 4.91s 1.72% runtime.notesleep
4.54s 1.59% 81.57% 4.54s 1.59% runtime.pthread_mutex_lock
3.45s 1.21% 82.78% 4.31s 1.51% runtime.resolveTypeOff
0.12s 0.042% 82.82% 4.08s 1.43% reflect.unsafe_New
1.90s 0.67% 83.48% 4.05s 1.42% reflect.resolveNameOff
0.43s 0.15% 83.63% 3.99s 1.40% reflect.directlyAssignable
0 0% 83.63% 3.94s 1.38% runtime.goschedImpl
0.01s 0.0035% 83.64% 3.58s 1.26% runtime.runqsteal
0 0% 83.64% 3.57s 1.25% runtime.gopreempt_m
0.02s 0.007% 83.64% 3.57s 1.25% runtime.runqgrab
1.53s 0.54% 84.18% 3.53s 1.24% reflect.(*rtype).Name
0.09s 0.032% 84.21% 3.40s 1.19% runtime.newproc
0 0% 84.21% 3.30s 1.16% runtime.gcstopm
0.42s 0.15% 84.36% 3.02s 1.06% reflect.funcLayout
0.28s 0.098% 84.46% 2.75s 0.96% runtime.makeslice
0.47s 0.16% 84.62% 2.58s 0.9% sync.(*Map).Load
0.02s 0.007% 84.63% 2.48s 0.87% github.com/dhui/thevent_test.BenchmarkEvents.func1.3
0 0% 84.63% 2.45s 0.86% github.com/dhui/thevent.(*Event).DispatchAsync
1.89s 0.66% 85.29% 2.22s 0.78% runtime.findfunc
2.15s 0.75% 86.05% 2.15s 0.75% runtime.resolveNameOff
0.08s 0.028% 86.08% 2.12s 0.74% github.com/dhui/thevent_test.BenchmarkEvents.func1.4.1
2.01s 0.7% 86.78% 2.11s 0.74% runtime.adjustpointers
0.06s 0.021% 86.80% 2.04s 0.72% runtime.chanrecv
0 0% 86.80% 2.04s 0.72% runtime.chanrecv2
0 0% 86.80% 2.04s 0.72% runtime.gcBgMarkWorker.func2
0.21s 0.074% 86.88% 2.04s 0.72% runtime.gcDrain
0.94s 0.33% 87.20% 2s 0.7% reflect.(*rtype).String
0.72s 0.25% 87.46% 2s 0.7% runtime.mapaccess2
0 0% 87.46% 1.95s 0.68% runtime.gcBgMarkWorker
1.10s 0.39% 87.84% 1.62s 0.57% runtime.heapBitsSetType
0.38s 0.13% 87.98% 1.57s 0.55% runtime.call64
1.56s 0.55% 88.52% 1.56s 0.55% reflect.name.name (inline)
1.54s 0.54% 89.06% 1.54s 0.54% runtime.memmove
0.01s 0.0035% 89.07% 1.52s 0.53% github.com/dhui/thevent_test.BenchmarkEvents.func1.4
0 0% 89.07% 1.44s 0.51% github.com/dhui/thevent.(*Event).DispatchAsyncWithResults
0.01s 0.0035% 89.07% 1.44s 0.51% runtime.gcAssistAlloc
Memory profile
$ go tool pprof -top -cum mem.prof
Type: alloc_space
Time: Sep 11, 2018 at 4:19pm (PDT)
Showing nodes accounting for 7010.28MB, 99.57% of 7040.50MB total
Dropped 29 nodes (cum <= 35.20MB)
flat flat% sum% cum cum%
0 0% 0% 6422.26MB 91.22% reflect.Value.Call
6421.25MB 91.20% 91.20% 6422.26MB 91.22% reflect.Value.call
0 0% 91.20% 3901.35MB 55.41% testing.(*B).runN
0 0% 91.20% 3894.65MB 55.32% testing.(*B).launch
392.02MB 5.57% 96.77% 3856.15MB 54.77% github.com/dhui/thevent.(*Event).dispatch
30.50MB 0.43% 97.21% 3096.12MB 43.98% github.com/dhui/thevent.(*Event).dispatch.func2
7.50MB 0.11% 97.31% 1841.07MB 26.15% github.com/dhui/thevent_test.BenchmarkEvents.func1.1
0 0% 97.31% 1833.57MB 26.04% github.com/dhui/thevent.(*Event).Dispatch
8MB 0.11% 97.43% 1738.57MB 24.69% github.com/dhui/thevent_test.BenchmarkEvents.func1.2
0 0% 97.43% 1730.57MB 24.58% github.com/dhui/thevent.(*Event).DispatchWithResults
12.50MB 0.18% 97.60% 198.01MB 2.81% github.com/dhui/thevent_test.BenchmarkEvents.func1.4
0 0% 97.60% 185.51MB 2.63% github.com/dhui/thevent.(*Event).DispatchAsyncWithResults
10.50MB 0.15% 97.75% 117MB 1.66% github.com/dhui/thevent_test.BenchmarkEvents.func1.3
107.50MB 1.53% 99.28% 107.50MB 1.53% reflect.ValueOf (inline)
0 0% 99.28% 106.50MB 1.51% github.com/dhui/thevent.(*Event).DispatchAsync
0 0% 99.28% 36.24MB 0.51% runtime.mstart
20.50MB 0.29% 99.57% 36.24MB 0.51% runtime.systemstack