Temporal service and CLI

Last update: Jan 1, 2023

Comments: 16

Temporal

Temporal is a microservice orchestration platform which enables developers to build scalable applications without sacrificing productivity or reliability. Temporal server executes units of application logic, Workflows, in a resilient manner that automatically handles intermittent failures, and retries failed operations.

Temporal is a mature technology, a fork of Uber's Cadence. Temporal is being developed by Temporal Technologies, a startup by the creators of Cadence.

Learn more about Temporal at docs.temporal.io.

Getting Started

Download and Start Temporal Server Locally

Execute the following commands to start a pre-built image along with all the dependencies.

git clone https://github.com/temporalio/docker-compose.git
cd docker-compose
docker-compose up

Refer to Temporal docker-compose repo for more advanced options.

Run the Samples

Clone or download samples for Go or Java and run them with the local Temporal server. We have a number of HelloWorld type scenarios available, as well as more advanced ones. Note that the sets of samples are currently different between Go and Java.

Use CLI

Use Temporal's command line tool tctl to interact with the local Temporal server.

alias tctl="docker exec temporal-admin-tools tctl"
tctl namespace list
tctl workflow list

Use Temporal Web UI

Try Temporal Web UI by opening http://localhost:8088 for viewing your sample workflows executing on Temporal.

Repository

This repository contains the source code of the Temporal server. To implement Workflows, Activities and Workers, use Go SDK or Java SDK.

Contributing

We'd love your help in making Temporal great. Please review our contribution guide.

If you'd like to work on or propose a new feature, first peruse feature requests and our proposals repo to discover existing active and accepted proposals.

Feel free to join the Temporal community or Slack channel to start a discussion or check if a feature has already been discussed. Once you're sure the proposal is not covered elsewhere, please follow our proposal instructions or submit a feature request.

License

MIT License

Owner

temporal.io

Temporal Workflow as Code

https://github.com/temporalio/temporal/

Comments

add address translation to Cassandra persistence

What changed?

This PR adds the possibility to specify so called address translator for Cassandra nodes. Address translation is well-known feature in Datastax Java driver as well as in gocql driver.

Address translation is used in case the client can not resolve the IP addresses which Cassandra gocql driver returns. This happens in case when, for example, Cassandra nodes are running behind some kind of a proxy or load balancer so internal IP addresses do not make sense to the client side. Address translator solves this so the addresses are meaningful.

In Temporal context, we need to run Cassandra nodes in Amazon PrivateLink setup. This is very much deployment specific but the basic idea is that you have a running Cassandra cluster in one VPC and temporal server in the other and you are connecting to Cassandra via exposed, so called Endpoint and Endpoint service. This is very handy deployment model for businesses which would like to have their Temporal server in their vpc / account and they would like to just use Cassandra cluster as a service which is possibly deployed by 3rd entity.

WARNING: I have fixed the address translation in upstream gocql repository in this commit (1) for Cassandra 4.x nodes. For Cassandra 3.x nodes, the port will be static - 9042. gocql driver with these changes was released just yesterday (version v1.2.0). You are using version 1.1.0 so in order to leverage this feature, you need to update driver to version 1.2.0 first.

(1) https://github.com/gocql/gocql/pull/1635

Approach

Address translators should be pluggable because my way how I am translating addresses are not suitable for everybody so I provided my solution but it should be very easy to add any other translator and it should be possible just to switch between them via configuration.

My translator is called "fixed address translator" because from the point of view of the client, it will ever connect only to the very same domain name but the only thing which differs will be port - to recognise the instances. (each Cassadra node will run on different CQL port). That way we can make the difference between the nodes behind a load balancer when we hit different CQL port of load balancer address.

This is nothing special, we are already offering fixed address translator to Datastax Java driver https://github.com/datastax/java-driver/pull/1597

For gocql, this driver already offers address translator mechanism but you just need to add your own translator implementation in the code. For Temporal, as it is a black box from user's point of view, we need to implement address translator in temporal codebase and provide the way how to plug it into the driver on the Temporal's initialisation.

I was looking for a way how to achieve this and I noticed you are using "auth plugin" infrastructure. I basically copied this plugging logic and I accommodated it for address translator case - this PR is about it.

How did you test it?

I used your docker compose setup. The way I tested it was that I run all containers in one VPC but I removed Cassandra container. I have configured compose and its exposed env variables in such a way that it will connect to remote Cassandra cluster running elsewhere, behind loadbalancer / endpoint service in Amazon PrivateLink.

I have, of course, run temporal unit tests.

Changed docker compose scripts are here:

https://github.com/instaclustr/temporalio-docker-builds/commit/ed8f5a903cdddf24de0fb3d7217f88d6c5583ee1

This will likely need to be included into docker builds script as well.

Docker compose script:

https://gist.github.com/smiklosovic/ccecfdfd280da5fbe1299eacacea4d24

Potential risks

Nothing is going to break when this feature is not used. This PR is no-op meaning the current depoyments are not affected. One needs to specifically enable this feature to see it in action.

Is hotfix candidate?

I assume this is new feature, not hotfix, and Temporal community should be informed about this feature.

Error: Add search attribute failed

Expected Behavior

I should be able to add a new search attribute "UserID" of type Keyword to Temporal for filtering.

Actual Behavior

I initially receive an error when trying to add attribute. Upon listing added keys, I see the newly added key but it is of the wrong type (String).

Steps to Reproduce the Problem

Deploy the latest Temporal 0.26.0 with Elasticsearch enabled.
Add a new search attribute by running the following command: tctl --namespace default adm cl asa --search_attr_key UserID --search_attr_type 1
Observe the following error and stack trace:

bash-5.0# tctl  --namespace default adm cl asa --search_attr_key UserID --search_attr_type 1
Are you trying to add key [UserID] with Type [Keyword]? Y/N y

Error: Add search attribute failed.
Error Details: Failed to update dynamic config, err: unable to update key.
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack(0xd, 0x0, 0x0)
	/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/temporalio/temporal/tools/cli.printError(0x1c5008f, 0x1c, 0x2003cc0, 0xc00000c300)
	/temporal/tools/cli/util.go:526 +0x2ad
github.com/temporalio/temporal/tools/cli.ErrorAndExit(0x1c5008f, 0x1c, 0x2003cc0, 0xc00000c300)
	/temporal/tools/cli/util.go:537 +0x49
github.com/temporalio/temporal/tools/cli.AdminAddSearchAttribute(0xc000127b80)
	/temporal/tools/cli/adminClusterCommands.go:61 +0x50f
github.com/temporalio/temporal/tools/cli.newAdminClusterCommands.func1(0xc000127b80)
	/temporal/tools/cli/admin.go:812 +0x2b
github.com/urfave/cli.HandleAction(0x18cf500, 0x1cc9b88, 0xc000127b80, 0xc000127b80, 0x0)
	/go/pkg/mod/github.com/urfave/[email protected]/app.go:528 +0x7c
github.com/urfave/cli.Command.Run(0x1c3692d, 0xf, 0x0, 0x0, 0xc0006998c0, 0x1, 0x1, 0x1c4d6ff, 0x1a, 0x0, ...)
	/go/pkg/mod/github.com/urfave/[email protected]/command.go:174 +0x57a
github.com/urfave/cli.(*App).RunAsSubcommand(0xc0004d9c00, 0xc000127760, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/[email protected]/app.go:407 +0x915
github.com/urfave/cli.Command.startApp(0x1c2a872, 0x7, 0x0, 0x0, 0xc0006999f0, 0x1, 0x1, 0x1c551de, 0x1e, 0x0, ...)
	/go/pkg/mod/github.com/urfave/[email protected]/command.go:373 +0x845
github.com/urfave/cli.Command.Run(0x1c2a872, 0x7, 0x0, 0x0, 0xc0006999f0, 0x1, 0x1, 0x1c551de, 0x1e, 0x0, ...)
	/go/pkg/mod/github.com/urfave/[email protected]/command.go:102 +0xa2b
github.com/urfave/cli.(*App).RunAsSubcommand(0xc0004d9a40, 0xc000127600, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/[email protected]/app.go:407 +0x915
github.com/urfave/cli.Command.startApp(0x1c27ae6, 0x5, 0x0, 0x0, 0xc000699970, 0x1, 0x1, 0x1c3df67, 0x13, 0x0, ...)
	/go/pkg/mod/github.com/urfave/[email protected]/command.go:373 +0x845
github.com/urfave/cli.Command.Run(0x1c27ae6, 0x5, 0x0, 0x0, 0xc000699970, 0x1, 0x1, 0x1c3df67, 0x13, 0x0, ...)
	/go/pkg/mod/github.com/urfave/[email protected]/command.go:102 +0xa2b
github.com/urfave/cli.(*App).Run(0xc0004d96c0, 0xc00010c000, 0xa, 0xa, 0x0, 0x0)
	/go/pkg/mod/github.com/urfave/[email protected]/app.go:279 +0x7c7
main.main()
	/temporal/cmd/tools/cli/main.go:37 +0x4b

List the search attributes and notice the wrong type:

bash-5.0#  tctl --namespace default cl get-search-attr
+-----------------------+------------+
|          KEY          | VALUE TYPE |
+-----------------------+------------+
| BinaryChecksums       | Keyword    |
| CloseTime             | Int        |
| CustomBoolField       | Bool       |
| CustomDatetimeField   | Datetime   |
| CustomDoubleField     | Double     |
| CustomIntField        | Int        |
| CustomKeywordField    | Keyword    |
| CustomStringField     | String     |
| ExecutionStatus       | Int        |
| ExecutionTime         | Int        |
| HistoryLength         | Int        |
| NamespaceId           | Keyword    |
| RunId                 | Keyword    |
| StartTime             | Int        |
| TaskQueue             | Keyword    |
| TemporalChangeVersion | Keyword    |
| UserID                | String     |
| WorkflowId            | Keyword    |
| WorkflowType          | Keyword    |
+-----------------------+------------+

Additional Questions

Now that I have an incorrect attribute added, is there any way to edit or remove it? Does the search attribute need to be registered explicitly by the CLI before it can be used elsewhere (go-sdk)? The docs mention something here but it's still a bit unclear: https://docs.temporal.io/docs/learn-workflow-filtering#search-attributes-go-client-usage

Specifications

Version: 0.26.0
Platform: Ubuntu 18.04

Repeated "New timer generated is less than read level" in logs after adding node to cluster
Expected Behavior

After adding a second node to a newly-bootstrapped one-node temporal cluster, logs should be generally quiet.

Actual Behavior

Instead, after adding the second node, I see "New timer generated is less than read level" repeated once per second in the logs of the original node (logs of second node are quiet). Full example log line:

{"level":"warn","ts":"2020-06-01T18:08:52.273-0700","msg":"New timer generated is less than read level","service":"history","shard-id":111,"address":"[redacted]:7234","wf-namespace-id":"892c1d1f-e202-49dc-bc77-97268a84c9ef","wf-id":"eae6d2bc-89fe-4ac5-a544-a8087e3deede","timestamp":"2020-06-01T17:14:41.689-0700","cursor-timestamp":"2020-06-01T18:08:53.265-0700","shard-update":"ShardAllocateTimerBeforeRead","logging-call-at":"shardContext.go:1130"}

Note that the IP address in the address field refers to the same node that is generating the log.

Steps to Reproduce the Problem

Set up a fresh Cassandra keyspace and ES index.

Boot a single Temporal node and create a test namespace.

Start a workflow/activity worker and start a workflow execution to test that it works.

Boot a second Temporal node and wait for it to finish joining the cluster.

Repeated log message should start.

I am not certain that step 3 is required, but this is what I did, so I'm detailing it here.

Specifications

Version: 0.23.1

Platform: Amazon Linux 2018.03 + Docker 18.09.9

More Information

After running tctl admin cluster describe against both Temporal nodes, I can see that they both have the same view of the cluster, and that they both see all four services on both nodes.
Prevent incorrect service discovery with multiple Temporal clusters
Is your feature request related to a problem? Please describe.

I run multiple separate Temporal clusters within a single k8s cluster. Each Temporal cluster has its own separate set of frontend, history, and matching services as well as persistence. Let's say I am running two Temporal clusters called "A" and "B" in a single k8s cluster. Note that in my setup, there are no networking restrictions on pods within the k8s cluster -- any pod may connect to any other pod if the IP address is known.

I recently encountered a problem where it appeared that a frontend service from Temporal cluster A was talking to a matching service from Temporal cluster B. This happened during a time where the pods in both of the Temporal clusters were getting cycled a lot due to some AZ balancing automation. It also happens that this particular k8s cluster is configured in such a way that pod IP address reuse is more likely than usual.

Both Temporal cluster A and B are running 3 matching nodes each. However, I saw this log line on Temporal cluster A's frontend service:

{"level":"info","ts":"2021-01-27T00:34:15.414Z","msg":"Current reachable members","service":"frontend","component":"service-resolver","service":"matching","addresses":"[100.123.207.80:7235 100.123.65.65:7235 100.123.120.28:7235 100.123.60.187:7235 100.123.17.255:7235 100.123.203.172:7235]","logging-call-at":"rpServiceResolver.go:266"}

This is saying that Temporal cluster A's frontend service is seeing 6 matching nodes, three from A and three from B. Yikes.

I believe what led to this is something like:

A matching pod in cluster A gets replaced, releasing its IP address. This IP address remains in cluster A's cluster_metadata table.

A matching pod is created in cluster B re-using this IP address.

An event occurs that causes a frontend in cluster A to re-read the the node membership for its matching nodes. It finds the original matching node's IP address still in the table and it can still connect to it even though it is actually now a matching node in cluster B.

Through this matching node in cluster B the the other cluster B matching nodes are located.

My fix for this is to make sure that each Temporal cluster has its own set of membership ports for each service. This would have prevented the discovery process in cluster A from seeing the pods in cluster B since it would be trying to connect on a different port.

Describe the solution you'd like

It may be possible to prevent this by including a check that a given node is indeed part of the correct cluster before adding it to the ring.

Describe alternatives you've considered

I don't believe our k8s environment has an easy way to prevent this using networking restrictions.
Support for pluggable authorizers

Is your feature request related to a problem? Please describe. I'm under the impression that the authorization interface requires user to recompile Temporal.io in order to add an implementation.

https://github.com/temporalio/temporal/blob/master/common/authorization/authorizer.go

Describe the solution you'd like I would like to implement a generic authorizer that loads a plugin using the golang plugin interface. Or a generic authorizer that calls an external Open Policy Agent server. This way I can plugin authorization rules wthout having to rebuild the cadence code.

What do you think? I'm available to contribute a merge request if you think it could be a valuable change. I would include i nthe merge request also support for checking the TaskQueue (extending the Attributes struct).
Add RDS IAM auth plugin for SQL drivers
Signed-off-by: Alexander Mays [email protected]

What changed?

a new environment variable to the CLI: SQL_AUTH_PLUGIN=rds-iam-auth

a new flag arg to the CLI: --sql-auth-plugin rds-iam-auth

2 new docker template variables: authPlugin: {{ default .Env.SQL_AUTH_PLUGIN "" }} and authPlugin: {{ default .Env.SQL_VIS_AUTH_PLUGIN "" }}

a new SQL configuration attribute authPlugin:

sql: pluginName: "mysql" databaseName: "temporal" connectAddr: "127.0.0.1:3306" connectProtocol: "tcp" user: "temporal" password: "temporal" maxConns: 20 maxIdleConns: 20 maxConnLifetime: "1h" authPlugin: plugin: "rds-iam-auth" timeout: "10s"

Why? Amazon RDS supports implicit auth using the execution role of an underlying compute resource. Some best practices steer users towards configuring RDS databases without a password and relying entirely on IAM auth.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.html

https://community.temporal.io/t/aws-rds-postgresql-iam-authentication/3168

The AWS SDK can derive credentials from an execution role (or other CredentialProviders) and then generate a one-time token (password) every time a new connection is created.

How did you test it?

Added 3 new unit test cases to test the basic call sites

Verified temporal-sql-tool [create|setup|update] against a production instance of MySQL 8.0.23 and Postgres 13.4

Verified temporal-server [start] against a production instance of MySQL 8.0.23

Potential risks

The plugin is opt-in. Default behavior should be unaffected.

RDS IAM integration adds overhead for creating a new connection on the client, as well as on the database server. Amazon recommends keeping new connection requests under 200/s otherwise the database can begin throttling and even drop existing connections. I did not witness any connection churn during my tests, but I am a new Temporal user so I might not be aware of behavior at scale. The AWS Guide linked above has more information on the limitations and considerations.

Is hotfix candidate? No

CronSchedule workflow stop

I run cron job ,After running for a period of time, the worker cannot get the corresponding task。

The last two tasks are：

lastRunTime_exclude               thisRunTime_include            WorkflowID                                              RunID
2021-08-18 07:58:02.448          2021-08-18 07:59:03.117         temporal_cron1_687ddd07-6e99-4c1b-96ed-f321979a0c85   8db0f3db-96e4-4908-be09-bc7a401c4315
2021-08-18 07:57:00.896          2021-08-18 07:58:02.448         temporal_cron1_687ddd07-6e99-4c1b-96ed-f321979a0c85   27a63c33-cd8b-4ec5-b32f-73563c6ceddd

the history of temporal_cron1_687ddd07-6e99-4c1b-96ed-f321979a0c85 is :

  1  WorkflowExecutionStarted  {WorkflowType:{Name:SampleCronWorkflow}, ParentInitiatedEventId:0,
                                TaskQueue:{Name:cron, Kind:Normal}, WorkflowExecutionTimeout:0s,
                                WorkflowRunTimeout:0s, WorkflowTaskTimeout:2m0s,
                                ContinuedExecutionRunId:8db0f3db-96e4-4908-be09-bc7a401c4315,
                                Initiator:CronSchedule,
                                LastCompletionResult:[{"RunTime":"2021-08-18T07:59:03.117003609Z"}],
                                OriginalExecutionRunId:a6ff393b-9cfe-45aa-a37b-04d787884f0b,
                                FirstExecutionRunId:a2781a9b-b867-4099-800d-5c3f2f2bdabf,
                                Attempt:1, CronSchedule:* * * * *, FirstWorkflowTaskBackoff:55s,
                                PrevAutoResetPoints:{Points:[{BinaryChecksum:bbf635f6a52ea2050d3cb1d49f888c14,
                                RunId:a2781a9b-b867-4099-800d-5c3f2f2bdabf, FirstWorkflowTaskCompletedId:4,
                                CreateTime:2021-08-18 07:46:01.656435253 +0000 UTC, ExpireTime:2021-08-21
                                07:46:03.999988743 +0000 UTC, Resettable:true}]}}
  2  WorkflowTaskScheduled     {TaskQueue:{Name:cron,
                                Kind:Normal},
                                StartToCloseTimeout:2m0s,
                                Attempt:1}

the history of runid 8db0f3db-96e4-4908-be09-bc7a401c4315 is ：

  1  WorkflowExecutionStarted         {WorkflowType:{Name:SampleCronWorkflow}, ParentInitiatedEventId:0,
                                       TaskQueue:{Name:cron, Kind:Normal}, WorkflowExecutionTimeout:0s,
                                       WorkflowRunTimeout:0s, WorkflowTaskTimeout:2m0s,
                                       ContinuedExecutionRunId:27a63c33-cd8b-4ec5-b32f-73563c6ceddd,
                                       Initiator:CronSchedule,
                                       LastCompletionResult:[{"RunTime":"2021-08-18T07:58:02.448290516Z"}],
                                       OriginalExecutionRunId:8db0f3db-96e4-4908-be09-bc7a401c4315,
                                       FirstExecutionRunId:a2781a9b-b867-4099-800d-5c3f2f2bdabf,
                                       Attempt:1, CronSchedule:* * * * *, FirstWorkflowTaskBackoff:56s,
                                       PrevAutoResetPoints:{Points:[{BinaryChecksum:bbf635f6a52ea2050d3cb1d49f888c14,
                                       RunId:a2781a9b-b867-4099-800d-5c3f2f2bdabf, FirstWorkflowTaskCompletedId:4,
                                       CreateTime:2021-08-18 07:46:01.656435253 +0000 UTC, ExpireTime:2021-08-21
                                       07:46:03.999988743 +0000 UTC, Resettable:true}]}}
   2  WorkflowTaskScheduled            {TaskQueue:{Name:cron,
                                       Kind:Normal},
                                       StartToCloseTimeout:2m0s,
                                       Attempt:1}
   3  WorkflowTaskStarted              {ScheduledEventId:2,
                                       Identity:15035@xxxxxxx@,
                                       RequestId:ebe2102d-6c2b-4c3d-bb43-1aa09f908167}
   4  WorkflowTaskCompleted            {ScheduledEventId:2, StartedEventId:3,
                                       Identity:15035@xxxxxxx@,
                                       BinaryChecksum:bbf635f6a52ea2050d3cb1d49f888c14}
   5  ActivityTaskScheduled            {ActivityId:5, ActivityType:{Name:DoSomething},
                                       TaskQueue:{Name:cron, Kind:Normal},
                                       Input:["2021-08-18T07:58:02.448290516Z",
                                       "2021-08-18T07:59:03.117003609Z",
                                       "8db0f3db-96e4-4908-be09-bc7a401c4315"],
                                       ScheduleToCloseTimeout:0s,
                                       ScheduleToStartTimeout:0s,
                                       StartToCloseTimeout:10s, HeartbeatTimeout:0s,
                                       WorkflowTaskCompletedEventId:4,
                                       RetryPolicy:{InitialInterval:1s,
                                       BackoffCoefficient:2, MaximumInterval:1m40s,
                                       MaximumAttempts:2, NonRetryableErrorTypes:[]}}
   6  ActivityTaskStarted              {ScheduledEventId:5,
                                       Identity:15035@xxxxxxx@,
                                       RequestId:ee1b2a8d-93b5-442b-a5d5-55d995441598,
                                       Attempt:1}
   7  ActivityTaskCompleted            {ScheduledEventId:5, StartedEventId:6,
                                       Identity:15035@xxxxxxx@}
   8  WorkflowTaskScheduled            {TaskQueue:{Name:xxxxxxx:e0290273-0ed3-4916-95ed-8e8c06ce5983,
                                       Kind:Sticky}, StartToCloseTimeout:2m0s, Attempt:1}
   9  WorkflowTaskStarted              {ScheduledEventId:8,
                                       Identity:15035@xxxxxxx@,
                                       RequestId:53f5ec3e-0cde-4afc-921c-b388db156fba}
  10  WorkflowTaskCompleted            {ScheduledEventId:8, StartedEventId:9,
                                       Identity:15035@xxxxxxx@,
                                       BinaryChecksum:bbf635f6a52ea2050d3cb1d49f888c14}
  11  WorkflowExecutionContinuedAsNew  {NewExecutionRunId:a6ff393b-9cfe-45aa-a37b-04d787884f0b,
                                       WorkflowType:{Name:SampleCronWorkflow},
                                       TaskQueue:{Name:cron, Kind:Normal}, WorkflowRunTimeout:0s,
                                       WorkflowTaskTimeout:2m0s, WorkflowTaskCompletedEventId:10,
                                       BackoffStartInterval:55s, Initiator:CronSchedule,
                                       LastCompletionResult:[{"RunTime":"2021-08-18T07:59:03.117003609Z"}]}

Updated relevant information

Steps to Reproduce the Problem

just run workflow with CronSchedule, in my case , I ran ten of them, and after a while, only 7 were left

Specifications

Version: v1.11.2
Platform: linux
Storage:mysql

Refactor Dockerfiles to better support ARM
What changed?

Use ARG for base docker images for easier debug/test.

Separate base-builder and base-ci-builder. The later has some tools which are not available on ARM yet and required only by CI builds. base-builder has very minimum required to build production version and has full ARM support.

Replaced binary of jwilder/dockerize with build from source from a git clone. Worth looking at a replacement or removal, but this preserves the existing use of dockerize templates for now.

Add corresponding Makefile targets to build with docker buildx.

Update docker/base-images/README.md.

Partial "fix" for #1305.

Build base-server for both arm64 and x86 container images with this command:

docker buildx build \ --build-arg TARGET=auto-setup \ --platform linux/arm64,linux/amd64 \ -f base-server.Dockerfile \ .

or

make base-server-x DOCKER_IMAGE_TAG=1.3.0

Why? Ability to run Temporal on AWS Graivton2 instances.

How did you test it? Build locally for different platforms.

Potential risks Might brake x86 docker images.

Is hotfix candidate? No.
Fix workflow ID reuse when running on ScyllaDB

What changed?

Fix an issue where the history service would get stuck in a loop when reusing workflow ID's on top of ScyllaDB.

Why?

Closes https://github.com/temporalio/temporal/issues/2683.

How did you test it? First reproduced the issue by running Temporal on top of Scylla locally; then verified that the change fixed the issue. Ran Cassandra persistence tests to ensure that this change didn't break anything else.

Potential risks As noted in the linked issue:

Not sure if there are conditions when Cassandra might also return nullable rows.

If there are cases where we should be handling nil rows as real errors, this change would break that.

Is hotfix candidate? Yes.
add optional explicit ForceTLS option to ClientTLS config

What changed? Add an optional useTls config property under publicClient to override whether or not TLS is used when connecting to the endpoint, regardless of the tls.frontend.server configuration.

Why? To fix issue 2035, which I encountered while setting up Temporal under ECS in AWS with a frontend ALB load balancer terminating TLS.

How did you test it? Tested locally and in my nonproduction environment.

Potential risks Worker role may fail to start if property is incorrectly set. Not setting the property at all results in the status quo behavior where TLS is based on configuration present in tls.frontend.server.

Is hotfix candidate? No

Note that I'm a newb to golang and am more than open to recommendations here.
Inject or plug in a customized Authorizer without having to build the temporal image by users

Is your feature request related to a problem? Please describe. We need some namespace level authorization. The best bet currently is https://github.com/temporalio/temporal/blob/release/v1.8.x/cmd/server/main.go#L154. However, we'd have to implement our authorizer in temporal's repo and build the docker image from that.

Describe the solution you'd like Ideally, as a customer, we don't have to build our own image. We can just use images like temporalio/auto-setup:1.8.2 and inject or plug in our own implementation of the authorizer somehow.

Describe alternatives you've considered We're currently modifying temporal open source project and build our own image.

Additional context
Fix datarace in forwarder_test

What changed? A loop variable is captured in a groutine here, so the test is non-deterministic. A golangci-linter pointed this out to me.

Why? Ensure this test is working correctly.

How did you test it? Reran the linter.

Potential risks None, just a test

Is hotfix candidate? No
schema version compatibility check failed: unable to read DB schema version keyspace/database:

I am trying to deploy temporal in. k8s using cloudsql which is a gcp managed service but getting below error

sql schema version compatibility check failed: unable to read DB schema version keyspace/database: temporal error: Error 1146: Table 'temporal.schema_version' doesn't exist [Fx] ERROR Failed to start: could not build arguments for function "go.uber.org/fx".(*App).constructCustomLogger.func2 (/go/pkg/mod/go.uber.org/[email protected]/app.go:415): failed to build fxevent.Logger: could not build arguments for function "go.temporal.io/server/temporal".glob..func8 (/home/builder/temporal/temporal/fx.go:916): failed to build log.Logger: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider (/home/builder/temporal/temporal/fx.go:163): sql schema version compatibility check failed: unable to read DB schema version keyspace/database: temporal error: Error 1146: Table 'temporal.schema_version' doesn't exist Unable to start server. Error: could not build arguments for function "go.uber.org/fx".(*App).constructCustomLogger.func2 (/go/pkg/mod/go.uber.org/[email protected]/app.go:415): failed to build fxevent.Logger: could not build arguments for function "go.temporal.io/server/temporal".glob..func8 (/home/builder/temporal/temporal/fx.go:916): failed to build log.Logger: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider (/home/builder/temporal/temporal/fx.go:163): sql schema version compatibility check failed: unable to read DB schema version keyspace/database: temporal error: Error 1146: Table 'temporal.schema_version' doesn't exist
Separate integration tests and unit tests
What changed?

Refactor dependencies integration tests.

Remove DB dependencies on unit tests target.

Why? Unit tests should not have dependencies other than testing and mocking libs.

How did you test it? Existing tests.

Potential risks

Is hotfix candidate?
Handle wrapped errors in task executable
What changed?

Check wrapped errors when handling task execution error

Why?

So that we can wrap errors in task execution logic and log more informative error messages.

How did you test it?

Existing test

Potential risks

If there's any existing error wrapping in task processing, error handling logic may go wrong. But I searched two ways of error wrapping:

Unwrap() error method: no result in our code base

%w: either not related to task processing or the error get wrapped is not one of the error or error types we are checking for task executable.

Is hotfix candidate?

No.
CRON scheduled workflow with ContinueAsNew
Expected Behavior

Hi, I'm trying to use a CRON scheduled workflow together with the Continue-As-New feature. When I trigger a CRON workflow, inside trigger Continue-As-New a few times, I expect it to schedule a new CRON run after the workflow is finished.

According to https://github.com/temporalio/temporal/issues/2146 this should work, but it does not for me.

Actual Behavior

The workflow just ends and no new CRON run is scheduled

Steps to Reproduce the Problem

Clone https://github.com/benkelukas/temporal-cron-continue-as-new

Run docker-compose build

Run docker-compose up

Run docker-compose exec app php app.php periodic:start

Wait for 2 Continue-As-New triggers

Observe Workflow being completed and no new CRON run scheduled

Specifications

Version: 1.19.0

Platform: MacOS

The Xiaomi message push service is a system-level channel on MIUI and is universal across the platform, which can provide developers with stable, reliable, and efficient push services.

Go-Push-API MiPush、JiPush、UMeng MiPush The Xiaomi message push service is a system-level channel on MIUI and is universal across the platform, which c

Oct 20, 2022

⚡ HTTP/2 Apple Push Notification Service (APNs) push provider for Go — Send push notifications to iOS, tvOS, Safari and OSX apps, using the APNs HTTP/2 protocol.

APNS/2 APNS/2 is a go package designed for simple, flexible and fast Apple Push Notifications on iOS, OSX and Safari using the new HTTP/2 Push provide

Jan 1, 2023

Uniqush is a free and open source software system which provides a unified push service for server side notification to apps on mobile devices.

Homepage Download Blog/News @uniqush Introduction Uniqush (\ˈyü-nə-ku̇sh\ "uni" pronounced as in "unified", and "qush" pronounced as in "cushion") is

Jan 9, 2023

RES Service protocol library for Go

RES Service for Go Synchronize Your Clients Go package used to create REST, real time, and RPC APIs, where all your reactive web clients are synchroni

Nov 23, 2022

golang client library to Viessmann Vitotrol web service

Package go-vitotrol provides access to the Viessmann™ Vitotrol™ cloud API for controlling/monitoring boilers. See https://www.viessmann.com/app_vitoda

Nov 16, 2022

Go client library SDK for Ably realtime messaging service

Ably Go A Go client library for www.ably.io, the realtime messaging service. Installation ~ $ go get -u github.com/ably/ably-go/ably Feature support T

Dec 2, 2022

Service responsible for streaming Kafka messages.

kafka-stream ????‍♂️ Service responsible for streaming Kafka messages. What it does? This service reads all messages from the input topic and sends th

Oct 16, 2021

ntfy is a super simple pub-sub notification service. It allows you to send desktop notifications via scripts.

ntfy ntfy (pronounce: notify) is a super simple pub-sub notification service. It allows you to send desktop and (soon) phone notifications via scripts

Jan 9, 2023

Go service for CRUD note, log tracking by RabbitMQ

Service for CRUD note, log tracking by RabbitMQ Architecture Three components: Service note CRUD, use a DB RabbitMQ for saving messages pushed from se

Nov 29, 2021

Tool for collect statistics from AMQP (RabbitMQ) broker. Good for cloud native service calculation.

amqp-statisticator Tool for collect statistics around your AMQP broker. For example RabbitMQ expose a lot information trought the management API, but

Dec 13, 2021

The Bhojpur MDM is a software-as-a-service product used as a Mobile Device Manager based on Bhojpur.NET Platform for application delivery.

Bhojpur MDM - Mobile Device Manager The Bhojpur MDM is a software-as-a-service product used as a Mobile Device Manager based on Bhojpur.NET Platform f

Dec 31, 2021

This service consumes events about new posts in go blog (go.dev)

This service consumes events about new posts in go blog (go.dev) from message broker (rabbitmq) (gbu-scanner service publishes these events) and sends notifications to websocket and grpc streams consumers.

Jan 29, 2022

CLI tool for generating random messages with rules & publishing to the cloud services (SQS,SNS,PUB/SUB and etc.)

Randomsg A CLI tool to generate random messages and publish to cloud services like (SQS,SNS,PUB/SUB and etc.). TODO Generation of nested objects is no

Sep 22, 2022

An easy-to-use CLI client for RabbitMQ.

buneary, pronounced bun-ear-y, is an easy-to-use RabbitMQ command line client for managing exchanges, managing queues and publishing messages to exchanges.

Sep 3, 2022

Modern CLI for Apache Kafka, written in Go.

Kaf Kafka CLI inspired by kubectl & docker Install Install from source: go get -u github.com/birdayz/kaf/cmd/kaf Install binary: curl https://raw.git

Dec 31, 2022

CLI Tool to Stress Apache Kafka Clusters

Kafka Stress - Stress Test Tool for Kafka Clusters, Producers and Consumers Tunning Installation Docker docker pull fidelissauro/kafka-stress:latest d

Nov 13, 2022

A CLI tool for interacting with Kafka through the Confluent Kafka Rest Proxy

kafkactl Table of contents kafkactl Table of contents Overview Build Development Overview kafkactl is a CLI tool to interact with Kafka through the Co

Nov 1, 2021

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

Cadence Visit cadenceworkflow.io to learn about Cadence. This repo contains the source code of the Cadence server. To implement workflows, activities

Jan 9, 2023

Our notification system simplifies the process of sending notifications via email, SMS, and push notifications for multiple applications. It supports multiple providers, customizable templates, and is easy to integrate into any application.

Simplify Notification Management with Customizable Templates and Multi-Provider Integration ⭐️ Why Envoyer Nowadays, notifications play a crucial role

May 11, 2023

Temporal service and CLI

Temporal

Getting Started

Download and Start Temporal Server Locally

Run the Samples

Use CLI

Use Temporal Web UI

Repository

Contributing

License

Owner

temporal.io

Comments

add address translation to Cassandra persistence

Error: Add search attribute failed

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Questions

Specifications

Repeated "New timer generated is less than read level" in logs after adding node to cluster

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

More Information

Prevent incorrect service discovery with multiple Temporal clusters

Support for pluggable authorizers

Add RDS IAM auth plugin for SQL drivers

CronSchedule workflow stop

Steps to Reproduce the Problem

Specifications

Refactor Dockerfiles to better support ARM

Fix workflow ID reuse when running on ScyllaDB

add optional explicit ForceTLS option to ClientTLS config

Inject or plug in a customized Authorizer without having to build the temporal image by users

Fix datarace in forwarder_test

schema version compatibility check failed: unable to read DB schema version keyspace/database:

I am trying to deploy temporal in. k8s using cloudsql which is a gcp managed service but getting below error

Separate integration tests and unit tests

Handle wrapped errors in task executable

CRON scheduled workflow with ContinueAsNew

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Related tags

The Xiaomi message push service is a system-level channel on MIUI and is universal across the platform, which can provide developers with stable, reliable, and efficient push services.

⚡ HTTP/2 Apple Push Notification Service (APNs) push provider for Go — Send push notifications to iOS, tvOS, Safari and OSX apps, using the APNs HTTP/2 protocol.

Uniqush is a free and open source software system which provides a unified push service for server side notification to apps on mobile devices.

RES Service protocol library for Go

golang client library to Viessmann Vitotrol web service

Go client library SDK for Ably realtime messaging service

Service responsible for streaming Kafka messages.

ntfy is a super simple pub-sub notification service. It allows you to send desktop notifications via scripts.

Go service for CRUD note, log tracking by RabbitMQ

Tool for collect statistics from AMQP (RabbitMQ) broker. Good for cloud native service calculation.

The Bhojpur MDM is a software-as-a-service product used as a Mobile Device Manager based on Bhojpur.NET Platform for application delivery.

This service consumes events about new posts in go blog (go.dev)

CLI tool for generating random messages with rules & publishing to the cloud services (SQS,SNS,PUB/SUB and etc.)

An easy-to-use CLI client for RabbitMQ.

Modern CLI for Apache Kafka, written in Go.

CLI Tool to Stress Apache Kafka Clusters

A CLI tool for interacting with Kafka through the Confluent Kafka Rest Proxy

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

Our notification system simplifies the process of sending notifications via email, SMS, and push notifications for multiple applications. It supports multiple providers, customizable templates, and is easy to integrate into any application.