Show HN: DBOS Java – Postgres-Backed Durable Workflows

github.com

110 points by KraftyOne 2 days ago

Hi HN - I’m Peter, here with Harry (devhawk), and we’re building DBOS Java, an open-source Java library for durable workflows, backed by Postgres.

https://github.com/dbos-inc/dbos-transact-java

Essentially, DBOS helps you write long-lived, reliable code that can survive failures, restarts, and crashes without losing state or duplicating work. As your workflows run, it checkpoints each step they take in a Postgres database. When a process stops (fails, restarts, or crashes), your program can recover from those checkpoints to restore its exact state and continue from where it left off, as if nothing happened.

In practice, this makes it easier to build reliable systems for use cases like AI agents, payments, data synchronization, or anything that takes hours, days, or weeks to complete. Rather than bolting on ad-hoc retry logic and database checkpoints, durable workflows give you one consistent model for ensuring your programs can recover from any failure from exactly where they left off.

This library contains all you need to add durable workflows to your program: there's no separate service or orchestrator or any external dependencies except Postgres. Because it's just a library, you can incrementally add it to your projects, and it works out of the box with frameworks like Spring. And because it's built on Postgres, it natively supports all the tooling you're familiar with (backups, GUIs, CLI tools) and works with any Postgres provider.

If you want to try it out, check out the quickstart:

https://docs.dbos.dev/quickstart?language=java

We'd love to hear what you think! We’ll be in the comments for the rest of the day to answer any questions.

reecardo 2 days ago

We are increasingly using Temporal with Temporal Cloud and soon Nexus to manage numerous workflows. I'm curious what type of observability is avialable for DBOS and how much of that you get for "free". The reason we ended up in Temporal was not that previous job-systems were unreliable, it was simply that nobody wanted to go dig through a database to find out what happened with their job, and nobody has time/energy to build a UI just for that purpose.

  • KraftyOne 2 days ago

    There's an observability and workflow management UI: https://docs.dbos.dev/java/tutorials/workflow-management

    You can view your workflows and queues, search/filter them by any number of criteria, visualize graphs of workflow steps, cancel workflows, resume workflows, restart workflows from a specific step--everything you'd want.

    Currently, this is available as a managed offering (Conductor - https://docs.dbos.dev/production/self-hosting/conductor), but we're also releasing a self-hostable version of it soon.

    • ChromaticPanic 2 days ago

      Recently added DBOS python to one of my FOSS projects. It's awesome, looking forward to seeing what that self hostable observability would look like. I have been looking in the DB and OTEL logs to debug development.

    • purrcat259 2 days ago

      I was super interested in DBOS but I had to back out when I figured that the observability isn't self hostable yet :(, so I'm chuffed to hear its coming!

      Whats the best way to hear about it when it does? Maybe newsletter I can register to or something.

exabrial 2 days ago

The Java version looks pretty danged cool paging through the some of the code.

I was trying to think of a use case for this and I was reminded of a Sun Microsystems demo (yes I'm that old) I saw. The paused a JVM and "slept" it, then kicked up on another machine almost instantly, all over the network. Was a pretty cool party trick, but then they did it when an HTTP request came in (Serverless was invented a LONG time ago!). I kinda wonder if this could be used for that?

  • KraftyOne 2 days ago

    Haha yes, one thing you can use this for is "long waits" or "long sleeps" where a program waits hours or days or weeks for a notification (potentially through server restarts, etc) then wakes up as soon as the notification arrives or a timeout is reached. More info in the docs: https://docs.dbos.dev/java/tutorials/workflow-communication

  • jedberg 2 days ago

    > I was reminded of a Sun Microsystems demo (yes I'm that old)

    I wonder if my Solaris 7 cert is still good...

ibgeek 2 days ago

I really wish you guys would change the name since the product has moved so far away from the goals and concepts in the original publication. :). I love the product and what you are doing -- it's definitely needed and valuable.

  • jedberg 2 days ago

    We’ve considered it, but at this point we’re kind of stuck. We’d have to rebuild our brand from scratch. But also, a name almost never makes or breaks a company. :)

    Also we have a backronym now: Durable Backends, Observable and Simple.

hazn a day ago

I remember reading that restate.dev is a 'push' based workflow and therefore works well with serverless workflows: https://news.ycombinator.com/item?id=40660568

what is your input on these two topics? aka pull vs push and working well with serverless workflows

  • KraftyOne a day ago

    The Restate model depends on a long-running external orchestrator to do the "pushing". However, that comes with downsides--you have to operate that orchestrator and its data store in production (and it's a single point of failure) and you have to rearchitect your application around it.

    DBOS implements a simpler library-based architecture where each of your processes independently "pulls" from a database queue. To make this work in a serverless setting, we recommend using a cron to periodically launch serverless workers that run for as long as there are workflows in the queue. If a worker times out, the next will automatically resume its workflows from their last completed step. This Github discussion has more details: https://github.com/dbos-inc/dbos-transact-ts/issues/1115

ivanr 2 days ago

Hello Peter. Thank you for your work. I really like this approach. I too have been following Temporal and I like it, but I don't think it's a good match for simpler systems.

I've been reading the DBOS Java documentation and have some questions, if you don't mind:

- Versioning; from the looks of it, it's either automatically derived from the source code (presumably bytecode?), or explicitly set for the entire application? This would be too coarse. I don't see auto-configuration working, as a small bug fix would invalidate the version. (Could be less of a problem if you're looking only at method signatures... perhaps add to the documentation?) Similarly, for the explicit setting, a change to the version number would invalidate all existing workflows, which would be cumbersome.

Have you considered relying on serialVersionUID? Or at least allowing explicit configuration on a per workflow basis? Or a fallback method to be invoked if the class signatures change in a backward incompatible way?

Overall, it looks like DBOS is fairly easy to pick up, but having a good story for workflow evolution is going to be essential for adoption. For example, if I have long-running workflows... do I have to keep the old code running until all old instances complete? Is that the idea?

- Transactional use; would it be possible to create a new workflow instance transactionally? If I am using the same database for DBOS and my application, I'd expect my app to do some work and create some jobs for later, reusing my transaction. Similarly, maybe when the jobs are running, I'd perhaps want to use the same transaction? As in, the work is done and then DBOS commits?

I know using the same transaction for both purposes could be tricky. I have, in fact, in the past, used two for job handling. Some guidance in the documentation would be very useful to have.

Thank you.

  • KraftyOne a day ago

    Thanks for the great questions!

    1. Yes, currently versioning is either automatically derived from a bytecode hash or manually set. The intended upgrade mechanism is blue-green deployments where old workflows drain on old code versions while new workflows start on new code versions (you can also manually fork workflows to new code versions if they're compatible). Docs: https://docs.dbos.dev/java/tutorials/workflow-tutorial#workf...

    That said, we're working on improvements here--we want it to be easier to upgrade your workflows without blue-green deployments to simplify operating really long-running (weeks-months) workflows.

    2. Could I ask what the intended use case is? DBOS workflow creation is idempotent and workflows always recover from the last completed step, so a workflow that goes "database transaction" -> "enqueue child workflow" offers the same atomicity guarantees as a workflow that does "enqueue child workflow as part of database transaction", but the former is (as you say) much simpler to implement.

    • ivanr a day ago

      > versioning

      Here's an example of a common long-running workflow: SaaS trials. Upon trial start, create a workflow to send the customer onboarding messages, possibly inspecting the account state to influence what is sent, and finally close the account that hasn't converted. This will usually take 14-30 days, but could go for months if manually extended (as many organisations are very slow to move).

      I think an "escape hatch" would be useful here. On version mismatch, invoke a special function that is given access to the workflow state and let it update the state to align it with the most recent code version.

      > transactional enqueuing

      A workflow that goes "database transaction" -> "enqueue child workflow" is not safe if the connection with DBOS is lost before the second step completes safely. This would make DBOS unreliable. Doing it the other way round can work provided each workflow checks that the "object" to which it's connected exists. That would deal with the situation where a workflow is created, but the transaction rolls back.

      If both the app and DBOS work off the same database connection, you can offer exactly-once guarantees. Otherwise, the guarantees will depend on who commits first.

      Personally, I would prefer all work to be done from the same connection so that I can have the benefit of transactions. To me, that's the main draw of DBOS :)

      • KraftyOne a day ago

        > versioning

        Yes, something like that is needed, we're working on building a good interface for it.

        > transactional enqueueing

        But it is safe as long as it's done inside a DBOS workflow. If the connection is lost (process crashes, etc.) after the transaction but before the child is enqueued, the workflow will recover from the last completed step (the transaction) and then proceed to enqueue the child. That's part of the power of workflows--they provide atomicity across transactions.

        • ivanr a day ago

          > > transactional enqueueing > But it is safe as long as it's done inside a DBOS workflow.

          Yes, but I was talking about the point at which a new workflow is created. If my transaction completes but DBOS disappears before the necessary workflows are created, I'll have a problem.

          Taking my trial example, the onboarding workflow won't have been created and then perhaps the account will continue to run indefinitely, free of charge to the user.

          • KraftyOne a day ago

            Looking at the trial example, the way I would solve it is that when the user clicks "start trial", that starts a small synchronous workflow that first creates a database entry, then enqueues the onboarding workflow, then returns. DBOS workflows are fast enough to use interactively for critical tasks like this.

rileymichael 2 days ago

glad to see the java sdk released, i've been following it for a while.

one of the rough edges i've noticed w/DBOS is for workflows that span multiple services. all of the examples are contained in a single application and thus use a single dbos 'system db' instance. if you have multiple services (as you often do in the real world) that need to participate in a workflow.. you really can't. you need to break them into multiple workflows and enqueue them in each service by creating an instance of the dbos client pointed at the other services system db. aside from the obvious overhead from fragmenting a workflow into multiple (and that you have to push to the service instead of a worker pulling the step), that means that every service needs to be aware of and have access to, every other services system db. also worth noting that sharing a single system db between services was not advised when i asked.

(docs for the above: https://docs.dbos.dev/architecture#using-dbos-in-a-distribut...)

  • jedberg 2 days ago

    The pattern I would recommend in such cases is having one service be responsible for the overall workflow and then call the other services as steps.

    So if you were for example running a website and wanted to have a "cancellation" flow, you'd have the cancellation service with the workflow inside of it, which would have all the steps defined, like

    1) disable user account

    2) mark user data as archived

    3) cancel recurring payments

    And then each step would call the service that actually does that work, using an idempotency key. Each service might have its own internal workflows to accomplish each task. In this case step 1 would call the accounts service, step two would call the storage service, and step three would call the payment service.

    But then you have a clean reusable interface in each service, as well as a single service responsible for the workflow.

    • rileymichael 2 days ago

      the OP wasn't clear but that's effectively what i settled on by launching workflows (within steps) via the dbos client. keeping that an implementation detail in each service though is probably better + solves the db awareness, just need to do the endpoint/rpc plumbing. thanks!

  • KraftyOne 2 days ago

    Thanks for the great feedback! Yeah, for isolation we recommend each service have its own system database and communicate via clients (so service A starts a workflow in service B by creating a client and calling "client.enqueue").

    How could we make this experience better while keeping DBOS a simple library? One improvement that comes to mind is to add an "application name" field to the workflows table so that multiple applications could share a system database. Then one application could directly enqueue a workflow to another application by specifying its name, and workflow observability tooling would work cross-application.

    • rileymichael 2 days ago

      yeah i think that's a step in the right direction, but ultimately as long as the workflow executor needs to know 'who runs this step' there will always be some friction compared to other systems like temporal.

layer8 2 days ago

It’s not clear what part of the functionality is specific to Postgres, and why. In particular in the Java world, you would expect any JDBC backend to be able to do the job.

  • KraftyOne 2 days ago

    We actually wrote a blog post recently on why we chose Postgres! https://www.dbos.dev/blog/why-postgres-durable-execution

    There's no technical reason why this couldn't be done with another database, and we may add support for more in the future (DBOS Python already supports SQLite), but we're not working on it right now.

prasadaditya 2 days ago

Hey! Not a DBOS Java question but stumbled on this and looking into it for the python client. Wondering what the support looks like for integration w/ gevent?

  • KraftyOne 2 days ago

    DBOS Python works with gevent out of the box (sync DBOS uses Python threading APIs and psycopg3 that gevent monkeypatches).

    Have you run into any issues using DBOS Python with gevent? Please let us know!

  • stankygenki 2 days ago

    Upvote on this! Currently building out a project using Flask and gevent, and would love to use DBOS python with my Flask gevent project

    • jedberg 2 days ago

      See the sibling comment, it should work out of the box!

xnzm1001 2 days ago

I like dbos architecture much more than temporal since it is much easier to operate on small team.

But dbos doesn't have opensource release of web ui, which is most critical part for a workflow management tool.

Since competitor have all of itself opensource I don't think dbos will have a chance.

  • jscheel 2 days ago

    We are a happy user of DBOS. I’ve been building out a lightweight TUI for managing our DBOS application internally, since we have workflows with tens of thousands of steps. I know the team is working on improving conductor for this use-case, but our internal TUI handles it pretty well for now. Hoping to open source it, but the code is a wreck atm. Anyways, I say all this to say, DBOS has a good client that can communicate with the instance easily, so building out a UI that fits your specific needs should be fairly simple.

  • KraftyOne 2 days ago

    We'll be releasing self-hostable Conductor (the web UI) soon--stay tuned!

    • xnzm1001 2 days ago

      If that comes, it will be truly great.

    • mazeez a day ago

      looking forward to it!

rfonseca 2 days ago

Peter, how does this compare to Azure Durable Functions? (Say, for the sake of argument, that you are comparing the Python version of both) Are there things that fundamentally you can do in one and not in the other?

  • KraftyOne 2 days ago

    The main difference is that this is a library you can install and use in any application anywhere, while Durable Functions is (as I understand it) primarily for orchestrating serverless functions in Azure.

    • rfonseca 2 days ago

      (disclaimer: I work at Microsoft, but am not directly involved with Durable Functions)

      Being a library is a pretty interesting feature! Correct, Durable Functions allows you to write task-parallel orchestrations of task-parallel 'activities' (which are stateless functions), and these orchestrations are fully persistent and resilient, like DBOS executions. It also has the concept of 'Entities', which are named objects (of a type you define) that "live forever", and serialize all method invocations, which are the only way to change their private state. These are also persistent. The Netherite paper [1], section 2, describes this model well.

      So, there seems to be a pretty close correspondence between DBOS steps and DF activities, and between workflows and orchestrations. I don't know what the correspondence is to DF entities is in the DBOS model.

      [1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

      • KraftyOne a day ago

        Yes, agree the correspondence is close, the primary difference is in form factor and not in underlying guarantees (but form factor matters! Building this as a library was technically tricky, but unlocks a lot of use cases). Reading the Orleans and early durable functions papers in grad school (and many of your papers) was definitely helpful in our journey.

nogridbag 2 days ago

Are there any plans for supporting other databases? Our company primarily uses and has experience managing MySQL deployments. I evaluated Temporal some time ago as it seemed like a good fit for what we're building so I'm watching this closely. Thanks!

  • KraftyOne 2 days ago

    Our primary focus is Postgres. DBOS Python recently added SQLite support, we'll add this to other languages if it proves popular, but no current plans for MySQL.

    That said, while DBOS requires Postgres for its own checkpoints, it can (and often is) used alongside other databases like MySQL for application data.

  • qianli_cs 2 days ago

    To build on what Peter said, we need to stay focused and make one backend solid before expanding. But architecturally, nothing prevents us from supporting more engines going forward.

lukaszkorecki 2 days ago

Looks great, shame that due to annotation-based API it's gonna be a pain to use in Clojure.

  • KraftyOne 2 days ago

    One thing we're looking at right now is what it would take to support Clojure or Kotlin.

    • jillesvangurp a day ago

      Fully declarative approach is the way to go for both IMHO. I'm more familiar with Kotlin than Kotlin. But its coroutines framework and structured concurrency are well aligned with something like this. Essentially you are doing is kind of a stateful form of structured concurrency where state is preserved in a DB and resilient against sub task failures, nodes dying, etc.

      This reminds me of a product called restate. I talked to some people in that company a while ago. Their solution is built in Rust I think but they have clients for all sorts of platforms. Including Kotlin. Cool company and distributed workflow engines and agentic / long running workflows feel like a good match.

      There are lots of other solutions in this space. I believe an ex Red Hat person is working on rebooting a workflow engine called Kogito based on something that orginally lived under their umbrella.

      There's a long history of very enterprisy business process management stuff here. Lots of potential for overengineered solutions.

      I once got sucked into a Spring Batch centric project and it was hopelessly overengineered for the requirements. Gave me a proper headache. Nothing was simple. Everything was littered in magic annotations causing all sorts of weird side effects. That's why I prefer declarative approaches with simple functions. Which is what the Kotlin syntax enables relative to Java. You can do the same technically in java but it quickly becomes an unreadable mess of function chaining.

rtcoms 14 hours ago

Do you have any plans to support Ruby ?

  • qianli_cs 4 hours ago

    Yeah, in the long term, we're supporting more languages. But we aren't actively building new languages right now.

stevefan1999 2 days ago

Thanks Peter, but I wonder if there will there be .NET support? Since Temporal includes one, so it is a hard selling point for us .NET developers when MassTransit went commercial.

  • KraftyOne 2 days ago

    .NET support is something we're considering, but aren't actively building right now.