I made the first version of this back in 2010, when Pearl's work on causal inference started impacting Epidemiology. A friend was an Epidemiologist and she told me about an MS-DOS program she was using to do something with graphs (https://pubmed.ncbi.nlm.nih.gov/20010223/); she found it painfully slow and wondered if I could "make it more user-friendly".

I did my PhD in algorithms at the time and was intrigued when I started reading Greenland, Pearl, and Robins (https://pubmed.ncbi.nlm.nih.gov/9888278/) and then Pearl's "Causality". I soon found that it was not obvious at all how you could speed up that MS-DOS program, and it led to a paper at UAI in 2011 (https://arxiv.org/abs/1202.3764). I made dagitty as a demonstration that you could actually use the algorithms we developed in that paper, and it took off from there -- started with 10 users per day, growing to the hundreds and thousands as causal inference became more popular.

It's now a bit dated, and I don't have as much time anymore to keep it "fresh" as I would like. But I am still grateful and amazed at about how many people I got to know due to this. Highlights included collaborating with Pearl himself on a solution manual for his book "Causal Inference: A Primer" when it first came out, and so many e-mails I got out of the blue from users all over the world. Just last summer I stayed at the house of the author of one of the builtin examples in dagitty.

As these 14 years flew by, I now am happy to do play a small part in supporting the next generation of causal inference software -- if you're interested in causal inference, be sure to check out pgmpy.org, a Python library for Bayesian networks that includes several causal inference functions (https://arxiv.org/abs/2304.08639). Ankur, the author, did his PhD with me and will soon defend his thesis!

Also, R users, be sure to check out ggdag, a great package by Malcolm Barrett that wraps dagitty functionality in a much nicer and tidyverse-compatible way.

Nice to see this still going! we used daggity in a grad school stats class back in 2013. To the instructor's credit, we spent the first few weeks thinking about causal models before we got into any actual stats. (Put differently, a DAG is a nonparametric structural equation model [0], and the rest of the stats class was about different ways to parametrize those models.)

I hate to ask this question.... but I've moved to a python shop after working in the tidyverse for years, and am unimpressed with the DAG visualization capabilities. Does anyone have any recommendations for 1,000 plus node DAGs?

I still miss R and tidy quite a bit, but polars at least gets closer.

Any of the python network science libraries can handle a 1000 node directed graph no problem.

Networkx visualizations are ugly out of the box but you can make the network look however you want. The best out of the box visualizations I think are a matter of taste and use case. Same with the layouts.

In a more abstract sense, I think it is hard to not have a 1000 node network visualization not be a useless hairball unless the network is quite sparse.

If you mean with do-calculus though I really have no idea.

I work on a graph-based library and regularly generate DAGs for analysis and debugging. I have been using graphviz/dot but it's just so damn frustrating. You have to jump through hoops to get the layout right. It would be nice if something as ubiquitous as graphviz had a dedicated rendering engine for DAGs which did moderately sane things like place root and tail nodes on the same rank without requiring me to figure out which nodes are and manually position them.

Very cool to see this here. Johannes Textor was my professor for Bayesian Networks and Causal Inference when I studied at the Radboud university in Nijmegen. He is an awesome and down to earth guy, and he was very happy about and open to feedback.

Do good ol' structural equation models count? Because I know quite a few colleagues doing research on patient experiences in healthcare, who do psychometric studies on patient-reported surveys of their experiences (patient-report outcome measures.)

I did my PhD in algorithms at the time and was intrigued when I started reading Greenland, Pearl, and Robins (https://pubmed.ncbi.nlm.nih.gov/9888278/) and then Pearl's "Causality". I soon found that it was not obvious at all how you could speed up that MS-DOS program, and it led to a paper at UAI in 2011 (https://arxiv.org/abs/1202.3764). I made dagitty as a demonstration that you could actually use the algorithms we developed in that paper, and it took off from there -- started with 10 users per day, growing to the hundreds and thousands as causal inference became more popular.

It's now a bit dated, and I don't have as much time anymore to keep it "fresh" as I would like. But I am still grateful and amazed at about how many people I got to know due to this. Highlights included collaborating with Pearl himself on a solution manual for his book "Causal Inference: A Primer" when it first came out, and so many e-mails I got out of the blue from users all over the world. Just last summer I stayed at the house of the author of one of the builtin examples in dagitty.

As these 14 years flew by, I now am happy to do play a small part in supporting the next generation of causal inference software -- if you're interested in causal inference, be sure to check out pgmpy.org, a Python library for Bayesian networks that includes several causal inference functions (https://arxiv.org/abs/2304.08639). Ankur, the author, did his PhD with me and will soon defend his thesis!

Also, R users, be sure to check out ggdag, a great package by Malcolm Barrett that wraps dagitty functionality in a much nicer and tidyverse-compatible way.

[0] Pearl 2021: https://ftp.cs.ucla.edu/pub/stat_ser/r370.pdf

I still miss R and tidy quite a bit, but polars at least gets closer.

Networkx visualizations are ugly out of the box but you can make the network look however you want. The best out of the box visualizations I think are a matter of taste and use case. Same with the layouts.

In a more abstract sense, I think it is hard to not have a 1000 node network visualization not be a useless hairball unless the network is quite sparse.

If you mean with do-calculus though I really have no idea.

I would find a python port useful, as R is more of a special use case in my own workflows, but my use case shouldn't deter the authors.

Tails: Nodes with no dependents.

I will use it when I have a chance.

I really like the "How to menu", I may recommend to do it a little more prominent on first usages or show me once that it's there.

Congrats!