-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Add extra flags for Engine & CLI tests #7416
ci: Add extra flags for Engine & CLI tests #7416
Conversation
I'm not sure we should set failfast by default - this is very helpful when making large refactorings, and trying to observe all the genuine failures. It's annoying in the case of flakes, or going for a fully green run, but getting all the failures is genuinely useful a lot of the time in my experience. |
// Default timeout to 30m | ||
// No test suite should take more than 30 minutes to run | ||
if timeout == "" { | ||
timeout = "30m" | ||
} | ||
args = append(args, fmt.Sprintf("-timeout=%s", timeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use // +default
for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did this, we would have the default in 3 places:
All()
Important()
Custom()
Since the race
flag uses the same approach - default in code rather than // +default
- I think that we should leave this as is. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, alternatively, we should refactor all of these into one function - then which tests to run should probably be flags on in.
Not a blocker though, happy as is.
Makes sense. Changing the default to |
`--parallel` now defaults to the number of CPUs. Before, this was hard-coded to `16` which was set the number of CPUs in the larger GitHub Runners, the `ubuntu-22.04-16c-64g-600gb` instance size specifically. If this new default value does not work well on the larger CI runners (they have 48CPUs & we run multiple pipelines on them at the same time), we can specify this value explicitly in the workflows. `--timeout` defaults to `30m`. This was changed not that long ago - the value was set to 1h when using race. The bottom line is that none of our test suites should take more than 10 minutes. This is something that we are actively working towards. For now, until the pipeline becomes more reliable, we will stick with the current 30m default. When running locally with fewer CPUs, or when hunting deadlocks, experience recommends a value of 180m. `--failfast` is a new one. It sets `--max-fails=1` in `gotestsum`. FTR: https://github.com/gotestyourself/gotestsum/blob/e9677fb405a5c0dc0b9d7217a7c73c3a20c5cb66/cmd/testdata/gotestsum-help-text#L20 Signed-off-by: Gerhard Lazu <gerhard@dagger.io>
2aabe60
to
bc6ad3e
Compare
If it defaults to 64 CPUs, as it did here, it's more likely to fail: https://dagger.cloud/dagger/traces/8a089c6b4c74959d5a3aef1d8aeded74#b2d066943ad6dbc4:L1051 Signed-off-by: Gerhard Lazu <gerhard@dagger.io>
// Default timeout to 30m | ||
// No test suite should take more than 30 minutes to run | ||
if timeout == "" { | ||
timeout = "30m" | ||
} | ||
args = append(args, fmt.Sprintf("-timeout=%s", timeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, alternatively, we should refactor all of these into one function - then which tests to run should probably be flags on in.
Not a blocker though, happy as is.
@@ -25,7 +25,7 @@ jobs: | |||
uses: ./.github/workflows/_dagger_call.yml | |||
secrets: inherit | |||
with: | |||
function: test all --race=true | |||
function: test all --race=true --parallel=16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from NumCPU
s? I think we can leave this unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are explicitly setting this to 16
in CI.
The new CI runners usually have 48 and even 64 CPUs which makes tests fail. Before setting this, checks failed.
* ci: Add extra flags for Engine & CLI tests `--parallel` now defaults to the number of CPUs. Before, this was hard-coded to `16` which was set the number of CPUs in the larger GitHub Runners, the `ubuntu-22.04-16c-64g-600gb` instance size specifically. We still want `16` in CI - we set this explicitly in the GitHub workflows - but are likely to want less locally where some of us only have `8` or `10` CPUs. `--timeout` defaults to `30m`. This was changed not that long ago - the value was set to 1h when using race. The bottom line is that none of our test suites should take more than 10 minutes. This is something that we are actively working towards. For now, until the pipeline becomes more reliable, we will stick with the current 30m default. When running locally with fewer CPUs, or when hunting deadlocks, experience recommends a value of 180m. `--failfast` is a new one. It sets `--max-fails=1` in `gotestsum`. FTR: https://github.com/gotestyourself/gotestsum/blob/e9677fb405a5c0dc0b9d7217a7c73c3a20c5cb66/cmd/testdata/gotestsum-help-text#L20 Signed-off-by: Gerhard Lazu <gerhard@dagger.io> * ci: Default Engine & CLI test parallel to 16 in CI If it defaults to 64 CPUs, as it did here, it's more likely to fail: https://dagger.cloud/dagger/traces/8a089c6b4c74959d5a3aef1d8aeded74#b2d066943ad6dbc4:L1051 Signed-off-by: Gerhard Lazu <gerhard@dagger.io> --------- Signed-off-by: Gerhard Lazu <gerhard@dagger.io>
* ci: Add extra flags for Engine & CLI tests `--parallel` now defaults to the number of CPUs. Before, this was hard-coded to `16` which was set the number of CPUs in the larger GitHub Runners, the `ubuntu-22.04-16c-64g-600gb` instance size specifically. We still want `16` in CI - we set this explicitly in the GitHub workflows - but are likely to want less locally where some of us only have `8` or `10` CPUs. `--timeout` defaults to `30m`. This was changed not that long ago - the value was set to 1h when using race. The bottom line is that none of our test suites should take more than 10 minutes. This is something that we are actively working towards. For now, until the pipeline becomes more reliable, we will stick with the current 30m default. When running locally with fewer CPUs, or when hunting deadlocks, experience recommends a value of 180m. `--failfast` is a new one. It sets `--max-fails=1` in `gotestsum`. FTR: https://github.com/gotestyourself/gotestsum/blob/e9677fb405a5c0dc0b9d7217a7c73c3a20c5cb66/cmd/testdata/gotestsum-help-text#L20 Signed-off-by: Gerhard Lazu <gerhard@dagger.io> * ci: Default Engine & CLI test parallel to 16 in CI If it defaults to 64 CPUs, as it did here, it's more likely to fail: https://dagger.cloud/dagger/traces/8a089c6b4c74959d5a3aef1d8aeded74#b2d066943ad6dbc4:L1051 Signed-off-by: Gerhard Lazu <gerhard@dagger.io> --------- Signed-off-by: Gerhard Lazu <gerhard@dagger.io> Signed-off-by: Vikram Vaswani <vikram@dagger.io>
--parallel
now defaults to the number of CPUs. Before, this was hard-coded to16
which was set the number of CPUs in the larger GitHub Runners, theubuntu-22.04-16c-64g-600gb
instance size specifically. We still want16
in CI - we set this explicitly in the GitHub workflows - but are likely to want less locally where some of us only have8
or10
CPUs.--timeout
defaults to30m
. This was changed not that long ago - the value was set to1h
when usingrace
. The bottom line is that none of our test suites should take more than 10 minutes. This is something that we are actively working towards. For now, until the pipeline becomes more reliable, we will stick with the current30m
default. When running locally with fewer CPUs, or when hunting deadlocks, experience recommends a value of 180m.--failfast
is a new one. It sets--max-fails=1
ingotestsum
. FTR:https://github.com/gotestyourself/gotestsum/blob/e9677fb405a5c0dc0b9d7217a7c73c3a20c5cb66/cmd/testdata/gotestsum-help-text#L20
Part of:
dagger call --source=.:default test all
#7387 (comment)FWIW, I am using the following shell function with this change: