-
-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queued script runs executing a command lead to a disallowed recursion #117745
Comments
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) script documentation |
Hey there @Danielhiversen, @felipediel, @L-I-Am, @eifinger, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) broadlink documentation |
The integration causing the issue should be: Scripts (documentation, issues) |
Hi! I've got the same problem. I use a template to control Modbus lights and covers. In these templates, for the action to open/close/turn on/off, I'm using a script that takes the Modbus register number as an argument. The script is in "queued" mode. Everything worked flawlessly until the 2024.03 or 04 update (I'm not sure which one). Using a single cover or light still works without any problems, but when it is run in an automation for more than one device, I get an error: "Disallowed recursion detected." Only the first cover/light is affected (e.g., the automation to close all covers closes only one). What is more suspicious is that when I try to close/open a group of covers or activate a light scene using the dedicated UI dialog, everything works. But placing the same group/scene in an automation fails with "Disallowed recursion detected." |
@bdraco is this issue possibly connected to the recent changes making script executions more efficient? |
The only recent change to the recursion detection code was in #116247. You could try reverting that PR, but that doesn't seem likely to be the cause since the effective change there was only to remove duplicate code. |
It looks like the code is nearly the same since it was first added in 65fbcfa so I'm not sure whats going on here |
If we take the sleep out, it will probably fix this issue diff --git a/homeassistant/helpers/script.py b/homeassistant/helpers/script.py
index c268a21758f..080cc3a1339 100644
@@ -1736,10 +1722,6 @@ class Script:
# runs before sleeping as otherwise if two runs are started at the exact
# same time they will cancel each other out.
self._log("Restarting")
- # Important: yield to the event loop to allow the script to start in case
- # the script is restarting itself so it ends up in the script stack and
- # the recursion check above will prevent the script from running.
- await asyncio.sleep(0)
await self.async_stop(update_state=False, spare=run)
if started_action:
diff --git a/tests/components/automation/test_init.py b/tests/components/automation/test_init.py
index edf0eff878b..9286740dea4 100644
--- a/tests/components/automation/test_init.py
+++ b/tests/components/automation/test_init.py
@@ -2733,6 +2733,7 @@ async def test_recursive_automation_starting_script(
{"platform": "event", "event_type": "trigger_automation"},
],
"action": [
+ {"delay": "0.1"},
{"service": "test.automation_started"},
{"service": "script.script1"},
],
@@ -2811,8 +2812,8 @@ async def test_recursive_automation(
{"platform": "event", "event_type": "trigger_automation"},
],
"action": [
- {"event": "trigger_automation"},
{"service": "test.automation_done"},
+ {"event": "trigger_automation"},
],
}
}, but than we will loose some more loop protection and these tests will fail
|
there is a race in the _QueuedScriptRun as well .. but not sure its related to this issue |
There was a race here. We can also use the lock as a context manager which simplifies the code quite a bit found while looking at #117745
I fixed that race in #118002 |
That race is very narrow so I didn't think it was the issue, but after looking at it again, I can't see what else it could be.
I had to close that PR as it will have other side effects. Best to wait for more information since I can't replicate the problem yet. |
Also, I can't replicate the problem, and since you haven't posted the script in question (or I've missed it somehow), I'm blocked from continuing to investigate until I can see the script and try to write a test for it. |
Here is the script.
I've made a bisect and found the first commit in which the bug seems to have been introduced:
|
That commit changed the timing to execute the script right away instead of delaying it until the next run of the event loop. Unfortunately that means we have a race somewhere in the script code that has gone unnoticed. I haven't been able to find it. |
I guess it could be in broadlink as well and that's why I can't replicate |
Thinking about this some more, before that change we would catch extra runs too late. It could mean there is a problem with locking. It's also possible you have an other automation that is unexpected calling the queued script at the same time. |
I think I'm going to have to get a broadlink device test with, since I spent an hour and a half poking at this and haven't been able to get it to happen |
Some more info to triangulate this issue:
|
Maybe we have a contextvar leak in the code somewhere.... This one is a bit of a stumper |
Hi! Yes, I have the same problem. I don't use broadlink integration. I get this error in a script that is using Modbus integration.
This scrip is used in a template:
When in automation I turn on/off more then one light (it does not matter if there is a group or a scene of lights or lights are added one by one) I get the error "Disallowed recursion detected". |
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) automation documentation |
I'm having no luck finding the issue. We could revert bdede0e and it will fix it for this case, but it means we are sweeping the problem under the rug instead of fixing the root cause and it will probably come up again the next time we change something else. |
At this point I'm ~7+ hours into investigating this without a viable test case and I think I'm going to have to throw in the towel as I can't find the root cause. I think we need to wait for someone else to take a look who will have another perspective on how to locate the issue. Hopefully its something obvious to someone else and I've made a silly mistake writing the test case or I'm thinking about this the wrong way. |
Test is still successful on 2024.5.5 as well |
I've been working under the assumption this is a bug in the script engine, but maybe its not and it actually is the script calling itself indirectly. |
I have been investigating #117745 for a few hours and have not been able to replicate the problem or write a failing test case for it. The logging does not show the stack so it is difficult to figure out where the recursion is coming from.
#118151 will log the stack when this hit the disallowed recursion so we can tell if the script is getting called in a loop somehow or there is an actual bug in the script code or the problem is somewhere else |
I just found #117251 which has a smaller reproducer. I'll try to write a test with that |
I was able to write a test from the reproducer in that issue #118153 |
That is looking more likely based on #117251 (comment) I'm going to close this issue as a duplicate of #117251 since we have a reproducer in there. Please continue in that issue as its the same problem. |
The problem
I have several blinds that are controlled via a Broadlink RM4 pro remote. I'm sending base64 codes via script, as suggested in the documentation, and can control the blinds individually without issue.
I've used the template platform to define open, close and stop cover commands and grouped the blinds using the "group" platform (see yaml snippets below). I could then open, close and stop blind groups. The send_command script is executed in queued mode.
This worked fine until HA 2024.3 and is broken now in HA 2024.5. Only one blind in the group is following the command (see also error message in the logs reported below).
What version of Home Assistant Core has the issue?
core-2024.5.3
What was the last working version of Home Assistant Core?
core-2024.3
What type of installation are you running?
Home Assistant Supervised
Integration causing the issue
Broadlink
Link to integration documentation on our website
https://www.home-assistant.io/integrations/broadlink/
Diagnostics information
home-assistant_broadlink_2024-05-19T20-01-20.921Z.log
Example YAML snippet
Anything in the logs that might be useful for us?
Additional information
If I run the script in parallel mode, not all blinds seem to receive the command. I receive then the following error messages in the logs:
The text was updated successfully, but these errors were encountered: