[sumo-user] Unspecified Fatal Error and blank trace logging outputs

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Harald Schaefer-2

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Jakob Erdmann
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
This is what I found


Screenshot from 2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Jakob Erdmann
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Harald Schaefer-2

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Harald Schaefer-2

Hi Marcelo,

sumo runs in your example in 5 threads (or light weight processes LWP), google for gdb and threads

What is the output of info threads?

You can toggle between the threads by typing

thread n

You should go to the "right" thread and execute bt there

I think you must ensure that the core file is generated by the same binary which is used for gdb

Harald

Am 02.03.21 um 19:18 schrieb Marcelo Andrade Rodrigues D Almeida:
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02
                                14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida

Here it is


Screenshot from 2021-03-03 08-16-46.png
(gdb) thread apply all bt

Thread 5 (Thread 0x7fb486f1e700 (LWP 12551)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311b860) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311b778, cond=0x56197311b838) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311b838, mutex=0x56197311b778) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311b760) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb486f1e700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fb47ef1c700 (LWP 12553)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311be00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311bd18, cond=0x56197311bdd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bdd8, mutex=0x56197311bd18) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311bd00) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb47ef1c700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fb48af1f700 (LWP 12550)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197301a230) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197301a148, cond=0x56197301a208) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197301a208, mutex=0x56197301a148) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197301a130) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb48af1f700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fb482f1d700 (LWP 12552)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311bb30) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311ba48, cond=0x56197311bb08) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bb08, mutex=0x56197311ba48) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311ba30) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb482f1d700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358
#2  0x00005619704a44af in MSEdgeControl::detectCollisions (this=0x561973113e10, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSEdgeControl.cpp:339
#3  0x000056197037d531 in MSNet::simulationStep (this=0x561972b2fd20) at /app/sumo-git/src/microsim/MSNet.cpp:636
#4  0x000056197037a18b in MSNet::simulate (this=0x561972b2fd20, start=0, stop=-1000) at /app/sumo-git/src/microsim/MSNet.cpp:378
#5  0x0000561970376ab8 in main (argc=31, argv=0x7ffcfc014bb8) at /app/sumo-git/src/sumo_main.cpp:98
(gdb)



On Tue, Mar 2, 2021 at 3:33 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

sumo runs in your example in 5 threads (or light weight processes LWP), google for gdb and threads

What is the output of info threads?

You can toggle between the threads by typing

thread n

You should go to the "right" thread and execute bt there

I think you must ensure that the core file is generated by the same binary which is used for gdb

Harald

Am 02.03.21 um 19:18 schrieb Marcelo Andrade Rodrigues D Almeida:
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02
                                14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Jakob Erdmann
Parallelization is a typical source of bugs that are only triggered seldomly and seemingly at random. Please try running without setting any parallelization options and check if the issue persists.
What sumo options were you using?

Am Mi., 3. März 2021 um 12:19 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:

Here it is


Screenshot from 2021-03-03 08-16-46.png
(gdb) thread apply all bt

Thread 5 (Thread 0x7fb486f1e700 (LWP 12551)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311b860) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311b778, cond=0x56197311b838) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311b838, mutex=0x56197311b778) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311b760) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb486f1e700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fb47ef1c700 (LWP 12553)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311be00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311bd18, cond=0x56197311bdd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bdd8, mutex=0x56197311bd18) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311bd00) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb47ef1c700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fb48af1f700 (LWP 12550)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197301a230) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197301a148, cond=0x56197301a208) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197301a208, mutex=0x56197301a148) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197301a130) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb48af1f700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fb482f1d700 (LWP 12552)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311bb30) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311ba48, cond=0x56197311bb08) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bb08, mutex=0x56197311ba48) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311ba30) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb482f1d700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358
#2  0x00005619704a44af in MSEdgeControl::detectCollisions (this=0x561973113e10, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSEdgeControl.cpp:339
#3  0x000056197037d531 in MSNet::simulationStep (this=0x561972b2fd20) at /app/sumo-git/src/microsim/MSNet.cpp:636
#4  0x000056197037a18b in MSNet::simulate (this=0x561972b2fd20, start=0, stop=-1000) at /app/sumo-git/src/microsim/MSNet.cpp:378
#5  0x0000561970376ab8 in main (argc=31, argv=0x7ffcfc014bb8) at /app/sumo-git/src/sumo_main.cpp:98
(gdb)



On Tue, Mar 2, 2021 at 3:33 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

sumo runs in your example in 5 threads (or light weight processes LWP), google for gdb and threads

What is the output of info threads?

You can toggle between the threads by typing

thread n

You should go to the "right" thread and execute bt there

I think you must ensure that the core file is generated by the same binary which is used for gdb

Harald

Am 02.03.21 um 19:18 schrieb Marcelo Andrade Rodrigues D Almeida:
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02
                                14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
"Parallelization is a typical source of bugs that are only triggered seldomly and seemingly at random. Please try running without setting any parallelization options and check if the issue persists."

Yes, I just started running it. It takes ~4x long than the multiprocess version, so it's going to take a while.
I'm also going to check the trace file as you suggested.


"What sumo options were you using?"

['/home/marcelo/code/sumo/bin/sumo',
'-n', '/home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined_buslanes.net.xml',
'-r', '/home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined.rou.xml, /home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined_busses.rou.xml',
'--log', '/home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/output/FRAP/right_on_red/Bologna_small__joined/Bologna_small__joined___03_03_09_42_34__a1474f4c-b923-45c9-9222-aab9da8ebf8e/Bologna_small__joined___03_03_09_42_34__a1474f4c-b923-45c9-9222-aab9da8ebf8e_train_generator_1_round_0__f82d0b56-9594-4b58-853e-7ebe617bc9e4.out.txt',
'--duration-log.statistics', 'True',
'--time-to-teleport', '-1',
'--collision.stoptime', '10',
'--collision.mingap-factor', '0',
'--collision.action', 'warn',
'--collision.check-junctions', 'True',
'--device.rerouting.threads', '4',
'--save-state.rng', 'True',
'--ignore-junction-blocker', '10',
'-a', '/home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined_bus_stops.add.xml, /home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined_detectors.add.xml, /home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined_vtypes.add.xml, /home/marcelo/code/urban-semaphore-optimization/scenario/experimental/Bologna_small-0.29.0/joined/joined_tls.add.xml',
'--step-length', '1']

On Wed, Mar 3, 2021 at 9:35 AM Jakob Erdmann <[hidden email]> wrote:
Parallelization is a typical source of bugs that are only triggered seldomly and seemingly at random. Please try running without setting any parallelization options and check if the issue persists.
What sumo options were you using?

Am Mi., 3. März 2021 um 12:19 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:

Here it is


Screenshot from 2021-03-03 08-16-46.png
(gdb) thread apply all bt

Thread 5 (Thread 0x7fb486f1e700 (LWP 12551)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311b860) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311b778, cond=0x56197311b838) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311b838, mutex=0x56197311b778) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311b760) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb486f1e700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fb47ef1c700 (LWP 12553)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311be00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311bd18, cond=0x56197311bdd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bdd8, mutex=0x56197311bd18) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311bd00) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb47ef1c700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fb48af1f700 (LWP 12550)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197301a230) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197301a148, cond=0x56197301a208) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197301a208, mutex=0x56197301a148) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197301a130) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb48af1f700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fb482f1d700 (LWP 12552)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311bb30) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311ba48, cond=0x56197311bb08) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bb08, mutex=0x56197311ba48) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311ba30) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb482f1d700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358
#2  0x00005619704a44af in MSEdgeControl::detectCollisions (this=0x561973113e10, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSEdgeControl.cpp:339
#3  0x000056197037d531 in MSNet::simulationStep (this=0x561972b2fd20) at /app/sumo-git/src/microsim/MSNet.cpp:636
#4  0x000056197037a18b in MSNet::simulate (this=0x561972b2fd20, start=0, stop=-1000) at /app/sumo-git/src/microsim/MSNet.cpp:378
#5  0x0000561970376ab8 in main (argc=31, argv=0x7ffcfc014bb8) at /app/sumo-git/src/sumo_main.cpp:98
(gdb)



On Tue, Mar 2, 2021 at 3:33 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

sumo runs in your example in 5 threads (or light weight processes LWP), google for gdb and threads

What is the output of info threads?

You can toggle between the threads by typing

thread n

You should go to the "right" thread and execute bt there

I think you must ensure that the core file is generated by the same binary which is used for gdb

Harald

Am 02.03.21 um 19:18 schrieb Marcelo Andrade Rodrigues D Almeida:
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02 15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02 14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from 2021-03-02
                                14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Harald Schaefer-2
In reply to this post by Jakob Erdmann

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald

Am 03.03.21 um 13:35 schrieb Jakob Erdmann:
Parallelization is a typical source of bugs that are only triggered seldomly and seemingly at random. Please try running without setting any parallelization options and check if the issue persists.
What sumo options were you using?

Am Mi., 3. März 2021 um 12:19 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:

Here it is


Screenshot from 2021-03-03 08-16-46.png
(gdb) thread apply all bt

Thread 5 (Thread 0x7fb486f1e700 (LWP 12551)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311b860) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311b778, cond=0x56197311b838) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311b838, mutex=0x56197311b778) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311b760) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb486f1e700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fb47ef1c700 (LWP 12553)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311be00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311bd18, cond=0x56197311bdd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bdd8, mutex=0x56197311bd18) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311bd00) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb47ef1c700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fb48af1f700 (LWP 12550)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197301a230) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197301a148, cond=0x56197301a208) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197301a208, mutex=0x56197301a148) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197301a130) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb48af1f700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fb482f1d700 (LWP 12552)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311bb30) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311ba48, cond=0x56197311bb08) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bb08, mutex=0x56197311ba48) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311ba30) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb482f1d700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358
#2  0x00005619704a44af in MSEdgeControl::detectCollisions (this=0x561973113e10, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSEdgeControl.cpp:339
#3  0x000056197037d531 in MSNet::simulationStep (this=0x561972b2fd20) at /app/sumo-git/src/microsim/MSNet.cpp:636
#4  0x000056197037a18b in MSNet::simulate (this=0x561972b2fd20, start=0, stop=-1000) at /app/sumo-git/src/microsim/MSNet.cpp:378
#5  0x0000561970376ab8 in main (argc=31, argv=0x7ffcfc014bb8) at /app/sumo-git/src/sumo_main.cpp:98
(gdb)



On Tue, Mar 2, 2021 at 3:33 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

sumo runs in your example in 5 threads (or light weight processes LWP), google for gdb and threads

What is the output of info threads?

You can toggle between the threads by typing

thread n

You should go to the "right" thread and execute bt there

I think you must ensure that the core file is generated by the same binary which is used for gdb

Harald

Am 02.03.21 um 19:18 schrieb Marcelo Andrade Rodrigues D Almeida:
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02
                                15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02
                                14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <[hidden email]> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
This is what I found


Screenshot from
                                            2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <[hidden email]> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Jakob Erdmann
Hi Harald,
the loop with the AnyVehicleIterator should never yield nullptrs. Hence the real bug is someplace else. 
The 4 worker threads in the stacktrace are due to  --device.rerouting.threads', '4', which doesn't really help to explain this (parallel routing typically doesn't cause premature vehicle deletion).
Had the threads come from option --threads, that would have been a likely cause of the issue since we have far fewer tests for this.

neverthless @marcello: Please try running without option --device.rerouting.threads and see if you can still trigger the crash.

Either way, I will probably need a traci-traceFile to fix this.

regards,
Jakob

Am Mi., 3. März 2021 um 13:55 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald



_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
I'm redoing the tests with traceGetters set to False to reduce the (huge) file size. Also, I had to restart the tests because someone or something turned off the remote machine overnight.


What I could find so far:

I could retrieve a trace file in the remote server (the huge one) and I found something very odd.

In my trivial test, I found a regular trace file

"traci.start(['/home/marcelo/code/sumo/bin/sumo-gui', '-n', '/home/marcelo/temp2/temp/temp/temp/regular-intersection__right_on_red.net.xml', '-r', '/home/marcelo/temp2/temp/temp/temp/regular-intersection.rou.xml', '--start', 'True'], port=None, label='default')
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
"

In my actual experiment (with multi_processing set to False), all 'traci.simulationStep()' commands are gone (see file attached for complete trace):
"
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrGGGgyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('233', 'rryyyyrrrryy')
traci.trafficlight.setRedYellowGreenState('282', 'rrryyyrrryyy')
traci.trafficlight.setRedYellowGreenState('221', 'yyyrrrryyyyyyyrrrrryyy')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrryyyrr')
traci.trafficlight.setRedYellowGreenState('209', 'ryrGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrrryyyy')
traci.trafficlight.setRedYellowGreenState('273', 'yyyrrrryy')
traci.vehicle.subscribe('Prati_Capraia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Borgo_20_56', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Malvasia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pertini_20_159', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Costa_700_126', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_199', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Gandhi_40_219', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Audinot_3_16', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_200', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Silvani_7_145', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrrrrrrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('282', 'GGgrrrGGgrrr')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrGrrrrr')
traci.trafficlight.setRedYellowGreenState('209', 'GrGGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrGGrrrr')
traci.trafficlight.setRedYellowGreenState('273', 'rrrGGGGrr')
traci.trafficlight.setRedYellowGreenState('233', 'GGrrrrGGGgrr')
traci.trafficlight.setRedYellowGreenState('221', 'rrrGGGGrrrrrrrGGGGGrrr')
"

This was the reported crash from this execution:
#0  0x000055d660c7cf86 in MSVehicle::getBoundingBox() const ()
#1  0x000055d660cfa5b1 in MSLane::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#2  0x000055d660cd8b54 in MSEdgeControl::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#3  0x000055d660c34294 in MSNet::simulationStep() ()
#4  0x000055d660c344a6 in MSNet::simulate(long long, long long) ()
#5  0x000055d660c1c37d in main ()

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What I could find about the trace file generation problem

The problem is that (without multi-processing) traci is discarding the first run trace file info and keeping traces for the following runs.
When running with multi-processing, all traci simulations are handled as first-run (i.e., new processes) and everything is thrown away.

It doesn't matter if it is a regular or a debug build

I'm don't know why the first run is discarded. I'll keep looking


Any new information, I post here


On Wed, Mar 3, 2021 at 10:08 AM Jakob Erdmann <[hidden email]> wrote:
Hi Harald,
the loop with the AnyVehicleIterator should never yield nullptrs. Hence the real bug is someplace else. 
The 4 worker threads in the stacktrace are due to  --device.rerouting.threads', '4', which doesn't really help to explain this (parallel routing typically doesn't cause premature vehicle deletion).
Had the threads come from option --threads, that would have been a likely cause of the issue since we have far fewer tests for this.

neverthless @marcello: Please try running without option --device.rerouting.threads and see if you can still trigger the crash.

Either way, I will probably need a traci-traceFile to fix this.

regards,
Jakob

Am Mi., 3. März 2021 um 13:55 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald


_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

trace_file_log.txt (308K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
Just to clarify

The attached file is one of the finished tests from the last batch (traceGetters False)

But both (this one and the previously traceGetters True) were presenting zero simulation step entries in the trace files.



On Thu, Mar 4, 2021 at 3:21 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I'm redoing the tests with traceGetters set to False to reduce the (huge) file size. Also, I had to restart the tests because someone or something turned off the remote machine overnight.


What I could find so far:

I could retrieve a trace file in the remote server (the huge one) and I found something very odd.

In my trivial test, I found a regular trace file

"traci.start(['/home/marcelo/code/sumo/bin/sumo-gui', '-n', '/home/marcelo/temp2/temp/temp/temp/regular-intersection__right_on_red.net.xml', '-r', '/home/marcelo/temp2/temp/temp/temp/regular-intersection.rou.xml', '--start', 'True'], port=None, label='default')
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
"

In my actual experiment (with multi_processing set to False), all 'traci.simulationStep()' commands are gone (see file attached for complete trace):
"
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrGGGgyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('233', 'rryyyyrrrryy')
traci.trafficlight.setRedYellowGreenState('282', 'rrryyyrrryyy')
traci.trafficlight.setRedYellowGreenState('221', 'yyyrrrryyyyyyyrrrrryyy')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrryyyrr')
traci.trafficlight.setRedYellowGreenState('209', 'ryrGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrrryyyy')
traci.trafficlight.setRedYellowGreenState('273', 'yyyrrrryy')
traci.vehicle.subscribe('Prati_Capraia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Borgo_20_56', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Malvasia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pertini_20_159', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Costa_700_126', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_199', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Gandhi_40_219', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Audinot_3_16', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_200', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Silvani_7_145', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrrrrrrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('282', 'GGgrrrGGgrrr')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrGrrrrr')
traci.trafficlight.setRedYellowGreenState('209', 'GrGGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrGGrrrr')
traci.trafficlight.setRedYellowGreenState('273', 'rrrGGGGrr')
traci.trafficlight.setRedYellowGreenState('233', 'GGrrrrGGGgrr')
traci.trafficlight.setRedYellowGreenState('221', 'rrrGGGGrrrrrrrGGGGGrrr')
"

This was the reported crash from this execution:
#0  0x000055d660c7cf86 in MSVehicle::getBoundingBox() const ()
#1  0x000055d660cfa5b1 in MSLane::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#2  0x000055d660cd8b54 in MSEdgeControl::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#3  0x000055d660c34294 in MSNet::simulationStep() ()
#4  0x000055d660c344a6 in MSNet::simulate(long long, long long) ()
#5  0x000055d660c1c37d in main ()

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What I could find about the trace file generation problem

The problem is that (without multi-processing) traci is discarding the first run trace file info and keeping traces for the following runs.
When running with multi-processing, all traci simulations are handled as first-run (i.e., new processes) and everything is thrown away.

It doesn't matter if it is a regular or a debug build

I'm don't know why the first run is discarded. I'll keep looking


Any new information, I post here


On Wed, Mar 3, 2021 at 10:08 AM Jakob Erdmann <[hidden email]> wrote:
Hi Harald,
the loop with the AnyVehicleIterator should never yield nullptrs. Hence the real bug is someplace else. 
The 4 worker threads in the stacktrace are due to  --device.rerouting.threads', '4', which doesn't really help to explain this (parallel routing typically doesn't cause premature vehicle deletion).
Had the threads come from option --threads, that would have been a likely cause of the issue since we have far fewer tests for this.

neverthless @marcello: Please try running without option --device.rerouting.threads and see if you can still trigger the crash.

Either way, I will probably need a traci-traceFile to fix this.

regards,
Jakob

Am Mi., 3. März 2021 um 13:55 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald


_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Marcelo Andrade Rodrigues D Almeida
I could reproduce the trace file problem

See attached files





On Thu, Mar 4, 2021 at 3:29 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
Just to clarify

The attached file is one of the finished tests from the last batch (traceGetters False)

But both (this one and the previously traceGetters True) were presenting zero simulation step entries in the trace files.



On Thu, Mar 4, 2021 at 3:21 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I'm redoing the tests with traceGetters set to False to reduce the (huge) file size. Also, I had to restart the tests because someone or something turned off the remote machine overnight.


What I could find so far:

I could retrieve a trace file in the remote server (the huge one) and I found something very odd.

In my trivial test, I found a regular trace file

"traci.start(['/home/marcelo/code/sumo/bin/sumo-gui', '-n', '/home/marcelo/temp2/temp/temp/temp/regular-intersection__right_on_red.net.xml', '-r', '/home/marcelo/temp2/temp/temp/temp/regular-intersection.rou.xml', '--start', 'True'], port=None, label='default')
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
"

In my actual experiment (with multi_processing set to False), all 'traci.simulationStep()' commands are gone (see file attached for complete trace):
"
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrGGGgyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('233', 'rryyyyrrrryy')
traci.trafficlight.setRedYellowGreenState('282', 'rrryyyrrryyy')
traci.trafficlight.setRedYellowGreenState('221', 'yyyrrrryyyyyyyrrrrryyy')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrryyyrr')
traci.trafficlight.setRedYellowGreenState('209', 'ryrGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrrryyyy')
traci.trafficlight.setRedYellowGreenState('273', 'yyyrrrryy')
traci.vehicle.subscribe('Prati_Capraia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Borgo_20_56', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Malvasia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pertini_20_159', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Costa_700_126', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_199', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Gandhi_40_219', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Audinot_3_16', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_200', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Silvani_7_145', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrrrrrrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('282', 'GGgrrrGGgrrr')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrGrrrrr')
traci.trafficlight.setRedYellowGreenState('209', 'GrGGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrGGrrrr')
traci.trafficlight.setRedYellowGreenState('273', 'rrrGGGGrr')
traci.trafficlight.setRedYellowGreenState('233', 'GGrrrrGGGgrr')
traci.trafficlight.setRedYellowGreenState('221', 'rrrGGGGrrrrrrrGGGGGrrr')
"

This was the reported crash from this execution:
#0  0x000055d660c7cf86 in MSVehicle::getBoundingBox() const ()
#1  0x000055d660cfa5b1 in MSLane::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#2  0x000055d660cd8b54 in MSEdgeControl::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#3  0x000055d660c34294 in MSNet::simulationStep() ()
#4  0x000055d660c344a6 in MSNet::simulate(long long, long long) ()
#5  0x000055d660c1c37d in main ()

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What I could find about the trace file generation problem

The problem is that (without multi-processing) traci is discarding the first run trace file info and keeping traces for the following runs.
When running with multi-processing, all traci simulations are handled as first-run (i.e., new processes) and everything is thrown away.

It doesn't matter if it is a regular or a debug build

I'm don't know why the first run is discarded. I'll keep looking


Any new information, I post here


On Wed, Mar 3, 2021 at 10:08 AM Jakob Erdmann <[hidden email]> wrote:
Hi Harald,
the loop with the AnyVehicleIterator should never yield nullptrs. Hence the real bug is someplace else. 
The 4 worker threads in the stacktrace are due to  --device.rerouting.threads', '4', which doesn't really help to explain this (parallel routing typically doesn't cause premature vehicle deletion).
Had the threads come from option --threads, that would have been a likely cause of the issue since we have far fewer tests for this.

neverthless @marcello: Please try running without option --device.rerouting.threads and see if you can still trigger the crash.

Either way, I will probably need a traci-traceFile to fix this.

regards,
Jakob

Am Mi., 3. März 2021 um 13:55 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald


_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

example.py (3K) Download Attachment
temp.zip (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Jakob Erdmann
Thanks for the example. The problem was due to https://github.com/eclipse/sumo/issues/8320

Am Do., 4. März 2021 um 20:16 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <[hidden email]>:
I could reproduce the trace file problem

See attached files





On Thu, Mar 4, 2021 at 3:29 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
Just to clarify

The attached file is one of the finished tests from the last batch (traceGetters False)

But both (this one and the previously traceGetters True) were presenting zero simulation step entries in the trace files.



On Thu, Mar 4, 2021 at 3:21 PM Marcelo Andrade Rodrigues D Almeida <[hidden email]> wrote:
I'm redoing the tests with traceGetters set to False to reduce the (huge) file size. Also, I had to restart the tests because someone or something turned off the remote machine overnight.


What I could find so far:

I could retrieve a trace file in the remote server (the huge one) and I found something very odd.

In my trivial test, I found a regular trace file

"traci.start(['/home/marcelo/code/sumo/bin/sumo-gui', '-n', '/home/marcelo/temp2/temp/temp/temp/regular-intersection__right_on_red.net.xml', '-r', '/home/marcelo/temp2/temp/temp/temp/regular-intersection.rou.xml', '--start', 'True'], port=None, label='default')
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
traci.trafficlight.setPhase('gneJ0', 0)
"

In my actual experiment (with multi_processing set to False), all 'traci.simulationStep()' commands are gone (see file attached for complete trace):
"
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrGGGgyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyyyyyrrrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('233', 'rryyyyrrrryy')
traci.trafficlight.setRedYellowGreenState('282', 'rrryyyrrryyy')
traci.trafficlight.setRedYellowGreenState('221', 'yyyrrrryyyyyyyrrrrryyy')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrryyyrr')
traci.trafficlight.setRedYellowGreenState('209', 'ryrGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrrryyyy')
traci.trafficlight.setRedYellowGreenState('273', 'yyyrrrryy')
traci.vehicle.subscribe('Prati_Capraia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Borgo_20_56', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Malvasia_100_70', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pertini_20_159', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Costa_700_126', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_199', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Gandhi_40_219', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Audinot_3_16', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Pepoli_10_200', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.vehicle.subscribe('Silvani_7_145', [66, 64, 122, 86, 183, 76, 72, 68, 81, 71, 77, 67, 181])
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrryyyyrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('231', 'rrrrGGggrrrrrrrrrrrGGrrGGGGggrrrrrrrrrrrrGGg')
traci.trafficlight.setRedYellowGreenState('282', 'GGgrrrGGgrrr')
traci.trafficlight.setRedYellowGreenState('220', 'GGGrrrrGrrrrr')
traci.trafficlight.setRedYellowGreenState('209', 'GrGGGrr')
traci.trafficlight.setRedYellowGreenState('210', 'rrrGGGGGrrrrrrGGrrrr')
traci.trafficlight.setRedYellowGreenState('273', 'rrrGGGGrr')
traci.trafficlight.setRedYellowGreenState('233', 'GGrrrrGGGgrr')
traci.trafficlight.setRedYellowGreenState('221', 'rrrGGGGrrrrrrrGGGGGrrr')
"

This was the reported crash from this execution:
#0  0x000055d660c7cf86 in MSVehicle::getBoundingBox() const ()
#1  0x000055d660cfa5b1 in MSLane::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#2  0x000055d660cd8b54 in MSEdgeControl::detectCollisions(long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#3  0x000055d660c34294 in MSNet::simulationStep() ()
#4  0x000055d660c344a6 in MSNet::simulate(long long, long long) ()
#5  0x000055d660c1c37d in main ()

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What I could find about the trace file generation problem

The problem is that (without multi-processing) traci is discarding the first run trace file info and keeping traces for the following runs.
When running with multi-processing, all traci simulations are handled as first-run (i.e., new processes) and everything is thrown away.

It doesn't matter if it is a regular or a debug build

I'm don't know why the first run is discarded. I'll keep looking


Any new information, I post here


On Wed, Mar 3, 2021 at 10:08 AM Jakob Erdmann <[hidden email]> wrote:
Hi Harald,
the loop with the AnyVehicleIterator should never yield nullptrs. Hence the real bug is someplace else. 
The 4 worker threads in the stacktrace are due to  --device.rerouting.threads', '4', which doesn't really help to explain this (parallel routing typically doesn't cause premature vehicle deletion).
Had the threads come from option --threads, that would have been a likely cause of the issue since we have far fewer tests for this.

neverthless @marcello: Please try running without option --device.rerouting.threads and see if you can still trigger the crash.

Either way, I will probably need a traci-traceFile to fix this.

regards,
Jakob

Am Mi., 3. März 2021 um 13:55 Uhr schrieb Harald Schaefer <[hidden email]>:

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald


_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
[hidden email]
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
12