2022-01-31 17:54:41

Hey folks, this is kind of niche, but a lot of you have more Windows development experience than I do and I'm at a loss as to what to try next.

Some of you know that I'm having a hard time getting useful panic stacktraces out of Rust, Sentry, and possibly Windows. I include that last bit because it's possible this is OS-specific, and filtering Sentry reports by OS isn't something I've figured out how to do yet. All my filenames/line numbers are <unknown> in any stacktraces I receive over Sentry, despite them seeming to be intact from Linux users running on consoles and copying logs from the terminal. Before I throw in the towel, integrate a file logger, and deal with the eternal stream of people who will report crashes and not send it along, I'd like to make one last stab at debugging this. Here's what I've done so far:

My Cargo.toml includes:

[profile.release]
debug = true
codegen-units = 1
lto = true

I created a small sample project, built in release mode. I started with nothing but a panic. I added my game engine. I initialized it. I changed the project to build as a Rust library, and then called into that library from a second binary. I stripped my release profile down to `debug = true` in case LTO and other optimizations were messing up stacktraces. Each and every time I rebuilt in release mode, caused the panic, and saw it in Sentry. Each time it included filenames/line numbers as it should.

I added code to System Fault to trigger a panic after 30 seconds. I ran it first via `cargo run --release`. Next I built a release package like those I distribute, unarchived it, and ran it from terminal. Then I opened the directory in Explorer and clicked on the executable in case something about the terminal session was making it work. Each time, filenames/line numbers showed up in the crash.

The only thing that's common here is my system. So there are two possibilities:

  • There's some path the code takes that changes whether line numbers make it into stacktraces. I can't think of what this could possibly be--I'm including it to cover all possible scenarios.

  • Something about how the executable is packed means it will always register correct filenames/line numbers on my system, but whatever tables are included aren't valid on other systems for some reason. Again, can't think of what might cause this, particularly as I ran identical builds. I'm also statically linking the MSVC runtime now, though these panics didn't report correctly even before I took that step.

Any ideas? Historically I've used Sentry in something like a Docker container, where I've got a whole lot of control over the runtime environment. Also, all the crashes appear to be in my engine. They're not buried in some DLL that might differ across Windows releases. Even if I were to fix any engine crashes, I'd need their coordinates so I could trace down what's wrong.

I've opened an issue on the Sentry Rust bindings, so hopefully someone there could help. But I thought I'd try here in case there's some obvious Windows-specific thing I'm not doing. It'd be great if my extensive attempts to trigger some of these actually worked, but I guess that'd make my life too easy. and who the hell expects to have an easy life anymore?

Thanks for any pointers.

2022-01-31 18:25:41

On Windows, symbols are generated in a pdb file and not attached to your executable.  You will need to submit your pdb file to Sentry so that they can symbolicate, assuming this is possible.  I'm reasonably sure that the difference between your system and the system of other people is that your system has the .pdb file in the expected place, though I would expect that Sentry wouldn't be smart enough to pick this up if you're shipping to a Sentry instance not on your machine.  That said, we do client-side unwinding, and it's possible Sentry now also has that (but I didn't think so, I thought it was one of our competitive advantages).

I'd suggest Backtrace.io but I don't think we have a working Rust binding because there's actually minimal value in it from a business perspective, and internally I think we see less than 1 panic every couple months in prod so we haven't needed it internally either.  Most Rust errors aren't panics in our experience, unless you write your code in a style which uses panics, and also you probably want panic = "abort" anyway for a few different reasons not least of which is that the default panic behavior causes a bunch of oddness when some of your threads panic but others are still chugging along.  Whether the Sentry bindings can handle that at all though, I don't know; it disables panic handlers, and typically you need a monitoring parent process to catch aborts (we do that via some magic where you call our library and our library does some subprocess magic. So it's possible. But idk how good Sentry is here).

My Blog
Twitter: @ajhicks1992

2022-01-31 20:05:23

Is the PDB a requirement, full stop? There's definitely no .pdb in my release archive, which makes me wonder if I'm including symbols correctly and the issue is a parsing one on Sentry's end. Not sure if `debug = true` correctly creates the .pdb, or if it's embedding tables in such a way that doesn't require a separate file.

panic = "abort" is something I've definitely considered trying. Maybe I'll ship that in the next version and see if it helps. It occurs to me that many of my attempts to artificially induce stacktraces have been fairly simple systems with little or no need to access ECS state, so maybe Bevy schedules them in some way that makes them easier for Sentry to parse. I'll try tucking a crash into a more complex system and seeing if that helps. Thanks.

2022-01-31 20:31:41

I have not yet had to deal with pdbs in Rust but nothing on Windows has good support for Dwarf even though it is technically possible to embed it, so you might want to ask on the Rust discord or something.  If you aren't using the msvc target you should; MinGW on Windows leads to all sorts of weirdness.  But I would be beyond astonished if Rust isn't putting a pdb somewhere, and if it's not then it is possible your config has a problem or something.

The thing about panics is that in Rust threads panic but this doesn't necessarily bring down your entire application.  Unless you e.g. have a mutex or something which will force the panic to "move across" the thread boundary, you can easily end up with some threads which aren't running anymore even though they should be, and the rest of your application tries to chug along even though it can't.  And then you are very sad once you lose a day or two to finding out that's what happened.  Given the choice between panic = "abort" and being able to ship errors somewhere, I'd personally choose panic = "abort" so that the crash is really obvious and not weird heisenbugs.  Obviously you ideally want both, but silent partial failures are no fun.

I'm surprised you're even getting panics outside tests though.  I guess it's just an artifact of how you're writing Rust.  One of the reasons we haven't really written Rust bindings is that in our experience it's rare to panic and you can't otherwise easily detect/get a stack trace from an err variant, but maybe I should re-evaluate how much other people are seeing it and take it  to work as a thing we might want to rethink.  But if I was doing what you're doing, I'd have written my Rust to not panic whenever possible, and only die for truly 100% fatal things (you can, e.g., surface errors in message boxes or whatever in many cases).

My Blog
Twitter: @ajhicks1992

2022-01-31 21:34:10

Thanks. *I'm* actually not panicking. The Bevy developers have essentially decided that any attempt to write to component data for a non-existent entity is an unrecoverable error vs. "Hey, I just blew that enemy up, so it's OK for all these other systems to fail if they can't access its data." There's no error-handling or escape hatch, just a fuckin' panic, and folks telling me this is on me while not giving me very effective tools to do something other than stop the world. I'd change course if I wasn't already entrenched. Hopefully this gets a bit easier to work with soon--I'm probably one of the heavier Bevy users right now, primarily because it lacks features that'd make it appealing for non-audio-only versions of the kinds of games I write.

Anyway, I'll try switching to abort or tracking down a pdb. Is it supposed to live alongside the executable, or is it possible to somehow embed the path? I guess it's possible the executable on my machine is referencing the pdb from target/ regardless of where I copy it, though I'm clutching at straws at this point.

2022-01-31 21:50:19

Usually the pdb is next to the executable, but it doesn't have to be, and someone on the Rust discord can probably just be like "and here is the solution to all your problems".

Shame about bevy.  I've had to write my own ECS for Ammo for other reasons--most ECS are terrible about "this entity will be around for literally 3 months"--but I'm glad I didn't buy in.  I hate libraries which just panic like that.

My Blog
Twitter: @ajhicks1992

2022-01-31 23:30:27

Yeah, Bevvy shouldn't panic like that, they should just allow those systems to fail. That's quite sad. I really liked Bevvy when I last looked into it.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2022-02-01 00:33:54

Looks like my PDBs are giant, on the order of 300 MB. I couldn't ship these even if they did open up stacktraces, unless these include literally every symbol statically linked, including msvcrt140.dll, and I can treeshake out whatever I don't care about somehow.

2022-02-01 00:59:22

You don't ship them to your clients. You upload them to Sentry, unless Sentry dropped the ball on their Rust stuff in ways that would be astonishingly bad.  This is like Linux: stack trace by default is just some addresses, debugger/etc translates via symbols.  But the symbols are outside the executable by default, rather than you needing to strip them before releasing.  It should only matter that both those files are on the machine translating the dump.

My Blog
Twitter: @ajhicks1992

2022-02-01 01:27:39

Got it, thanks. It looks like there's a Sentry CLI that can upload something called Debugging Information Files that look like the compiled equivalent of sourcemaps. I'll probably give that a shot tomorrow.

Is it possible to determine where an executable might be looking for its associated PDB file? It sounds like it can be colocated with the executable--I'm just trying to figure out how stacktraces work fine for me regardless of where my executable is or what I've named it. I suspect rustc may be hard-coding a path at compile time--I'm just wondering if there's any way to investigate that on my own.

Thanks again, folks.

2022-02-01 01:34:17

There is, but I don't remember how this works and it's involved.  Uploading debugging information is what you're looking for, yes.  Can only tell you how ours works but there'll be buttons in there somewhere for native stuff, if they properly support it.  Ymmv though, you're talking to someone working at a company that keeps getting clients from Sentry because we do better at native than they do and they haven't caught up to us yet.  They're a very JS/Python/etc focused solution, and have only recently tried to pivot.

My Blog
Twitter: @ajhicks1992

2022-02-01 02:21:48

Cool, well, maybe I'll look into you folks when I'm done dealing with this bullshit. If I've gotta pay for this then I will, but at this point Sentry brings zero value in their free trial, so I'm not inclined to buy. And I've already got a panic-handler in debug builds, so I don't imagine it'd be hard massaging Rust backtraces into whatever format your API takes.

Thanks again everyone.

2022-02-01 03:05:09

yeah...but we don't have Rust bindings. So lol.  And tbh our frontend accessibility leaves much to be desired, not that Sentry was better last time I tried it.

My Blog
Twitter: @ajhicks1992

2022-02-01 05:27:45 (edited by Ethin 2022-02-01 05:28:45)

@13, I mean, the bindings can be worked around through library calls (dlopen, dlsym, etc. and Windows-specific friends). Its hacky, sure, and relies on the DLL interface not changing, but it should work until you guys do get Rust bindings. Or if its not a DLL but uses HTTP it shouldn't (I don't think) be difficult to make bindings for that via something like Serde, so the non-existent bindings really isn't much of a problem. The inaccessibility of the UI (or if parts of it are accessible but poorly done) is more of an issue though.
I'm making assumptions, mind, so if your system uses a proprietary protocol or something, that's a totally different story.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2022-02-01 05:49:47

It's not proprietary and is in fact open source on github, but iirc it's not as simple as just a C call because you have to do a thing with processes that I don't remember the details of at the moment and there is also a chance that work needs to be done on the backend (though I don't think so; pretty sure Rust works if you get it that far).  In order to capture everything, you have to do a number of weird things, but I don't work closely on this part of the stack and can't quote them offhand.

My Blog
Twitter: @ajhicks1992