Hello,
This is a continuation of the discussion from F36 Change: GNU Toolchain Update.
Uninitialized variables are a big problem. They can be sources of information exposure if parts of a buffer are not initialized. They can also cause unexpected execution paths if the attacker can groom the memory to a value of their choosing. If the variable is a pointer to heap, this can cause free to corrupt memory under certain circumstances. If the uninitialized memory is part of user input, this can lead to improper input validation. This is not hypothetical. All of these come from a paper doing an emprical study of android flaws. [1] The data used in the paper is here. [2]
Part of the problem is that compilers and static analysis tools can't always find them. I created a test program that has 8 uses of unintialized variables. Gcc 11 misses all of them. Gcc 12 finds 2. Clang 13 finds 1. cppcheck finds 2 or 3 - but does so much complaining you'd think it found all. Valgrind finds 2. Flexelint, a commercial linter, finds 1.
Since tools can't always find them, the only option we have right now is force initialization to something the attacker cannot control. Kees Cook started a discussion on the llvm developers mail list a while back. He makes a very clear argument. I would be repeating his points, so please read the original discussion here (also read the replies):
https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
He talks about -ftrivial-auto-var-init=zero being used for production builds and -ftrivial-auto-var-init=<pattern> being used for debug builds. The use is not just the kernel. Consider a server that returns data across the network to a client. It could possibly leak crypto keys or passwords if the returned data structure has uninitialized memory.
For more background, the creator of this technology for LLVM presented a talk about this feature at a past LLVM developer conference:
https://www.youtube.com/watch?v=I-XUHPimq3o
He said this would have prevented over 900 fixed CVE's in Chrome and 12% of all Android CVE's.
From deep inside the LLVM thread above, comes this nugget: --- To add in, we (Microsoft) currently use zero initialization technology in Visual Studio in a large amount of production code we ship to customers (all kernel components, a number of user-mode components). This code is both C and C++.
We already have had multiple vulnerabilities killed because we shipped this technology in production. We received bug reports with repros that worked on older versions of Windows without the mitigation and new versions of Windows that do have it. The new versions don't repro, the old ones do. ---
Microsoft is also digging in to uninitialized variables. They have a lengthy blog post that talks about extending this to heap memory. [3]
I think this would be an important step forward to turn this on across all compilations. We could wipe out an entire class of bugs in one fell swoop. But then, what about heap allocations? Calloc has existed for a long time. It might be worthwhile to have a CFLAG that can tell glibc (or other allocators) to substitute something like calloc for malloc.
Cheers, -Steve
[1] - https://picture.iczhiku.com/resource/paper/shkeTWJEaFUuWCMc.pdf
[2] - http://ml-papers.gitlab.io/android.vulnerabilities-2017/appendix/ MSR2017/vulnerabilitiesList.html
[3] - https://msrc-blog.microsoft.com/2020/07/02/solving-uninitialized-kernel-pool...
On Fri, Jan 21 2022 at 01:04:51 PM -0500, Steve Grubb sgrubb@redhat.com wrote:
He said this would have prevented over 900 fixed CVE's in Chrome and 12% of all Android CVE's.
I believe it. This should be treated like any other security hardening flag.
Michael
On Fri, Jan 21, 2022 at 08:26:00PM -0800, John Reiser wrote:
It might be worthwhile to have a CFLAG that can tell glibc (or other allocators) to substitute something like calloc for malloc.
The environment variable MALLOC_PERTURB_ has been used by glibc malloc for over 15 years.
Just a note that glibc 2.34 (annoyingly) removed this feature, replacing it with a much more complex mechanism. Here's my patch to fix nbdkit:
https://gitlab.com/nbdkit/nbdkit/-/commit/362e0fdcae37db876e13b944102a5c152e...
You'll want to look at the "mentioned in commit" updates on that page too since the initial fix required further tweaking.
Rich.
On Friday, January 21, 2022 11:26:00 PM EST John Reiser wrote:
It might be worthwhile to have a CFLAG that can tell glibc (or other allocators) to substitute something like calloc for malloc.
The environment variable MALLOC_PERTURB_ has been used by glibc malloc for over 15 years.
Fair point. Looking over the man page, it says: Setting MALLOC_PERTURB_ to zero disables the feature. I was hoping memory could be wiped to 0 which is a safer choice based on the literature I've read. But, yeah. An environmental variable would do as a way to measure impact.
-Steve
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
Such problems must be fixed by upstream developers, not by volunteers [package maintainers].
Most upstreams will close such bug reports with "Fedora specific issue" reason, as it was with GLIB assertions. I've manually fixed many of them in my packages.
On Sat, Jan 22, 2022 at 12:36:01PM +0100, Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
Such problems must be fixed by upstream developers, not by volunteers [package maintainers].
Most upstreams will close such bug reports with "Fedora specific issue" reason, as it was with GLIB assertions. I've manually fixed many of them in my packages.
I'm curious here how you think programs could crash with the proposed changes.
For auto (ie stack allocated) variables, as far as any program is concerned this isn't a change at all - after all, an uninitialized stack variable could start off zero as easily as any other value.
However I could see there might possibly be a problem with large stack frames (eg. ones containing multi-page buffers). These would now be written to, rather than left untouched. So stack canaries and guard pages might be touched rather than ignored. I think this would highlight an actual bug in the program.
In all the C programs I'm involved in we have long added GCC warning flags to limit frame sizes and prohibit VLAs, just because you can never really be sure how big the stack is.
There is a concern, I think, with the proposed "always calloc" change, since that would change both the performance and more critically the allocation pattern of the program. A program that mallocs a huge amount of RAM -- assuming vm.overcommit_memory has been changed from its default (eg. 0 -> 1) -- would suddenly begin allocating memory. And generally programs would run a little slower, and some a lot slower.
Rich.
On Sat, 22 Jan 2022 at 14:58, Richard W.M. Jones wrote:
On Sat, Jan 22, 2022 at 12:36:01PM +0100, Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
Such problems must be fixed by upstream developers, not by volunteers [package maintainers].
Most upstreams will close such bug reports with "Fedora specific issue" reason, as it was with GLIB assertions. I've manually fixed many of them in my packages.
I'm curious here how you think programs could crash with the proposed changes.
Vitaly, it looks like you didn't respond to this. I'm also curious why this change would lead to crashes. Are we missing something?
* Jonathan Wakely:
Vitaly, it looks like you didn't respond to this. I'm also curious why this change would lead to crashes. Are we missing something?
I've seen cases where access to uninitialized data was fine as long as the memory location was never zero, something that was always true for how GCC compiled the program at the time.
But I most say that I find the other direction more likely (as in, the program is fine because it works correctly on Fedora).
Thanks, Florian
On Tue, 1 Feb 2022 at 12:03, Florian Weimer fweimer@redhat.com wrote:
- Jonathan Wakely:
Vitaly, it looks like you didn't respond to this. I'm also curious why this change would lead to crashes. Are we missing something?
I've seen cases where access to uninitialized data was fine as long as the memory location was never zero, something that was always true for how GCC compiled the program at the time.
Ah, so uninitialized pointers that were non-zero, and so reading from some arbitrary mapped page. If the pointer gets initialized to zero reading from it would be a segfault, because the zero page isn't mapped.
That seems like an improvement, and worth finding and fixing the code. "Maintainers should not have to fix bugs in their packages" seems like a totally bogus argument to me.
But I most say that I find the other direction more likely (as in, the program is fine because it works correctly on Fedora).
On Saturday, January 22, 2022 6:36:01 AM EST Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
As much as I don't want this to cause unnecessary work for anyone, I also don't want to see preventable exploits happen. Nearly all major software vendors are doing this.
I mentioned in the original proposal that I have a test program with 8 test cases. This is it in case anyone wants to try it out:
#include <stdio.h>
struct test { int one; int two; };
void func2(const struct test *t) { if (t->one == 0) printf("init func2\n");
if (t->two == 0) // Uninitialized 1 printf("uninit func2\n"); }
void func1(struct test *t) { t->one = 1; // two is uninitialized func2(t); }
int func3(int num) { if (num) // Uninitialized 2 return num; else return 0; }
void func4(int *a, int max) { int i; // skip the first for (i=1; i<max; i++) a[i] = 0; }
void func5(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 3 printf("func5: %d\n", i); } }
int func6(const int *num) { if (*num) // Uninitialized 4 return *num; else return 0; }
int j; int func7(void) { return j; // Uninitialized 5 }
void func8(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 6 printf("func8: %d\n", i); } }
enum {RED, AMBER, GREEN, BLACK};
int main(void) { struct test t; int num; int arry[10]; int go; int color = BLACK;
func1(&t); func3(num); func4(arry, 10); func5(arry, 10); func6(&num);
printf("num: %d\n", num); // Uninitialized 7 printf("func7: %d\n", func7()); arry[0] = func7(); // copy uninitiliazed j into arry[0] func8(arry, 10);
switch (color) { case RED: case AMBER: go = 0; break; case GREEN: go = 1; break; }
printf("go :%d\n", go); // Uninitialized 8
return 0; }
Detection results: gcc11 : 0 gcc11+fanalyzer: 0 gcc12: 2 gcc12+fanalyzer: 3 cppcheck: 2 but describes different aspects of the same problems gcc11+asan: 0 gcc11:+ubsan: 0 clang13: 1 valgrind+clang: 0 valgrind+gcc: 2 Flexelint: 1 splint: 2
The one surprising result is that valgrind's results differ by the compiler choice.
-Steve
On 1/22/22 15:00, Steve Grubb wrote:
On Saturday, January 22, 2022 6:36:01 AM EST Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
As much as I don't want this to cause unnecessary work for anyone, I also don't want to see preventable exploits happen. Nearly all major software vendors are doing this.
I strongly support this change. The security benefits far outweigh any risks.
On Sat, Jan 22, 2022 at 03:00:14PM -0500, Steve Grubb wrote:
On Saturday, January 22, 2022 6:36:01 AM EST Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
As much as I don't want this to cause unnecessary work for anyone, I also don't want to see preventable exploits happen. Nearly all major software vendors are doing this.
I mentioned in the original proposal that I have a test program with 8 test cases. This is it in case anyone wants to try it out:
#include <stdio.h>
struct test { int one; int two; };
void func2(const struct test *t) { if (t->one == 0) printf("init func2\n");
if (t->two == 0) // Uninitialized 1 printf("uninit func2\n");
}
void func1(struct test *t) { t->one = 1; // two is uninitialized func2(t); }
int func3(int num) { if (num) // Uninitialized 2 return num; else return 0; }
void func4(int *a, int max) { int i; // skip the first for (i=1; i<max; i++) a[i] = 0; }
void func5(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 3 printf("func5: %d\n", i); } }
int func6(const int *num) { if (*num) // Uninitialized 4 return *num; else return 0; }
int j; int func7(void) { return j; // Uninitialized 5 }
void func8(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 6 printf("func8: %d\n", i); } }
enum {RED, AMBER, GREEN, BLACK};
int main(void) { struct test t; int num; int arry[10]; int go; int color = BLACK;
func1(&t); func3(num); func4(arry, 10); func5(arry, 10); func6(&num); printf("num: %d\n", num); // Uninitialized 7 printf("func7: %d\n", func7()); arry[0] = func7(); // copy uninitiliazed j into arry[0] func8(arry, 10); switch (color) { case RED: case AMBER: go = 0; break; case GREEN: go = 1; break; } printf("go :%d\n", go); // Uninitialized 8 return 0;
}
Detection results: gcc11 : 0 gcc11+fanalyzer: 0 gcc12: 2 gcc12+fanalyzer: 3 cppcheck: 2 but describes different aspects of the same problems gcc11+asan: 0 gcc11:+ubsan: 0 clang13: 1 valgrind+clang: 0 valgrind+gcc: 2 Flexelint: 1 splint: 2
The one surprising result is that valgrind's results differ by the compiler choice.
valgrind seems to do better for me than described. It spots errors in these places:
test.c:13 (Uninitialised 1) test.c:25 (Uninitialised 2) test.c:43 (Uninitialised 3) test.c:50 (Uninitialised 4) test.c:87 (Uninitialized 7) test.c:102 (Uninitialized 8)
So I'd say it's getting 6/8.
I did:
$ gcc -g test.c -o test $ valgrind ./test
using:
gcc-11.2.1-1.fc35.x86_64 valgrind-3.17.0-3.fc35.x86_64
Rich.
On Sat, 2022-01-22 at 20:49 +0000, Richard W.M. Jones wrote:
On Sat, Jan 22, 2022 at 03:00:14PM -0500, Steve Grubb wrote:
On Saturday, January 22, 2022 6:36:01 AM EST Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
As much as I don't want this to cause unnecessary work for anyone, I also don't want to see preventable exploits happen. Nearly all major software vendors are doing this.
I mentioned in the original proposal that I have a test program with 8 test cases. This is it in case anyone wants to try it out:
#include <stdio.h>
struct test { int one; int two; };
void func2(const struct test *t) { if (t->one == 0) printf("init func2\n");
if (t->two == 0) // Uninitialized 1 printf("uninit func2\n");
}
void func1(struct test *t) { t->one = 1; // two is uninitialized func2(t); }
int func3(int num) { if (num) // Uninitialized 2 return num; else return 0; }
void func4(int *a, int max) { int i; // skip the first for (i=1; i<max; i++) a[i] = 0; }
void func5(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 3 printf("func5: %d\n", i); } }
int func6(const int *num) { if (*num) // Uninitialized 4 return *num; else return 0; }
int j; int func7(void) { return j; // Uninitialized 5 }
void func8(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 6 printf("func8: %d\n", i); } }
enum {RED, AMBER, GREEN, BLACK};
int main(void) { struct test t; int num; int arry[10]; int go; int color = BLACK;
func1(&t); func3(num); func4(arry, 10); func5(arry, 10); func6(&num); printf("num: %d\n", num); // Uninitialized 7 printf("func7: %d\n", func7()); arry[0] = func7(); // copy uninitiliazed j into arry[0] func8(arry, 10); switch (color) { case RED: case AMBER: go = 0; break; case GREEN: go = 1; break; } printf("go :%d\n", go); // Uninitialized 8 return 0;
}
Detection results: gcc11 : 0 gcc11+fanalyzer: 0 gcc12: 2 gcc12+fanalyzer: 3 cppcheck: 2 but describes different aspects of the same problems gcc11+asan: 0 gcc11:+ubsan: 0 clang13: 1 valgrind+clang: 0 valgrind+gcc: 2 Flexelint: 1 splint: 2
The one surprising result is that valgrind's results differ by the compiler choice.
valgrind seems to do better for me than described. It spots errors in these places:
test.c:13 (Uninitialised 1) test.c:25 (Uninitialised 2) test.c:43 (Uninitialised 3) test.c:50 (Uninitialised 4) test.c:87 (Uninitialized 7) test.c:102 (Uninitialized 8)
So I'd say it's getting 6/8.
I did:
$ gcc -g test.c -o test $ valgrind ./test
Right. That is because:
return j; // Uninitialized 5
Isn't actually use (it just returns) and j is an global variable, which are already initialized to zero by default.
And so also this:
arry[0] = func7(); // copy uninitiliazed j into arry[0]
Is actually assigning an initialized (zero) value.
So I would give valgrind a 6/6 (100%) score :)
If you compile with -O2 gcc will only leave 2 actual uses of uninitialized values in the code, so valgrind can also only spot 2. The compiler (correctly) determines all other code isn't actually used and so optimizes it out.
Cheers,
Mark
On 1/22/22 10:05 PM, Mark Wielaard wrote:
So I would give valgrind a 6/6 (100%) score :)
But if the compiler starts copying zeros on uninitialized memory, valgrind loses any ability to detect the defect in the code.
Regards.
Hi,
On Thu, Jan 27, 2022 at 10:41:36AM +0100, Roberto Ragusa wrote:
On 1/22/22 10:05 PM, Mark Wielaard wrote:
So I would give valgrind a 6/6 (100%) score :)
But if the compiler starts copying zeros on uninitialized memory, valgrind loses any ability to detect the defect in the code.
Yes. So that is the compromise. You'll always get initialized zeros for local variables, so any usage is defined (though probably buggy). But some of the tools, like valgrind memcheck, will be unable to detect the buggy code.
If you believe the tools we have are pretty bad to begin with and/or not actually used by people to find such bugs then this is a good compromise. If you believe the tools are pretty good for detecting these issues (and I believe they are, the example given was just unfortunate because some of the issues weren't actually bad code and some others were rightfully optimized out, so would never trigger), then it is a bad compromise. But we definitely need to encourage people to use the tools more.
Cheers,
Mark
On Thu, Jan 27 2022 at 11:37:29 AM +0100, Mark Wielaard mark@klomp.org wrote:
If you believe the tools are pretty good for detecting these issues (and I believe they are, the example given was just unfortunate because some of the issues weren't actually bad code and some others were rightfully optimized out, so would never trigger), then it is a bad compromise.
I don't agree. valgrind could detect disaster 100% of the time on certain malicious input, but that does us no good unless somebody tests that same malicious input before the bad guys do. The certainty of knowing that all stack memory is initialized to 0 when our users run the code is way more valuable than the chance that good guys with fuzzers will find a programming error before the bad guys with fuzzers do.
A couple other considerations:
* The vast majority of uninitialized stack memory bugs are "forgot to initialize to NULL or 0," so the effects of these bugs are obviated, and finding them becomes academic rather than interesting. * If you really want to be able to find these bugs using valgrind, recompiling the software for testing purposes is an option. That's not an option for our users who expect Fedora production builds to be as secure as possible.
Michael
Hello Mark,
On Thursday, January 27, 2022 5:37:29 AM EST Mark Wielaard wrote:
On Thu, Jan 27, 2022 at 10:41:36AM +0100, Roberto Ragusa wrote:
On 1/22/22 10:05 PM, Mark Wielaard wrote:
So I would give valgrind a 6/6 (100%) score :)
But if the compiler starts copying zeros on uninitialized memory, valgrind loses any ability to detect the defect in the code.
Yes. So that is the compromise. You'll always get initialized zeros for local variables, so any usage is defined (though probably buggy). But some of the tools, like valgrind memcheck, will be unable to detect the buggy code.
I think people doing debug probably have a special set of compile flags. I do.
If you believe the tools we have are pretty bad to begin with and/or not actually used by people to find such bugs then this is a good compromise.
Nobody has said the tools are "bad". But a couple things have been discovered. One is that the amount of warnings and detections depends on not optimizing. That is also in conflict with detecting run time problems that FORTIFY_SOURCE would have picked up because it only works when optimizing is on.
Valgrind is a valuable tool. I use it all the time to find where something segfaults. (I was active on that mail list around 2003.) I also use it in conjunction with radamsa for fuzzing sometimes. But using it to find uninitialized variables means that you have to traverse the path that leads to the problem. This is not trivial for any medium to large project. So, the need really falls to static analysis/compiler warnings.
If you believe the tools are pretty good for detecting these issues (and I believe they are, the example given was just unfortunate because some of the issues weren't actually bad code and some others were rightfully optimized out, so would never trigger), then it is a bad compromise. But we definitely need to encourage people to use the tools more.
Yes. But at this moment, we need a safety net until detection gets better. The bugs I put into the test program comes from observing a lot a kernel CVE's that I have to create assurance arguments for. They are generally caught by fuzzing. And many are non-obvious because initialization happens in one function and used in another funtion, while the variable is stored in yet another function.
To get detection better, one needs a curated set of bugs to test against. NIST's SARD dataset helps tool developers do just that. But it doesn't have every conceivable variation. The tools are steadily getting better. But I think we need the safety net in the mean time.
Best Regards, -Steve
On Thu, Jan 27, 2022 at 11:37:29AM +0100, Mark Wielaard wrote:
Hi,
On Thu, Jan 27, 2022 at 10:41:36AM +0100, Roberto Ragusa wrote:
On 1/22/22 10:05 PM, Mark Wielaard wrote:
So I would give valgrind a 6/6 (100%) score :)
But if the compiler starts copying zeros on uninitialized memory, valgrind loses any ability to detect the defect in the code.
Yes. So that is the compromise. You'll always get initialized zeros for local variables, so any usage is defined (though probably buggy). But some of the tools, like valgrind memcheck, will be unable to detect the buggy code.
If you believe the tools we have are pretty bad to begin with and/or not actually used by people to find such bugs then this is a good compromise. If you believe the tools are pretty good for detecting these issues (and I believe they are, the example given was just unfortunate because some of the issues weren't actually bad code and some others were rightfully optimized out, so would never trigger), then it is a bad compromise. But we definitely need to encourage people to use the tools more.
It isn't about whether the tools are good or bad. We have many great tools, valgrind included. It about the practicality of using the tools to achieve the same end goal - no use of uninitalized memory.
Valgrind is only win, if it is practical to actually exercise every codepath in question to detect the latent problems.
Unit testing coverage for most projects is nowhere near good enough to exercise all codepaths. Nor is any other functional/integration testing that is done, if any.
Error code paths are a particular problem since they're rarely exercised by normal users, and difficult to exercirse in tests without clever fault injection techniques.
If you have unlimited developer resources, you might write tests to exercise every relevant code path and put this all through valgrind and become confident you don't have uninitialized variables.
Even then, valgrind isn't practical for some apps because of the performance penalty it imposes.
So awesome as valgrind is, from a real world practicality POV, it is a clear win to let the compiler default initialize every variable to zero. It will eliminate a whole class of bugs that will never be reliably identified by the developers, allowing them to spend more time dealing with more important classes of bugs or work on features or whatever else deserves their very limited time.
Regards, Daniel
On Thu, Jan 27, 2022 at 10:41:36AM +0100, Roberto Ragusa wrote:
On 1/22/22 10:05 PM, Mark Wielaard wrote:
So I would give valgrind a 6/6 (100%) score :)
But if the compiler starts copying zeros on uninitialized memory, valgrind loses any ability to detect the defect in the code.
For me, when using valgrind or a fuzzer I'm using upstream, local builds without the Fedora flags, often adding custom flags for ASAN and that kind of thing. IOW I'm treating valgrind and fuzzing as different environments from Fedora "production".
Rich.
On Sat, 2022-01-22 at 15:00 -0500, Steve Grubb wrote:
On Saturday, January 22, 2022 6:36:01 AM EST Vitaly Zaitsev via devel wrote:
On 21/01/2022 19:04, Steve Grubb wrote:
Uninitialized variables are a big problem.
Yes, but as a package maintainer, I don't want to deal with dozens of crashes after this change.
As much as I don't want this to cause unnecessary work for anyone, I also don't want to see preventable exploits happen. Nearly all major software vendors are doing this.
I mentioned in the original proposal that I have a test program with 8 test cases. This is it in case anyone wants to try it out:
#include <stdio.h>
struct test { int one; int two; };
void func2(const struct test *t) { if (t->one == 0) printf("init func2\n");
if (t->two == 0) // Uninitialized 1 printf("uninit func2\n"); }
void func1(struct test *t) { t->one = 1; // two is uninitialized func2(t); }
int func3(int num) { if (num) // Uninitialized 2 return num; else return 0; }
void func4(int *a, int max) { int i; // skip the first for (i=1; i<max; i++) a[i] = 0; }
void func5(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 3 printf("func5: %d\n", i); } }
int func6(const int *num) { if (*num) // Uninitialized 4 return *num; else return 0; }
int j; int func7(void) { return j; // Uninitialized 5 }
void func8(const int *a, int max) { int i; for (i=0; i<max; i++) { if (a[i]) // Uninitialized 6 printf("func8: %d\n", i); } }
enum {RED, AMBER, GREEN, BLACK};
int main(void) { struct test t; int num; int arry[10]; int go; int color = BLACK;
func1(&t); func3(num); func4(arry, 10); func5(arry, 10); func6(&num);
printf("num: %d\n", num); // Uninitialized 7 printf("func7: %d\n", func7()); arry[0] = func7(); // copy uninitiliazed j into arry[0] func8(arry, 10);
switch (color) { case RED: case AMBER: go = 0; break; case GREEN: go = 1; break; }
printf("go :%d\n", go); // Uninitialized 8
return 0; }
Detection results: gcc11 : 0 gcc11+fanalyzer: 0 gcc12: 2 gcc12+fanalyzer: 3
Steve, thanks for putting together these cases.
I've filed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104224 against the gcc analyzer upstream to help me track improving the analyzer on this.
OK if I go ahead and slurp this into the upstream gcc testsuite?
What optimization level were you running -fanalyzer at? (Unfortunately the analyzer is currently affected by that; I'm thinking of moving the analysis pass much earlier so that it isn't).
Running this on Compiler Explorer with gcc trunk with -fanalyzer (no optimizations) is: https://godbolt.org/z/T17TbqYdx
Dave
cppcheck: 2 but describes different aspects of the same problems gcc11+asan: 0 gcc11:+ubsan: 0 clang13: 1 valgrind+clang: 0 valgrind+gcc: 2 Flexelint: 1 splint: 2
The one surprising result is that valgrind's results differ by the compiler choice.
-Steve
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hello Dave,
On Tuesday, January 25, 2022 9:29:53 AM EST David Malcolm wrote:
Steve, thanks for putting together these cases.
I've filed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104224 against the gcc analyzer upstream to help me track improving the analyzer on this.
OK if I go ahead and slurp this into the upstream gcc testsuite?
Sure, go ahead. But I guess the global variable test cases can be removed since the standard says they should be initialized to 0. (I thought only static got that treatment but found out otherwise.)
What optimization level were you running -fanalyzer at?
I was doing all tests at -O2 since that is how things go through the build system. And based on other reports in this thread, it really affects the detection of tools - which is a pity, since a compile at -O2 is what we get via gating tests.
(Unfortunately the analyzer is currently affected by that; I'm thinking of moving the analysis pass much earlier so that it isn't).
Running this on Compiler Explorer with gcc trunk with -fanalyzer (no optimizations) is: https://godbolt.org/z/T17TbqYdx
Interesting. I was also using it with -fanalyzer-verbosity=0 to try to get a concise warning without following all the branches it used to decide.
Also, the tests that I wrote are a starting point. I didn't try anything heap related, as parameters to syscalls, shift operations, floating point calculations, or mathematical expressions. I also did not touch on C++ at all. To really quantify how the various tools stack up (and hopefully to improve detection), we need a good, curated collection of bugs.
Best Regards, -Steve
On Tue, 2022-01-25 at 11:47 -0500, Steve Grubb wrote:
Hello Dave,
On Tuesday, January 25, 2022 9:29:53 AM EST David Malcolm wrote:
Steve, thanks for putting together these cases.
I've filed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104224 against the gcc analyzer upstream to help me track improving the analyzer on this.
OK if I go ahead and slurp this into the upstream gcc testsuite?
Sure, go ahead.
Thanks; done: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9ff3e2368d86c1bf7d1e8876a1... In doing so I noticed that the analyzer was failing to check for uninit in some special-cases (for stdio functions), and so it was failing to warn for the: printf("go :%d\n", go); case. The patch above fixes that.
But I guess the global variable test cases can be removed since the standard says they should be initialized to 0. (I thought only static got that treatment but found out otherwise.)
I kept them, as it's valuable to check that we don't issue a false positive for such code (expressed via the "dg-bogus" directives in the above patch).
What optimization level were you running -fanalyzer at?
I was doing all tests at -O2 since that is how things go through the build system. And based on other reports in this thread, it really affects the detection of tools - which is a pity, since a compile at -O2 is what we get via gating tests.
(Unfortunately the analyzer is currently affected by that; I'm thinking of moving the analysis pass much earlier so that it isn't).
Running this on Compiler Explorer with gcc trunk with -fanalyzer (no optimizations) is: https://godbolt.org/z/T17TbqYdx
Interesting. I was also using it with -fanalyzer-verbosity=0 to try to get a concise warning without following all the branches it used to decide.
I'm experimenting with patches to avoid reporting on unrelated contol flow, to see if this will make the default verbosity level more readable. I've also added events describing where the uninit region of memory is created (stack vs heap), in: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=00e7d024afb80e95fb36d74b1c...
which will hopefully be live in Compiler Explorer within the next 24 hours.
Also, the tests that I wrote are a starting point. I didn't try anything heap related,
For the stack, consider alloca as well as local vars. For the heap, consider malloc (no init) vs calloc (zero-init).
as parameters to syscalls, shift operations, floating point calculations, or mathematical expressions.
I believe -fanalyzer handles all of these cases [1]; I'll add explicit test coverage for them to the GCC testsuite.
I also did not touch on C++ at all. To really quantify how the various tools stack up (and hopefully to improve detection), we need a good, curated collection of bugs.
FWIW I've been experimenting with the Juliet test suite from SARD: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102471 and I've fixed at least one -fanalyzer bug found by running that suite (a leak false-positive).
I'm very interested in any more feedback or ideas you (or others) have on this (including ideas for test cases).
Thanks Dave
[1] implementation-wise: due to all of these going through a check for uninit values in region_model::get_rvalue; GCC 12's -fanalyzer is tracking initialization at the per-bit level
Hi Steve,
On Fri, 2022-01-21 at 13:04 -0500, Steve Grubb wrote:
This is a continuation of the discussion from F36 Change: GNU Toolchain Update.
Uninitialized variables are a big problem. They can be sources of information exposure if parts of a buffer are not initialized. They can also cause unexpected execution paths if the attacker can groom the memory to a value of their choosing. If the variable is a pointer to heap, this can cause free to corrupt memory under certain circumstances. If the uninitialized memory is part of user input, this can lead to improper input validation. This is not hypothetical. All of these come from a paper doing an emprical study of android flaws. [1] The data used in the paper is here. [2]
Part of the problem is that compilers and static analysis tools can't always find them. I created a test program that has 8 uses of unintialized variables. Gcc 11 misses all of them. Gcc 12 finds 2. Clang 13 finds 1. cppcheck finds 2 or 3 - but does so much complaining you'd think it found all. Valgrind finds 2. Flexelint, a commercial linter, finds 1.
Since tools can't always find them, the only option we have right now is force initialization to something the attacker cannot control.
Although I see how having a deterministic bug (use of know bad zero value) is better than having an undeterministic bug (use of undefined value), it is still a bug. And I think you don't give the tools we have much credit.
In my experience almost all such bugs will be found with a combination of gcc -fsanitize=undefined and valgrind memcheck. Your test program didn't trigger most because gcc, when optimizing, is smart enough to simply not emit unused code.
Of course gcc -fsanitize=undefined cannot be used on production code. And while valgrind memcheck can be used on production code, it is often fairly slow (and arguably now it is too late, because already in production).
Although I am not against trying to turn nondeterministic bugs into deterministic ones and getting rid off more undefined code, I am slightly worried it means those bugs will be harder to find in production. Also I really hope we do also encourage people to use the various tools to find those bugs before they get into production. They really aren't as bad at finding these issues as you make them out to be.
Cheers,
Mark
Mark Wielaard mark@klomp.org writes:
Although I am not against trying to turn nondeterministic bugs into deterministic ones and getting rid off more undefined code, I am slightly worried it means those bugs will be harder to find in production. Also I really hope we do also encourage people to use the various tools to find those bugs before they get into production. They really aren't as bad at finding these issues as you make them out to be.
Right. Gfortran has -finit-local-zero (part of "All the world's a VAX") and other -finit- options, and notes that they silence -Wuninitialized.
I don't have measurements -- does someone else? -- but I suspect this at least is something you don't want in performance-critical code of the sort I work with.
Of course gcc -fsanitize=undefined cannot be used on production code.
Why not? Will it find too many errors?
This discussion is at least 5 years old:
https://seclists.org/oss-sec/2016/q1/363
I don't know if the problems have been addressed or if new problems have popped up. The short of it, if you don't read the link above, is that you can use the _OPTIONS environmental variable with a setuid application and clobber any file on the file system.
-Steve
On Fri, 28 Jan 2022 at 16:40, Steve Grubb wrote:
Of course gcc -fsanitize=undefined cannot be used on production code.
Why not? Will it find too many errors?
This discussion is at least 5 years old:
https://seclists.org/oss-sec/2016/q1/363
I don't know if the problems have been addressed or if new problems have popped up. The short of it, if you don't read the link above, is that you can use the _OPTIONS environmental variable with a setuid application and clobber any file on the file system.
(That's about ASan, but UBSAN_OPTIONS will do the same.)
It's worth noting that -fsanitize=undefined -fsanitize-undefined-trap-on-error doesn't use UBSAN_OPTIONS and doesn't require libubsan.so. With the trap-on-error option you just get a crash instead of a user-friendly description of the error, but it does still check for UB and halt the process when it's detected.
On Fri, Jan 21, 2022 at 01:04:51PM -0500, Steve Grubb wrote:
Hello,
This is a continuation of the discussion from F36 Change: GNU Toolchain Update.
snip.
He talks about -ftrivial-auto-var-init=zero being used for production builds and -ftrivial-auto-var-init=<pattern> being used for debug builds. The use is not just the kernel. Consider a server that returns data across the network to a client. It could possibly leak crypto keys or passwords if the returned data structure has uninitialized memory.
snip
I think this would be an important step forward to turn this on across all compilations. We could wipe out an entire class of bugs in one fell swoop.
Fast-forward a few months and I see GCC 12.1 is released now with -ftrivial-auto-var-init option support [2].
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
With regards, Daniel
[1] https://gcc.gnu.org/gcc-12/changes.html [2] https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Optimize-Options.html#index-ft...
Hello,
On Monday, May 9, 2022 5:10:07 AM EDT Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 01:04:51PM -0500, Steve Grubb wrote:
This is a continuation of the discussion from F36 Change: GNU Toolchain Update.
snip.
He talks about -ftrivial-auto-var-init=zero being used for production builds and -ftrivial-auto-var-init=<pattern> being used for debug builds. The use is not just the kernel. Consider a server that returns data across the network to a client. It could possibly leak crypto keys or passwords if the returned data structure has uninitialized memory.
snip
I think this would be an important step forward to turn this on across all compilations. We could wipe out an entire class of bugs in one fell swoop.
Fast-forward a few months and I see GCC 12.1 is released now with -ftrivial-auto-var-init option support [2].
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
I would like to see this happen. But I have not yet tested anything with the flag added. I was under the impression from someone on the gcc team that they wanted to look into this after 12.1 and all of the fallout from that is settled. Maybe now is the time to start looking into it?
I'd need someone from the gcc team to partner on this as I don't have permissions to actually do this.
Best Regards, -Steve
[1] https://gcc.gnu.org/gcc-12/changes.html [2] https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Optimize-Options.html#index-%3... ftrivial-auto-var-init
On Monday, May 9, 2022 5:10:07 AM EDT Daniel P. Berrangé wrote:
On Fri, Jan 21, 2022 at 01:04:51PM -0500, Steve Grubb wrote:
This is a continuation of the discussion from F36 Change: GNU Toolchain Update.
snip.
He talks about -ftrivial-auto-var-init=zero being used for production builds and -ftrivial-auto-var-init=<pattern> being used for debug builds. The use is not just the kernel. Consider a server that returns data across the network to a client. It could possibly leak crypto keys or passwords if the returned data structure has uninitialized memory.
snip
I think this would be an important step forward to turn this on across all compilations. We could wipe out an entire class of bugs in one fell swoop.
Fast-forward a few months and I see GCC 12.1 is released now with -ftrivial-auto-var-init option support [2].
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
I've connected with the gcc folks and we will file a proposal in the near future.
-Steve
Hi Steve,
On Wed, 2022-05-11 at 22:35 -0400, Steve Grubb wrote:
On Monday, May 9, 2022 5:10:07 AM EDT Daniel P. Berrangé wrote:
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
I've connected with the gcc folks and we will file a proposal in the near future.
I am not a fan, because I think this mainly hides bugs. But also because the original change proposal made it sound like we don't have any other way to find and fix these kind of bugs. While a little analysis of your examples showed we can find and fix 100% of these issues with the existing gcc and analysis tools.
So my counter proposal would probably be to enforce -Werror and running all package test-suites under valgrind. But maybe others won't like that "solution".
If you do propose this again could you at least make clear it's another tool in the toolbox, not a replacement, and that the other tools do work, and are trusted (if you pay attention to them). Then at least we could have a honest discussion why (and in which circumstances) each of the tools might or might not work/catch an issue. And if we may/can/should require packagers to pay more attention to compiler warnings and/or running analysis tools over the sources they package.
Thanks,
Mark
On Mon, May 16, 2022 at 7:18 PM Mark Wielaard mjw@fedoraproject.org wrote:
Hi Steve,
On Wed, 2022-05-11 at 22:35 -0400, Steve Grubb wrote:
On Monday, May 9, 2022 5:10:07 AM EDT Daniel P. Berrangé wrote:
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
I've connected with the gcc folks and we will file a proposal in the near future.
I am not a fan, because I think this mainly hides bugs. But also because the original change proposal made it sound like we don't have any other way to find and fix these kind of bugs. While a little analysis of your examples showed we can find and fix 100% of these issues with the existing gcc and analysis tools.
So my counter proposal would probably be to enforce -Werror and running all package test-suites under valgrind. But maybe others won't like that "solution".
<why_not_both.gif>
OK, running all package testsuites under valgrind may be overkill but we should certainly build towards coverage that resembles that.
If you do propose this again could you at least make clear it's another tool in the toolbox, not a replacement, and that the other tools do work, and are trusted (if you pay attention to them). Then at least we could have a honest discussion why (and in which circumstances) each of the tools might or might not work/catch an issue. And if we may/can/should require packagers to pay more attention to compiler warnings and/or running analysis tools over the sources they package.
We should capture the impact of this on analysis tools and perhaps document ways to achieve the full benefit of valgrind when debugging packages. The ideal would be a rawhide-debug (or f37-build-debug, etc) target that disables trivial-auto-var-init and maybe also some other flags to improve debuggability, but that could be a separate proposal.
Thanks, Siddhesh
siddhesh wrote:
[...] The ideal would be a rawhide-debug (or f37-build-debug, etc) target that disables trivial-auto-var-init and maybe also some other flags to improve debuggability, but that could be a separate proposal.
We don't build "debug" variants of the distro RPMs for a variety of reasons. Other than the obvious (extra QA, no user base, etc.), it confuses means and ends. People want to run the real binaries and debug them if necessary. Debuggability is not the 'ends', and having 'debug' binaries is not a means toward debugging the real ones.
- FChE
On 5/11/22 19:35, Steve Grubb wrote:
On Monday, May 9, 2022 5:10:07 AM EDT Daniel P. Berrangé wrote:
[snip]
Fast-forward a few months and I see GCC 12.1 is released now with -ftrivial-auto-var-init option support [2].
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
I've connected with the gcc folks and we will file a proposal in the near future.
Zero is the worst possible auto-int value. It will hide the most bugs. Zero is unrecognizable as an auto-init value by hardware, code, and humans.
The auto-init value should be recognizable instantly in all circumstances. The value which comes to mind is 0x81 in every byte. This is the "core constant" used by the Michigan Terminal System beginning 1967 (yes, 55 years ago!) on IBM S/360-67 and S/370 hardware; see https://en.wikipedia.org/wiki/MTS_system_architecture.
As a signed integer the value is negative, which trips assumptions of "positive or zero" for loop counters or array indices. As a virtual address, (void *)0x818181..81 triggers a fault, thus catching uninit pointers. 0x81 stands out in a hex dump. Programmers quickly learn the 32-bit integer value "-2122219135".
Please do not use zero as the default auto-init value. Use 0x818181..81 instead.
On Mon, May 16 2022 at 07:23:20 AM -0700, John Reiser jreiser@bitwagon.com wrote:
Zero is the worst possible auto-int value. It will hide the most bugs.
That's true, but using zero also converts code execution vulnerabilities into denial of service vulnerabilities. Dereference a NULL pointer and you get a non-exploitable crash. Dereference 0x81818181 and you have a much more serious problem at predictable location.
The goal of this change is to mitigate security bugs, and using a nonzero value does not accomplish that goal.
Michael
On 5/16/22 07:33, Michael Catanzaro wrote:
On Mon, May 16 2022 at 07:23:20 AM -0700, John Reiser jreiser@bitwagon.com wrote:
Zero is the worst possible auto-int value. It will hide the most bugs.
That's true, but using zero also converts code execution vulnerabilities into denial of service vulnerabilities. Dereference a NULL pointer and you get a non-exploitable crash. Dereference 0x81818181 and you have a much more serious problem at predictable location.
The goal of this change is to mitigate security bugs, and using a nonzero value does not accomplish that goal.
Today on x86_64 Linux does not allow 0x8181...81 to be mapped in a user process (except for i686 software running under x86_64 kernel, which may be prevented via configuration choice), so the addressing fault for 0x8181...81 is just as fatal as for zero. SIGSEGV is generated in both cases, and handled independent of address value.
On 16/05/2022 16:23, John Reiser wrote:
Please do not use zero as the default auto-init value. Use 0x818181..81 instead.
Fedora only cares about security issues, not about various logical errors. Setting uninitialized variables to 0 is fine.
Also I think that every such implicit assignment should raise a warning.
On Mon, May 16, 2022 at 07:23:20AM -0700, John Reiser wrote:
On 5/11/22 19:35, Steve Grubb wrote:
On Monday, May 9, 2022 5:10:07 AM EDT Daniel P. Berrangé wrote:
[snip]
Fast-forward a few months and I see GCC 12.1 is released now with -ftrivial-auto-var-init option support [2].
Are you going to take this idea forward and make a formal change proposal for Fedora to set -ftrivial-auto-var-init=zero by default in its default RPM build flags ?
I've connected with the gcc folks and we will file a proposal in the near future.
Zero is the worst possible auto-int value. It will hide the most bugs. Zero is unrecognizable as an auto-init value by hardware, code, and humans.
I guess it depends on the POV, it effectively turns it into a non-bug in the vast majority of cases. Zero initialized is what the programmer would have intended to use in a large % of the time, and is thus likely to result in safe behaviour. Free'ing zero is a no-op for a pointer. De-referencing a NULL pointer is an immediate crash, and easy to trace backwards from - whether it was an explicit or implicit NULL doesn't really matter.
For non-pointer values it is usuaully good, for example as a loop bound it will terminate quickly, for a data size it wouldn't result in huge allocations. The most common case where zero is not good will be for UNIX file descriptors, where -1 is the better value. Still it will be a net win in general IMHO.
Static analysis tools and GCC warnings can still be used at the same time as the automatic zero-init, so maintainers can still get warned about as many of the problemas the tools are able to identify. The remaining cases are made safer by the auto zero init, and often eliminating the crash entirely which is good for users too.
The auto-init value should be recognizable instantly in all circumstances. The value which comes to mind is 0x81 in every byte. This is the "core constant" used by the Michigan Terminal System beginning 1967 (yes, 55 years ago!) on IBM S/360-67 and S/370 hardware; see https://en.wikipedia.org/wiki/MTS_system_architecture.
As a signed integer the value is negative, which trips assumptions of "positive or zero" for loop counters or array indices. As a virtual address, (void *)0x818181..81 triggers a fault, thus catching uninit pointers. 0x81 stands out in a hex dump. Programmers quickly learn the 32-bit integer value "-2122219135".
Please do not use zero as the default auto-init value. Use 0x818181..81 instead.
Using a large value like that is not good for non-pointer variables. It would result in loop iterations burning CPU cycles, or malloc allocations being enourmous, or out of bound array accesses and other bad behaviour.
With regards, Daniel
devel@lists.stg.fedoraproject.org