As we start a new year, I'm thinking about data retention in general. :)
In my experience, it's pretty rare on an end-user laptop or desktop system for logs from much more than the previous boot to be interesting. Maybe I occasionally want to look back a little while to see if a problem just started. It's exceedingly rare that I need (or want) to look back more than a month.
Right now, we don't set MaxRetentionSec, so journal expiry on Workstation is entirely based on disk usage.
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
Matthew Miller wrote:
Right now, we don't set MaxRetentionSec, so journal expiry on Workstation is entirely based on disk usage.
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
I don't think we should be destroying data by default. There should be no expiry by default.
Kevin Kofler
On Sat, Jan 02, 2021 at 12:53:52AM +0100, Kevin Kofler via devel wrote:
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
I don't think we should be destroying data by default. There should be no expiry by default.
"Destroying data" seems a bit dramatic. We clean up temp files and so on all the time. And the logs, of course, get trimmed if disk space is tight, so there's already that. Keeping data forever just because isn't necessarily inherently better. At $formeremployer, we had a policy of not keeping anything longer than a year unless mandated to for regulatory reasons.
On Fri, Jan 1, 2021 at 7:02 PM Matthew Miller mattdm@fedoraproject.org wrote:
On Sat, Jan 02, 2021 at 12:53:52AM +0100, Kevin Kofler via devel wrote:
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
I don't think we should be destroying data by default. There should be no expiry by default.
"Destroying data" seems a bit dramatic. We clean up temp files and so on all the time. And the logs, of course, get trimmed if disk space is tight, so there's already that. Keeping data forever just because isn't necessarily inherently better. At $formeremployer, we had a policy of not keeping anything longer than a year unless mandated to for regulatory reasons.
There are religious wars about this over in the source control world. To some, the history is more important than the current state of the system. Convincing some advocates of this, that the history itself is also a program that is amenable to editing and modification, is sometimes quite an adventure.
On Fri, Jan 1, 2021 at 4:54 PM Kevin Kofler via devel devel@lists.fedoraproject.org wrote:
Matthew Miller wrote:
Right now, we don't set MaxRetentionSec, so journal expiry on Workstation is entirely based on disk usage.
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
I don't think we should be destroying data by default. There should be no expiry by default.
Indirectly they already expire by default. It's just a different expiration date for everyone, because the current policy depends on a combination of free space remaining and maximum number of files.
Once upon a time, Chris Murphy lists@colorremedies.com said:
On Fri, Jan 1, 2021 at 4:54 PM Kevin Kofler via devel devel@lists.fedoraproject.org wrote:
I don't think we should be destroying data by default. There should be no expiry by default.
Indirectly they already expire by default. It's just a different expiration date for everyone, because the current policy depends on a combination of free space remaining and maximum number of files.
For systems with rsyslog installed, there's also already a default from logrotate, so setting a default expiration has plenty of precedent. And having a defined period would be MUCH better for consistency. Maybe I would add a note wherever that default is set that lower disk space could mean a shorter time though.
Matthew Miller mattdm@fedoraproject.org writes:
As we start a new year, I'm thinking about data retention in general. :)
In my experience, it's pretty rare on an end-user laptop or desktop system for logs from much more than the previous boot to be interesting. Maybe I occasionally want to look back a little while to see if a problem just started. It's exceedingly rare that I need (or want) to look back more than a month.
Right now, we don't set MaxRetentionSec, so journal expiry on Workstation is entirely based on disk usage.
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
That sounds very reasonable to me. As long as we document this properly, anyone who wants to keep logs for ever, can change the setting themselves pretty easily.
Cheers,
Dan
Matthew Miller mattdm@fedoraproject.org writes:
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
If there are non-negligible speed impacts from large logs, this seems like a problem with systemd-journald - plain text doesn't have this problem.
Thanks, --Robbie
On Fri, 2021-01-01 at 12:15 -0500, Matthew Miller wrote:
As we start a new year, I'm thinking about data retention in general. :)
In my experience, it's pretty rare on an end-user laptop or desktop system for logs from much more than the previous boot to be interesting. Maybe I occasionally want to look back a little while to see if a problem just started. It's exceedingly rare that I need (or want) to look back more than a month.
Right now, we don't set MaxRetentionSec, so journal expiry on Workstation is entirely based on disk usage.
Logs can accidentally contain sensitive data, and it's just plain faster to work with them when there's less. I propose we set this to something like six months by default.
Hi,
On RHEL/Centos 7. I notice that log rotate default is 4 weeks , I guess this is because is the default of Fedora ... . So we should think the default we use for Fedora one day will be the default for RHEL .
I use 52 weeks in my machines or even 104 weeks if they are important, because specially on security issues , we need dig for more than 2 or 3 months and for statistic like watch disk usage , etc Like logrotate, I'd like have journal retention for one year / 52 weeks .
Best regards,
On Sat, Jan 23, 2021 at 07:02:17PM +0000, Sérgio Basto wrote:
I use 52 weeks in my machines or even 104 weeks if they are important, because specially on security issues , we need dig for more than 2 or 3 months and for statistic like watch disk usage , etc Like logrotate, I'd like have journal retention for one year / 52 weeks
This might be a case where we'd want different defaults for different Editions and Spins.
Once upon a time, Sérgio Basto sergio@serjux.com said:
I use 52 weeks in my machines or even 104 weeks if they are important, because specially on security issues , we need dig for more than 2 or 3 months and for statistic like watch disk usage , etc Like logrotate, I'd like have journal retention for one year / 52 weeks
For server setups (where longer logs might be important), I send logs to a remote system. I have systems that have a gig of compressed logs daily - I don't set up my VMs with an extra 50-100G of disk for local logs.
Log rotation is something that can have vastly different requirements for different environments - I don't think it is up to a distribution to try to determine that and match them. Stick with one more or less sane default and apply it to everything (e.g. logrotate and journal retention should have similar settings). Beyond that, just document how to adjust them. Don't try to out-smart or second-guess environments.
On Sat, Jan 23, 2021 at 02:32:15PM -0600, Chris Adams wrote:
Log rotation is something that can have vastly different requirements for different environments - I don't think it is up to a distribution to try to determine that and match them. Stick with one more or less sane default and apply it to everything (e.g. logrotate and journal retention should have similar settings). Beyond that, just document how to adjust them. Don't try to out-smart or second-guess environments.
Sure, it should be easy to change, but that doesn't mean we shouldn't also look at defaults. Again, particularly on desktops where having years of logs probably more of a problem than a benefit.
On Sat, Jan 23, 2021 at 3:39 pm, Matthew Miller mattdm@fedoraproject.org wrote:
Sure, it should be easy to change, but that doesn't mean we shouldn't also look at defaults. Again, particularly on desktops where having years of logs probably more of a problem than a benefit.
One year sounds like a good default to me.
Hi,
Workstation working group discussed this at a recent meeting, and reached no decision. There's general agreement that a time based limitation should apply rather than disk usage limit, to make the retention more predictable. I think this can work by default for all Fedora editions and spins. Consider this a pre-change proposal for Fedora 35.
Summary of the current behaviors:
* Fedora Server, minimally configured from out of the box, the current 4G limit translates into ~60 months of journal retention.
* Fedora Workstation, out of the box, the same 4G limit translates into ~12 months of journal retention.
* In practice, max used space retention limit has a large range of implied retention time, depending on the configuration.
* rsyslog+logrotate are in editions and spins using @standard, which is pretty much all except Workstation edition. Logrotate's default policy is a 4 week retention before logs are purged. This is also used at least since RHEL 7 with the same default.
About journald.conf's MaxRetentionSec= "This controls whether journal files containing entries older than the specified time span are deleted." That is, once a journal file contains an entry older than the specified time, the entire file is purged.
Journal files can grow up to 128M, however MaxFileSec= defaults to 1 month, so a journal file shouldn't contain more than one month of entries. A single change to 'MaxRetentionSec=6month' combined with other existing defaults, means a max of 5-6 months of journal entries.
To get systemd-journald more like the rsyslog+logrotate default, 'MaxRetentionSec=5week' and 'MaxFileSec=1week'. That would permit a float of 4-5 weeks, tending to be closer to 4 weeks.
My take is that probably most folks are a bit surprised by the logrotate 4 week default, and prefer a longer retention time. And yet they still want something more consistent and shorter than the present defaults. While there are other possibilities, 6 months is both more consistent and shorter than today, and longer than the 4 week logrotate default.
Per the language in man pages, my working assumption is all Max limits are in effect. If any Max value is busted, vacuuming is triggered. If anyone knows differently, now would be a good time to know!
-- Chris Murphy
devel@lists.stg.fedoraproject.org