You are not logged in.
T H E D O C U M E N T A T I O N D O C U M E N T A T I O N
People always ask me how to maintain a complex IT system without getting into trouble. There are some basic rules:
Don't test new things in a productive environment.
Write clean scripts instead of doing complex things manually.
Always comment your code.
Create documentation of what you did.
This text is my README on how to document your operating system.
0. Why documentation?
1. What should I?
1.1 Practice versus Requirements
1.2 Package selection pitfalls
1.3 Changes to System Files
1.4 Non-persistant Changes
1.5 DON'T SAVE SENSITIVE DATA
1.6 Whatever you tend to forget
2. Types
2.1 File-centric
2.2 Task-centric
2.3 Timestamp-centric
3. Methods
3.1 One big file
3.2 Folders with files
3.3 Database
3.4 Table of Contents
3.5 Categories
3.6 Syntax
4. Tools
4.1 (e)grep
4.2 cat
4.3 sed and awk
4.4 vim or emacs
4.5 Zim
4.6 Be creative!
5. Pitfalls
5.1 Not enough data
5.2 Too much information
5.3 Flat syntax
5.4 Too complex
0. Why documentation?
Every time you change an entity of a complex system, you do this with a certain result in mind. If you suddenly are after another result, you have to adapt your changes accordingly.
When you build a house you arrage the power lines in a certain fashion that suits your needs. Those needs might change in a year (or ten) or something might break. You have to decide whether to rebuild the whole house or to alter your cable configuration. Imagine yourself ripping all walls open, just to see which cord goes where. You will, of course, take notes about what's where, so you safe yourself some time in the future.
So, if you have to study the whole system every time you apply changes, you will end up looking at the same unchanged entities over and over again. To prevent you from doing this, you have to memorize the changes to all entities of the system.
Some things are easily memorized. If you break up with your partner, you might have little to no problems remembering the next day, that you are not a couple anymore. There's no need to take a note, as the system changes dramatically enough to reflect the absence of the entity "partner", as the parent entity "couple" is gone as well (might depend on your reasons to break up).
Now we build our house. Is the exact circuit configuration relevant for your every day life? Would you notice, if your power lines were put under your feet in the floor instead of the lower half of the wall, if the outlet would be at the same position? It wouldn't matter, unless you try to drill some holes in the wall to install a cupboard, when you'll ask yourself whether it's safe or not. The "cord" entity wouldn't have any impact, unless you try adding the "hole" entity.
Let's say you built your house yourself. You installed all the cables on your own. You perfectly know where everything is, approximatle until the day a new "partner" entity is installed, filling up your space with "personal.preferences.partner". You'll forget eventually.
This is why people keep documentations. It can save a lot of time and trouble.
1. What should I?
Everything that is not obvious should be documented.
1.1 Practice versus Requirements
Let's move closer to software systems, especially your Linux computer. Or better: my Linux computer. There are certain things I know by heart. I know how to operate my package manager (pacman). I know that everything that is not yet installed either goes with -S or -U, I know that all the local stuff goes With -R, -Q or -D. I know how to chain commands in pacman and I know the limitations. In fact, I know all those commands and switches. The most important thing is, that I know where to find the documentation, it's only a "man pacman" away. Should I ever forget how to syncronize the package cache, I read the man file.
Debian is different. We have apt-cache, apt-get, aptitude and dpkg (and some dpkg-* commands). There is also dselect, which might or might not be deprecated (one never knows). As I don't use Debian at home, I have very little practice, but I know where to find most informations. "man [command]", the debian wiki, the ubuntu wiki and so on, will help me out. Sometimes I don't have time to read all the documentation at work, so I search for the switches and commands in the morning and take them down on a so called "cheat sheet", a scrap of paper with some commands on it.
1.2 Package selection pitfalls
Imagine you just killed your Linux partition, your last backup is a year old and you need your computer to do something important. Wouldn't it be nice to know what exactly you had installed?
Most package managers allow you to save a list of all installed packages. This is good, you'll be able to install all of them by feeding the package manager with that list. This is good already, but it might mess up your system, because around 50-90% of all packages are just dependencies. If your package manager has flags for explicitly installed packages vs. depedencies, you'll be in a dirty administration hell, as one of the most useful features of your package manager is now only eating space.
Some package managers allow you to filter the explicit packages (like pacman -Qe). This is better. The dependencies will be pulled from the repositories automatically. There is only one problem: The packages installed by the OS installer will be reinstalled this way as well. This might or might not be a problem for most users, but remember that administration hell is a real place, where most of you will go because of things like this.
You probably have installed packages from outside the main repositories. If those were source packages, your package manager will most likely complain and then terminate.
Another point are optional dependencies. I once installed avidemux and dvdrip in Ubuntu. It took another 20 packages to make those two programs do what I want, as all those extra packages were optional. There was no hint, I figured this out by myself. The next time I wanted to set up those programs, I would have been standing in the dark without my documentation, as those packages had partially obscure names.
This is why I keep a list of all package names I feed the package manager with. I keep a second list (more or less) of all non-repo packages I manually install (AUR, PPA) and I also take down where it's from. This sounds like a lot of work, I'll explain methods later.
1.3 Changes to System Files
It's kind of obvious that changes to files in /etc should be documented. There are so many files and so many things that can go wrong if you add or remove something. If you have to set a special flag in "/etc/xorg/xorg.conf.d/1001-rubbertoy" to save your marriage and that brat of cousin-son rm's it, you'll start a riot in the whole village.
I generally take notes of everything I change in /etc. I also write down when I create symlinks that might interfere with the package managing (some cowboy tricks like symlinking exo-open to xdg-open, because they both suck).
I generally write down every change to files in these folders:
/usr
/etc
/boot
/opt
/lib*If a GUI or a tool does the change, make sure you take a note (like adduser basically fiddling with /etc/{shadow,groups,sudoers}.
1.4 Non-persistant Changes
Did you manually mount something that will be lost after a reboot? Take a note, you'll be happy to know that next time you reboot and your backup script fails, because you didn't add it to the fstab (for whatever reasons).
1.5 DON'T SAVE SENSITIVE DATA
Passwords and the likes have no place in a general documentation file!
1.6 Whatever you tend to forget
It's your computer. Whatever you think you should write down... do it! After reading this document, you should be able to decide on your own, though (especially section 5. will be your friend).
2. Types
The best way to keep a good documentation, is neither the shortest or the most detailed, it's the one that suits your needs. I will demonstrate different methods and explain the benefits and disadvantages.
2.1 File-centric
The file-centric documentation has a list of files being changed in whatever process. The list of changes is documented, as well as the reasons for those changes. A time stamp is never a bad idea either.
EXAMPLE:
/etc/rc.conf
20111112.1520
1. Added hwclock to the DAEMONS=() array, because localtime is now deprecated. See NEWSITEM 123
20111130.1530
2. Removed networkmanager from the DAEMONS=() array and added wicd/etc/xorg/xorg.conf.d/1001-rubbertoy
20111112.1630
Added Section:
Section "Climax"
Option "DOIT"
Section End
# Cousin-Son rm'd the file, now the cows are nervous.
20111112.1640
Added to Section "Climax"
Option "Autoincrease"Advantages: You can rebuild all the changes to all config files very fast. If you stick with a certain syntax, you might even be able to automate the process. More about that later.
Disadvantages: You need to know which file goes with what task.
2.2 Task-centric
The task-centric documentation lists different concepts, thoughts and, well, tasks in an abstract way.
EXAMPLE:
#automatic display standby#
activate DMPS
#/etc/X11/xorg.conf.d/20-nvidia.conf#
Section "Device"
...
Option "DPMS" "True"
...
EndSection
#manual display standby
> $ xset dpms force offTimestamps here would be nice as well.
Advantages: You don't need to know what files you have to change and you can describe abstract concepts without cluttering it over different entries.
Disadvantage: If you restore your system from a crash, you'll end up jumping forth and backwards through all the config files, instead of changing one file at a time.
2.3 Timestamp-centric
The timestamp-centric documentation allows you to identify corresponding actions at a certain time. It's very useful if you have to give your boss some sort of an activity log or if information becomes deprecated after a certain time. I think I don't need an example, you all know how a diary works.
Most system generated logfiles look like this, as it usually is important to know the order in which entities occured, that have not been created by yourself.
The disadvantage of this type is that you can't exrapolate the final state of an entity. You'll have to repeat all the steps chronologically (or extrapolate a prototype), to get to the final result.
Advantage: Every date you look at, is a fine snapshot of a past state of the system. It comes in handy when debugging.
3. Methods
3.1 One big file
If you only use a single file to save all your system changes.
Advantage: "grep whatever logfile" every time you need to know something. It also reduces the clutter of having many files. A text file can be read even if you don't have X installed, simply cat, grep, vim it. If you're clever, you can turn your logfile into a script :-)
Disadvantage:
- *scroll* *scroll* *scroll* *scroll* Oh here it is! Oh, it's not... *scroll* *scroll*
- You can't add links and anchors (unless you use something like HTML or XML)
- 3.2 Folders with files
By creating one file per topic and by adhering to a sane naming convention, you can create a set of log files similar to the files found in /var/log.
Advantage: You will not really care about a table of contents, as a sane folder structure will be very tidy.
Disadvantage: Searching for specific information will require some scripting. This method is also very strong in making you lazy, you might end up throwing files into the folder and "sort them later". "Do X later" is actually the first branding you'll get on your ass in "administion hell".
3.3 Database
You can use a database (any SQL or even dbase DB will do the trick) to save your knowledge.
Advantages: You don't need to stick with a certain type (see 2. Types), you just make sure everything you enter is in the right field. You can create output and even rearrange the context.
Disadvantage:
- Output is complicated, as you need to know how to formulate a search query.
- You need special software to input and read your data. This is especially painful if you have to chroot into your system from a live environment. Do this only, if you keep your documentation on a seperate system, or if the data is not crucial for the base system operation.
3.4 Table of Contents
Keep your own ToC. Make sure titles in the ToC correspond titles in the text, or you'll end up reading instead of grepping. If you use a folder+files oriented method, you might or might not need a ToC.
3.5 Categories
Make sure you don't just make a list of entities, sort them a little. Make it count!
Base
- stuff like fstab, shadow, menu.lst
- Changed an HDD
- switched from sysV to systemd
- ...
Package management
- installed AUR helper
- installed checkinstall
- changed dpkg log path
- ...
Multimedia
- added xray filter mplayer
- installed audacious3.6 Syntax
Create a special syntax for your documentation. Use certain strings for certain meanings. I have my own complete syntax, but I won't share it here, because a) you should use what works best for you (not for me) and b) it contains more problems and pitfalls, than it solves problems :-D
EXAMPLES
#Timestamps#
20110320.1530 ← This is a timestamp. You'll be able to use regular expression to search for it, like [0-9]{7}\.[0-9]{3}.
#T#20110320.1530 ← This is also a timestamp. You'll be able to use simple search or grep without regex to find all instances of #T#, which is uncommon, besides your new use as a timestamp.
## Multimedia ← a category
# Play dirty videos ← title of an abstract problem
- $mplayer dirti.vid ← what actually happend on the system
# Cascaded
## Multimedia
# Play dirty videos #T#20110320.1530
- $echo A timestamp is added to the name of the enitity, grepping for the name or the timestamp will give you both/EXAMPLES
Make sure you stick with your own syntax. If you ever want to change the syntax (to add flexibility), you can replace/sed existing entries (this is where most people will dive into regex eventually).
4. Tools
All you need is a text editor, but the more complex your log is, the more you might need help
4.1 (e)grep
If you don't use a GUI text editor like gedit or geany, you might want to search your logfiles in a terminal. Learn how to use grep. → man grep
4.2 cat
While looking at a wall of text can be boring, cat -n gives you line numbering. Just pipe something to cat -n.
4.3 sed and awk
If you stick with a certain syntax, you can extract package lists and changes to some /etc files with a script. sed and awk will be your friends, unless you're willing to learn perl or python. Remember: Your logfile might be your hero if your system crashes and you find yourself in runlevel 1, trying to fix it. sed and awk should be on every Linux desktop, so at least learn the basics.
4.4 vim or emacs
In a minimal configuration, you'll need a versatile text editor to read and change your documentation. Both bring in a lot of little helpers, you can even configure the syntax highlighting to know your own log syntax (might be overkill on the average desktop).
4.5 Zim
Zim is a GUI documentation system. It's basically an implementation of "Folders with files" (see 3.2), coming with all kinds of data types (multimedia of all sorts), it's own kind of syntax and even a calendar (see 2.3 Timestamp-centric).
4.6 Be creative!
Write your own scripts, find your own tools. I only mentioned what I know and what I like. It's your computer, don't forget that.
5. Pitfalls
No matter how smart you think you are, you'll end up wondering what happened at some point.
5.1 Not enough data
Like I mentioned earlier, if you only take down minimal information (like a list of all installed packages), you'll end up having to find out where that information belongs. Saying that you have to "install the driver for the wifi device" might not be enough, given the fact that it's not in the official repositories, two of the 5 possibilities are depracated and two of the other three won't work with your RT kernel. Save yourself some time and write down the name of the driver.
5.2 Too much information
Keep your diary seperate from your system documentation. You don't want to read about your favorite feline friends digestion, while trying to recover your raid array.
5.3 Flat syntax
Let's say you kept one system over 5 years, no fresh install in half a decade (a not so uncommon Arch and Debian phenomenon, happens to servers every day). Your lofgile should have several ten thousand lines. Now your boss asks you to migrate the system from Suse 7 to a recent RHEL. What you need is scripts now. If they can't tell a package name apart from information about your cat's digestion or a timestamp, you'll end up in "administration hell".
5.4 Too complex
I once kept a log in Zim. Since Zim is so versatile, I tried to use every single feature. I had a package list, I had abstract categories, I had a calendar. Every time I installed a package, I created an abstract entry and linked it back to the calendar and the package list. I was amazed how cool the system was, so I added complexity. In the end, adding a package took me 4 seconds in a terminal window, calling pacman and 2 minutes in Zim, documenting my work. This might be okay if you live alone with a cat, no friends and your door blocked by ten metric tons of rocks, but it's unacceptable as a professional. First it kept me from making any changes to my system at all (OMFG! I'LL HAVE TO DOCUMENT THAT! Time for a coffee...), then I simply stopped taking any notes. After two months of not taking any notes, my system looked like the "seventh plain of torments" (an advanced level of "administration hell"), was b0rked, broke on an update (damn you, dbus!) and was broken beyond repair - in a responsible time I mean.
A rule of thumb: If you need a documentation file for your documentation file, you're doing it wrong.
Also: If you're documentation looks like C-code, you're doing it wrong.
Also++: If you can chmod +x your documentation and bash can parse it, it's not a documentation, it's a post install script.
More: If your inode table is full, it's too many files...
/doku
That's it. Don't be that guy. Be cool. Take notes and create backups.
I'm so meta, even this acronym
Offline
awesome. really, thanks a lot for this. can't believe you just wrote down all of that from (seemingly) the top of your head. just read through it all and will have to again sometime when i am actually setting something like this up. even though i am only administrating my own little system, this will be very useful sorting out the complicated mess that Linux can be sometimes.
you might want to read it through yourself and make some corrections in grammar and unclear sentences here and there, but that is no surprise as you just typed this out in one go. overall, it is very good.
Offline
Yeah, you're right, typing such a wall of text in a small edit field of a message board leads to erros. I'll go over it later, now I'll be busy installing my new sound system 
I'm so meta, even this acronym
Offline
Awebb, this is really excellent stuff!
Ever thought of publishing your thoughts to a wider audience? A blog doesn't necessarily have to be updated twice a day. Just when the mood grabbed you...
John
--------------------
( a boring Japan blog , and idle twitterings )
Offline
can't believe you just wrote down all of that from (seemingly) the top of your head.
Of course, this begs the philosophical question as to whether Awebb made documentation for the documentation on documenting documentation, but I fear we would open a dangerous hole there.

Great work Awebb, we should all follow these lessons! I do this thing haphazardly, like cp-ing /etc to my 10 GB bakcup partition when I use fsarchiver, for something like a dual backup solution. It mostly works well, but I have run into situation where I actually had no idea what I had to restore from backup on a fresh install, because some /etc/init.d file I changed months earlier was completely lost in my mind, and doing a total cp of /etc to the new system just wasn't an option anymore, due to system changes and differences.
Offline
Ever thought of publishing your thoughts to a wider audience?
Not really. I find blogs a bit authistic, I prefer communities like this one.
I'm so meta, even this acronym
Offline
Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.