Cleaning up version-controlled directories

By Ghost on Sunday 01 June 2008 18:26
Categories: General tricks, Linux, Views: 1556

When writing LaTeX documents, you often end up with all kinds of redundant files. These files are generated during the compilation of a dvi or a pdf document and can generally be discarded afterwards. Most of my tex documents are under version control and, consequently, it is possible to get a list of the files that are (and are not) under this control. Using a single Bash command, you can abuse subversion to determine which files you want to delete:

rm -i `svn status | awk '/^\?/ {print $2}'`


Perhaps daunting at first, this command can be broken apart into understandable substatements. The first part (rm -i `<command>`) states that the result of the command between the apostrophes should be interpreted by the rm program.

The second part is a pipeline consisting of a svn command and an awk command. The svn command recursively generates a table with the version-controlled status of the directory tree:
A	testjuh.txt
?	data.tex
?	data.aux
M	contents.tex
?	contents.toc
?	contents.toc.old
?	contents.aux
?	randomtext.cpp

It should be noted that the lines starting with a question mark denote the files that are not under version control.

This list is interpreted by awk. Now, here stuff gets interesting: awk is a program that can use regular expressions to parse text input, not unlike a compiler. It is especially optimized for processing tables, where cells are delimited by whitespace and end-of-line characters.

In this case, I have told awk to print the value in the second column (the filename) for every row the regular expression ^\? yields a result. The regular expression matches only table rows that start with a question mark. So this command generates a list of files that are not under version control. In line of the previously given example:
data.tex
data.aux
contents.toc
contents.toc.old
contents.aux
randomtext.cpp


This list will be processed by the rm command, resulting in the deletion of the uncontrolled files.

Beware!
You have probably noted that two, apparently important files show in the list of files that are scheduled for deletion: data.tex and randomtext.cpp. If you forget to put critical files under version control (either by adding to the repository or setting the svn:ignore property), you risk losing them using this command. That's why it is probably best to keep the 'i' switch in the rm statement as a precaution. This will cause rm to prompt for a confirmation for every file to be deleted.

Also note that this command differs from the 'svn revert -R' command. This command reverts the version-controlled directory to it's original state, also removing freshly added files and reverting modified files.

Update: As the commenter below already implies, the command is a bit hard to read. Using the xargs program, the commandline can be simplified a little:

svn status | awk '/^\?/ {print $2}' | xargs rm -i

Volgende: Rabbit's Revenge 08-06
Volgende: Saxion en CAA, losmakelijk verbonden 23-05

Comments


By T.net user GX, Tuesday 03 June 2008 16:43

Or, you do this:

svn status | grep ^? | xargs rm

;)

By T.net user Ghost, Tuesday 03 June 2008 17:02

Nope, I believe that is not correct. Your solution does not only remove the unversioned files, but also files with a filename consisting of one character. Xargs expects its arguments delimited by whitespace (both eol and spaces), resulting in a whole batch of '?', the indicator from the svn status command.

Of course, this is no problem when you do not have files like that, but I think it should be noted that your solution has some side effects ;).

Comment form
(required)
(required, but will not be displayed)
(optional)

Please enter the characters you see in the image below: