3.3. Manipulating files (continued)

3.3.3. Finding files

3.3.3.1. Using shell features

In the example on moving files we already saw how the shell can manipulate multiple files at once. In that example, the shell finds out automatically what the user means by the requirements between the square braces, [ and ]. The shell can substitute ranges of numbers and upper or lower case characters alike. It also substitutes as many characters as you want with an asterisk, and only one character with a question mark.

All sorts of substitutions can be used simultaneously; the shell is very logical about it. The Bash shell, for instance, has no problem with expressions like ls dirname/*/*/*[2-3].

In other shells, the asterisk is commonly used to minimize the efforts of typing: people would enter cd dir* instead of cd directory. In Bash however, this is not necessary because the GNU shell has a feature called file name completion. It means that you can type the first few characters of a command (anywhere) or a file (in the current directory) and if no confusion is possible, the shell will find out what you mean. For example in a directory containing many files, you can check if there are any files beginning with the letter A just by typing ls A and pressing the Tab key twice, rather than pressing Enter. If there is only one file starting with A, this file will be shown as the argument to ls (or any shell command, for that matter) immediately.

3.3.3.2. Which

A very simple way of looking up executable files is using the which command to look in the directories listed in the user's search path. which will not find ordinary files or files located outside the user's search path. The which command is useful when troubleshooting Command not Found problems. In the example below, user tina can't use the acroread program, while her colleague has no troubles whatsoever on the same system. The problem is similar to the PATH problem in the previous part: Tina's colleague tells her that he can see the required program in /opt/acroread/bin, but this directory is not in her path:

tina:~> which acroread
/usr/bin/which: no acroread in (/bin:/usr/bin:/usr/bin/X11)

The problem can be solved by giving the full path to the command to run, or by re-exporting the content of the PATH variable:

tina:~> export PATH=$PATH:/opt/acroread/bin

tina:~> echo $PATH
/bin:/usr/bin:/usr/bin/X11:/opt/acroread/bin

Using the which command with the -a switch will find all instances of the command:

tina:~> which -a make
/usr/bin/make
/usr/bin/X11/make

3.3.3.3. Find and locate

These are the real tools, used when searching other paths beside those listed in the search path. The UNIX find tool, very powerful, uses a somewhat complex syntax. GNU find, however, deals better with the difficult syntax. This command not only allows you to search file names, it can also accept file size, date of last change and other file properties as criteria for a search. The most common use is for finding file names:

find path -name searchstring

This can be interpreted as “Look in all files and subdirectories contained in a given path, and print the names of the files containing the search string in their name” (not in their content).

peter:~> find /boot -name menu.lst
/boot/grub/menu.lst

Another application of find is for searching files of a certain size, as in the example below, where user peter wants to find all files in the current directory or one of its subdirectories, that are bigger than 5 MB:

peter:~> find . -size +5M
psychotic_chaos.mp3

If you dig in the man pages, you will see that find can also perform operations on the found files. A common example is removing files. It is best to first test without the -exec option that the correct files are selected, after that the command can be rerun to delete the selected files. Below, we search for files ending in .tmp:

peter:~>  find . -name "*.tmp" -exec rm {} \;

peter:~>

Optimize! * This command will call on rm as many times as a file answering the requirements is found. In the worst case, this might be thousands or millions of times. This is quite a load on your system. A more realistic way of working would be the use of a pipe (|) and the xargs tool with rm as an argument. This way, the rm command is only called when the command line is full, instead of for every file. See Chapter 5 for more on using I/O redirection to ease everyday tasks.

Later on (in 1999 according to the man pages, after 20 years of find), locate was developed. This program is easier to use, but more restricted than find, since its output is based on a file index database that is updated only once every day. On the other hand, a search in the locate database uses less resources than find and therefore shows the results nearly instantly.

Most Linux distributions use slocate these days, security enhanced locate, the modern version of locate that prevents users from getting output they have no right to read. The files in root's home directory are such an example, these are not normally accessible to the public. A user who wants to find someone who knows about the C-shell may issue the command locate .cshrc, to display all users who have a customized configuration file for the C shell. Supposing the users root and jenny are running C shell, then only the file /home/jenny/.cshrc will be displayed, and not the one in root's home directory. On most systems, locate is a symbolic link to the slocate program:

billy:~> ls -l /usr/bin/locate
lrwxrwxrwx 1 root slocate  7 Oct 28 14:18 /usr/bin/locate -> slocate*

User tina could have used locate (instead of her previous attempt using which) to find the application she wanted:

tina:~> locate acroread
/usr/share/icons/hicolor/16x16/apps/acroread.png
/usr/share/icons/hicolor/32x32/apps/acroread.png
/usr/share/icons/locolor/16x16/apps/acroread.png
/usr/share/icons/locolor/32x32/apps/acroread.png
/usr/local/bin/acroread
/usr/local/Acrobat4/Reader/intellinux/bin/acroread
/usr/local/Acrobat4/bin/acroread

Directories that don't contain the name bin can't contain the program - they don't contain executable files. There are three possibilities left. The file in /usr/local/bin is the one tina would have wanted: it is a link to the shell script that starts the actual program:

tina:~> file /usr/local/bin/acroread
/usr/local/bin/acroread: symbolic link to ../Acrobat4/bin/acroread

tina:~> file /usr/local/Acrobat4/bin/acroread
/usr/local/Acrobat4/bin/acroread: Bourne shell script text executable

tina:~> file /usr/local/Acrobat4/Reader/intellinux/bin/acroread
/usr/local/Acrobat4/Reader/intellinux/bin/acroread: ELF 32-bit LSB 
executable, Intel 80386, version 1, dynamically linked (uses 
shared libs), not stripped

In order to keep the path as short as possible, so the system doesn't have to search too long every time a user wants to execute a command, we add /usr/local/bin to the path and not the other directories, which only contain the binary files of one specific program, while /usr/local/bin contains other useful programs as well. To allow the binaries in the other /usr/local directories to run, symbolic links to them are placed in /usr/local/bin.

Again, a description of the full features of find and locate can be found in the Info pages.

3.3.3.4. The grep command

3.3.3.4.1. General line filtering

A simple but powerful program, grep is used for filtering input lines and returning certain patterns to the output. There are literally thousands of applications for the grep program. In the example below, jerry uses grep to see how he did the thing with find:

jerry:~> grep find .bash_history
find . -name userinfo
man find
find ../ -name common.cfg
  • Search history
    • Also useful in these cases is the search function in bash, activated by pressing Ctrl+R at once, such as in the example where we want to check how we did that last find again:
    • thomas ~> ^R 
      (reverse-i-search)`find': find `/home/thomas` -name *.xml
      
    • Type your search string at the search prompt. The more characters you type, the more restricted the search gets. This reads the command history for this shell session (which is written to .bash_history in your home directory when you quit that session). The most recent occurrence of your search string is shown. If you want to see previous commands containing the same string, type Ctrl+R again.
    • See the Info pages on bash for more.

All UNIXes with just a little bit of decency have an online dictionary. So does Linux. The dictionary is a list of known words in a file named words, located in /usr/share/dict. To quickly check the correct spelling of a word, no graphical application is needed:

william:~> grep pinguin /usr/share/dict/words

william:~> grep penguin /usr/share/dict/words
penguin
penguins

Who is the owner of that home directory next to mine? Hey, there's his telephone number!

lisa:~> grep gdbruyne /etc/passwd
gdbruyne:x:981:981:Guy Debruyne, tel 203234:/home/gdbruyne:/bin/bash

And what was the E-mail address of Arno again?

serge:~/mail> grep -i arno *
sent-mail: To: <Arno.Hintjens@celeb.com>
sent-mail: On Mon, 24 Dec 2001, Arno.Hintjens@celeb.com wrote:

find and locate are often used in combination with grep to define some serious queries. For more information, see Chapter 5 on I/O redirection.

3.3.3.4.2. Special characters

Characters that have a special meaning to the shell have to be escaped. The escape character in Bash is backslash, as in most shells; this takes away the special meaning of the following character. The shell knows about several special characters, among the most common ”/”, ”.”, ”?” and “*”. A full list can be found in the Info pages and documentation for your shell.

For instance, say that you want to display the file * instead of all the files in a directory, you would have to use

less \*

The same goes for filenames containing a space:

cat This\ File

3.3.4. More ways to view file content

3.3.4.1. General

Apart from cat, which really doesn't do much more than sending files to the standard output, there are other tools to view file content.

The easiest way of course would be to use graphical tools instead of command line tools. In the introduction we already saw a glimpse of an office application, OpenOffice. Other examples are the GIMP (start up with gimp from the command line), the GNU Image Manipulation Program; xpdf to view Portable Document Format files (PDF); GhostView (gv) for viewing PostScript files; Mozilla/FireFox, lynx and links (two text mode browsers), Konqueror, Opera and many others for web content; XMMS, CDplay and others for multimedia file content; AbiWord, Gnumeric, KOffice etc. for all kinds of office applications and so on. There are thousands of Linux applications; to list them all would take days.

Instead we keep concentrating on shell- or text-mode applications, which form the basics for all other applications. These commands work best in a text environment on files containing text. When in doubt, check first using the file command.

So let's see what text tools we have that are useful to look inside files.

  • Font problems
    • Plain text tools such as the ones we will now be discussing, often have problems with plain text files because of the font encoding used in those files. Special characters, such as accented alphabetical characters, Chinese characters and other characters from languages using different character sets than the default en_US encoding and so on, are then displayed the wrong way or replaced by unreadable rubbish. These problems are discussed in Section 7.4.

3.3.4.2. less is more

Undoubtedly you will hear someone say this phrase sooner or later when working in a UNIX environment. A little bit of the UNIX history of pagers explains this:

  • First there was cat. Output was streamed in an uncontrollable way.
  • Then there was pg, which may still be found on older UNIXes. This command puts text to the output one page at a time.
  • The more program was a revised version of pg. This command is still available on every Linux system.
  • less is the GNU version of more and has extra features allowing highlighting of search strings, scrolling back etc. The syntax is very simple: less file

More information is located in the Info pages.

You already know about pagers by now, because they are used for viewing the man pages.

3.3.4.3. Head and tail

These two commands display the n first/last lines of a file respectively. To see the last ten commands entered:

tony:~> tail -10 .bash_history 
locate configure | grep bin
man bash
cd
xawtv &
grep usable /usr/share/dict/words 
grep advisable /usr/share/dict/words 
info quota
man quota
echo $PATH
frm

head works similarly showing the first lines in a file. The tail command has a handy feature to continuously show the last lines of a file as the file is changing. This -f option is often used by system administrators to check on log files. For example, you use ”tail -f /var/log/syslog” in a terminal to watch what happens as you plug in a usb device. More information is located in the system documentation files.

3.3.5. Linking files

3.3.5.1. Link types

Since we know more about files and their representation in the file system, understanding links (or shortcuts) is a piece of cake. A link is nothing more than a way of matching two or more file names to the same set of file data. There are two ways to achieve this:

  • Hard link: Associate two or more filenames with the same inode. Hard links share the same data blocks on the hard disk, while they continue to behave as independent files.
    There is an immediate disadvantage: hard links can't span partitions, because inode numbers are unique only within a given partition.
  • Soft link or symbolic link (or for short: symlink): a small file that is a pointer to another file. A symbolic link contains the path to the target file instead of a physical location on the hard disk. Since inodes are not used in this system, soft links can span across partitions.

The two link types behave similarly, but are not the same, as illustrated in the scheme below:

<a name="fig_03_02"></a>

Figure 3-2. Hard and soft link mechanism

Here's another good reference: <a href='http://linuxgazette.net/105/pitcher.html'>Q & A: The difference between hard and soft links</a>

3.3.5.1.1 Soft link Details

Note that removing the target file for a symbolic link makes the link useless.

3.3.5.1.2 Hard link Details

Each regular file is in principle a hard link. Hard links cannot span across partitions, since they refer to inodes, and inode numbers are unique only within a given partition. The number of hard links that exist for a file is displayed by the ls command. The rm command actually removes the hard link, not the file itself. Thus if one hard link to a file is deleted, the others continue to work. Only when the hard link count drops to zero, will the inode itself will be freed.

3.3.5.1.3 Example: Hard links and Soft links

Soft link
stw@laptop:~/LBook$ ln -s file slink
stw@laptop:~/LBook$ ls -l
lrwxrwxrwx 1 stw stw 4 2006-11-08 20:08 slink -> file 
stw@laptop:~/LBook$ ln file hlink
ln: accessing `file': No such file or directory
  • A soft link (“slink”) to “file” is created. Note that this target does not exist yet. Broken “slink” may be red.
  • Attempt to make a hard link (“hlink”) fails since no target (“file”) exists yet.
Hard link
stw@laptop:~/LBook$ echo > file "Hello Linuxbasics.org"
stw@laptop:~/LBook$ ls -l
-rw-r--r-- 1 stw stw 22 2006-11-08 20:12 file
lrwxrwxrwx 1 stw stw  4 2006-11-08 20:08 slink -> file
stw@laptop:~/LBook$ ln file hlink
stw@laptop:~/LBook$ ls -l
-rw-r--r-- 2 stw stw 22 2006-11-08 20:13 file
-rw-r--r-- 2 stw stw 22 2006-11-08 20:13 hlink
lrwxrwxrwx 1 stw stw  4 2006-11-08 20:08 slink -> file
  • The actual data is written to a file named “file”.
  • A hard link named “hlink” is created, pointing to the same data as “file”.
  • The link count displayed by “ls” (the number directly after the permissions) is increased.
  • The color of “slink” may change from red to cyan, showing that it is no longer broken.
Usage
stw@laptop:~/LBook$ cat file
Hello Linuxbasics.org
stw@laptop:~/LBook$ cat hlink
Hello Linuxbasics.org
stw@laptop:~/LBook$ cat slink
Hello Linuxbasics.org
  • All three “locations” act on the same data.
Breaking the soft link
stw@laptop:~/LBook$ rm file
stw@laptop:~/LBook$ ls -l
-rw-r--r-- 1 stw stw 22 2006-11-08 20:13 hlink
lrwxrwxrwx 1 stw stw  4 2006-11-08 20:08 slink -> file
stw@laptop:~/LBook$ cat slink
cat: slink: No such file or directory
stw@laptop:~/LBook$ cat hlink
Hello Linuxbasics.org
  • Deleting the link-target breaks the soft link. Again the link may show as red or blinking red.
  • The hard link is still good.

3.3.5.1.4 User-space Links

It may be argued that there is a third kind of link, the user-space link, which is similar to a shortcut in MS Windows. These are files containing meta-data which can only be interpreted by the graphical file manager. To the kernel and the shell these are just normal files. They may end in a .desktop or .lnk suffix; an example can be found in ~/.gnome-desktop:

[dupont@boulot .gnome-desktop]$ cat La\ Maison\ Dupont
[Desktop Entry]
Encoding=Legacy-Mixed
Name=La Maison Dupont
Type=X-nautilus-home
X-Nautilus-Icon=temp-home
URL=file:///home/dupont

This example is from a KDE desktop:

[lena@venus Desktop]$ cat camera
[Desktop Entry]
Dev=/dev/sda1
FSType=auto
Icon=memory
MountPoint=/mnt/camera
Type=FSDevice
X-KDE-Dynamic-Device=true

Creating this kind of link is easy enough using the features of your graphical environment. Should you need help, your system documentation should be your first resort.

In the next section, we will study the creation of UNIX-style symbolic links using the command line.

3.3.5.2. Creating symbolic links

The symbolic link is particularly interesting for beginning users: they are fairly obvious to see and you don't need to worry about partitions.

The command to make links is ln. In order to create symlinks, you need to use the -s option:

ln -s targetfile linkname

In the example below, user freddy creates a link in a subdirectory of his home directory to a directory on another part of the system:

freddy:~/music> ln -s /opt/mp3/Queen/ Queen

freddy:~/music> ls -l
lrwxrwxrwx  1 freddy  freddy  17 Jan 22 11:07 Queen -> /opt/mp3/Queen

Symbolic links are always very small files, while hard links have the same size as the original file.

The application of symbolic links is widespread. They are often used to save disk space, to make a copy of a file in order to satisfy installation requirements of a new program that expects the file to be in another location, they are used to fix scripts that suddenly have to run in a new environment and can generally save a lot of work. A system admin may decide to move the home directories of the users to a new location, disk2 for instance, but if he wants everything to work like before, like the /etc/passwd file, with a minimum of effort he will create a symlink from /home to the new location /disk2/home.


Prev: Manipulating files
Home
Next: File security

/home/www/LinuxBasics.org/data/pages/course/book/sect_03_03_03.txt · Last modified: 2008/07/20 21:08 (external edit)
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0