Stream text processing

From PedrosBrainDump

cat

Reads a text file

Some useful options:

  • -n show line numbers
  • -b show non blank line numbers
  • -s split multiple break lines into single break line
  • -A shows special chars

tac (the opposite of cat)

Reads a text file from the last line to the first

head

Reads the first 10 lines of the file (some distros uses 5 by default)

Some useful options:

  • -n<number> shows this number of lines
  • -<number> shows this number of lines
  • -c<number> show this number of bytes

tail

Reads the last 10 lines of the file (some distros uses 5 by default)

Some useful options:

  • -n<number> shows this number of lines
  • -<number> shows this number of lines
  • -c<number> show this number of bytes
  • -f will read the last lines and keep watching for new inputs

less

Reads a file by paginating it, you can navigate with the arrows and pg up/pg down.

By typing / you can search a pattern, with n you go to the next occurrence and with tipping N (or p) it will go to the past occurrence.

wc

Reads the amount of lines, words and chars. Some useful options:

  • -l shows the amount of lines
  • -c or --bytes shows the amount of bytes
  • -m or --chars shows the amount of chars

nl

Similar to the cat -b, will enumerate file lines and not counting empty lines.

sort

Will sort the lines by alphabetical order. Some useful options:

  • -r will reverse the sort output
  • -k<number> will sort by the second field (after the first space/tab)

uniq

Shows unique occurrences of the lines removing lines that already have been shown in sequence (if two lines have the same content but the lines aren't one before the other it will show again). Some useful options:

  • -d shows only the duplicate lines
  • -c count how many times the line had occured

od (octal dump)

Reads the file as octal Some useful options:

  • -tx or -t x2 shows in hexadecimal

join

Bond to files using an index thats by default the first field.

(e.g.)

sh-5.2$ cat users 
1 user1
2 user2
3 user3
sh-5.2$ cat rating 
1 4.9
2 4.7
3 4.5
sh-5.2$ join users rating 
1 user1 4.9
2 user2 4.7
3 user3 4.5

Some useful options:

  • -j<number> specifies what field will be the index field

paste

Will bond each line of each file together.

(e.g.)

sh-5.2$ cat users 
1 user1
2 user2
3 user3
sh-5.2$ cat rating 
1 4.9
2 4.7
3 4.5
sh-5.2$ paste users rating 
1 user1	1 4.9
2 user2	2 4.7
3 user3	3 4.5

split

Splits a file into multiple files by default split in 1000 lines files like xaa, xab xac and etc... Some useful options:

  • -l<number> or -<number> how much lines will be the output files.
  • b<number> how much bytes will be the output files.

(e.g.)

sh-5.2# ls -l
total 56
-rw-r--r-- 1 root root 56827 Aug  6 19:33 test_file
sh-5.2# wc -l test_file 
591 test_file
sh-5.2# split -l300 test_file 
sh-5.2# ls -l
total 116
-rw-r--r-- 1 root root 56827 Aug  6 19:33 test_file
-rw-r--r-- 1 root root 27759 Aug  6 19:37 xaa
-rw-r--r-- 1 root root 29068 Aug  6 19:37 xab
sh-5.2# split -l300 test_file renamed_output_
sh-5.2# ls -l
total 176
-rw-r--r-- 1 root root 27759 Aug  6 19:37 renamed_output_aa
-rw-r--r-- 1 root root 29068 Aug  6 19:37 renamed_output_ab
-rw-r--r-- 1 root root 56827 Aug  6 19:33 test_file
-rw-r--r-- 1 root root 27759 Aug  6 19:37 xaa
-rw-r--r-- 1 root root 29068 Aug  6 19:37 xab

tr

Will override characters from inputs with other characters, will only read from stdin.

(e.g.)

sh-5.2$ cat users 
1 user1
2 user2
3 user3
sh-5.2$ cat users | tr [:lower:] [:upper:]
1 USER1
2 USER2
3 USER3
sh-5.2$ cat users | tr ' ' '_'
1_user1
2_user2
3_user3
sh-5.2$ cat users | tr e J
1 usJr1
2 usJr2
3 usJr3
sh-5.2$ cat users | tr -d u
1 ser1
2 ser2
3 ser3

Some useful options:

  • -d Will delete the occurrence.

cut

sed