GNU+Linux command memo

RipGrep (rg) faster than rgrep

Alternatives à grep et find plus simples et plus rapides.

1. Introduction

1.1. version : en

The post of the week will please Alain Leaper, the author of the Memos.

Because it’s about something new in the command line (CLI) universe, a rare event as Alain quotes, as an argument to interest people to the command line interface, that a major advantage of the Unix command line ecosystem is its stability (since already more than 40 years).

"It’s worth learning it, as it has not been superseded since all this time". A bit like the sharks which are not evolving since million years because they are at an evolution top.

But, with some improvments in the Linux kernel, and the the apparition of the Rust language, the will to re-inspect the performances of Unix commands went back to some developpers whom offer us new commands !

I already made a few lists of them :

This week, let’s take a look at RipGrep, which reveal itself 8x faster than the common GNU rgrep (one of the most useful commands when you mess with configuration files or when to write programs).

1.2. version : fr

Le billet de la semaine sur fera plaisir à Alain Leaper, l’auteur des mémos.

En effet, il s’agit d’une nouveauté dans l’univers de la ligne de commande, chose relativement rare si l’on en croit Alain qui prend comme argument, pour s’intéresser à la ligne de commande, la stabilité (depuis près de 40 ans) de l’éco-système des commandes Unix.

"Ça vaut donc le coup de se pencher dessus, car c’est stable et inégalé depuis tout ce temps", un peu comme le requin qui n’évolue plus depuis des millions d’années car il est à un sommet de l’évolution… (surtout le requin marteau)

Toutefois, suite à certaines améliorations du noyau Linux, puis avec l’apparition du langage Rust, l’envie de se repencher sur les performances des commandes Unix est revenue chez des développeurs, qui nous gratifient du coup de nouvelles commandes !

J’ai déjà dressé deux petites listes sur le sujet :

Cette semaine, zoom sur RipGrep, qui se révèle jusqu’à 8x plus rapide que rgrep (l’une des commandes les plus utiles quand on cherche à trifouiller des fichiers de configuration, ou quand on fait de la programmation).

2. rg (8x faster) and ag (4x faster) alternatives to rgrep

$ rgrep $needle (1)
1 without common aliases, a full version would be : grep -r -i -n --color
$ ag $needle (1)
$ rg $needle (2)
1 ag is in the Debian package ag-silversearch
2 rg is in the Debian package ripgrep
$ time rgrep -n --color import /usr
25,22s user 15,68s system 35% cpu 1:55,44 total
$ time ag import /usr
5,96s user 13,89s system 66% cpu 29,933 total (1)
1 here ag is 4x faster than rgrep.
$ time rg import /usr
2,94s user 7,32s system 63% cpu 16,158 total (1)
1 here rg is 8x faster than rgrep

3. rg and ag alternatives to egrep to test regex

$ echo "texxxt" | egrep --color "xx*"

→ 1:texxxt

$ echo "texxxt" | ag "xx*" (1)
1 the --numbers option would print the line numbers of the matchs

→ texxxt

$ echo "texxxt" | rg "xx*" (1)
1 the -n option would print the line numbers of the matchs

→ texxxt

$ time py -c "import os; [os.popen('echo \'texxxt\' | egrep --color \'xx*\'').read() for _ in range(1000)]"
1,97s user 0,66s system 120% cpu 2,179s total
time py -c "import os; [os.popen('echo \'texxxt\' | ag \'xx*\'').read() for _ in range(1000)]"
3,12s user 0,87s system 113% cpu 3,523 total
$ time py -c "import os; [os.popen('echo \'texxxt\' | rg \'xx*\'').read() for _ in range(1000)]"
2,99s user 1,16s system 112% cpu 3,679 total

Here the traditional grep is the best, being 38% faster.

4. rg (2,5x faster) and ag alternatives to find

$ find . -name "*.jpg" (1)
$ find . -regextype posix-extended -iregex '.*jpe?g' (2)
1 time find . -name "*.jpg" | wc -l : 13071 found files ; 0,154s total (1,12e-05s / file)
2 19275 found files ; 0,190s total (0,985e-05s / file)
$ ag -g ".*jpe?g" (1)
1 time ag -g ".*jpe?g" | wc -l : 19449 found files ; 0,194s total (0,997e-05s / file)
$ rg --files --iglob "*jp*g"  (1) (2)
1 time rg --files --iglob "*jp*g" | wc -l : 19448 found files ; 0,088s total (0,452e-05s / file)
2 Unfortunately, rg uses git "glob" patterns instead of regular expression to restrict file to search in, based on file names.

ag and rg find 174 (and 173) more files, because of badly encoded file names such as �t� 2002 006.jpg that find avoided.

ag takes 89% of the time used by find.

rg takes 40% of the time used by find, being 2,5x faster.

5. rg and ag alternatives to find / xargs

$ find . -regextype posix-extended -iregex '.*jpe?g' -print0 | xargs -0 -n 1 -P 4 jpegoptim -pt
$ ag -g ".*jpe?g" -0 | xargs -0 -n 1 -P 4 jpegoptim -pt
$ rg --files --iglob "*jp*g" -0 | xargs -0 -n 1 -P 4 jpegoptim -pt
$ rg . -i -g "*wav" --pre flac  (1) (2)
1 rg is multitasking by default
2 --pre only accepts executable names, to pass arguments, one must put the command in a shell script : rg . -i -g "*wav" --pre (with the line flac -d "$1" in an executable

6. rg (6x faster) and ag (2x faster) alternatives to find / xargs / grep

$ find /usr -name "*.py" -print0 | xargs -0 grep -i -n --color $needle (1)
$ find /usr -regextype posix-extended -iregex '.*\.py$' -print0 | xargs -0 grep -i -n --color $needle (2)
1 If the needle is the string "import" : time … | wc -l → 94809 ; 14,168s total (14,94e-05s / file)
2 → 94809 ; 14,994s total (15,81e-05 / file)
$ ag --py $needle (1)
$ ag -G ".*\.py$" $needle (2)
1 time ag --py import /usr | wc -l : 94026 ; 7,376s total (7,84e-05s / file)
2 94026 ; 7,459s total (7.93e-05 / file)
$ rg -t py $needle (1)
$ rg -g "*.py"  $needle (2)
$ rg -n -g "*.py" -i $needle (3)
1 time rg -t py import /usr | wc -l : 88900 ; 2,595s total (2,92e-05s / file)
2 88900 ; 2,566s total (2,88e-05s / file)
3 94026 ; 2,503s total (2,66e-05s / file)

Here ag is roughly 2x faster than find / xargs / grep. The difference in the number of found files is due to the fact that ag did not explore links.

Here rg is 3x faster than ag so 6x faster than find / xargs / grep. The difference in the number of found files is due to the fact that rg is case sensitive by default. If rg is invoked with the -i flag, it becomes case insensitive and then finds 94026 occurences likes ag. I also added the -n flag to get line numbers in rg output in order to get a fairer speed comparison and an easier output comparison.

7. Information about the test protocol

Each measure have been run several times to verify that the results were stable, but a single example have been picked up here.

The grep and find used here are the default Debian Stable 9.6 ones (from grep and findutils packages) so grep (GNU grep) 2.27 and find (GNU findutils) 4.7.0-git.

The file system used for the tests is Ext4.

More info about the machine used for the test here : Memo_10.