Searching vs Filing

Nov 28, 2021 · 1725 words · 9 minute read

Search, and ye shall find? or teach a man to file, and he can find files by himself for a lifetime?

Apparently kids these days can’t find anything for themselves in directories anymore, and worse, don’t even know what a directory is, and searching is to blame. But what’s the right answer?

It’s both, and it’s always been both, but the two methods of navigating and finding files can appear to have a large overlap, even if, depending on the situation there is a clear advantage to one method. It also takes more mental effort to switch ‘modes’ than it does to stretch your current way of doing things, even if that’s way past the point of it being the best way.

Before moving on it’s worth remembering that a directory structure, and putting the file in a specific ‘location’ is an abstraction created by the computer and operating system. Assuming you are putting the files on the same physical volume, where within that volume they go has no relation to where in the directory structure they sit. In that sense the hard drive is much more like one bucket in which everything goes, with the file system and operating system creating the illusion that it’s all neatly organised into folders. So if you left folders behind and just dumped everything into one place, and then searched for it, it would just be another abstraction, no better or worse from the computer’s perspective.

Searching and search history

Searching for files, based on their name a contents is not new. On UNIX like systems find has existed since around 1971 , which allows you to search for files based on their names and other properties1. grep has existed since 1973, and allows for searching text content using regular expressions.

Both of these methods have limitations, the first being that they are relatively slow. Normally you would only use them on a specific directory, and not on a whole computer, as they search each file they find. Additionally grep only works with plain text files, and can’t search the contents of binary files like spreadsheets or PDFs.

Speed can be improved by having a pre-built database of the all the files, which is how locate (since 1982) works. More modern desktop search implementations, like MacOS’s Spotlight, work in a similar way. They build and maintain a database of files and their properties, keywords etc. Searching this database is much faster than looking at all the files on the disk. These properties can be indexed quietly in the background.

When people talk about searching now, they normally think of something like a web search, but file search isn’t the same.

The key difference is that web search is normally for something where you’re willing to accept a range of answers, and there’s fuzziness around if the exact thing you are searching for even exists. If you search ‘chocolate cake recipe’ - you’ll probably accept a range of answers. If you’re looking for a specific file, you will only accept one answer. With web search, if the thing you’re searching for exists, but the search results don’t show it, you don’t even know you’re missing it, if you’re hunting for a specific file, you know it exists, even if the search results don’t show it.

Benefits of file structure

So perhaps organising things in a directory structure and a hierarchy seems like a lot of effort, even if it’s time you’ll save later when looking. But you gets lots of advantages from doing so. You can still search it later if you really want, and it’ll be easier to search.

A file hierarchy does require you to develop some kind of taxonomy, but it’s much much less critical which way the files are organised, just that they are. One limitation is you can only split at each level, so depending on what groupings are important to you that’s certainly a consideration, but in many cases the hierarchy could be regarded as a series of tags. The following are equivalent in most situations: ~/documents/$my_address/bills/phone/september 2021.pdf or ~/documents/bills/phone/$my_address/september 2021.pdf. You could almost regard each level as a tag, and on the way you are creating groups. In a more search focused world you might end up naming the file $my_address phone bill september 2021.pdf with a very similar effect, except you loose the grouping advantage.

You can also group files together this way which share almost no other similarities, e.g. are different types and don’t have anything common in their name. If you rely on search results, these get very convoluted very quickly if you’re trying to see more than one type with different names at the same time.

These levels also give you context, you can see related files and information you might have forgotten about, or couldn’t remember the name of. If you can find yourself to the relevant folder, you don’t have to burden your memory with remembering all the different fragments of file names to later search for.

Along with the context you get another key advantage, and that is discoverability. This is especially true in shared environments. You can find files and folders relevant to what you want without someone telling you what they are all called. You can also find related files on the way; looking for “budget 2022.xlsx”? if it’s in a project folder for budgets you might see “budget 2021.xslx”, which could be helpful. Otherwise you wouldn’t be aware that it even existed2.

In shared environments having some folder structure might be essential for access rights, but it also forces users to mostly do some kind of sorting. If a file has to go somewhere hopefully it’ll go into vaguely the right location. As above the folders can act a bit like enforced tags. Even if you replaced folders with tags (like Gmail) they would always be optional.

If you’re working in a multi-lingual shared environment, then searching isn’t going to get you very far either, in almost no situation currently is a search term going to be correctly translated into multiple languages with the same meaning and find similar files.

Folder structures are also very portable. You can zip them up and move them to another environment. Relying on different search tools will lead to different results in different places. Google Drive search and Apple Spotlight don’t give the same results in most cases.

Lastly, if you feel searching is quicker because you don’t have to keep moving a mouse around and clicking on folders; then autocompletion on the command line will be a revelation. This is a much easier and faster way to more through files than constantly clicking on folder symbols.

There is some argument that something like tags or collections allow the same file to exist in multiple contexts at the same time, and don’t force you to choose. If this is a very important feature then links (shortcuts) can be used from one possible location to another, or if you’re hard linking in Linux, then both locations have equal weight and validity.

Then Why Search for files?

So, if putting things in files is so amazing for all the reasons listed above, why are people searching for them?

Here I can think of two main reasons. Firstly, they’ve never had the need to, search has always been ‘good enough’. Especially if you don’t work in a shared environment, you don’t deal with files over a long period of time and you’re the only one editing and naming them; then you can search. Under these circumstances you only need to remember a small number of files at any one time, and none that you don’t know of are going to be created. Additionally a ‘recent files’ collection may be all that you need. Until you reach university or a place of work, it’s unlikely that you’re going to need to go back more than a year of schoolwork to find something relevant.

The second is that if you spend most of your time in some proprietary application, be it Facebook, Twitter, Instagram or whatever, you don’t have the choice. Search is the only option given to you to find anything3, and if you don’t have an alternative, you can’t learn the alternative. Here it may also be very difficult, especially with overlapping privacy and access restrictions, to give you have coherent directory structure of things to look through.

For finding things on the web, and other unstructured situations where you can accept a range of answers, search is great.

File search is a bit more limited, but in some cases it can be the fastest or best way:

  • The file you’re looking for has an easy to remember and almost unique name. If I want to listen to a particular song in my music collection ⌘ + Space and then start typing the song name or artists is often fastest.
  • You want to find files based on a property not covered by you taxonomy, e.g. size, type, date modified.
  • You know keywords within a file, but not it’s name or location.
  • You’re stuck in an application where you don’t have the choice.

Conclusion

It’s no huge revelation, but spending a little time on understanding how you do something, and why you do it that way can help make your life easier. Even if you don’t change it. I certainly think it’s much clearer to me when to search, when to walk through the file system, and also why I’m so disappointed with the results file system searches, like Google Drive, give me.


  1. find can do much more, like launching other programs to work through the results of the search, but that’s a different topic. ↩︎

  2. This is making the pretty big assumption that your colleagues, family or whomever you’re sharing some collective file system with organise things in some vaguely sane way. Hell is other people’s files. ↩︎

  3. If that even works. In my experience the search in Facebook is so laughably bad it almost seems intentional. Try searching for a link in a comment on a friend’s post. You get endless suggestions of new people with similar names to befriend, but can’t just search their activity. It’s so bad, it almost feels as if it’s done on purpose, perhaps to stop people easily digging up historic embaressments? ↩︎