Searching a big text file in PowerShell
Where I work, we use simple text files as menus for our web applications. These menus may reference hundreds of jobs per month, and span up to 84 months (7 years) in some cases.
Looking up lines in these files can be very time-consuming. Since I’m writing scripts that have to fit around our processing automation, I usually can’t sort the files (or alter them in any other way, in fact) to make it possible to do some sort of binary search on them.
Doing a linear search might be workable if the lines I wanted were near the beginning of the files, but I usually want the information near the end (since new lines are appended to the end of the file). Also, the file server where the local copy of these files reside has historically been very dodgy, and communicating with it over our LAN has caused problems with normal processing, so I wouldn’t want to add to that load by doing Get-Content in my PowerShell script and passing the data line-by-line to the search logic.
Mind you, that server has been replaced recently, and the new one works much better (I think they upgraded from Windows Server 2003 to Server 2012, skipping right over Server 2008 – which gives you an idea how long we suffered with the previous incarnation).
Still, it’s better not to flog the network with more traffic than necessary, and in any case old habits die hard. So how do we make searching a plain text file across the LAN more efficient?
Well, for starters, I copy the entire file into a variable on my PC, so the whole thing is stored in RAM. We have ordinary desktop PCs, nothing high-end, but just about any PC you’ll find in a business environment these days will have at least 4 GB of RAM. A menu file with hundreds of thousands of lines will have an actual size on disk of just a few MB – so grab the entire file.
$big_menu = Get-Content \\Serv1\jobs\cust37\menufiles\MEMBERS.TXT
Since $big_menu is an array of strings, it’s not out of the question to do a brute-force linear search for one or two lines. If the lines you want are most likely to be at the end of the file, it’s possible to step backwards through an array of strings in a way that isn’t feasible when you’re reading the file line-by-line from disk.
Or if you know that, say, 117 jobs were processed today, and you just want the last 117 lines of the menu, you can create a cut-down version of the menu using the Select-Object cmdlet:
$todays_menu = $big_menu | Select-Object -Last 117
If the line(s) you need might be anywhere in the file, or if you’re trying to determine if a job number that ought to appear only once actually appears more than once, you’ll probably want to create an associative array, also know as a hash.
If you have a PowerShell function that can extract, say, a job number from an individual menu line, you can hash the entire file using just a couple of PowerShell commands:
$big_menu_hash = @{} # create an empty hash
$big_menu | ForEach-Object { $big_menu_hash[Get-JobNumber( $_ )] = $_}
(if there is more than one instance of JobNumber in the file, the later instance(s) will overwrite the entries for the earlier one(s) in $big_menu_hash — you need a bit more logic than this to account for multiple appearances of a job number)
the Out-Gridview cmdlet
Maybe you don’t need to search the entire file. For example, maybe you just want to grab a couple of lines (maybe for testing before running against the entire file). Is there an alternative to opening the file in a text editor and copying the line(s) you want to another text editor window?
If you’re using PowerShell 3.0 or greater, you have access to the Out-Gridview cmdlet with the -PassThru feature. Just pipe the variable to Out-Gridview, and then use Ctrl-Click to select the rows you want:
$selected_menu_lines = $todays_menu | Out-Gridview -PassThru
Of course, you aren’t limited to piping variables to Out-Gridview – you can make a selection from a text file on disk directly:
$selected_menu_lines2 = Get-Content \\serv2\cust42\menu\MENU.TXT | Out-Gridview -PassThru
I hadn’t used it myself before today, and I was just blown away by how well it worked!