SiteMod. Website-wide file-modification tool
24th July, 2018
SiteMod is a Python tool from the AAIMI Project to perform server-wide modifications on web pages. This allows you to modify text and code on hundreds of web pages at a time to deal with unexpected changes to your website.
Let's say you write a lot of articles, and for some topics you like to include helpful links to a specific external website to help your readers. Now imagine that website shuts down, and you have an unknown number of articles with links helpfully pointing to a site that doesn't exist. If you only have a few articles you could easily open each file and change the link to an alternative site. But what if you have hundreds of articles?
Another scenario would be a product re-branding. If you decide to rename a product you feature heavily on your website, it can take hours of work combing through each page, and you are almost certain to miss a few instances.
The first task I set AaimiSiteMod was to modify links on Anth's Computer Cave as part of the transition to HTTPS. The program crawled every page in the Cave and converted every old 'http://anthscomputercave.com' link to 'https://anthscomputercave.com'. Although there were only one or two of these old http hard links on each page, changing these manually on 120 pages would have taken me hours and invited typos and other mistakes. AaimiSiteMod ripped through the hundred-odd pages in a couple of seconds.
Sure, changing 'http' to 'https' on specific links is fairly basic, but you can find and substitute complex strings, even entire blocks of code in some cases, like nav panels and forms.
First, the scary Warning
This is a REALLY DANGEROUS PROGRAM!!
Like any program that automatically modifies files, SiteMod can cause major harm if used incorrectly. If you don't have a full backup of your website, you should create one before using this program. Test SiteMod on a copy of your website files before using it on your actual website.
I can't help you if this program eats your website.
That's the scary stuff out of the way. If you're still game you can download SiteMod here.
Extract the sitemod zip folder and place the file called find_replace.py in the web-root directory on your web-server.
You'll need to configure find_replace.py.
The auto_write variable on line 77 determines whether the program will automatically write the changes to file. By default it is set to no, which means it will prompt for confirmation before applying changes to each page.
You should definitely leave this precaution in place until you have tested your SiteMod configuration. If the first few file mods work, you can then type 'all' at the prompt (without the quotes), instead of 'y', and the program will automatically modify all other remaining files.
Single or bulk mode
You can modify a single file or automatically scan all files on the server. Just like auto write, I recommend running in single run-mode on one file first while you check that the program is making the correct alterations.
You choose the run_mode variable on line 69. Set to 'bulk' it will scan and modify all your files recursively. Set to 'single' it will scan just one file.
For single run-mode you also need to add the full path and filename to the target_file variable on line 71.
You need to tell the program which file-type you wish to modify. I've used it on .html, .txt, .js and .php files so far.
Set the file_type variable on line 37.
There are three operation types.
String mode will find and replace a string on a single line. You can replace a partial string within the line, or the entire line. This mode will replace all instances of the target string.
Multi mode will replace a block of lines, beginning with a target starting string and ending on a target end string. Use this mode when the final line of the block is unique to all other lines in the block, like the block pictured below.
If there are multiple matching strings within the block you'll need to use multi number mode. Consider the block of lines below.
There is a line inside the block identical to the last line we want to remove. If I asked the program to stop on the div line it would stop on the second-last line instead of the last.
Rather than looking for an end string, multi number mode replaces a set number of lines after the target starting string.
Whichever method you choose, you set the operation type on line 67 of sitemod.py.
Set to 'string' for string mode, 'multi_end' for multi line mode and 'multi_number' for multi line by number mode.
Replace a single string or line
We'll start with the most basic operation type and replace a string within a single line. I'll use an example I mentioned earlier, changing 'http' links to 'https'.
I've set the run_mode variable to single and the target_file variable as this webpage. The operation_type variable is set to string.
The target_string variable on line 48 is the exact string you wish to replace if found.
The condition variable on line 50 is a second optional string that, if enabled, must also be in a line before AAIMI will replace strings.
The replacement string on line 45 is the text that will replace the target string. Notice that by default the replacement variable uses tripple quotes. This means you can use multiple lines of text for the replacement.
Using my example of changing all HTTP anthscomputercave.com links to HTTPS, I used "http://anthscomputercave.com" as the target_string variable and "https://anthscomputercave.com" as the replacement_string.
For this example I didn't need to use the condition variable, but in some cases it can really help.
For example, consider you have one phone number for all inquiries from customers, and there are various lines scattered around your web pages like "Call this phone number for support", and "call this same number for a quote", etc. Now imagine you wish to use a new, dedicated phone number for support queries, and the existing phone number just for quotation queries. In that case you would use your old number as the target_string, your new number as the replacement string, and set the condition as "support". This will leave all instances of the existing phone number except those that that have "support" on the same line. The support instances would change to the new number.
Now you have configured your variables and strings, lets take AaimiSiteMod for a spin.
In a terminal, navigate to the root directory of your website and type: python sitemod.py
If you have auto_write variable set to "no", the first time SiteMod finds your target_string inside a page it will prompt you for confirmation before it overwrites the original file. Enter "y" at the prompt to replace the original file.
If your are in bulk run_mode the program will move on to the next file. If you are certain your configuration is correct you can instead enter "all" at the prompt to turn on auto_write and SiteMod will modify all remaining pages with no further input from you.
To abort the modifications for a file, type "n" to skip that file. To cancel all operations, type "q".
Replace multiple line block containing unique end string
In the second example we'll replace an entire block of code.
It would be nice if we could just copy the entire block into the target_string variable like before, but that generally won't work. This is because whichever text and code editor created the original file will have added various hidden characters between lines. Each editor will use different characters, so it is not practical to try to allow for them all. We need to target the first line of the block, then determine the last line.
Notice that the last line of the the code, the div closing tag, is not featured anywhere else in the block, so we can use the standard multi-line operation type. Set the operation_type variable on line 68 to 'multi_end'.
This time instead of setting the target_string variable we set the start_string on line 55 to the first line of our block and the end_string variable on line 57 to the last line of the block.
I've used my three new lines as the replacement_string on line 45.
Note that you need to use a newline character (\n) at the end of the string when you are replacing entire lines, or the following line will append to your last replacement line
This time when I run the program i get 'Found', and the program displays th block of text it thinks it should exclude.
SiteMod has found a start point and the end point and given me the all-clear. I can see it has found the exact block I wanted to replace. This gives me the confidence to press y at the confirmation prompt and overwrite the original file.
Multi line by number
Now we'll use the multi line by number operation_type. The operation_type variable is now 'multi_number'
This time we'll run the program in bulk mode to scan every page on Anth's Computer Cage looking for our target block of lines. I've set the run_mode variable on line 69 to 'bulk'.
As with the multi_end operation, you set the start_string variable to the first line of your block, but this time you don't set an end_string.
Instead you change the second index of the excluded_count array on line 59 to the number of lines you wish to exclude. Here's our block of code with two matching div tags.
There are 6 lines of code so my excluded_count array looks like this: excluded_count = [0, 6].
You could run the program now, but there is one more option you may wish to set. So the program can tell if it is on the right track, you can set a secondary string. This will general be the last unique line in the block. In my case that is the fourth line in the block, the 'more contents' line. I use that for the secondary_string variable on line 62. On line 64 I set the secondary_string_line_num variable to 2. This is the number of lines from the end of the block.
If the program has found the start line and read the designated secondary line number it will compare that line to the secondary string. If they do not match, the operation will abort for that file, and the program will move on to the next file.
Here are my variables for the sample target.
When we run the program this time it will first make a list of every file of the target file-type in the current directory and all subfolders.
If you wish to exclude a subfolder from the operation you can add it to the excluded folders array on line 34. You need to use the full path, including the trailing slash. For example /home/yourName/public_html/folder/
I run the program and within a seconds it found my start_string, hidden amongst 167 webpages. It has check and confirmed the secondary string.
Once again it has displayed the entire block of lines to be replaced so I can see that everything is correct, and type y.
SiteMod tells me it has read 169 pages and found, then modified one page.
Next for SiteMod
That's about it for using this module, but there are more SiteMod modules on the way.
Later this week we release a brand new version of our site search system
Expect to see selective bulk-file-renaming options and a link-checker to rid your website of broken links. Other features will include website-optimization functions, such as minifying code and automatically adjusting images.