| Home | “Changed and unchanged files may be easily separated by sorting by Date modified.” |

Free Download & Tutorial: How to Convert 1000s of HTML Files from ANSI to UTF-8

or Vice Versa in One Fell Swoop! (Version 3)

By Howard Charles Best, March 16, 2009

(LLBest.com, )

Updated: August 30, 2012

On this web page:

A. Introduction

B. Assumptions

C. Get the Free Download

D. Main Features

E. Demo #1: ANSI to UTF-8

F. Demo #2: UTF-8 to ANSI

G. The Tutorial

A. Introduction

The advantages of converting your HTML files from ANSI to UTF-8 are as follows:

1. UTF-8 files are more, “What you see is what you get.” (WYSIWYG). For example, instead of ♪ and ©, you will see and ©.

2. UTF-8 files are more compact. Therefore they take up less space on your local hard drive, upload faster, but more important to your website visitors: they load faster.

The free download contains the following 3 files:

1. _ConvertHTMLfilesFromANSItoUTF8_v3.bat.

2. _ConvertHTMLfilesFromUTF8toANSI_v3.bat.

3. _Test.htm.

_ConvertHTMLfilesFromANSItoUTF8_v3.bat is a Perl program which converts ANSI (or Unicode) format HTML files to the UTF-8 format. In order to accomplish this, 3 types of changes are made to each of the HTML files:

1. The file format is changed from ANSI (or Unicode) to UTF-8 (with signature).

2. The following meta tag is added following the <head> tag:

3. HTML special character codes are converted to the actual Unicode characters. For example, &#9834; is converted to and &copy; is converted to ©. This will be done to all files, even if they were already Unicode or UTF-8 files.

_ConvertHTMLfilesFromUTF8toANSI_v3.bat is a 2nd Perl program which does exactly the opposite of the above 3 types of changes:

1. The file format is changed from UTF-8 (or Unicode) to ANSI.

2. If it exists, the following meta tag is deleted:

3. All Unicode characters, if any are found, are converted to their corresponding HTML ANSI character codes. For example, is converted to &#9834; and © is converted to &copy;.

_Test.htm is an extremely short, simple HTML file used to test the 2 Perl programs.

Caution: Be sure to keep unconverted backup copies of all of the HTML files that you convert. The backups should be in a separate folder, or, better yet, on a different drive.

B. Assumptions

This tutorial assumes the following 3 things:

1. That extensions for known file types are not hidden: a) Click Start. b) Click Computer. c) Click Organize. d) Click Folder and search options…. e) Click the View tab. f. Make sure that Hide extensions for known file types is unchecked.) g. Click OK. h. Close the Computer window.)

2. That you have Notepad2 or a shortcut to it on your desktop.

3. That you have installed Perl: a) Go to http://www.activestate .com/Products/ActivePerl/. b) Download the free version of ActivePerl for Windows. c) Install it, making sure that Add Perl to the PATH environment variable is checked (the default).

C. Get the Free Download

1. Download _HTMLtoUTF8_v3.zip (5.68 KB)

2. Unzip _HTMLtoUTF8_v3.zip to a folder named _HTMLtoUTF8_v3.

D. Main Features

1. Multiple HTML files can be converted in one fell swoop.

2. The files may be a mixture of ANSI, Unicode, and UTF-8 format.

3. Unicode or UTF-8 files can be converted to standard ANSI HTML file format.

4. A log file is created which contains detailed statistics on all of the changes.

5. Changed and unchanged files may be easily separated via sorting by Date modified.

E. Demo #1: ANSI to UTF-8

1. In reference to the _HTMLtoUTF8_v3 folder created above, here is a listing of the contents of _Test.htm:

2. Drag and drop the _Test.htm file onto the Notepad2 icon and click File / Encoding. Then you will see that it is an ANSI file:

Screen capture of Notepad2 showing ANSI file

3. Double click the _ConvertHTMLfilesFromANSItoUTF8_v3.bat file’s icon. If Perl was installed correctly, here’s what you will see:

Screen capture of conversion to UTF-8

A log file containing the following is also created:

_ConvertHTMLfilesFromANSItoUTF8_v3.log

4. Drag and drop the _Test.htm file onto the Notepad2 icon again and again click File / Encoding. Now you will see that it has been converted to a UTF-8 file with signature:

Screen capture of Notepad2 showing UTF-8 file

Here is what the newly created UTF-8 version of the _Test.htm file looks like:

Please note that for this short example, the file has actually gotten larger, but for a normal sized web page, especially if it has lots of special characters, converting it to UTF-8 will make it smaller.

F. Demo #2: UTF-8 to ANSI

1. Assuming that you’ve already completed Demo #1 above, double click the _ConvertHTMLfilesFromUTF8toANSI_v3.bat file’s icon. Here is what you should see:

Screen capture of conversion to ANSI

A log file containing the following is also created:

_ConvertHTMLfilesFromUTF8toANSI_v3.log

Now the _Test.htm file is back to the way it was originally:

2. Again, drag and drop the _Test.htm file onto the Notepad2 icon. Again click File / Encoding, and you will see that it is back to being an ANSI file:

Screen capture of Notepad2 showing ANSI file

G. The Tutorial

To convert a batch of HTML files:

1. Be sure to keep backup copies of all of the files to be converted until you are sure that the conversion was done correctly.

2. Copy the HTML files to be converted to a temporary folder such as C:\Temp2.

3. Copy the appropriate .bat file to that same folder.

4. Double click the .bat file’s icon. If all goes according to plan, all of the HTML files will now be converted automatically!

Note: Near the beginning of

the _ConvertHTMLfilesFromANSItoUTF8_v3.bat and

the _ConvertHTMLfilesFromUTF8toANSI_v3.bat files are the following:

The above will match .htm, .html, .php, .php3, etc. files. It may be changed to suit your needs. For example, if all that you want to modify are .txt files, then it could be changed as follows:

Note: Another way to convert HTML from ANSI to UTF-8 and vice versa, albeit one file at a time, is to do it online using:

http://llbest.com/?P=6x.


| Home | THIS WEB PAGE URL: http://LLBest.com/?P=5h |