Thursday, March 5, 2009

Convert Word .doc files to .html using ASP.NET

Programatically converting Word DOCs to HTML

This article describes how to use ASP.NET 2 to convert documents in Word .doc format into .html documents. This is done using the built-in features of MS Word, via the COM object.

The reason for doing this was as follows: I wanted to allow users to upload files to my Intranet, through their browser, and make them available for other people to look at. But if they uploaded Word documents, only people with Word installed would be able to view them, causing problems for Mac & Linux users. So, I wanted to get my server to convert the .doc file into a .html file automatically, at the point when the file is uploaded. There was no way that I was going to reverse-engineer a Word doc and figure out how to convert it into html, so instead I used the built-in facility inside MS Word that does this for you. If you give it a Word doc, it will save a .html file, and a separate folder with all the necessary images in it, all linked properly to the html file. Yes, I admit it is an html file full of weird codes, but it does work, in fact very nicely.

How to do it

The first step is that you must have MS Word installed on the server where this ASP.NET page is going to be running. You then add a reference to your ASP.NET project, telling Visual Studio where to find the vital Word library. To do this:

  1. In Solution Explorer, right-click on your project root and select "Add Reference".
  2. Go to the COM tab and find Microsoft Word 11 Object Library.
  3. Click on it and then click OK.

Once you have done this, you will be able to use the "Word" namespace in your project.

To test it, make a sample webpage, perhaps called test.aspx, and put a FileUpload, a Button and a Label on it. The FileUpload component is used to upload the file; the Button is clicked to make the process start, and the Label is used to display a success message.

The complete code for the upload routine is here:

protected void Button1_Click(object sender, EventArgs e)
{
if (FileUpload1.HasFile)
{
// When we click Button1, the file we specify is uploaded to a temporary
// folder, then converted into an html document...
string folder_to_save_in = @"c:\temp\documents\";
string filePath = folder_to_save_in + FileUpload1.FileName;
// This bit does the actual file upload:
FileUpload1.SaveAs(filePath);

// Here we set up a WOrd Application...
Word.ApplicationClass wordApplication = new Word.ApplicationClass();

// Opening a Word doc requires many parameters, but we leave most of them blank...
object o_nullobject = System.Reflection.Missing.Value;
object o_filePath = filePath;
Word.Document doc = wordApplication.Documents.Open(ref o_filePath,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);

// Here we save it in html format...
// This assumes it was called "something.doc"
string newfilename = folder_to_save_in + FileUpload1.FileName.Replace(".doc", ".html");
object o_newfilename = newfilename;
object o_format = Word.WdSaveFormat.wdFormatHTML;
object o_encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
object o_endings = Word.WdLineEndingType.wdCRLF;
// Once again, we leave many of the parameters blank.
// See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbawd11/html/womthSaveAs1_HV05213080.asp
// for full list of parameters.
wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,
ref o_nullobject, ref o_endings, ref o_nullobject);

// Report success...
Label1.Text = "Uploaded successfully!";
// Finally, close original...
doc.Close(ref o_nullobject, ref o_nullobject, ref o_nullobject);
}
}

And that is it really. When you browse to a file and click the upload button, the file is uploaded to your server and stored in the temp folder. Then, this doc file is opened, and a SaveAs performed. This saves the new .html file in the same temp folder, with the associated image files in a subfolder with the same name as the .html file, but with _files appended to its name.

4 comments:

  1. Once again great post. You seem to have a good understanding of these themes.When I entering your blog,I felt this . Come on and keep writting your blog will be more attractive. To Your Success!





    Classic Dresses
    Classic Bridesmaid Dresses
    Wedding Dresses with Sleeves

    ReplyDelete
  2. Once again great post. You seem to have a good understanding of these themes.When I entering your blog,I felt this . Come on and keep writting your blog will be more attractive. To Your Success!




    Classic Dresses
    Classic Bridesmaid Dresses
    Wedding Dresses with Sleeves

    ReplyDelete
  3. 陽痿,在醫學上更準確的病名應為“男性勃起功能壯陽藥 壯陽藥品 犀利士 威而鋼 威而鋼哪裡買 犀利士 犀利士 壯陽藥品 壯陽藥障礙”。主要表現為男性長期的或反復、經常存在的陰莖勃起困難,以致陰莖不能充分勃起,難
      一般講,在不同年齡段的成年男性中,至少l/10的壯陽藥 威而鋼 壯陽藥 犀利士 犀利士 犀利士專賣 犀利士哪裡買 犀利士5mg價格 壯陽藥品 犀利士專賣 威而鋼 壯陽藥 威而鋼專賣店 犀利士哪裡買男性或多或少地存在陽痿問題。也許正因為這種疾病較常見,并會給婚姻生活帶來影響,因此許多
    個不應期期間、即使非常有效的強烈犀利士 威而鋼 威而鋼 威而鋼 威而鋼 威而鋼 威而鋼 威而鋼 威而鋼 威而鋼 犀利士 壯陽藥品去哪買 犀利士 犀利士 犀利士 犀利士 犀利士 威而鋼 犀利士性刺激都不會喚起男性陰莖勃起的性反應。不應
    男人面對不喜歡的女人表現出陰莖勃起威而鋼 犀利士哪裡買 壯陽藥品 壯陽藥 威而鋼哪裡買 犀利士專賣 威而鋼 威而鋼哪裡買 威而鋼專賣店 威而鋼藥局無力,一個女人面對不喜歡的男人不能給予有效的

    ReplyDelete