More File Handling and bits of XML, CSS and XSLT!


Let us make XML files out of the two text files of Arabic and English Quran which you have seen earlier.

First, we will open the file, read a line, convert a line from this format:

112|1|Say, "He is Allah , [who is] One,
112|2|Allah , the Eternal Refuge.
112|3|He neither begets nor is born,
112|4|Nor is there to Him any equivalent."
to this XML format:
<?xml version="1.0" encoding="utf-8"?>
<quran>
<sura no="112">
<verse no="1">Say, "He is Allah , [who is] One,</verse>
<verse no="2">Allah , the Eternal Refuge.</verse>
<verse no="3">He neither begets nor is born,</verse>
<verse no="4">Nor is there to Him any equivalent."</verse>
</sura>
</quran>
We will chop this bin single Quran file into 114 files each file contains a surah. We will place all these files in a premade folder called qxmlen.Here is the code.
     1  #!/usr/bin/perl
     2  open(FH, 'saheeh.tab') || die "can't open file..\n";
     3  $oldsura="1";
     4  open(OUT,'>qxmlen/1.xml');
     5  print OUT "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
     6  print OUT "<quran>\n";
     7  print OUT "<sura no=\"1\">\n";
     8  while($line=<FH>){
     9   ($sura,$v,$aya) = split('\|', $line);
    10   if ($oldsura ne $sura){
    11      print OUT "</sura>\n</quran>\n";
    12      close (OUT);
    13      open(OUT,">qxmlen/$sura.xml");
    14      print OUT "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
    15      print OUT "<quran>\n";
    16      print OUT "<sura no=\"$sura\">\n";
    17      $oldsura = $sura;}
    18   chomp($aya);
    19   print OUT "<verse no=\"$v\">";
    20   print OUT "$aya";
    21   print OUT "</verse>\n";
    22  }
    23  print OUT "</sura>\n</quran>";
    24  close (OUT);
    25  close(FH);
Line 4: we are creating a file for writing, indicated by the '>' sign.
Lines 5-7: The first 3 lines of the first file is hard written.
Line 8: we start to loop each line of the file, the line is stored in the scalar $line.
Line 9: We are splitting the line around the '|' delimiter, and store the three components accordingly as $sura, $v, $aya.
Lines 10-17: we will enter this if loop only when we start a new surah, where the old file OUT need to be closed, and a new file need to be created.

In this way we have created 114 XML representation of surahs. You can see them in the folder: http://www.textminingthequran.com/data/qxmlen/. I am not going to create another XML tutorial because the guys in W3Schools did wonderful tutorials. Here I am only going to show you how to present and format your XML output using CSS stylesheet.The details can be found in the W3School tutorial.

If you open a raw XML file, the presentation might not look appealing to human reader, like the picture below:

But now, I am including a CSS stylesheet within the XML file at line 2 below:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="quran.css"?>
<quran>
<sura no="112">
<verse no="1">Say, "He is Allah , [who is] One,</verse>
<verse no="2">Allah , the Eternal Refuge.</verse>
<verse no="3">He neither begets nor is born,</verse>
<verse no="4">Nor is there to Him any equivalent."</verse>
</sura>
</quran>
And this quran.css is as follows:
quran
{
background-color: #ffffff;
width: 100%;
}
sura
{
margin-left: 10;
color: #FF0000;
}
verse
{
display:block;
color: #0000FF;
font-size: 12pt;
}
Again you can go through a CSS tutorial in W3School. Now, the file looks better:

And, let us aim for more control over XML throgh some XSLT transformation, where we gain more control over XML elements and attributes. Following is the XSLT stylesheet store in a file quran.xsl, that presents verses in a tabular form. See the w3school tutorial.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>Quran::Sura No. <xsl:value-of select="quran/sura/@no"/></h2>
  <table border="1">
    <tr bgcolor="#9acd32">
      <th>No</th>
      <th>Verse</th>
    </tr>
    <xsl:for-each select="quran/sura/verse">
    <tr>
      <td><xsl:value-of select="@no"/></td>
      <td><xsl:value-of select="."/></td>
    </tr>
    </xsl:for-each>
  </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>
We include this XSL into the 111.xml file as follows:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="quran.xsl"?>
<quran>
<sura no="111">
<verse no="1">May the hands of Abu Lahab be ruined, and ruined is he.</verse>
<verse no="2">His wealth will not avail him or that which he gained.</verse>
<verse no="3">He will [enter to] burn in a Fire of [blazing] flame</verse>
<verse no="4">And his wife [as well] - the carrier of firewood.</verse>
<verse no="5">Around her neck is a rope of [twisted] fiber.</verse>
</sura>
</quran>
And the result is as follows:

Let us wrap up, so we learned reading and writing to files using perl. We created XML files, and saw how we can view these XML files using CSS and XSL stylesheets, benefitting from some of the tutorials at w3schools.


<<Hello World | Start | A Search program>>
tutorial@textminingthequran.com