Tags Conversion
Tags Conversion is a feature requested by an AncientGreek user. The idea is to be able to convert text within tags to and from any Ancient Greek legacy encoding (such as Beta Code or SPIonic).
A tag may be inserted anywhere in the main body of a document, and it may enclose any number of characters: from just a few characters of a line to a whole page containing several paragraphs.
AncientGreek will ask you to insert the tag that contains the text you want to convert, help you navigate to it and convert it to and from any of its supported Ancient Greek legacy encodings.
Working with HTML code^
If you want to convert text written in HTML, you have to transfer the HTML code to a text document before being able to use this feature.
Generally, no matter what program you use to create or edit a web page, you have to access the HTML code, copy it and paste it in a new LibreOffice / OpenOffice text document. Perform all conversions there and when done copy/paste the HTML code back to the origianl program.
If for example, you are already working on an HTML document within LibreOffice / Open office, you should open the HTML code view. To do that you have to save the document and go to menu "View / HTML source". Then copy the HTML code and paste it on a new text document. Perform any conversions desired within this new text document and copy/paste the code back to the HTML view of your original document.
The quotation mark issue^
When working with HTML code within a LibreOffice / OpenOffice text document, you may encounter the "quotation mark issue".
HTML extensively uses the quotation mark in its tags, so chances are you will do so too. LibreOffice / OpenOffice may (depending on the font used) replace the quotation mark with a number of different characters, such as the left and right double quotation marks, to name the most frequently used.
The problem is that using these substitute characters break HTML code, so your browser will missbehave when trying to render this code.
AncientGreek will provide a solution; a macro that changes back all these substitute characters to the HTML valid quotation mark. This macro will be executed whenever the Tags Conversion dialog preforms a document search, fixing the HTML code before any conversion is made.
In case you want to manual execute it, just to be on the safe side, just go to the "AncientGreek / Legacy Encodings" menu and click on the "Fix quotation mark" entry. This is highly recommended before copying HTML code to be pasted in a HTML editor (LibreOffice / OpenOffice HTML view included), especially if you have edited the code by hand.
AncientGreek Tags Conversion Implementation^
Having said all that about tags and conversions and HTML code and such, we can now start examinig how all this is implemented.
First we will talk about what tags are and how AncientGreek understands and uses them. Then we will present the actual dialog that finds and converts text within tags.
What is a tag?^
People usually think of HTML code when talking about tags. HTML is the basic way of writing a web page. A piece of HTML code would be:
<p class="ancientgreek">ἥκω <b>Διὸς</b> παῖς τήνδε Θηβαίων χθόνα</p>
In this example, text shown in bold black color composes the starting and ending tag of the "P (paragraph) HTML Tag". Both tags (starting and ending) start with a LESS-THAN SIGN ("<") character, and end with a GREATER-THAN SIGN (">") character.
So, a tag is a type of a marker, a way to enclose some text (or other things such as images, etc.), and a way to instruct the web browser as to what to do with it; it might request that the text should be in bold, or shown in red color, or right aligned, to mention just a few simple instructions.
It is obvious that a tag is actually a pair of tags; the starting and the ending tag. When a brower renders a web page, it tries to match known starting and ending tags, in order to apply special formatting to the text within, as instructed by the starting tag.
If such a matching is not found, the tag is called an "Open Tag" (or an "Orphan Tag"). Each browser deals with open tags differently, but they all try their best to be as close as possible to what the author of the web page tried to do.
As you can see, a second pair of tags exists in our piece of code; it's the "<b>" and "</b>" part. This is actually the "B (Bold) HTML Tag", which instructs the browser to render the text within in bold. This pair of tags is called a nested or inline tag (or pair of tags if you prefer), because it exists within another tag (pair of tags). As a result, the word "Διὸς" is rendered in bold.
Finally, our example would render as shown bellow:
ἥκω Διὸς παῖς τήνδε Θηβαίων χθόνα
AncientGreek Tags^
AncientGreek support for tags is very basic. It can recognize the existence of a tag, but does not know and does not care about what type of a tag it is.
An "AncientGreek Tag", should obey the following scheme:
Staring char.-text1-Ending char.-Three Dots-Staring char.-text2-Ending char.
It should be noted that:
- The Staring char. may be any of the following characters:
- LESS-THAN SIGN ("<")
- LEFT SQUARE BRACKET ("[")
- LEFT PARENTHESIS ("(")
- The Ending char. may be any of the following characters:
- GREATER-THAN SIGN (">")
- RIGHT SQUARE BRACKET ("]")
- RIGHT PARENTHESIS (")")
- text1 and text2 may contain any number of characters, as long as
- text1 is not equall to text2.
- text1 and text2 do not contain the Staring char. or the Ending char.
- Three dots ("...") must be palced between a Staring tagand an Ending tag.
Obeying this scheme, the following tags are valid "AncientGreek Tags":
<p class="ancientgreek">...</p>
[beta-code-text]...[end-beta-code-text]
[xixixi]...[xaxaxa]
(abc)...(def)
The Tags Conversion dialog^
The Tags Conversion dialog opens from the "AncientGreek / Legacy Encodings / Tags Conversion" menu. When one clicks this menu entry the following dialog appears.
The Tags Conversion dialog on Mac OS.
The dialog is modal (blocks access to the document), which means that you will have a general overview of the text selected, but if it's too long, you won't be able to scroll up and down. It also means that if you need to make any changes to the document, you have to close and re-open it when editing is done.
The dialog can be separated it the follwoing sections:
- Tag section^
This is where the "AncientGreek Tag" are inserted.
There are two ways to insert an "AncientGreek Tag": a) by typing it and b) by using one of the existing tags in the drop-down list.
One would start by typing a new "AncientGreek Tag", which will be validated as typed. When a valid tag is found, a document search will be automatically performed. If a match is found the "AncientGreek Tag" will be inserted to the list for future reference.
If the drop-down list is not empty, a tag can be selected from it. Selecting a tag will automatically start a document search.
- Select Conversion section^
This is where the type of conversion is selected. AncientGreek currently supports the following conversions:
Beta Code to Unicode.
Unicode to Beta Code.
When selected, the following options are enabled:
- Capital Letters Only (TLG)
Enabling this option will make the converter produce capital ASCII characters, complying to TLG (Thesaurus Linguae Graecae). Otherwise, small ASCII characters will be used.
- Use simple sigma
This option has to do with the conversion of the Final Sigma (ς) and the Lunate Sigma Symbol (c), when converting text to Beta Code. Both of these letters can be converted to "S", but according to the The TLG® Beta Code Manual 2013, the Final Sigma (ς) can also be converted to "S2" and the Lunate Sigma Symbol (c) to "S3" ("*S3" for capital). So, when this option is enabled, the simple conversion ("S") will be used; if it's disabled, the complex one will be used ("S2" and "S3" respectively).
- Exclude lunate sigma
Enabling the previous option will result in eliminating any lunate sigma found in the document. In order to prevent this from happening, one can enable this option.
- Simplify stand-alone diacritics
Enabling this option will vastly increase the readability of raw Beta Code encoded text. All stand-alone diacritics will be encoded using their simple form; for example, Smooth breathing (psili), which is often used as an apostrophe too, would normally be converted to "%30", so the phrase "ἀλλ᾽ ὦ φίλη Λάκαινα" would become "A)LL%30 W)= FI/LH *LA/KAINA". When this option is enabled, this would become "A)LL) W)= FI/LH *LA/KAINA".
The values set using this dialog will be valid for the current session only. If you want to permanently set these parameters, please use AncientGreek Configuration dialog.
SPIonic to Unicode.
Unicode to SPIonic.
Sgreek to Unicode.
Unicode to Sgreek.
LaserGreek to Unicode.
Unicode to LaserGreek.
GreekKeys to Unicode.
Unicode to GreekKeys.
Ismini to Unicode.
Unicode to Ismini.
WinGreek to Unicode.
If WinGreek configuration has not been preformed, this conversion will not be available.
Unicode to WinGreek.
If WinGreek configuration has not been preformed, this conversion will not be available.
Son of WinGreek to Unicode.
Unicode to Son of WinGreek.
- Actions section^
This section contains buttons that perform navigation between tags, open tags, tags pairs, etc. and text conversion.
1. Navigation buttons^
Navigates you to the next/previous starting/ending tag. When a starting/ending tag is selected, replacing is inhibited.
Navigates you to the next/previous open starting/ending tag (if open tags are found). These two buttons are enabled when the "Only valid pair tags" option is unchecked, so replacing is inhibited.
Navigates you to the next/previous tag pair, effectively selecting the text to convert.
2. Conversion buttons^
These two buttons are enabled when the "Only valid pair tags" option is unchecked, so replacing is inhibited.
Converts the selected tag. When the conversion is done, the converted tag is removed from the tags list and the next tag is selected. If no more tags are available, the message "Nothing found..." is displayed, which means that the dialog can now be closed.
Using this button, one could easily convert consecutive tag pairs to different encodings.
Converts all tags and closes the dialog.
3. Icons Selection button^
Changes the active icon set.
- Other Options^
The following options are also available:
1. Only valid pair tags^
When this option is enabled, only "valid" tag pairs are taken into account and both conversion buttons get enabled, so that text conversion is possible.
When this option is disabled, both "valid" and "invalid" tag pairs are taken into account, but both conversion buttons get disbled, so that text conversion is not possible; only navigation between tags, open tags and tag pairs is available.
The question is: what is a valid tag pair?
Consider the following HTML code:
<p class="anc">Some Text</p>
Some more text
</p>
This code actually introduces an HTML error. The "<p class="anc">" and "</p>" tags are a valid HTML tag pair (or simply put: a valid HTML tag), but the "</p>" tag which follows, is an "orphan" or an "open" tag.
When this text is matched against an "AncientGreek tag" (in this case: "<p class="anc">...</p>"), it will lead to the extraction of two selections, as shown in the following image:
AncientGreek considers the first selection (the one to the left) to be "valid" and the second (the one to the right) to be "invalid", since it contains an "orphan" or an "open" tag.
2. Remove tag after conversion^
When this option is enabled, both the Starting and Ending Tag will be removed, after performing the conversion. This would be the case when not working with HTML code.
Handling open inline tags^
As we have already seen, a tag pair may contain other tags which are called "inline" or "nested" tags.
AncientGreek can handle such tags (tag pairs), but will refuse to perform any kind of text conversion when an "open inline tag" exists within the outer tag pair.
Let's see such a tag:
<p class="ancientgreek">πάντες ἄνθρωποι τοῦ εἰδέναι <b>ὀρέγονται<</b> φύσει.</p>
Here we have:
- Outer tag pair: <p class="ancientgreek"> , </p>
- Inline tag pair: <b> , </b>
- Open inline tag: <
In case such a condition is detected, AncientGreek will display the "Open inline tag selection dialog", shown below.
The "Open inline tag selection dialog" on Mac OS
The user can select to either let AncientGreek help resolve the problem, or try to solve it himself, in which case the whole tag conversion operation stops, so that manually editing is possible.
If the user selects to let AncientGreek help resolve the problem, the "Tags editor" dialog will open:
The "Tags editor" dialog on Mac OS
The dialog offers the following functions:
Go to Previous/Next open inline text .
Shortcuts: Ctrl-PgUp / Ctrl-PgDn (Win & Linux) - ⌥↑ / ⌥↓ (Mac OS)
Go to Previous/Next Starting/Ending Char.
Shortcuts: Ctrl-< / Ctrl-> (Win & Linux) - ⌥← / ⌥→ (Mac OS)
Reload original content. This will discard any changes you have made.
Shortcuts: Ctrl-R (Win & Linux) - ⌘R (Mac OS)
Decrease/Increase font size.
Shortcuts: Ctrl-- / Ctrl-+ (Win & Linux) - ⌘↓ / ⌘↑ (Mac OS)
Using these functions, you should be able to correct (remove or complete) any open inline tag. Tag validation is performed as you type, and the result is displayed in red at the top of the window, along with the number of remaining open inline tags.
When all open inline tags are corrected, the "OK" button will be enabled, so you can accept all the changes, exit the dialog and return to the "Tags Conversion" dialog.
At any time you can click "Cancel" and exit the dialog, discarding any changes made. This will take you back to the "Open inline tag selection" dialog.
Inserting tags^
AncientGreek also provides a way to insert a tag on the actual document, so that the text enclosed by the tag can later be handled as desired. The idea is to be able to select a tag to insert and then select parts of the document to insert that tag.
This can be done using the "Insert Tag" dialog, shown below.
The "Insert Tag" dialog on Mac OS
The dialog can be opened either form "AncientGreek" main dialog, or the menu entry "AncientGreek / Legacy Encodings / Insert Tag".
The dialog will remain open until "Close" is clicked, so that inserting the tag around multiple selections is possible.