Code to support the text kind of value.
- §1. Block Format
- §2. Extent Of Long Block
- §3. Character Set
- §4. Debugging
- §5. Creation
- §6. Copy
- §7. Copy Short Block
- §8. Transmutation
- §9. Mutability
- §10. Casting
- §11. Data Conversion
- §12. Z Version
- §13. Glulx Version
- §14. Comparison
- §15. Hashing
- §16. Printing
- §17. Capitalised printing
- §18. Serialisation
- §19. Unserialisation
- §20. Substitution
- §21. Perishability
- §22. Blobs
- §23. Blob Access
- §24. Get Blob
- §25. Replace Blob
- §26. Replace Text
- §27. Character Length
- §28. Get Character
- §29. Casing
- §30. Change Case
- §31. Concatenation
§1. Block Format. The short block for a text is two words long: the first word selects which form of storage will be used to represent the content, and the second word is a reference to that content. This reference is an I6 String or Routine in all cases except one, when it's a pointer to a long block containing a null-terminated array of characters, like a C string.
Clearly we need PACKED_TEXT_STORAGE and UNPACKED_TEXT_STORAGE to distinguish between the two basic methods of text storage, roughly equivalent to the pre-2013 kinds "text" and "indexed text". But why do we need four?
CONSTANT_PACKED_TEXT_STORAGE is easy to explain: the BlkValue routines normally detect constants using metadata in their long blocks, but of course that won't work for values which haven't got any long blocks. We use this instead. We don't need a CONSTANT_UNPACKED_TEXT_STORAGE because I7 never compiles constant text in unpacked form.
The surprising one is CONSTANT_PERISHABLE_TEXT_STORAGE. This is a constant created by the I7 compiler which is marked as being tricky because its value is a text substitution containing references to local variables. Unlike other text substitutions, this can't meaningfully be stored away to be expanded later: it must be expanded into unpacked text before it perishes.
Constant CONSTANT_PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 1; Constant CONSTANT_PERISHABLE_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 2; Constant PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + 3; Constant UNPACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_LONGBLOCK + 4;
§2. Extent Of Long Block. When there's a long block, we need enough of the entries to store the number of characters, plus one for the null terminator. This is called only in the course of a deep copy, so, not very often.
[ TEXT_TY_LongBlockSize arg1 x; x = BlkValueSeekZeroEntry(arg1); if (x < 0) return -1; should not happen, of course return x+1; ];
§3. Character Set. On the Z-machine, we use the 8-bit ZSCII character set, stored in bytes; on Glulx, we use full 32-bit Unicode, stored in words. (Until May 2023, 16-bit half-words were used, but that restricted the range of Unicode points so as to omit such essentials as "Smiling Cat Face With Heart-Shaped Eyes".)
The Z-machine does have very partial Unicode support, but not in a way that can help us here. It is capable of printing a wide range of Unicode characters, and on a good interpreter with a good font (such as Zoom for Mac OS X, using the Lucida Grande font) can produce many thousands of glyphs. But it is not capable of printing those characters into memory rather than the screen, an essential technique for texts: it can only write each character to a single byte, and it does so in ZSCII. That forces our hand when it comes to choosing the character set here. For Unicode, use Glulx.
#Iftrue CHARSIZE == 1; Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE; #Ifnot; Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE + BLK_FLAG_WORD; #Endif; CHARSIZE
§4. Debugging. This shows the various forms a text's short block can take:
[ TEXT_TY_Debug txt; switch (txt-->0) { CONSTANT_PACKED_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~"; CONSTANT_PERISHABLE_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~"; PACKED_TEXT_STORAGE: print " = p~", (PrintI6Text) txt-->1, "~"; UNPACKED_TEXT_STORAGE: print " = ~", (TEXT_TY_Say) txt, "~"; default: print " broken?"; } ];
§5. Creation. A newly created text is a two-word short block with no long block, like this:
Array ThisIsAText --> PACKED_TEXT_STORAGE EMPTY_TEXT_PACKED;
[ TEXT_TY_Create kind_id sb_address short_block; short_block = CreatePVShortBlock(sb_address, kind_id); short_block-->0 = PACKED_TEXT_STORAGE; short_block-->1 = EMPTY_TEXT_PACKED; return short_block; ];
[ TEXT_TY_Copy to_bv from_bv kind recycling; CopyPVRawData(to_bv, from_bv, kind, recycling); ];
§7. Copy Short Block. When a short block for a constant is copied, the new copy isn't a constant any more.
[ TEXT_TY_CopyShortBlock to_bv from_bv; CopyPVShortBlock(to_bv, from_bv, 2); if (to_bv-->0 & BLK_BVBITMAP_CONSTANTMASK) to_bv-->0 = PACKED_TEXT_STORAGE; ];
§8. Transmutation. What happens if a text is stored in packed form, but we need to access or change its individual characters? The answer is that we have to "transmute" it into long block form. Sometimes this is a permanent change, but often it's only temporary, and will soon be followed by an un-transmutation.
[ TEXT_TY_Transmute txt; TEXT_TY_Temporarily_Transmute(txt); ]; [ TEXT_TY_Temporarily_Transmute txt x; if ((txt) && (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0)) { x = txt-->1; The old value was a packed string txt-->0 = UNPACKED_TEXT_STORAGE; txt-->1 = FlexAllocate(32, TEXT_TY, TEXT_TY_Storage_Flags); if (x ~= EMPTY_TEXT_PACKED) TEXT_TY_CastPrimitive(txt, false, x); return x; } return 0; ]; [ TEXT_TY_Untransmute txt pk cp x; if ((pk) && (txt-->0 == UNPACKED_TEXT_STORAGE)) { x = txt-->1; The old value was an unpacked string FlexFree(x); txt-->0 = cp; txt-->1 = pk; The value earlier returned by TEXT_TY_Temporarily_Transmute } return txt; ];
§9. Mutability. That neatly handles the question of how to make a text mutable. (Note that constants are never created in unpacked form.)
[ TEXT_TY_Mutable txt; if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) { TEXT_TY_Transmute(txt); rfalse; } rtrue; Tell BlockValue there's a long block pointer ];
§10. Casting. In general computing, "casting" is the process of translating data in one type into semantically equivalent data in another: the only interesting cast here is that a snippet can be turned into a text.
[ TEXT_TY_Cast to_txt from_kind from_value; if (from_kind == TEXT_TY) { CopyPV(to_txt, from_value); } else if (from_kind == SNIPPET_TY) { TEXT_TY_Transmute(to_txt); TEXT_TY_CastPrimitive(to_txt, true, from_value); } ]; [ SNIPPET_TY_to_TEXT_TY to_txt snippet; return CastPV(to_txt, SNIPPET_TY, snippet); ];
§11. Data Conversion. We use a single routine to handle two kinds of format translation: a packed I6 string into an unpacked text, or a snippet into an unpacked text.
In each case, what we do is simply to print out the value we have, but with the output stream set to memory rather than the screen. That gives us the character by character version, neatly laid out in an array, and all we have to do is to copy it into the text and add a null termination byte.
What complicates things is that the two virtual machines handle printing to memory quite differently, and that the original text has unpredictable length. We are going to try printing it into the array TEXT_TY_Buffers, but what if the text is too big? Disastrously, the Z-machine simply writes on in memory, corrupting all subsequent arrays and almost certainly causing the story file to crash soon after. There is nothing we can do to predict or avoid this, or to repair the damage: this is why the Inform documentation warns users to be wary of using text with large strings in the Z-machine, and advises the use of Glulx instead. Glulx does handle overruns safely, and indeed allows us to dynamically allocate memory as necessary so that we can always avoid overruns entirely.
Constant TEXT_TY_BufferSize = BasicInformKit`TEXT_BUFFER_SIZE_CFGV + 3; Constant TEXT_TY_NoBuffers = 2; #Iftrue CHARSIZE == 1; Array TEXT_TY_Buffers -> TEXT_TY_BufferSize*TEXT_TY_NoBuffers; Where characters are bytes #Ifnot; Array TEXT_TY_Buffers --> (TEXT_TY_BufferSize+2)*TEXT_TY_NoBuffers; Where characters are words #Endif; Global RawBufferAddress = TEXT_TY_Buffers; Global RawBufferSize = TEXT_TY_BufferSize; Global TEXT_TY_CastPrimitiveNesting = 0;
§12. Z Version. The two versions of this routine, one for each virtual machine, are in all important respects the same, but there are enough fiddly differences that it's clearer to give two definitions, so:
#ifdef TARGET_ZCODE; [ TEXT_TY_CastPrimitive to_txt from_snippet from_value len news buffer; SuspendRTP(); buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*TEXT_TY_BufferSize; TEXT_TY_CastPrimitiveNesting++; if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) FlexError("ran out with too many simultaneous text conversions"); @push say__p; @push say__pc; ClearParagraphing(6); @output_stream 3 buffer; if (from_value) { if (from_snippet) print (PrintSnippet) from_value; else print (PrintI6Text) from_value; } @output_stream -3; @pull say__pc; @pull say__p; ResumeRTP(); len = buffer-->0; if (len > RawBufferSize-1) len = RawBufferSize-1; buffer->(len+2) = 0; TEXT_TY_CastPrimitiveNesting--; WritePVFieldsFromByteArray(to_txt, buffer+2, len+1); ];
#ifnot; TARGET_ZCODE [ TEXT_TY_CastPrimitive to_txt from_snippet from_value len i stream saved_stream news buffer buffer_size memory_to_free results; buffer_size = (TEXT_TY_BufferSize + 2)*WORDSIZE; RawBufferSize = TEXT_TY_BufferSize; buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*buffer_size; TEXT_TY_CastPrimitiveNesting++; if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) { buffer = VM_AllocateMemory(buffer_size); memory_to_free = buffer; if (buffer == 0) FlexError("ran out with too many simultaneous text conversions"); } SuspendRTP(); .RetryWithLargerBuffer; saved_stream = glk_stream_get_current(); stream = glk_stream_open_memory_uni(buffer, RawBufferSize, filemode_Write, 0); glk_stream_set_current(stream); @push say__p; @push say__pc; ClearParagraphing(7); if (from_snippet) print (PrintSnippet) from_value; else print (PrintI6Text) from_value; @pull say__pc; @pull say__p; results = buffer + buffer_size - 2*WORDSIZE; glk_stream_close(stream, results); if (saved_stream) glk_stream_set_current(saved_stream); ResumeRTP(); len = results-->1; if (len > RawBufferSize-1) { Glulx had to truncate text output because the buffer ran out: len is the number of characters which it tried to print news = RawBufferSize; while (news < len) news=news*2; i = VM_AllocateMemory(news*WORDSIZE); if (i ~= 0) { if (memory_to_free) VM_FreeMemory(memory_to_free); memory_to_free = i; buffer = i; RawBufferSize = news; buffer_size = (RawBufferSize + 2)*WORDSIZE; jump RetryWithLargerBuffer; } Memory allocation refused: all we can do is to truncate the text len = RawBufferSize-1; } buffer-->(len) = 0; TEXT_TY_CastPrimitiveNesting--; WritePVFieldsFromWordArray(to_txt, buffer, len+1); if (memory_to_free) VM_FreeMemory(memory_to_free); ]; #endif;
§14. Comparison. This is more or less strcmp, the traditional C library routine for comparing strings, but it does pose a few interesting questions. The answers are:
- (a) Two different unexpanded texts with substitutions are never equal, so "[X]" and "[Y]" aren't equal as texts even if X and Y are equal.
- (b) Otherwise we test the current value of the text as expanded, so "[X]" and "17" can be equal as texts if X is 17.
[ TEXT_TY_Compare left_txt right_txt rv; @push say__comp; say__comp = true; rv = TEXT_TY_Compare_Inner(left_txt, right_txt); @pull say__comp; return rv; ]; [ TEXT_TY_Compare_Inner left_txt right_txt pos ch1 ch2 capacity_left capacity_right fl fr cl cr cpl cpr; if (left_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fl = true; if (right_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fr = true; if (fl && fr) { if ((left_txt-->1 ofclass String) && (right_txt-->1 ofclass String)) return left_txt-->1 - right_txt-->1; if ((left_txt-->1 ofclass Routine) && (right_txt-->1 ofclass Routine)) if (left_txt-->1 == right_txt-->1) return 0; cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt); cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt); } else if (fl) { cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt); } else if (fr) { cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt); } if ((cl) || (cr)) { pos = TEXT_TY_Compare(left_txt, right_txt); TEXT_TY_Untransmute(left_txt, cl, cpl); TEXT_TY_Untransmute(right_txt, cr, cpr); return pos; } capacity_left = PVFieldCapacity(left_txt); capacity_right = PVFieldCapacity(right_txt); for (pos=0:(pos<capacity_left) && (pos<capacity_right):pos++) { ch1 = PVField(left_txt, pos); ch2 = PVField(right_txt, pos); if (ch1 ~= ch2) return ch1-ch2; if (ch1 == 0) return 0; } if (pos == capacity_left) return -1; return 1; ]; [ TEXT_TY_Distinguish left_txt right_txt; if (TEXT_TY_Compare(left_txt, right_txt) == 0) rfalse; rtrue; ];
§15. Hashing. This calculates a hash value for the string, using Bernstein's algorithm.
[ TEXT_TY_Hash txt rv len i p cp; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); rv = 0; len = PVFieldCapacity(txt); for (i=0: i<len: i++) rv = rv * 33 + PVField(txt, i); TEXT_TY_Untransmute(txt, p, cp); return rv; ];
[ TEXT_TY_Say txt ch i dsize; if (txt==0) rfalse; if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) return PrintI6Text(txt-->1); dsize = PVFieldCapacity(txt); for (i=0: i<dsize: i++) { ch = PVField(txt, i); if (ch == 0) break; print (char) ch; } if (i == 0) rfalse; rtrue; ];
§17. Capitalised printing. It turns out to be useful to have a variation on this:
[ TEXT_TY_Say_Capitalised txt mod rc; mod = CreatePV(TEXT_TY); TEXT_TY_SubstitutedForm(mod, txt); if (TEXT_TY_CharacterLength(mod) > 0) { WritePVField(mod, 0, CharToCase(PVField(mod, 0), 1)); TEXT_TY_Say(mod); rc = true; say__p = 1; } DestroyPV(mod); return rc; ];
§18. Serialisation. Here we print a serialised form of a text which can later be used to reconstruct the original text. The printing is apparently to the screen, but in fact always takes place when the output stream is a file.
The format chosen is a letter "S" for string, then a comma-separated list of decimal character codes, ending with the null terminator, and followed by a semicolon: thus S65,66,67,0; is the serialised form of the text "ABC".
[ TEXT_TY_Serialise txt len pos ch p cp; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); len = PVFieldCapacity(txt); print "S"; for (pos=0: pos<=len: pos++) { if (pos == len) ch = 0; else ch = PVField(txt, pos); if (ch == 0) { print "0;"; break; } else { print ch, ","; } } TEXT_TY_Untransmute(txt, p, cp); ];
§19. Unserialisation. If that's the word: the reverse process, in which we read a stream of characters from a file and reconstruct the text which gave rise to them.
[ TEXT_TY_Unserialise txt auxf ch i v dg pos tsize p; if (ch == -1) rtrue; TEXT_TY_Transmute(txt); tsize = PVFieldCapacity(txt); while (ch ~= 32 or 9 or 10 or 13 or 0 or -1) { ch = FileIO_GetC(auxf); if (ch == ',' or ';') { if (pos+1 >= tsize) { if (SetPVFieldCapacity(txt, 2*pos) == false) break; tsize = PVFieldCapacity(txt); } WritePVField(txt, pos++, v); v = 0; if (ch == ';') break; } else { dg = ch - '0'; v = v*10 + dg; } } WritePVField(txt, pos, 0); return txt; ];
[ TEXT_TY_SubstitutedForm to txt; if (txt) { CopyPV(to, txt); TEXT_TY_Transmute(to); } return to; ]; [ TEXT_TY_IsSubstituted txt; if ((txt) && (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) && (txt-->1 ofclass Routine)) rfalse; rtrue; ];
§21. Perishability. As noted above, a perishable constant is one which must be expanded before the values it refers to vanish from existence.
[ TEXT_TY_ExpandIfPerishable to from; if ((from) && (from-->0 == CONSTANT_PERISHABLE_TEXT_STORAGE)) return TEXT_TY_SubstitutedForm(to, from); return from; ];
§22. Blobs. That completes the compulsory services required for this KOV to function: from here on, the remaining routines provide definitions of text-related phrases in the Standard Rules.
What are the basic operations of text-handling? Clearly we want to be able to search, and replace, but that is left for the segment "RegExp.i6t" to handle. More basically we would like to be able to read and write characters from the text. But texts in I7 tend to be of natural language, rather than containing arbitrary material — that's indeed why we call them texts rather than strings. This means they are likely to be punctuated sequences of words, divided up perhaps into sentences and even paragraphs.
So we provide facilities which regard a text as being an array of "blobs", where a "blob" is a unit of text. The user can choose whether to see it as an array of characters, or words (of three different sorts: see the Inform documentation for details), or paragraphs, or lines.
Constant CHR_BLOB = 1; Construe as an array of characters Constant WORD_BLOB = 2; Of words Constant PWORD_BLOB = 3; Of punctuated words Constant UWORD_BLOB = 4; Of unpunctuated words Constant PARA_BLOB = 5; Of paragraphs Constant LINE_BLOB = 6; Of lines Constant REGEXP_BLOB = 7; Not a blob type as such, but needed as a distinct value
§23. Blob Access. The following routine runs a small finite-state-machine to count the number of blobs in a text, using any of the above blob types (except REGEXP_BLOB, which is used for other purposes). If the optional arguments ctxt and wanted are supplied, it also copies the text of blob number wanted (counting upwards from 1 at the start of the text) into the text ctxt. If the further optional argument rtxt is supplied, then ctxt is instead written with the original text txt as it would read if the blob in question were replaced with the text in rtxt.
Constant WS_BRM = 1; Constant SKIPPED_BRM = 2; Constant ACCEPTED_BRM = 3; Constant ACCEPTEDP_BRM = 4; Constant ACCEPTEDN_BRM = 5; Constant ACCEPTEDPN_BRM = 6; [ TEXT_TY_BlobAccess txt blobtype ctxt wanted rtxt p1 p2 cp1 cp2 r; if (txt==0) return 0; if (blobtype == CHR_BLOB) return TEXT_TY_CharacterLength(txt); cp1 = txt-->0; p1 = TEXT_TY_Temporarily_Transmute(txt); cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt); TEXT_TY_Transmute(ctxt); if (ctxt) BlkMakeMutable(ctxt); r = TEXT_TY_BlobAccessI(txt, blobtype, ctxt, wanted, rtxt); TEXT_TY_Untransmute(txt, p1, cp1); TEXT_TY_Untransmute(rtxt, p2, cp2); return r; ]; [ TEXT_TY_BlobAccessI txt blobtype ctxt wanted rtxt brm oldbrm ch i dsize blobcount gp cl j; dsize = PVFieldCapacity(txt); if ((rtxt) && (ctxt == 0)) "*** rtxt without ctxt ***"; brm = WS_BRM; for (i=0:i<dsize:i++) { ch = PVField(txt, i); if (ch == 0) break; oldbrm = brm; if (ch == 10 or 13 or 32 or 9) { if (oldbrm ~= WS_BRM) { gp = 0; for (j=i:j<dsize:j++) { ch = PVField(txt, j); if (ch == 0) { brm = WS_BRM; break; } if (ch == 10 or 13) { gp++; continue; } if (ch ~= 32 or 9) break; } ch = PVField(txt, i); if (j == dsize) brm = WS_BRM; switch (blobtype) { PARA_BLOB: if (gp >= 2) brm = WS_BRM; LINE_BLOB: if (gp >= 1) brm = WS_BRM; default: brm = WS_BRM; } } } else { gp = false; if ((blobtype == WORD_BLOB or PWORD_BLOB or UWORD_BLOB) && (ch == '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}')) gp = true; switch (oldbrm) { WS_BRM: brm = ACCEPTED_BRM; if (blobtype == WORD_BLOB) { if (gp) brm = SKIPPED_BRM; } if (blobtype == PWORD_BLOB) { if (gp) brm = ACCEPTEDP_BRM; } SKIPPED_BRM: if (blobtype == WORD_BLOB) { if (gp == false) brm = ACCEPTED_BRM; } ACCEPTED_BRM: if (blobtype == WORD_BLOB) { if (gp) brm = SKIPPED_BRM; } if (blobtype == PWORD_BLOB) { if (gp) brm = ACCEPTEDP_BRM; } ACCEPTEDP_BRM: if (blobtype == PWORD_BLOB) { if (gp == false) brm = ACCEPTED_BRM; else { if ((ch == PVField(txt, i-1)) && (ch == '-' or '.')) blobcount--; blobcount++; } } ACCEPTEDN_BRM: if (blobtype == WORD_BLOB) { if (gp) brm = SKIPPED_BRM; } if (blobtype == PWORD_BLOB) { if (gp) brm = ACCEPTEDP_BRM; } ACCEPTEDPN_BRM: if (blobtype == PWORD_BLOB) { if (gp == false) brm = ACCEPTED_BRM; else { if ((ch == PVField(txt, i-1)) && (ch == '-' or '.')) blobcount--; blobcount++; } } } } if (brm == ACCEPTED_BRM or ACCEPTEDP_BRM) { if (oldbrm ~= brm) blobcount++; if ((ctxt) && (blobcount == wanted)) { if (rtxt) { if (cl+1 >= PVFieldCapacity(ctxt)) { if (SetPVFieldCapacity(ctxt, 2*cl) == false) break; } WritePVField(ctxt, cl, 0); TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB); cl = TEXT_TY_CharacterLength(ctxt); if (brm == ACCEPTED_BRM) brm = ACCEPTEDN_BRM; if (brm == ACCEPTEDP_BRM) brm = ACCEPTEDPN_BRM; } else { if (cl+1 >= PVFieldCapacity(ctxt)) { if (SetPVFieldCapacity(ctxt, 2*cl) == false) break; } WritePVField(ctxt, cl++, ch); } } else { if (rtxt) { if (cl+1 >= PVFieldCapacity(ctxt)) { if (SetPVFieldCapacity(ctxt, 2*cl) == false) break; } WritePVField(ctxt, cl++, ch); } } } else { if ((rtxt) && (brm ~= ACCEPTEDN_BRM or ACCEPTEDPN_BRM)) { if (cl+1 >= PVFieldCapacity(ctxt)) { if (SetPVFieldCapacity(ctxt, 2*cl) == false) break; } WritePVField(ctxt, cl++, ch); } } } if (ctxt) { if (cl+1 >= PVFieldCapacity(ctxt)) { SetPVFieldCapacity(ctxt, 2*cl); } WritePVField(ctxt, cl++, 0); } return blobcount; ];
§24. Get Blob. The front end which uses the above routine to read a blob. (Note that, for efficiency's sake, we read characters more directly.)
[ TEXT_TY_GetBlob ctxt txt wanted blobtype; if (txt==0) return; if (blobtype == CHR_BLOB) return TEXT_TY_GetCharacter(ctxt, txt, wanted); TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted); return ctxt; ];
§25. Replace Blob. The front end which uses the above routine to replace a blob. (Once again, characters are handled directly to avoid incurring all that overhead.)
[ TEXT_TY_ReplaceBlob blobtype txt wanted rtxt ctxt ilen rlen i p cp; TEXT_TY_Transmute(txt); cp = rtxt-->0; p = TEXT_TY_Temporarily_Transmute(rtxt); if (blobtype == CHR_BLOB) { ilen = TEXT_TY_CharacterLength(txt); rlen = TEXT_TY_CharacterLength(rtxt); wanted--; if ((wanted >= 0) && (wanted<ilen)) { if (rlen == 1) { WritePVField(txt, wanted, PVField(rtxt, 0)); } else { ctxt = CreatePV(TEXT_TY); TEXT_TY_Transmute(ctxt); if (SetPVFieldCapacity(ctxt, ilen+rlen+1)) { for (i=0:i<wanted:i++) WritePVField(ctxt, i, PVField(txt, i)); for (i=0:i<rlen:i++) WritePVField(ctxt, wanted+i, PVField(rtxt, i)); for (i=wanted+1:i<ilen:i++) WritePVField(ctxt, rlen+i-1, PVField(txt, i)); WritePVField(ctxt, rlen+ilen, 0); CopyPV(txt, ctxt); } DestroyPV(ctxt); } } } else { ctxt = CreatePV(TEXT_TY); TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted, rtxt); CopyPV(txt, ctxt); DestroyPV(ctxt); } TEXT_TY_Untransmute(rtxt, p, cp); ];
§26. Replace Text. This is the general routine which searches for any instance of ftxt, as a blob, in txt, and replaces it with the text rtxt. It works on any of the above blob-types, but two cases are special: first, if the blob-type is CHR_BLOB, then it can do more than search and replace for any instance of a single character: it can search and replace any instance of a substring, so that ftxt is not required to be only a single character. Second, if the blob-type is the special value REGEXP_BLOB then ftxt is interpreted as a regular expression rather than something literal to find: see "RegExp.i6t" for what happens next.
[ TEXT_TY_ReplaceText blobtype txt ftxt rtxt r p1 p2 cp1 cp2; TEXT_TY_Transmute(txt); cp1 = ftxt-->0; p1 = TEXT_TY_Temporarily_Transmute(ftxt); cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt); r = TEXT_TY_ReplaceTextI(blobtype, txt, ftxt, rtxt); TEXT_TY_Untransmute(ftxt, p1, cp1); TEXT_TY_Untransmute(rtxt, p2, cp2); return r; ]; [ TEXT_TY_ReplaceTextI blobtype txt ftxt rtxt ctxt csize ilen flen i cl mpos ch chm whitespace punctuation; if (blobtype == REGEXP_BLOB or CHR_BLOB) return TEXT_TY_Replace_RE(blobtype, txt, ftxt, rtxt); ilen = TEXT_TY_CharacterLength(txt); flen = TEXT_TY_CharacterLength(ftxt); ctxt = CreatePV(TEXT_TY); TEXT_TY_Transmute(ctxt); csize = PVFieldCapacity(ctxt); mpos = 0; whitespace = true; punctuation = false; for (i=0:i<=ilen:i++) { ch = PVField(txt, i); .MoreMatching; chm = PVField(ftxt, mpos++); if (mpos == 1) { switch (blobtype) { WORD_BLOB: if ((whitespace == false) && (punctuation == false)) chm = -1; } } whitespace = false; if (ch == 10 or 13 or 32 or 9) whitespace = true; punctuation = false; if (ch == '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}') { if (blobtype == WORD_BLOB) chm = -1; punctuation = true; } if (ch == chm) { if (mpos == flen) { if (i == ilen) chm = 0; else chm = PVField(txt, i+1); if ((blobtype == CHR_BLOB) || (chm == 0 or 10 or 13 or 32 or 9) || (chm == '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}')) { mpos = 0; cl = cl - (flen-1); WritePVField(ctxt, cl, 0); TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB); csize = PVFieldCapacity(ctxt); cl = TEXT_TY_CharacterLength(ctxt); continue; } } } else { mpos = 0; } if (cl+1 >= csize) { if (SetPVFieldCapacity(ctxt, 2*cl) == false) break; csize = PVFieldCapacity(ctxt); } WritePVField(ctxt, cl++, ch); } CopyPV(txt, ctxt); DestroyPV(ctxt); ];
§27. Character Length. When accessing at the character-by-character level, things are much easier and we needn't go through any finite state machine palaver.
[ TEXT_TY_CharacterLength txt ch i dsize p cp r; if (txt==0) return 0; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); dsize = PVFieldCapacity(txt); r = dsize; for (i=0:i<dsize:i++) { ch = PVField(txt, i); if (ch == 0) { r = i; break; } } TEXT_TY_Untransmute(txt, p, cp); return r; ]; [ TEXT_TY_Empty txt; if (txt==0) rtrue; if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) { if (txt-->1 == EMPTY_TEXT_PACKED) rtrue; rfalse; } if (TEXT_TY_CharacterLength(txt) == 0) rtrue; rfalse; ];
§28. Get Character. Characters in a text are numbered upwards from 1 by the users of this routine: which is why we subtract 1 when reading the array in the block-value, which counts from 0.
[ TEXT_TY_GetCharacter ctxt txt i ch p cp; if (txt==0) return 0; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); TEXT_TY_Transmute(ctxt); if ((i<=0) || (i>TEXT_TY_CharacterLength(txt))) ch = 0; else ch = PVField(txt, i-1); WritePVField(ctxt, 0, ch); WritePVField(ctxt, 1, 0); TEXT_TY_Untransmute(txt, p, cp); return ctxt; ];
§29. Casing. In many programming languages, characters are a distinct data type from strings, but not in I7. To I7, a character is simply a text which happens to have length 1 — this has its inefficiencies, but is conceptually easy for the user.
TEXT_TY_CharactersOfCase(txt, case) determines whether all the characters in txt are letters of the given casing: 0 for lower case, 1 for upper case. In the case of ZSCII, this is done correctly handling all of the European accented letters; in the case of Unicode, it follows the Unicode standard.
Note that there is no requirement for txt to be only a single character long.
[ TEXT_TY_CharactersOfCase txt case i ch len p cp r; if (txt==0) return 0; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); len = TEXT_TY_CharacterLength(txt); r = true; for (i=0:i<len:i++) { ch = PVField(txt, i); if ((ch) && (CharIsOfCase(ch, case) == false)) { r = false; break; } } TEXT_TY_Untransmute(txt, p, cp); return r; ];
§30. Change Case. We set ctxt to the text in txt, except that all the letters are converted to the case given (0 for lower, 1 for upper). The definition of what is a "letter", what case it has and what the other-case form is are as specified in the ZSCII and Unicode standards.
[ TEXT_TY_CharactersToCase ctxt txt case i ch len bnd pk cp; if (txt==0) return 0; cp = txt-->0; pk = TEXT_TY_Temporarily_Transmute(txt); TEXT_TY_Transmute(ctxt); len = TEXT_TY_CharacterLength(txt); if (SetPVFieldCapacity(ctxt, len+1)) { bnd = 1; for (i=0:i<len:i++) { ch = PVField(txt, i); if (case < 2) { WritePVField(ctxt, i, CharToCase(ch, case)); } else { WritePVField(ctxt, i, CharToCase(ch, bnd)); if (case == 2) { bnd = 0; if (ch == 0 or 10 or 13 or 32 or 9 or '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}') bnd = 1; } if (case == 3) { if (ch ~= 0 or 10 or 13 or 32 or 9) { if (bnd == 1) bnd = 0; else { if (ch == '.' or '!' or '?') bnd = 1; } } } } } WritePVField(ctxt, len, 0); } TEXT_TY_Untransmute(txt, pk, cp); return ctxt; ];
§31. Concatenation. To concatenate two texts is to place one after the other: thus "green" concatenated with "horn" makes "greenhorn". In this routine, from_txt would be "horn", and is added at the end of to_txt, which is returned in its expanded state.
When the blob type is REGEXP_BLOB, the routine is used not for simple concatenation but to handle the concatenations occurring when a regular expression search-and-replace is going on: see "RegExp.i6t".
[ TEXT_TY_Concatenate to_txt from_txt blobtype ref_txt p cp r; if (to_txt==0) rfalse; if (from_txt==0) return to_txt; TEXT_TY_Transmute(to_txt); cp = from_txt-->0; p = TEXT_TY_Temporarily_Transmute(from_txt); r = TEXT_TY_ConcatenateI(to_txt, from_txt, blobtype, ref_txt); TEXT_TY_Untransmute(from_txt, p, cp); return r; ]; [ TEXT_TY_ConcatenateI to_txt from_txt blobtype ref_txt pos len ch i tosize x y case; switch(blobtype) { CHR_BLOB, 0: pos = TEXT_TY_CharacterLength(to_txt); len = TEXT_TY_CharacterLength(from_txt); if (SetPVFieldCapacity(to_txt, pos+len+1) == false) return to_txt; for (i=0:i<len:i++) { ch = PVField(from_txt, i); WritePVField(to_txt, i+pos, ch); } WritePVField(to_txt, len+pos, 0); return to_txt; REGEXP_BLOB: return TEXT_TY_RE_Concatenate(to_txt, from_txt, blobtype, ref_txt); } print "*** TEXT_TY_Concatenate used on impossible blob type ***^"; rfalse; ];