Announcement

Collapse
No announcement yet.

isalpha() and isdigit()

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    isalpha() and isdigit()



    When it is asked to process characters
    ASCII 127 (characters that include what
    Americans are pleased to call "foreign"
    characters), it goofs badly ... though
    I believe it has improved since it was
    first released.

    Here's what it produces, above ASCII
    127. Characters within []s should
    not be there. Characters within
    ][s should be there and are not.

    [=83 florin]
    [=93 opening curly quote]
    =C1=C3
    ]=C0=C2=C4=C5=C6[
    =C7
    =C8=C9=CB
    ]=CA=8C[
    =CD
    ]=CC=CE=CF[
    =D1
    =D3
    ]=D2=D4=D5=D6=D8[
    [=D7 multiplication sign]
    ]=8A[
    =D9=DB
    ]=DA=DC[
    =DD
    ]=9F[
    [=DF beta]
    =E1=E2=E3=E5
    ]=E0=E4=E6[
    =E7
    =E9=EB
    ]=E8=EA=9C[
    =EF
    ]=EC=ED=EE[
    =F1
    =F3=F5
    ]=F3=F4=F6=F8[
    ]=9A[
    [=F7 addition sign]
    =F9=FB=FC
    ]=FA[
    =FD
    ]=FF[
    [=FE ASCII 222, 'thorn' ... I think]
    ]=8E=9E[ ; cap and lc Z with inverted circumflex
    ; they print in MSWord and browsers, but
    ; not in MvTools, Notepad,

    So here's a more reliable function. It runs just a
    hair slower that the built-in isalpha(). On my
    machine, for 30,000 iterations, the function body
    needs 7.0 (+/- 1) seconds; isalpha() needs
    6.5 (+/- 1) seconds.


    <MvFUNCTION NAME =3D "isalphabeta" PARAMETERS =3D "Char">
    <MvASSIGN NAME =3D "l.Allowed" VALUE =3D
    "{'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu vwxyz=C1=C3=C0=C2=C4=C5=
    =C6=C7=C8=C9=CB=CA=8C=CD=CC=CE=CF=D1=D3=D2=D4=D5=D 6=D8=8A=D9=DB=DA=DC=DD=
    =9F=E1=E2=E3=E5=E0=E4=E6=E7=E9=EB=E8=EA=9C=EF=EC=E D=EE=F1=F3=F5=F3=F4=F6=
    =F8=9A=F9=FB=FC=FA=FD=FF=8E=9E'}">
    <MvFUNCRETURN VALUE =3D "{l.Char IN l.Allowed}">
    </MvFUNCTION>

    On the other hand, Miva's isdigit() returns true for
    the following, in addition to 01234567890:

    ASCII: 132 =3D =84
    ASCII: 148 =3D =94
    ASCII: 195 =3D =C3
    ASCII: 197 =3D =C5
    ASCII: 199 =3D =C7
    ASCII: 200 =3D =C8
    ASCII: 201 =3D =C9
    ASCII: 209 =3D =D1
    ASCII: 211 =3D =D3
    ASCII: 213 =3D =D5
    ASCII: 221 =3D =DD
    ASCII: 223 =3D =DF
    ASCII: 225 =3D =E1
    ASCII: 226 =3D =E2
    ASCII: 227 =3D =E3
    ASCII: 233 =3D =E9
    ASCII: 235 =3D =EB
    ASCII: 237 =3D =ED
    ASCII: 245 =3D =F5
    ASCII: 247 =3D =F7
    ASCII: 249 =3D =F9

    So, unless you want to invent a
    base-48 numbering system, and need some
    of these characters for your notation,
    you're better off with:

    <MvFUNCTION NAME =3D "IsADigit" PARAMETERS =3D "Char">
    <MvASSIGN NAME =3D "l.Allowed" VALUE =3D "{'0123456789'}">
    <MvFUNCRETURN VALUE =3D "{l.Char IN l.Allowed}">
    </MvFUNCTION>

    The running times are about the same as for
    the alpha functions, despite the much short
    string assignment is IsADigit().

    Jack

    --=20
    MvTools, a Miva/HTML editor -- FREE, at
    http://papercitysoftware.com


    #2
    isalpha() and isdigit()



    Jack,

    the big problem I see with your functions is that they only work
    on a single character whereas isdigit() and isalpha() work on the
    entire contents of a variable or literal.

    Example:

    <MvEVAL EXPR =3D

    isdigit('13579')
    $ isdigit('13f79')
    $ isalpha('aeiou')
    $ isalpha('ae5ou')
    }">

    Returns 1010 AKA True, False, True, False.

    But with your functions in the test script the translated code of:

    <MvEVAL EXPR =3D

    isadigit('13579')
    $ isadigit('13f79')
    $ isalphabeta('aeiou')
    $ isalphabeta('ae5ou')
    }">

    Returns 0000 AKA False, False, False, False.

    It is obvious to me from the parameter names you chose that
    you intended your function to work on a single character, but as
    such is it an inadequate replacement for the built-in functions.

    To do what they do and what you want at the same time, you
    would have to substring() through the variable passed to the
    function and then do your check again in a loop. This would be
    a lot slower naturally.

    Of course with the VM you could add an external function but I
    will restrict my comments to pure Miva script.

    Hmmnnn.

    I think it is safe to say that to accomplish what you want on
    small strings using your functions rewritten to loop through the
    string would be slower than isalpha() and isdigit() but not so
    horribly slower that they are unusable. For really large
    strings we are safe assuming that they would be really slow but I
    wonder how fast or slow isalpha() and isdigit() are on really
    long strings.

    I also wonder if a loop that uses multiple calls to glosub() to
    remove the acceptable list of characters from the string and then
    checks to see if anything is left would be faster on really large
    strings.

    I then wonder if a bit of research could determine what the
    average length of the string would need to be to make it more
    efficient to use the later method than the former.

    If any of that proves true or of value you could do something
    that I will express in pseudo code.

    l.threshold =3D length that is so long we think glosubs would be =
    faster.
    l.len =3D len( l.string)

    if l.len lt 1
    It has no length thus can't be
    valid alpha, digit or whatever.
    return 0
    /if

    if l.len eq 1
    Do your in check and return accordingly.
    /if

    if l.len gt 1 and lt l.threshold
    loop through the string doing your
    check and if ever it proves false
    then return false and if you get to
    the end without a false return true.
    /if

    if l.len ge threshold
    loop through list of valid characters
    glosub() them out of the string
    /loop
    if new length is gt 0
    return 0
    else
    return 1
    /if

    Now the big question. is all the code to do the checks going to
    slow things down so much that the function becomes to slow to
    even do short strings?

    I guess the answer depends on how much you really need that
    new functionality dealing with higher ASCII values.

    -Jeff Huber

    ----- Original Message -----
    From: R. Jackson Wilson <[email protected]>
    To: Miva Users Forum <[email protected]>
    Sent: Wednesday, June 05, 2002 10:43 AM
    Subject: [meu] isalpha() and isdigit()


    When it is asked to process characters
    ASCII 127 (characters that include what
    Americans are pleased to call "foreign"
    characters), it goofs badly ... though
    I believe it has improved since it was
    first released.

    Here's what it produces, above ASCII
    127. Characters within []s should
    not be there. Characters within
    ][s should be there and are not.

    [f florin]
    [" opening curly quote]
    =C1=C3
    ]=C0=C2=C4=C5=C6[
    =C7
    =C8=C9=CB
    ]=CAO[
    =CD
    ]=CC=CE=CF[
    =D1
    =D3
    ]=D2=D4=D5=D6=D8[
    [=D7 multiplication sign]
    ]S[
    =D9=DB
    ]=DA=DC[
    =DD
    ]Y[
    [=DF beta]
    =E1=E2=E3=E5
    ]=E0=E4=E6[
    =E7
    =E9=EB
    ]=E8=EAo[
    =EF
    ]=EC=ED=EE[
    =F1
    =F3=F5
    ]=F3=F4=F6=F8[
    ]s[
    [=F7 addition sign]
    =F9=FB=FC
    ]=FA[
    =FD
    ]=FF[
    [=FE ASCII 222, 'thorn' ... I think]
    ]Zz[ ; cap and lc Z with inverted circumflex
    ; they print in MSWord and browsers, but
    ; not in MvTools, Notepad,

    So here's a more reliable function. It runs just a
    hair slower that the built-in isalpha(). On my
    machine, for 30,000 iterations, the function body
    needs 7.0 (+/- 1) seconds; isalpha() needs
    6.5 (+/- 1) seconds.


    <MvFUNCTION NAME =3D "isalphabeta" PARAMETERS =3D "Char">
    <MvASSIGN NAME =3D "l.Allowed" VALUE =3D
    "{'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu vwxyz=C1=C3=C0=C2=C4=C5=
    =C6=C7=C8=C9=CB=CAO=CD=CC=CE=CF=D1=D3=D2=D4
    =D5=D6=D8S=D9=DB=DA=DC=DDY=E1=E2=E3=E5=E0=E4=E6=E7 =E9=EB=E8=EAo=EF=EC=ED=
    =EE=F1=F3=F5=F3=F4=F6=F8s=F9=FB=FC=FA=FD=FFZz'}">
    <MvFUNCRETURN VALUE =3D "{l.Char IN l.Allowed}">
    </MvFUNCTION>

    On the other hand, Miva's isdigit() returns true for
    the following, in addition to 01234567890:

    ASCII: 132 =3D "
    ASCII: 148 =3D "
    ASCII: 195 =3D =C3
    ASCII: 197 =3D =C5
    ASCII: 199 =3D =C7
    ASCII: 200 =3D =C8
    ASCII: 201 =3D =C9
    ASCII: 209 =3D =D1
    ASCII: 211 =3D =D3
    ASCII: 213 =3D =D5
    ASCII: 221 =3D =DD
    ASCII: 223 =3D =DF
    ASCII: 225 =3D =E1
    ASCII: 226 =3D =E2
    ASCII: 227 =3D =E3
    ASCII: 233 =3D =E9
    ASCII: 235 =3D =EB
    ASCII: 237 =3D =ED
    ASCII: 245 =3D =F5
    ASCII: 247 =3D =F7
    ASCII: 249 =3D =F9

    So, unless you want to invent a
    base-48 numbering system, and need some
    of these characters for your notation,
    you're better off with:

    <MvFUNCTION NAME =3D "IsADigit" PARAMETERS =3D "Char">
    <MvASSIGN NAME =3D "l.Allowed" VALUE =3D "{'0123456789'}">
    <MvFUNCRETURN VALUE =3D "{l.Char IN l.Allowed}">
    </MvFUNCTION>

    The running times are about the same as for
    the alpha functions, despite the much short
    string assignment is IsADigit().

    Jack

    --
    MvTools, a Miva/HTML editor -- FREE, at
    http://papercitysoftware.com

    Comment


      #3
      isalpha() and isdigit()



      Hi, Jeff:

      As you correctly guessed, I meant the functions
      as a failsafe for folks who do a lot of work
      in "foreign" -- i.e., Latin -- alphabets.

      I obviously thought about doing whole strings,
      but substring() is so bloody slow (in all
      programming languages, I think) that I was
      discouraged.

      I didn't try the VM yet, to see whether it would
      make the parsing loop acceptably fast.

      Maybe I will. I'm in fiddlin' mode.

      Jack

      --
      MvTools, a Miva/HTML editor -- FREE, at
      http://papercitysoftware.com

      Comment


        #4
        isalpha() and isdigit()



        There is another problem - foreign languages rarely use the same
        character set. So for example only for the Czech language I know about =
        a
        dozen of different character sets, where the most popular are the
        Windows-1250, CP852, ISO-8859-2, UTF-8, Kamenicky, MacOS-CentralEurope,
        KOI8-CS, ICL, East8, Cork and partially also CP437 or ISO-8859-1.

        The same situation is not only at many other eastern and western
        languages, but also at English. Different OS, different browsers and
        their different versions use different coding - beginning by plain
        ASCII, CP437, CP850, ISO-8859-1 (Latin I) to UTF-8 and many OS =
        dependent
        odds.

        So additionally to parsing the strings you would need to define the
        right mapping for the character set as you have set it in the HTML
        header with a META tag.

        As for the looping through the string, I think that a "relatively" =
        quick
        solution would be using the new glosub_array() function. Something in
        this way:

        g.iso8859_1[] =3D =
        '=C1=C3=C0=C2=C4=C5=C6=C7=C8=C9=CB=CA=CD=CC=CE=CF. ..=E1=E3=E0...'=20
        g.ascii_map[] =3D 'AAAAAAACEEEEIIII...aaa...'

        and then

        isalpha(glosub_array(string, g.iso8859_1, g.ascii_map))
        or
        isdigit(glosub_array(string, g.iso8859_1, g.ascii_map))=20

        No idea if it is fast enough to use extensively, but probably faster
        than substring().

        Ivo
        http://mivo.truxoft.com=20




        -----Original Message-----
        From: R. Jackson Wilson
        =20
        Hi, Jeff:

        As you correctly guessed, I meant the functions
        as a failsafe for folks who do a lot of work
        in "foreign" -- i.e., Latin -- alphabets.

        I obviously thought about doing whole strings,
        but substring() is so bloody slow (in all
        programming languages, I think) that I was
        discouraged.

        I didn't try the VM yet, to see whether it would
        make the parsing loop acceptably fast.

        Maybe I will. I'm in fiddlin' mode.

        Jack

        --=20
        MvTools, a Miva/HTML editor -- FREE, at
        http://papercitysoftware.com



        Comment


          #5
          isalpha() and isdigit()



          -----BEGIN PGP SIGNED MESSAGE-----
          Hash: SHA1

          On Wed, 05 Jun 2002 16:50:58 -0400, R. Jackson Wilson wrote:

          >Hi, Jeff:
          >
          >As you correctly guessed, I meant the functions
          >as a failsafe for folks who do a lot of work
          >in "foreign" -- i.e., Latin -- alphabets.
          >
          >I obviously thought about doing whole strings,
          >but substring() is so bloody slow (in all
          >programming languages, I think) that I was
          >discouraged.
          >
          >I didn't try the VM yet, to see whether it would
          >make the parsing loop acceptably fast.
          >
          Jack, I just ran isalpha through my compiler speed testbed
          (It was one of my trial functions, but I hadn't tried it on the
          finished product yet.)

          The results: (<MvASSIGN NAME="result" VALUE="{isalpha(g.string}">
          String length Time
          1 0.73
          6 1.12
          25 2.17
          100 6.80
          500 30.00

          So as expected, the function speed is pretty well linear to the
          length of its argument. The numbers roughly equal time for a single
          function call on my box (Athlon 800) in microseconds.
          As I mentioned the other day, forget to add the scope to the argument and
          you lose bigtime. The speed for a single-character check blew out to 4.5 when
          I removed the scope from the argument.
          And the function is "smart". When I replaced an all-alpha 500 char string with
          one where the second character was a digit, the time dropped to 1.08.


          [ dramatic design --- for web | print | theatre ]
          [ po box 3263 | christchurch 8015 | new zealand ]
          [ tel +64-3-360-2200 | cel +64-27-221-5053 ]
          [ fax +64-3-360-2201 | e [email protected] ]
          [ web <A HREF ="http://www.dramatic.co.nz ]">http://www.dramatic.co.nz ]</A>

          -----BEGIN PGP SIGNATURE-----
          Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

          iQA/AwUBPP3hqLX0f0pNMuZlEQLdPACfbiEtfieuueVjzb/N012vLnbAJ6YAoPHn
          H+X5CW+7gvpmmTnGpjfpKLoq
          =YPyv
          -----END PGP SIGNATURE-----


          Comment


            #6
            isalpha() and isdigit()



            Hi, Jeff:

            Nice to be talking again!

            > the big problem I see with your functions is that they only work
            > on a single character whereas isdigit() and isalpha() work on the
            > entire contents of a variable or literal.

            And that, of course, depends on who and where you are.
            If the strings you are dealing with are all directly
            typeable from a standard keyboard, then isalpha is fine.

            If, on the other hand, you deal with a lot of strings
            that have "foreign" characters, then isalpha() is pretty
            worthless.

            Anyhow, just to give some specific shape to the issue, I
            ran up a compiled version that deals with strings rather
            than single characters.

            I did it in a straightahead sort of way for a couple of
            reasons 1) even MY penchant for fiddlin' has its limits
            and 2) since isalpha treats any string with a space in it
            as non-alpha, the real-world use of an IsAlphaBeta() function
            would probably be on reasonably short strings of characters.

            Anyway, compiled it's not a disaster: roughly one ms to deal with
            a string of 64 characters whose first member is a digit. (For
            a couple of reasons, the function works on the string back-to-
            front.)

            I guess where I'd come down is this: if you have process strings
            that may well contain "foreign" characters, and you need to know
            if a string is an alphabeta thing, then a function whose body looks
            roughly like this:

            <MvASSIGN NAME =3D "l.Allowed" VALUE =3D
            "{'=C1=C3=C0=C2=C4=C5=C6=C7=C8=C9=CB=CA=8C=CD=CC=C E=CF=D1=D3=D2=D4=D5=D6=
            =D8=8A=D9=DB=DA=DC=DD=9F=E1=E2=E3=E5=E0=E4=E6=E7=E 9=EB=E8=EA=9C=EF=EC=ED=
            =EE=F1=F3=F5=F3=F4=F6=F8=9A=F9=FB=FC=FA=FD=FF=8E=9 E'}">
            <MvASSIGN NAME =3D "l.Length" VALUE =3D "{len(l.String)}">
            <MvASSIGN NAME =3D "l.ctr" VALUE =3D "{1.Length}">
            <MvASSIGN NAME =3D "l.Result" VALUE =3D "{1}">
            <MvWHILE EXPR =3D "{l.ctr}}">
            <MvASSIGN NAME =3D "l.Char" VALUE =3D "{substring(l.String, l.ctr, =
            1)}">
            <MvIF EXPR =3D "{NOT((l.Char GE 'a' AND l.Char LE 'z') OR (l.Char =
            GE 'A' AND l.Char LE
            'Z'))}">
            <MvIF EXPR =3D "{NOT(l.Char IN l.Allowed)}">
            <MvFUNCRETURN VALUE =3D "{0}">
            <MvWHILESTOP>
            </MvIF>
            </MvIF>
            <MvASSIGN NAME =3D "l.ctr" VALUE =3D "{l.ctr - 1}">
            </MvWHILE>
            <MvFUNCRETURN VALUE =3D "{l.Result}">

            could be useful. But it will be a bit slow uncompiled,
            roughly .03 seconds on a 64-char string, if it returns
            true.

            Ivo: I am painfully aware of the complexity of the
            character-set problem. My starting point was that
            isalpha() is not adequate for the languages I happen
            to be able to recognize (though not read). So my
            thought was to see if an alternate function could be
            made that would handle in a "relatively" acceptable time
            (most) of the ASCII characters that show up in (most)
            Western languages in the most common character set.
            In fact, I'll be happy if it can handle my only other
            truly working language, Portuguese. Such a function
            is weak, but it's stronger (and slower) than isalpha().

            Jack

            MvTools, a Miva/HTML editor -- FREE, at
            http://papercitysoftware.com

            Comment


              #7
              isalpha() and isdigit()



              From: R. Jackson Wilson


              > Ivo: I am painfully aware of the complexity of the
              > character-set problem. My starting point was that
              > isalpha() is not adequate for the languages I happen
              > to be able to recognize (though not read).

              Did you check the one-liner I posted?

              isalpha(glosub_array(string, g.iso8859_1, g.ascii_map))

              Well, I know, it works only in v4, but maybe you can wait few days more
              :-)

              Additionally, in this way it is very easy to created multiple mappings -
              instead of using simple arrays (g.iso8859_1, g.ascii_map) as above, you
              create two dimensional arrays containing mappings of several character
              sets.

              I bet that this code is much faster than any (even compiled) MvWHILE
              loop.

              Ivo
              http://mivo.truxoft.com


              Comment


                #8
                isalpha() and isdigit()



                From: R. Jackson Wilson

                [SNIP]
                > made that would handle in a "relatively" acceptable time
                > (most) of the ASCII characters that show up in (most)
                > Western languages in the most common character set.

                This is exactly the problem, the character sets are very much platform
                depending. You do much better when forcing the client to use the
                character you choose. I recommend the ISO-8859-1 (Latin I). Simply put
                the following line into your HTML header:

                <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

                ISO-8859-1 is today probably the most compatible charset - all modern
                browsers, on different OS and in different language versions can usually
                display it without problems. I strongly discourage from using Windows
                charsets. Even if Windows are most common OS, there are far too many
                incompatibilities among different versions and language versions of
                Windows.

                Ivo
                http://mivo.truxoft.com




                Comment


                  #9
                  isalpha() and isdigit()



                  -----BEGIN PGP SIGNED MESSAGE-----
                  Hash: SHA1

                  On Thu, 6 Jun 2002 03:57:39 +0200, Ivo Truxa wrote:

                  >From: R. Jackson Wilson
                  >
                  >[SNIP]
                  >> made that would handle in a "relatively" acceptable time
                  >> (most) of the ASCII characters that show up in (most)
                  >> Western languages in the most common character set.
                  >
                  >This is exactly the problem, the character sets are very much platform
                  >depending. You do much better when forcing the client to use the
                  >character you choose. I recommend the ISO-8859-1 (Latin I). Simply put
                  >the following line into your HTML header:
                  >
                  ><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
                  >
                  >ISO-8859-1 is today probably the most compatible charset - all modern
                  >browsers, on different OS and in different language versions can usually
                  >display it without problems. I strongly discourage from using Windows
                  >charsets. Even if Windows are most common OS, there are far too many
                  >incompatibilities among different versions and language versions of
                  >Windows.
                  >
                  To become properly Unicode-aware, Miva would need the ability to
                  rcognize an unicode character entity such as &####; (#= hex digit) as a
                  single character, including use as a token delimiter in gettoken etc.

                  [ dramatic design --- for web | print | theatre ]
                  [ po box 3263 | christchurch 8015 | new zealand ]
                  [ tel +64-3-360-2200 | cel +64-27-221-5053 ]
                  [ fax +64-3-360-2201 | e [email protected] ]
                  [ web <A HREF ="http://www.dramatic.co.nz ]">http://www.dramatic.co.nz ]</A>

                  -----BEGIN PGP SIGNATURE-----
                  Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

                  iQA/AwUBPP5NHLX0f0pNMuZlEQIivwCgqhyQz61zxGmsC6yH9zFmZu kygAIAn23x
                  JzL3f4E9haOhdoPw9KaEOXkj
                  =qsiR
                  -----END PGP SIGNATURE-----



                  Comment


                    #10
                    isalpha() and isdigit()



                    Hi Jack,

                    Jack wrote:
                    <snip>
                    and 2) since isalpha treats any string with a space in it
                    as non-alpha, the real-world use of an IsAlphaBeta() function
                    would probably be on reasonably short strings of characters.
                    <snip to end>

                    I hadn't considered the issue with spaces and isalpha().

                    Spaces can sure be a pain in the but related to Miva script.

                    I'm sure you remember our many tests, trials and tribulations
                    all the way back to htmlscript days with spaces as relates to
                    zeros and null and just recently I found out that index
                    expressions that have values that end in a space cause
                    problems.

                    But to be fair to isalpha() I guess it is reasonable for it to say
                    that a space is not an alpha character.

                    Darn spaces :)

                    -Jeff Huber

                    P.S. I think it is nice to be talking again as well.
                    jh




                    Comment


                      #11
                      isalpha() and isdigit()



                      From: Dramatic Design [mailto:[email protected]]

                      > To become properly Unicode-aware, Miva would need the ability to
                      > rcognize an unicode character entity such as &####; (#= hex digit) as
                      a
                      > single character, including use as a token delimiter in gettoken etc.

                      That's true, but the mentioned characters sets are not Unicode. Handling
                      Unicode means quite a lot of overhead and today you'll still not find a
                      lot of applications that are Unicode aware.

                      Time for everybody to learn Czech - it is enough with all these odd
                      character sets and languages like English and others; lets make them
                      illegal.

                      Ivo
                      http://mivo.truxoft.com





                      Comment


                        #12
                        isalpha() and isdigit()



                        Jeff said:

                        > I'm sure you remember our many tests, trials and tribulations=20
                        > all the way back to htmlscript days with spaces as relates to=20
                        > zeros and null

                        I'll never forget those wonderful nights and days. I
                        have an archive squirreled away someplace that contains
                        more than a hundred exchanges during just a few days.

                        The problem we discovered was that htmlscript was
                        (for the purposes of the EQ operator) treating
                        spaces, nulls, and zeroes as equal. Somewhere along
                        the way, they fixed the worst aspect of the problem,
                        and now spaces are OK. Zero and null are still equal.

                        Once was:

                        Space EQ 0 -> 1 ;aargh!
                        Space EQ '' -> 1 ;aargh!
                        Space EQ ' ' -> 1 ;ok, obviously

                        '' EQ 0 -> 1 ;has to be watched
                        '' EQ '' -> 1 ;ok, obviously
                        '' EQ ' ' -> 1 ;aargh!

                        0 EQ 0 -> 1 ;obviously
                        0 EQ '' -> 1 ;has to be watched
                        0 EQ ' ' -> 0 ;aargh!

                        The current situation is:

                        Space EQ 0 -> 0 ;all good
                        Space EQ '' -> 0
                        Space EQ ' ' -> 1

                        '' EQ 0 -> 1 ;has to be watched
                        '' EQ '' -> 1 ;good
                        '' EQ ' ' -> 0 ;good

                        0 EQ 0 -> 1 ;good
                        0 EQ '' -> 1 ;has to be watched
                        0 EQ ' ' -> 0 ;good

                        As for isalpha(), I should perhaps explain that my
                        interest in its shortcomings grows out of an effort to
                        reproduce FMT in a FMT(String, Pattern) function. I just
                        want the function to do a better job than FMT
                        when it deals with certain non-standard-English
                        characters.

                        For example, '=E9l=E8ve' FMT '0a' returns null.
                        My FMT('=E9l=E8ve', '0a') returns =E9l=E8ve.

                        That's enough to please me. What I want to end up
                        with is a compiled, stand-alone function, FMT(String, Pattern),
                        that will do everything that FMT does ... plus a bit
                        more. I've introduced two new limiters, '<<' and '>>',
                        that can be used with patterns. The '<<' limiter says,
                        make a string that includes ONLY the specified character
                        types. So FMT('ABCdefG', '<<[A]') would yield
                        'ABCG'. FMT('ABC(d)/ef', '<<[aS]') would yield
                        '(d)/ef', and so on. The '>>' limiter works in the reverse
                        way. FMT('ABCdefG', '>>[A]') will return 'def', and
                        FMT('ABC(d)/ef', '>>[Sa]') will return 'ABC'.
                        I always missed such limiters in the old FMT. They
                        are essentially strippers. '<<[pattern]' says "strip
                        everything BUT characters of the type given in the
                        pattern." '>>' says "strip all characters of the
                        type given in the pattern." (C++ coders will recognize
                        the '<<' and '>>' as mimicking set operators that add
                        to or remove elements from a set.)

                        I can't attach FMT.mvc, I guess. But I am yearning for
                        people to test it and show me where it breaks. So if
                        anyone is interested, please let me know and I'll ship
                        you the function.

                        Ivo said:

                        > Time for everybody to learn Czech - it is enough with all these odd
                        > character sets and languages like English and others; lets make them
                        > illegal.

                        No doubt a fine idea. But I suspect it won't fly, at
                        least in the short run. It does prompt me to ramble,
                        however.

                        <MvRAMBLE>

                        It is interesting to think about why certain
                        languages achieve predominance at certain
                        junctures -- Latin at a certain point in
                        European history, or English in today's
                        world. It's a kind of joke, and a revealing
                        joke, on the idea of the "survival of the fittest."

                        There was nothing "fitter" about Latin that might
                        have enabled it to win a competitive struggle with
                        other languages in the Mediterranean and West-
                        European world. Similarly, there is nothing about
                        English that makes it better suited to be a
                        "world" language than, say, Czech. Czech and English
                        probably do their linguistic jobs equally well (or
                        equally poorly). It's obviously a question
                        of money and power. English is a "fitter" language
                        than Czech for the simple reason that native speakers
                        of English have more money and power than native speakers
                        of Czech. (_Some_ native speakers of English. I am one
                        of them, but I have neither money nor power ... though
                        I guess by world standards, I have a lot of both.)

                        But consider the basic idea of the "survival of the
                        fittest." What was the meaning of "fittest"? Most
                        likely to survive. So what was the meaning of the
                        "survival of the fittest"? The survival of those most
                        likely to survive. To his great credit, Darwin understood
                        that this was not an explanatory principle, but a mere
                        trivial description: survival is a selective process.

                        Back to language. What determines whether a language
                        survives and prospers has nothing to do with features
                        inherent to the language, but instead with whether the
                        people who use the language survive and prosper. As for
                        English, its time has come because the time of its native
                        users has come -- for better or worse. As for Czech, only
                        time will tell -- for better or worse.
                        </MvRAMBLE>

                        Jack

                        --=20
                        MvTools, a Miva/HTML editor -- FREE, at
                        http://papercitysoftware.com

                        Comment


                          #13
                          isalpha() and isdigit()




                          Jack Said:

                          >I'll never forget those wonderful nights and days. I
                          >have an archive squirreled away someplace that contains
                          >more than a hundred exchanges during just a few days.

                          Ah the good old days when one could diable late into the
                          night with nary a care. :)

                          Jack also said:
                          >As for isalpha(), I should perhaps explain that my
                          >interest in its shortcomings grows out of an effort to
                          >reproduce FMT in a FMT(String, Pattern) function.

                          Jack I remember your fondness for FMT over these years
                          and think you are handling its demise quite gracefully. I
                          always liked FMT well enough, but really longed for, and
                          am still holding out some small hope for, regular expressions
                          to be supported someday in some manner as a core part of
                          Miva script. That would be wonderful.

                          -Jeff Huber




                          Comment

                          Working...
                          X