Announcement

#2 · 06-05-02, 12:37 PM

isalpha() and isdigit()

Jack,

the big problem I see with your functions is that they only work
on a single character whereas isdigit() and isalpha() work on the
entire contents of a variable or literal.

Example:

<MvEVAL EXPR =3D

isdigit('13579')
$ isdigit('13f79')
$ isalpha('aeiou')
$ isalpha('ae5ou')
}">

Returns 1010 AKA True, False, True, False.

But with your functions in the test script the translated code of:

<MvEVAL EXPR =3D

isadigit('13579')
$ isadigit('13f79')
$ isalphabeta('aeiou')
$ isalphabeta('ae5ou')
}">

Returns 0000 AKA False, False, False, False.

It is obvious to me from the parameter names you chose that
you intended your function to work on a single character, but as
such is it an inadequate replacement for the built-in functions.

To do what they do and what you want at the same time, you
would have to substring() through the variable passed to the
function and then do your check again in a loop. This would be
a lot slower naturally.

Of course with the VM you could add an external function but I
will restrict my comments to pure Miva script.

Hmmnnn.

I think it is safe to say that to accomplish what you want on
small strings using your functions rewritten to loop through the
string would be slower than isalpha() and isdigit() but not so
horribly slower that they are unusable. For really large
strings we are safe assuming that they would be really slow but I
wonder how fast or slow isalpha() and isdigit() are on really
long strings.

I also wonder if a loop that uses multiple calls to glosub() to
remove the acceptable list of characters from the string and then
checks to see if anything is left would be faster on really large
strings.

I then wonder if a bit of research could determine what the
average length of the string would need to be to make it more
efficient to use the later method than the former.

If any of that proves true or of value you could do something
that I will express in pseudo code.

l.threshold =3D length that is so long we think glosubs would be =
faster.
l.len =3D len( l.string)

if l.len lt 1
It has no length thus can't be
valid alpha, digit or whatever.
return 0
/if

if l.len eq 1
Do your in check and return accordingly.
/if

if l.len gt 1 and lt l.threshold
loop through the string doing your
check and if ever it proves false
then return false and if you get to
the end without a false return true.
/if

if l.len ge threshold
loop through list of valid characters
glosub() them out of the string
/loop
if new length is gt 0
return 0
else
return 1
/if

Now the big question. is all the code to do the checks going to
slow things down so much that the function becomes to slow to
even do short strings?

I guess the answer depends on how much you really need that
new functionality dealing with higher ASCII values.

-Jeff Huber

----- Original Message -----
From: R. Jackson Wilson <[email protected]>
To: Miva Users Forum <[email protected]>
Sent: Wednesday, June 05, 2002 10:43 AM
Subject: [meu] isalpha() and isdigit()

When it is asked to process characters
ASCII 127 (characters that include what
Americans are pleased to call "foreign"
characters), it goofs badly ... though
I believe it has improved since it was
first released.

Here's what it produces, above ASCII
127. Characters within []s should
not be there. Characters within
][s should be there and are not.

[f florin]
[" opening curly quote]
=C1=C3
]=C0=C2=C4=C5=C6[
=C7
=C8=C9=CB
]=CAO[
=CD
]=CC=CE=CF[
=D1
=D3
]=D2=D4=D5=D6=D8[
[=D7 multiplication sign]
]S[
=D9=DB
]=DA=DC[
=DD
]Y[
[=DF beta]
=E1=E2=E3=E5
]=E0=E4=E6[
=E7
=E9=EB
]=E8=EAo[
=EF
]=EC=ED=EE[
=F1
=F3=F5
]=F3=F4=F6=F8[
]s[
[=F7 addition sign]
=F9=FB=FC
]=FA[
=FD
]=FF[
[=FE ASCII 222, 'thorn' ... I think]
]Zz[ ; cap and lc Z with inverted circumflex
; they print in MSWord and browsers, but
; not in MvTools, Notepad,

So here's a more reliable function. It runs just a
hair slower that the built-in isalpha(). On my
machine, for 30,000 iterations, the function body
needs 7.0 (+/- 1) seconds; isalpha() needs
6.5 (+/- 1) seconds.

<MvFUNCTION NAME =3D "isalphabeta" PARAMETERS =3D "Char">
<MvASSIGN NAME =3D "l.Allowed" VALUE =3D
"{'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu vwxyz=C1=C3=C0=C2=C4=C5=
=C6=C7=C8=C9=CB=CAO=CD=CC=CE=CF=D1=D3=D2=D4
=D5=D6=D8S=D9=DB=DA=DC=DDY=E1=E2=E3=E5=E0=E4=E6=E7 =E9=EB=E8=EAo=EF=EC=ED=
=EE=F1=F3=F5=F3=F4=F6=F8s=F9=FB=FC=FA=FD=FFZz'}">
<MvFUNCRETURN VALUE =3D "{l.Char IN l.Allowed}">
</MvFUNCTION>

On the other hand, Miva's isdigit() returns true for
the following, in addition to 01234567890:

ASCII: 132 =3D "
ASCII: 148 =3D "
ASCII: 195 =3D =C3
ASCII: 197 =3D =C5
ASCII: 199 =3D =C7
ASCII: 200 =3D =C8
ASCII: 201 =3D =C9
ASCII: 209 =3D =D1
ASCII: 211 =3D =D3
ASCII: 213 =3D =D5
ASCII: 221 =3D =DD
ASCII: 223 =3D =DF
ASCII: 225 =3D =E1
ASCII: 226 =3D =E2
ASCII: 227 =3D =E3
ASCII: 233 =3D =E9
ASCII: 235 =3D =EB
ASCII: 237 =3D =ED
ASCII: 245 =3D =F5
ASCII: 247 =3D =F7
ASCII: 249 =3D =F9

So, unless you want to invent a
base-48 numbering system, and need some
of these characters for your notation,
you're better off with:

<MvFUNCTION NAME =3D "IsADigit" PARAMETERS =3D "Char">
<MvASSIGN NAME =3D "l.Allowed" VALUE =3D "{'0123456789'}">
<MvFUNCRETURN VALUE =3D "{l.Char IN l.Allowed}">
</MvFUNCTION>

The running times are about the same as for
the alpha functions, despite the much short
string assignment is IsADigit().

Jack

--
MvTools, a Miva/HTML editor -- FREE, at
http://papercitysoftware.com

#3 · 06-05-02, 12:50 PM

isalpha() and isdigit()

Hi, Jeff:

As you correctly guessed, I meant the functions
as a failsafe for folks who do a lot of work
in "foreign" -- i.e., Latin -- alphabets.

I obviously thought about doing whole strings,
but substring() is so bloody slow (in all
programming languages, I think) that I was
discouraged.

I didn't try the VM yet, to see whether it would
make the parsing loop acceptably fast.

Maybe I will. I'm in fiddlin' mode.

Jack

--
MvTools, a Miva/HTML editor -- FREE, at
http://papercitysoftware.com

#4 · 06-05-02, 01:32 PM

isalpha() and isdigit()

There is another problem - foreign languages rarely use the same
character set. So for example only for the Czech language I know about =
a
dozen of different character sets, where the most popular are the
Windows-1250, CP852, ISO-8859-2, UTF-8, Kamenicky, MacOS-CentralEurope,
KOI8-CS, ICL, East8, Cork and partially also CP437 or ISO-8859-1.

The same situation is not only at many other eastern and western
languages, but also at English. Different OS, different browsers and
their different versions use different coding - beginning by plain
ASCII, CP437, CP850, ISO-8859-1 (Latin I) to UTF-8 and many OS =
dependent
odds.

So additionally to parsing the strings you would need to define the
right mapping for the character set as you have set it in the HTML
header with a META tag.

As for the looping through the string, I think that a "relatively" =
quick
solution would be using the new glosub_array() function. Something in
this way:

g.iso8859_1[] =3D =
'=C1=C3=C0=C2=C4=C5=C6=C7=C8=C9=CB=CA=CD=CC=CE=CF. ..=E1=E3=E0...'=20
g.ascii_map[] =3D 'AAAAAAACEEEEIIII...aaa...'

and then

isalpha(glosub_array(string, g.iso8859_1, g.ascii_map))
or
isdigit(glosub_array(string, g.iso8859_1, g.ascii_map))=20

No idea if it is fast enough to use extensively, but probably faster
than substring().

Ivo
http://mivo.truxoft.com=20

-----Original Message-----
From: R. Jackson Wilson
=20
Hi, Jeff:

As you correctly guessed, I meant the functions
as a failsafe for folks who do a lot of work
in "foreign" -- i.e., Latin -- alphabets.

I obviously thought about doing whole strings,
but substring() is so bloody slow (in all
programming languages, I think) that I was
discouraged.

I didn't try the VM yet, to see whether it would
make the parsing loop acceptably fast.

Maybe I will. I'm in fiddlin' mode.

Jack

--=20
MvTools, a Miva/HTML editor -- FREE, at
http://papercitysoftware.com

#5 · 06-05-02, 02:02 PM

isalpha() and isdigit()

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 05 Jun 2002 16:50:58 -0400, R. Jackson Wilson wrote:

>Hi, Jeff:
>
>As you correctly guessed, I meant the functions
>as a failsafe for folks who do a lot of work
>in "foreign" -- i.e., Latin -- alphabets.
>
>I obviously thought about doing whole strings,
>but substring() is so bloody slow (in all
>programming languages, I think) that I was
>discouraged.
>
>I didn't try the VM yet, to see whether it would
>make the parsing loop acceptably fast.
>
Jack, I just ran isalpha through my compiler speed testbed
(It was one of my trial functions, but I hadn't tried it on the
finished product yet.)

The results: (<MvASSIGN NAME="result" VALUE="{isalpha(g.string}">
String length Time
1 0.73
6 1.12
25 2.17
100 6.80
500 30.00

So as expected, the function speed is pretty well linear to the
length of its argument. The numbers roughly equal time for a single
function call on my box (Athlon 800) in microseconds.
As I mentioned the other day, forget to add the scope to the argument and
you lose bigtime. The speed for a single-character check blew out to 4.5 when
I removed the scope from the argument.
And the function is "smart". When I replaced an all-alpha 500 char string with
one where the second character was a digit, the time dropped to 1.08.

[ dramatic design --- for web | print | theatre ]
[ po box 3263 | christchurch 8015 | new zealand ]
[ tel +64-3-360-2200 | cel +64-27-221-5053 ]
[ fax +64-3-360-2201 | e [email protected] ]
[ web <A HREF ="http://www.dramatic.co.nz ]">http://www.dramatic.co.nz ]</A>

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBPP3hqLX0f0pNMuZlEQLdPACfbiEtfieuueVjzb/N012vLnbAJ6YAoPHn
H+X5CW+7gvpmmTnGpjfpKLoq
=YPyv
-----END PGP SIGNATURE-----

#6 · 06-05-02, 05:23 PM

isalpha() and isdigit()

Hi, Jeff:

Nice to be talking again!

> the big problem I see with your functions is that they only work
> on a single character whereas isdigit() and isalpha() work on the
> entire contents of a variable or literal.

And that, of course, depends on who and where you are.
If the strings you are dealing with are all directly
typeable from a standard keyboard, then isalpha is fine.

If, on the other hand, you deal with a lot of strings
that have "foreign" characters, then isalpha() is pretty
worthless.

Anyhow, just to give some specific shape to the issue, I
ran up a compiled version that deals with strings rather
than single characters.

I did it in a straightahead sort of way for a couple of
reasons 1) even MY penchant for fiddlin' has its limits
and 2) since isalpha treats any string with a space in it
as non-alpha, the real-world use of an IsAlphaBeta() function
would probably be on reasonably short strings of characters.

Anyway, compiled it's not a disaster: roughly one ms to deal with
a string of 64 characters whose first member is a digit. (For
a couple of reasons, the function works on the string back-to-
front.)

I guess where I'd come down is this: if you have process strings
that may well contain "foreign" characters, and you need to know
if a string is an alphabeta thing, then a function whose body looks
roughly like this:

<MvASSIGN NAME =3D "l.Allowed" VALUE =3D
"{'=C1=C3=C0=C2=C4=C5=C6=C7=C8=C9=CB=CA=8C=CD=CC=C E=CF=D1=D3=D2=D4=D5=D6=
=D8=8A=D9=DB=DA=DC=DD=9F=E1=E2=E3=E5=E0=E4=E6=E7=E 9=EB=E8=EA=9C=EF=EC=ED=
=EE=F1=F3=F5=F3=F4=F6=F8=9A=F9=FB=FC=FA=FD=FF=8E=9 E'}">
<MvASSIGN NAME =3D "l.Length" VALUE =3D "{len(l.String)}">
<MvASSIGN NAME =3D "l.ctr" VALUE =3D "{1.Length}">
<MvASSIGN NAME =3D "l.Result" VALUE =3D "{1}">
<MvWHILE EXPR =3D "{l.ctr}}">
<MvASSIGN NAME =3D "l.Char" VALUE =3D "{substring(l.String, l.ctr, =
1)}">
<MvIF EXPR =3D "{NOT((l.Char GE 'a' AND l.Char LE 'z') OR (l.Char =
GE 'A' AND l.Char LE
'Z'))}">
<MvIF EXPR =3D "{NOT(l.Char IN l.Allowed)}">
<MvFUNCRETURN VALUE =3D "{0}">
<MvWHILESTOP>
</MvIF>
</MvIF>
<MvASSIGN NAME =3D "l.ctr" VALUE =3D "{l.ctr - 1}">
</MvWHILE>
<MvFUNCRETURN VALUE =3D "{l.Result}">

could be useful. But it will be a bit slow uncompiled,
roughly .03 seconds on a 64-char string, if it returns
true.

Ivo: I am painfully aware of the complexity of the
character-set problem. My starting point was that
isalpha() is not adequate for the languages I happen
to be able to recognize (though not read). So my
thought was to see if an alternate function could be
made that would handle in a "relatively" acceptable time
(most) of the ASCII characters that show up in (most)
Western languages in the most common character set.
In fact, I'll be happy if it can handle my only other
truly working language, Portuguese. Such a function
is weak, but it's stronger (and slower) than isalpha().

Jack

MvTools, a Miva/HTML editor -- FREE, at
http://papercitysoftware.com

#7 · 06-05-02, 05:34 PM

isalpha() and isdigit()

From: R. Jackson Wilson

> Ivo: I am painfully aware of the complexity of the
> character-set problem. My starting point was that
> isalpha() is not adequate for the languages I happen
> to be able to recognize (though not read).

Did you check the one-liner I posted?

isalpha(glosub_array(string, g.iso8859_1, g.ascii_map))

Well, I know, it works only in v4, but maybe you can wait few days more
:-)

Additionally, in this way it is very easy to created multiple mappings -
instead of using simple arrays (g.iso8859_1, g.ascii_map) as above, you
create two dimensional arrays containing mappings of several character
sets.

I bet that this code is much faster than any (even compiled) MvWHILE
loop.

Ivo
http://mivo.truxoft.com

#8 · 06-05-02, 05:57 PM

isalpha() and isdigit()

From: R. Jackson Wilson

[SNIP]
> made that would handle in a "relatively" acceptable time
> (most) of the ASCII characters that show up in (most)
> Western languages in the most common character set.

This is exactly the problem, the character sets are very much platform
depending. You do much better when forcing the client to use the
character you choose. I recommend the ISO-8859-1 (Latin I). Simply put
the following line into your HTML header:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

ISO-8859-1 is today probably the most compatible charset - all modern
browsers, on different OS and in different language versions can usually
display it without problems. I strongly discourage from using Windows
charsets. Even if Windows are most common OS, there are far too many
incompatibilities among different versions and language versions of
Windows.

Ivo
http://mivo.truxoft.com

#9 · 06-05-02, 09:40 PM

isalpha() and isdigit()

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 6 Jun 2002 03:57:39 +0200, Ivo Truxa wrote:

>From: R. Jackson Wilson
>
>[SNIP]
>> made that would handle in a "relatively" acceptable time
>> (most) of the ASCII characters that show up in (most)
>> Western languages in the most common character set.
>
>This is exactly the problem, the character sets are very much platform
>depending. You do much better when forcing the client to use the
>character you choose. I recommend the ISO-8859-1 (Latin I). Simply put
>the following line into your HTML header:
>
><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
>
>ISO-8859-1 is today probably the most compatible charset - all modern
>browsers, on different OS and in different language versions can usually
>display it without problems. I strongly discourage from using Windows
>charsets. Even if Windows are most common OS, there are far too many
>incompatibilities among different versions and language versions of
>Windows.
>
To become properly Unicode-aware, Miva would need the ability to
rcognize an unicode character entity such as &####; (#= hex digit) as a
single character, including use as a token delimiter in gettoken etc.

[ dramatic design --- for web | print | theatre ]
[ po box 3263 | christchurch 8015 | new zealand ]
[ tel +64-3-360-2200 | cel +64-27-221-5053 ]
[ fax +64-3-360-2201 | e [email protected] ]
[ web <A HREF ="http://www.dramatic.co.nz ]">http://www.dramatic.co.nz ]</A>

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBPP5NHLX0f0pNMuZlEQIivwCgqhyQz61zxGmsC6yH9zFmZu kygAIAn23x
JzL3f4E9haOhdoPw9KaEOXkj
=qsiR
-----END PGP SIGNATURE-----

#10 · 06-05-02, 11:51 PM

isalpha() and isdigit()

Hi Jack,

Jack wrote:
<snip>
and 2) since isalpha treats any string with a space in it
as non-alpha, the real-world use of an IsAlphaBeta() function
would probably be on reasonably short strings of characters.
<snip to end>

I hadn't considered the issue with spaces and isalpha().

Spaces can sure be a pain in the but related to Miva script.

I'm sure you remember our many tests, trials and tribulations
all the way back to htmlscript days with spaces as relates to
zeros and null and just recently I found out that index
expressions that have values that end in a space cause
problems.

But to be fair to isalpha() I guess it is reasonable for it to say
that a space is not an alpha character.

Darn spaces :)

-Jeff Huber

P.S. I think it is nice to be talking again as well.
jh

#11 · 06-06-02, 03:32 AM

isalpha() and isdigit()

From: Dramatic Design [mailto:[email protected]]

> To become properly Unicode-aware, Miva would need the ability to
> rcognize an unicode character entity such as &####; (#= hex digit) as
a
> single character, including use as a token delimiter in gettoken etc.

That's true, but the mentioned characters sets are not Unicode. Handling
Unicode means quite a lot of overhead and today you'll still not find a
lot of applications that are Unicode aware.

Time for everybody to learn Czech - it is enough with all these odd
character sets and languages like English and others; lets make them
illegal.

Ivo
http://mivo.truxoft.com

#12 · 06-06-02, 04:19 PM

isalpha() and isdigit()

Jeff said:

> I'm sure you remember our many tests, trials and tribulations=20
> all the way back to htmlscript days with spaces as relates to=20
> zeros and null

I'll never forget those wonderful nights and days. I
have an archive squirreled away someplace that contains
more than a hundred exchanges during just a few days.

The problem we discovered was that htmlscript was
(for the purposes of the EQ operator) treating
spaces, nulls, and zeroes as equal. Somewhere along
the way, they fixed the worst aspect of the problem,
and now spaces are OK. Zero and null are still equal.

Once was:

Space EQ 0 -> 1 ;aargh!
Space EQ '' -> 1 ;aargh!
Space EQ ' ' -> 1 ;ok, obviously

'' EQ 0 -> 1 ;has to be watched
'' EQ '' -> 1 ;ok, obviously
'' EQ ' ' -> 1 ;aargh!

0 EQ 0 -> 1 ;obviously
0 EQ '' -> 1 ;has to be watched
0 EQ ' ' -> 0 ;aargh!

The current situation is:

Space EQ 0 -> 0 ;all good
Space EQ '' -> 0
Space EQ ' ' -> 1

'' EQ 0 -> 1 ;has to be watched
'' EQ '' -> 1 ;good
'' EQ ' ' -> 0 ;good

0 EQ 0 -> 1 ;good
0 EQ '' -> 1 ;has to be watched
0 EQ ' ' -> 0 ;good

As for isalpha(), I should perhaps explain that my
interest in its shortcomings grows out of an effort to
reproduce FMT in a FMT(String, Pattern) function. I just
want the function to do a better job than FMT
when it deals with certain non-standard-English
characters.

For example, '=E9l=E8ve' FMT '0a' returns null.
My FMT('=E9l=E8ve', '0a') returns =E9l=E8ve.

That's enough to please me. What I want to end up
with is a compiled, stand-alone function, FMT(String, Pattern),
that will do everything that FMT does ... plus a bit
more. I've introduced two new limiters, '<<' and '>>',
that can be used with patterns. The '<<' limiter says,
make a string that includes ONLY the specified character
types. So FMT('ABCdefG', '<<[A]') would yield
'ABCG'. FMT('ABC(d)/ef', '<<[aS]') would yield
'(d)/ef', and so on. The '>>' limiter works in the reverse
way. FMT('ABCdefG', '>>[A]') will return 'def', and
FMT('ABC(d)/ef', '>>[Sa]') will return 'ABC'.
I always missed such limiters in the old FMT. They
are essentially strippers. '<<[pattern]' says "strip
everything BUT characters of the type given in the
pattern." '>>' says "strip all characters of the
type given in the pattern." (C++ coders will recognize
the '<<' and '>>' as mimicking set operators that add
to or remove elements from a set.)

I can't attach FMT.mvc, I guess. But I am yearning for
people to test it and show me where it breaks. So if
anyone is interested, please let me know and I'll ship
you the function.

Ivo said:

> Time for everybody to learn Czech - it is enough with all these odd
> character sets and languages like English and others; lets make them
> illegal.

No doubt a fine idea. But I suspect it won't fly, at
least in the short run. It does prompt me to ramble,
however.

<MvRAMBLE>

It is interesting to think about why certain
languages achieve predominance at certain
junctures -- Latin at a certain point in
European history, or English in today's
world. It's a kind of joke, and a revealing
joke, on the idea of the "survival of the fittest."

There was nothing "fitter" about Latin that might
have enabled it to win a competitive struggle with
other languages in the Mediterranean and West-
European world. Similarly, there is nothing about
English that makes it better suited to be a
"world" language than, say, Czech. Czech and English
probably do their linguistic jobs equally well (or
equally poorly). It's obviously a question
of money and power. English is a "fitter" language
than Czech for the simple reason that native speakers
of English have more money and power than native speakers
of Czech. (_Some_ native speakers of English. I am one
of them, but I have neither money nor power ... though
I guess by world standards, I have a lot of both.)

But consider the basic idea of the "survival of the
fittest." What was the meaning of "fittest"? Most
likely to survive. So what was the meaning of the
"survival of the fittest"? The survival of those most
likely to survive. To his great credit, Darwin understood
that this was not an explanatory principle, but a mere
trivial description: survival is a selective process.

Back to language. What determines whether a language
survives and prospers has nothing to do with features
inherent to the language, but instead with whether the
people who use the language survive and prosper. As for
English, its time has come because the time of its native
users has come -- for better or worse. As for Czech, only
time will tell -- for better or worse.
</MvRAMBLE>

Jack

--=20
MvTools, a Miva/HTML editor -- FREE, at
http://papercitysoftware.com

#13 · 06-06-02, 10:27 PM

isalpha() and isdigit()

Jack Said:

>I'll never forget those wonderful nights and days. I
>have an archive squirreled away someplace that contains
>more than a hundred exchanges during just a few days.

Ah the good old days when one could diable late into the
night with nary a care. :)

Jack also said:
>As for isalpha(), I should perhaps explain that my
>interest in its shortcomings grows out of an effort to
>reproduce FMT in a FMT(String, Pattern) function.

Jack I remember your fondness for FMT over these years
and think you are handling its demise quite gracefully. I
always liked FMT well enough, but really longed for, and
am still holding out some small hope for, regular expressions
to be supported someday in some manner as a core part of
Miva script. That would be wonderful.

-Jeff Huber

Announcement

isalpha() and isdigit()

isalpha() and isdigit()

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment