REBOL 3.0

Comments on: Solving the DIR? problem.

Carl Sassenrath, CTO
REBOL Technologies
20-Aug-2009 18:22 GMT

Article #0238
Main page || Index || Prior Article [0237] || Next Article [0239] || 16 Comments || Send feedback

I need your ideas on how best to solve the DIR? problem (bug #602).

The problem simply stated: if you specify a file name (or path) how do we determine if it is a directory.

The current algorithm is:

  1. Check if the name ends with "/" - if so, it refers to a directory.
  2. If it does not end with "/", check if the file exists, and if so, determine if it is a file or directory.

So, the algorithm mixes both the lexical check of the filename with the actual check of the local file.

There are two main problems with this:

  1. If the file name ends with "/", we just assume that it is a directory reference. However, locally a file of that name (without the /) may be present.
  2. It would be nice if the mechanism worked for URL's as well. So, when processing something like an HTTP request, we can use DIR? along the way. This is problematic if we check local storage, which in many cases is unrelated to the URL itself.

Although it would be nice to believe that there is a simple solution here, the compound behavior of the current DIR? function might cause more confusion than it's worth.

However, it would be nice to have a way to easily check file and URL paths. I often write:

if #"/" = last file-path [...]

and I'd prefer to write something a lot more clear, perhaps:

if dir-name? file-path [...]

or:

if is-dir? file-path [...]

Please let me know your comments soon. I realize it's still summer vacation, but dust that sand off your keyboard and post a comment.

16 Comments

Comments:

Ben
20-Aug-2009 16:22:59
I suppose I can start:

Seems like four different checks are being performed and its not currently clear which one.

1,2. string-check: does the {%file/path/string} refer to a file or directory? i.e. what did the programmer/user mean?

3,4. physical-device-check: does the {%file/path/string} exist on the physical device as a directory or file?

RobertS
20-Aug-2009 17:56:11
The URI case is one thing, but even actual URL's can be so odd in practice: for example at www.cl1p.net web clipboarding service if you go to www.clip1.net/rebol and then enter the address www.clip1.net/rebol/dir/ and save a note, you will get the same behavior as if you had instead entered www.clip1.net/rebol/dir and subsequent navigation to www.clip1.net/rebol/dir/ gives the same result as www.clip1.net/rebol/dir which is just that content - yet it is easy to imagine that behavior changing at python-based www.cl1p.COM when it comes online. So it seems a challenge ...
RobertS
20-Aug-2009 17:59:50
excuse my typos above pls: those URLs above should all be cl1p with a "one" digit and not an #"i" as in cl1p.net

Should we check if the URL is in a scheme such as FTP where a closing slash after domain must be meaningful?

Sunanda
20-Aug-2009 18:06:06
A existing R3 change over R2 is that EXISTS? returns [DIR FILE NONE] rather than [TRUE FALSE]

exists? %r3-a78.exe
== file

exists? %. == dir

exists? %no-such-file.txt == none

That covers the case for actual, existing files. And suggests to me that DIR? can be relegated to a simple "does the name appear to be the correct template for a folder name?" ie end in a "/"

URLs are more problematical as they are assumed to be hierarchically marked by "/" but that does not need to map to any file system. Best avoid the issue and not permit them in DIR?

Brian Hawley
21-Aug-2009 0:17:14
I like the algorithm (different from yours):
  1. If the string is of the file! or url! type, check if the file exists, and if so, determine if it is a file or directory.
  2. If the file doesn't exist, or the string type doesn't refer to a file, then check if the name ends with "/" - if so, it refers to a directory.

We can already tell if a string has a / on its end, and we can tell whether a file exists (barring bug#606). What we don't have (other than DIR?) is an efficient way to do both.

Endo
21-Aug-2009 3:09:43
I prefer to split them,

is-dir? and is-file? check the syntax only (end with /), file-exists? and dir-exists? (or simply exists?) check for existence.

It is not clear when Dir? returns true means it is exists or not. And it is different than file? because file? checks the datatype not the existence, but there is no dir! datatype.

As Sunanda says Exists? covers all of this. So Dir? can be alias for Exists?

-pekr-
21-Aug-2009 6:56:46
As for the blog content, I don't understand why funcs like 'is-dir? would be needed? Respectively - why not simply 'dir? 'file? ... and eventual refinements. I am also not sure we necessarily have to mix dirs/files with URLs. Not the same concept, imo ...
Kaj
21-Aug-2009 9:57:21
DIR? works for FTP in R2. I would really like to keep that functionality.
Ratio
21-Aug-2009 14:57:33
From a users point of view

1. Files are not directories and directories not files. They are totally different things.

2. Syntax (slash or not) is an internal issue. The "system" should cut or append slashes, just as needed. So users are not forced to append/omit slashes. What users want can be easyly checked and corrected by the functions they call (using dirize or "filize" respectively).

3. Functions for files and dirs must have different names! Except the func exists? this seems the case in REBOL - but 'file? returns .t. also for directories (while 'dir? works on dirs only). 'Exists? returns .t. on files AND dirs if they exist. This situation should be considered as buggy.

In other languages I had similar problems. So I had to write special functions to avoid them. In REBOL we have the chance to avoid this at the root.

Proposal:

1. Introduce, as Carl is suggesting, new funcs is-dir? and is-file?

to check existence of dirs or files exclusively. Regardless "slash syntax"; possibly we can even forget the %-type ;-) )

2. Functions dir? and file? should be changed so that they perform a check of syntax/type only. In the moment dir? checks existence of dirs, file? checks syntax without making a difference between files and dirs; this is not only confusing, it might be dangerous.

3. Return values should in all cases be TRUE/FALSE; only in case of (access) errors NONE or even better the OS errorcode.

4. The func exists? mixes files and dirs, is superseded by (1) and could be marked as obsolete.

btw: in Win32 all what has a last "\" is a "path". Directories just like files have no slash at the end (exception root: C:\ or \). In REBOL this seems to be the opposite, slash for dirs.

URLs might be just another path/directory names ? Else their also should be a special set of url-functions.

Ratio

Maxim Olivier-Adlhoch
21-Aug-2009 16:50:59
In EVERY application which does file management (And I have done many with remark engine as the basis) the current hybrid functionality of 'DIR? is USELESS and painfull.

how it should be:

dir-path?: func [file-path [file! url! string!]][
    #"/" = last file-path
]

file-path?: func [file-path [file! url! string!]][ not is-dir? file-path ]

note that we can use string here too, since many times the source dir info isn't in rebol data format, but taken from an input field or user data in OS format.

and with the newer (BETTER) return of 'EXISTS? we should trash the 'DIR? func entirely, it twice half of what you need (its thus only .25 really usefull ;-).

Ratio
22-Aug-2009 20:58:52
Directories, files, checks of syntax and existences may never be mixed!

For example these existence checks I'am using work as users normally expect.

Copy to console and check, see the comments inside:

; ------- user functions ----------

is-dir?: make function! [

path "Accepts almost all, string, word, url, number..."

][ NEW-exists? path ]

is-file?: make function! [

path "Accepts almost all, string, word, url, number..."

][ NEW-exists?/file path ]

;--------- main but more internal: 2 helper functions ------------

NEW-exists?: make function! [

[catch]

path "almost any-type due to 'to-reb-syntax below"

/file /local info check ][ path: either file [to-reb-syntax/file path] [to-reb-syntax path] ;<<----

check: either file ['file] ['directory]

info: throw-on-error [info? path]

either none? info [false] [info/type = check] ]

;---- converts almost any input to rebol ------

to-reb-syntax: make function! [

"Converts almost all to rebol dir or (option) file"

path /file ][ path: to-rebol-file to string! path ; <<-----

either file [if #"/" = last path [path: head clear back tail path]] ; filize

[if #"/" <> last path [path: join path #"/"]] ; dirize

path ]

No syntax checks.

For more flexibility we have gentle, user-friendly adjustements only ('to-reb-syntax).

Comments/improvements welcome (I'm an experienced programmer but new to REBOL).

Ratio

Ratio
22-Aug-2009 21:13:25
...and here a proposal for syntax checks. Never to be mixed with existence checks!

Main issue is 'OS-illegal?

; --------- syntax checks ---------------

valid-file-name?: make function! [

"File syntax check"

path [file! string! url!] ][ if #"/" = last path [return false]

not OS-illegal? path ]

valid-dir-name?: make function! [

"Dir syntax check"

path [file! string! url!] ][ if not #"/" = last path [return false]

not OS-illegal? path ]

; ---------------------------

OS-illegal?: make function! [

"Seek illegal characters within path, returns true if illegals found"

path /local illegal collect result ch ][ unless 3 = fourth system/version ; windows, others I do not know

[ return false ] illegal: "<>|+æ=" ; string should be checked and available in REBOL system collect: copy ""

foreach ch illegal [ if find path ch [ collect: join collect ch ]]

result: either empty? collect [false][true]

if result

[alert rejoin ["Sorry! ^/" path " contains illegal characters: " collect ] ]

result ]

Very sorry for bad display of the code.

Ratio

Ashley
24-Aug-2009 1:16:39
Would subtypes based on syntax alone be the answer?

>> type? %files
== file!
>> type? %files/
== file-dir!
>> type? url
== url!
>> type? url/
== url-dir!
>> to file-dir! %a
== %a/

This wouldn't be correct in 100% of cases, but would probably satisfy the majority of usage cases (much like email!, and the comma for number! types).

Ben
24-Aug-2009 12:16:34
or perhaps more general in nature:

target %file/path/object

target? file/path/object
>> URL | FILE | DIR | none | ??

target-syntax? {%file/path/object}
>> URL | FILE | DIR | none | ??

Ratio
24-Aug-2009 13:53:08
I must be silly, but I can't understand that even in REBOL we have the "common" problems regarding files and dirs.

What users want to address (file or dir) is in a good system clearly determined by the functions they call and therefore should never depend on last slashes users append or omit (allows btw a user friendly behavior by handling the crucial "slash problem" internally).

...if, yes IF the functions were strictly separated into file and directory functions, at least via options (working example proposed above - wildcard handling should be added).

'Make-dir and 'write (a file) are different things even in REBOL, aren't they ?

Same applies to reading, and all other topics (existence checks, renamings etc.)

If anybody knows an exception I would be happy to learn about. ;-)

Hi Ashley, got my mail? Asked you a question about rebGUI. Today keyboard handling is an orphaned issue (also in REBOL) which imho needs more attention. Advanced users prefer hotkeys for almost everything.

Ratio

Anton Rolls
27-Aug-2009 9:41:10
I think mixing a filesystem check with a last slash check just makes things complex and confusing. I've been using this:

if #"/" = last file-path [...]

...and I will continue to do so, because its meaning is clear and likely to continue working on R2 code and R3 code into the future. All those other functions come and go.

I typically avoid the filesystem check on local files because of performance. But when I need it, I want it to do like R3 EXISTS?, not confusingly mixing in any last slash checking.

So my point of view is very close to Sunanda's, except I might suggest just removing the DIR? function altogether - less is more.

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0238


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 19-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net