Re: [Scheme-reports] Is bytevector a token?

Re: [Scheme-reports] Is bytevector a token? Takashi Kato 29 Oct 2013 08:52 UTC
I think it's impossible to put it into <token> as long as <bytevector> contains ')'. (again correct me if I'm wrong.)

> On the other hand the rule <bytevector> is placed under section 7.1.1
> (Lexical structure), not under 7.1.2 (External representations). I
> think this is not related to the complexity of the lexer implementation.
Yes, you're right.

On Tuesday, 29 October 2013, 9:23, Yuichi Nishiwaki <yuichi.nishiwaki@gmail.com> wrote:
Hi,

> As my understanding (correct me if I'm wrong), <token> meant be the smallest unit that a lexer reads and <bytevector> is a compound datum that a parser needs to construct. Suppose your reader gets #u8(1 2 3) as its input, then first your lexer needs to return a token to let your parser know what type of datum needs to be constructed.

Even in that case, the lexer still is capable of scanning bytevectors,
though the implementation will be much more complex than otherwise.

> Now, first the lexer returns #u8( then the parser understands this is a bytevector. However if <bytevector> is in <token> then your lexer needs to read the input as a bytevector. Then inside of the bytevector it has a delimiter and token so lexer needs to read your input recursively. I'm not good with theory but seems something is wrong.

I'm really wondering which of '#u8(' and <bytevector> is a token. As
seeing the definition of <token> rule it reads '#u8(' is a token. On
the other hand the rule <bytevector> is placed under section 7.1.1
(Lexical structure), not under 7.1.2 (External representations). I
think this is not related to the complexity of the lexer
implementation.

-- Yuichi Nishiwaki

2013/10/29 Takashi Kato <ktakashi@ymail.com>:
> I think it's on purpose.
>
> As my understanding (correct me if I'm wrong), <token> meant be the smallest unit that a lexer reads and <bytevector> is a compound datum that a parser needs to construct. Suppose your reader gets #u8(1 2 3) as its input, then first your lexer needs to return a token to let your parser know what type of datum needs to be constructed. Now, first the lexer returns #u8( then the parser understands this is a bytevector. However if <bytevector> is in <token> then your lexer needs to read the input as a bytevector. Then inside of the bytevector it has a delimiter and token so lexer needs to read your input recursively. I'm not good with theory but seems something is wrong.
>
>
> Hope it can help you.
>
> _/_/
> Takashi Kato
> E-mail: ktakashi@ymail.com
>
>
>
>
> On Tuesday, 29 October 2013, 5:48, Yuichi Nishiwaki <yuichi.nishiwaki@gmail.com> wrote:
> Hi, all. I'm very excited to see the final R7RS draft published. Thank
> you all for the great work.
> Reading the final draft, I have one question about the formal syntax
> definition (7.1.1). <bytevector> is not listed in <token> line. Is it
> by purpose? Or just a missing?
>
> -- Yuichi Nishiwaki
>
> _______________________________________________
> Scheme-reports mailing list
> Scheme-reports@scheme-reports.org
> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports