From bhern at netscape.com Tue Oct 21 11:56:50 1997 From: bhern at netscape.com (Brian Hernacki) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Dealing with internationalization in NNTP Message-ID: <344CFAF2.C3B136CD@netscape.com> Now it seems that this conversation has been started a couple of times, but has never resolved anything. So I'm going to try again. I think it would be a mistake to expend all this effort to "bring NNTP up-to-date" and to not solidly deal with localization and internationalization issues. And I'm not convinced we have done that yet. A couple of issues I'd like to start with are: o the charset The current 977bis draft includes the CHARSET command to allow the client and server to negotiate a charset. A couple folks (including me) have posted as to why we think this is a bad idea. I would much rather see us defined something like UTF8 as the default charset used. While I heard from people who agreed with me on this, I didn't hear any objections. Is this OK? Do poeple think this would be a bad thing? If not should we change the draft? o a language specification/negotiation extension While having the charset well known is great, it still doesn't help the Japanese user who gets error messages popping up in English. What I'd like to see is something akin to IMAP's LANGUAGE extension. This allows client and server to negotiate a non-default language to use for things like error messages, newsgroup descriptions, etc. Thoughts? --brian From sob at academ.com Wed Oct 22 01:03:29 1997 From: sob at academ.com (Stan Barber) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Dealing with internationalization in NNTP Message-ID: <199710220603.BAA13034@academ.com> > Now it seems that this conversation has been started a couple of times, > but has never resolved anything. So I'm going to try again. I think it > would be a mistake to expend all this effort to "bring NNTP up-to-date" > and to not solidly deal with localization and internationalization > issues. And I'm not convinced we have done that yet. > > A couple of issues I'd like to start with are: > > o the charset > > The current 977bis draft includes the CHARSET command to allow the > client and server to negotiate a charset. A couple folks (including me) > have posted as to why we think this is a bad idea. I would much rather > see us defined something like UTF8 as the default charset used. While I > heard from people who agreed with me on this, I didn't hear any > objections. Is this OK? Do poeple think this would be a bad thing? If > not should we change the draft? I don't have any opinion, but I know we must address it to make the ADs happy. If someone would put together some specific diffs to the current document and post them here to deal with this, folks can comment on them and I can get them in the next draft. There was a gentleman in the usenet-format group that forwarded some text from one of their drafts that looked pretty good to me, but I don't recall seeing any comments on that from anyone. Did anyone besides me see it? > > o a language specification/negotiation extension > > While having the charset well known is great, it still doesn't help the > Japanese user who gets error messages popping up in English. What I'd > like to see is something akin to IMAP's LANGUAGE extension. This allows > client and server to negotiate a non-default language to use for things > like error messages, newsgroup descriptions, etc. Thoughts? I think this is a great candidate for an extension. I don't know that it is required in the basic specification. -- Stan | Academ Consulting Services |internet: sob@academ.com Olan | For more info on academ, see this |uucp: {mcsun|amdahl}!academ!sob Barber | URL- http://www.academ.com/academ |Opinions expressed are only mine. From mduerst at ifi.unizh.ch Wed Oct 22 12:24:06 1997 From: mduerst at ifi.unizh.ch (=?iso-8859-1?Q?Martin_J=2E_D=FCrst?=) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Please use UTF-8, not UTF8 In-Reply-To: <1203.234T18T1550736@thule.no> Message-ID: On 22 Oct 1997, Petter Nilsen wrote: > In article <344CFAF2.C3B136CD@netscape.com>, bhern@netscape.com (Brian > Hernacki) wrote: > > > see us defined something like UTF8 as the default charset used. While I > > heard from people who agreed with me on this, I didn't hear any > > objections. Is this OK? Do poeple think this would be a bad thing? If > > not should we change the draft? > > UTF8 sounds fine. Please always use UTF-8 and not UTF8. UTF-8 is the correct MIME charset value; UTF8 only is risking confusion. [The only place where I have seen UTF8 instead of UTF-8 is VRML 2.0, but that's not an IETF standard, and other values are not allowed anyway.] Regards, Martin. From mduerst at ifi.unizh.ch Wed Oct 22 12:38:19 1997 From: mduerst at ifi.unizh.ch (=?iso-8859-1?Q?Martin_J=2E_D=FCrst?=) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Re: ietf-nntp Dealing with internationalization in NNTP In-Reply-To: <344CFAF2.C3B136CD@netscape.com> Message-ID: On Tue, 21 Oct 1997, Brian Hernacki wrote: > Now it seems that this conversation has been started a couple of times, > but has never resolved anything. So I'm going to try again. I think it > would be a mistake to expend all this effort to "bring NNTP up-to-date" > and to not solidly deal with localization and internationalization > issues. Very good idea: > A couple of issues I'd like to start with are: > > o the charset > > The current 977bis draft includes the CHARSET command to allow the > client and server to negotiate a charset. A couple folks (including me) > have posted as to why we think this is a bad idea. I would much rather > see us defined something like UTF8 as the default charset used. While I > heard from people who agreed with me on this, I didn't hear any > objections. Is this OK? Do poeple think this would be a bad thing? If > not should we change the draft? As far as I understand, this only refers to protocol elements (parameters) in the NNTP protocol itself, not to the news articles themselves? I don't have much of an idea of what protocol elements are used and which of these need or may benefit from internationalization or localization (by the way, these words can be shortened to i18n and l10n). Also, I think there is some overlap with usenet, e.g. in respect to newsgroup names. I would definitely prefer to use UTF-8 only for these things, but UTF-8 should be prescribed in a way that doesn't completely forbid local existing customs. For examlpe, an NNTP server should not refuse a command with a newsgroup name or something else just because it does not meet the syntactic constraints that an UTF-8 octet sequence does. > o a language specification/negotiation extension > > While having the charset well known is great, it still doesn't help the > Japanese user who gets error messages popping up in English. What I'd > like to see is something akin to IMAP's LANGUAGE extension. This allows > client and server to negotiate a non-default language to use for things > like error messages, newsgroup descriptions, etc. Thoughts? The IMAP model can definitely be used. For more information, please also see Harald Alvestrand's draft about the IETF charset policy, which contains quite a few recommendations about this topic (e.g. default language stuff which can be highly political,...). Other things I could immagine, at least in theory, are a facility that a news article can be pasted in various languages, and the NNTP server only returns one of them based on language preferences (which might not be the same as those for error messages), or a facility to specify some localized conventions for sorting newsgroups or article subjects on the server prior to requesting a certain subrange of newsgroups or article subjects (if such a facility is present currently in NNTP, anyway). Well, just brainstorming. Regrads, Martin. From bhern at netscape.com Wed Oct 22 08:49:35 1997 From: bhern at netscape.com (Brian Hernacki) Date: Sun Dec 7 20:19:21 2003 Subject: ietf-nntp Re: nntp-extensions Dealing with internationalization in NNTP References: <199710220603.BAA13034@academ.com> Message-ID: <344E208F.1EDCC213@netscape.com> Stan Barber wrote: > I think this is a great candidate for an extension. I don't know that it > is required in the basic specification. I'd agree. I wanted this thread to be cross posted toboth lists since internationalization affects both the base protocol (charset) and the extensions. --brian From bhern at netscape.com Wed Oct 22 09:00:42 1997 From: bhern at netscape.com (Brian Hernacki) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Re: ietf-nntp Dealing with internationalization in NNTP References: Message-ID: <344E232A.714CE49B@netscape.com> Martin J. D?rst wrote: > > A couple of issues I'd like to start with are: > > > > o the charset > > > > The current 977bis draft includes the CHARSET command to allow the > > client and server to negotiate a charset. A couple folks (including me) > > have posted as to why we think this is a bad idea. I would much rather > > see us defined something like UTF8 as the default charset used. While I > > heard from people who agreed with me on this, I didn't hear any > > objections. Is this OK? Do poeple think this would be a bad thing? If > > not should we change the draft? > > As far as I understand, this only refers to protocol elements > (parameters) in the NNTP protocol itself, not to the news articles > themselves? I don't have much of an idea of what protocol elements > are used and which of these need or may benefit from internationalization > or localization (by the way, these words can be shortened to i18n > and l10n). Also, I think there is some overlap with usenet, e.g. > in respect to newsgroup names. It affects things like error strings and search tokens which are strictly protocol. I'll check the usenet-format stuff to see how they are handling NG names. > I would definitely prefer to use UTF-8 only for these things, but > UTF-8 should be prescribed in a way that doesn't completely > forbid local existing customs. For examlpe, an NNTP server > should not refuse a command with a newsgroup name or something > else just because it does not meet the syntactic constraints > that an UTF-8 octet sequence does. The current standard is ASCII only so I don't see a backwards compatibility problem. Are suggesting a future revision might want to use a different local encoding (like SJIS)? I'd be inclinded to have 977bis forbid that. It would just generate the kind of interoperability problems we're trying to solve. > > o a language specification/negotiation extension > The IMAP model can definitely be used. For more information, please > also see Harald Alvestrand's draft about the IETF charset policy, > which contains quite a few recommendations about this topic (e.g. > default language stuff which can be highly political,...). Yeah. I kept up on that. I don't really have too much of a problem with it. The biggest constraint is that any new IETF standard must "deal" with charset in a definitive way. If we make the requirement that NNTP is UTF-8 we cover that. --brian From mduerst at ifi.unizh.ch Wed Oct 22 20:22:43 1997 From: mduerst at ifi.unizh.ch (=?iso-8859-1?Q?Martin_J=2E_D=FCrst?=) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Re: ietf-nntp Dealing with internationalization in NNTP In-Reply-To: <344E232A.714CE49B@netscape.com> Message-ID: On Wed, 22 Oct 1997, Brian Hernacki wrote: > Martin J. D?rst wrote: > > > A couple of issues I'd like to start with are: > > > > > > o the charset > > > > > > The current 977bis draft includes the CHARSET command to allow the > > > client and server to negotiate a charset. A couple folks (including me) > > > have posted as to why we think this is a bad idea. I would much rather > > > see us defined something like UTF8 as the default charset used. While I > > > heard from people who agreed with me on this, I didn't hear any > > > objections. Is this OK? Do poeple think this would be a bad thing? If > > > not should we change the draft? > > > > As far as I understand, this only refers to protocol elements > > (parameters) in the NNTP protocol itself, not to the news articles > > themselves? I don't have much of an idea of what protocol elements > > are used and which of these need or may benefit from internationalization > > or localization (by the way, these words can be shortened to i18n > > and l10n). Also, I think there is some overlap with usenet, e.g. > > in respect to newsgroup names. > > It affects things like error strings and search tokens which are > strictly protocol. I'll check the usenet-format stuff to see how they > are handling NG names. There is a tendency to go towards UTF-8, too, so this fits well. But there are also local experiments with iso-8859-1. That's one of the things why I would advocate: As clearly UTF-8 only as possible, but don't make software break if it's something else. Similar things where done in FTP, but it is true that there, there was much more of a legacy problem than with NG names. > > I would definitely prefer to use UTF-8 only for these things, but > > UTF-8 should be prescribed in a way that doesn't completely > > forbid local existing customs. For examlpe, an NNTP server > > should not refuse a command with a newsgroup name or something > > else just because it does not meet the syntactic constraints > > that an UTF-8 octet sequence does. > > The current standard is ASCII only so I don't see a backwards > compatibility problem. It's not only what the standard says. The FTP standard also was ASCII only, but people used all kinds of other things for filenames. The solution there was basically to say that it's okay to use other things among "private parties", but for the Internet as a whole, it's UTF-8. Such a policy was facilitated by the fact that it's fairly easy and safe to identify UTF-8 (for some details, see my report at http://www.ifi.unizh.ch/mml/mduerst/papers.html#IUC11-UTF-8). > Are suggesting a future revision might want to > use a different local encoding (like SJIS)? It's more iso-8859-1 than SJIS; in Japan, there is also EUC and JIS, which keeps them pretty much to ASCII for such things. Explicitly allowing it would make things much more complicated (CHARSET,...). But to some extent, it might be tolerated. > I'd be inclinded to have > 977bis forbid that. It would just generate the kind of interoperability > problems we're trying to solve. Of course, interoperability is a core point. Interoperability in NNTP and for NG names clearly means something different (long-term, very wide-reaching) than for FTP (short-time, point-to-point). Also, there migth be less faits accomplis in NNTP than in FTP. You and others are probably in a much better position to judge that than me. > > > o a language specification/negotiation extension > > The IMAP model can definitely be used. For more information, please > > also see Harald Alvestrand's draft about the IETF charset policy, > > which contains quite a few recommendations about this topic (e.g. > > default language stuff which can be highly political,...). > > Yeah. I kept up on that. I don't really have too much of a problem with > it. The biggest constraint is that any new IETF standard must "deal" > with charset in a definitive way. If we make the requirement that NNTP > is UTF-8 we cover that. Well, I think it really shouldn't be seen as "we have to do something, so that it looks as if we did something". If after serious considerations, we came to the conclusion that for NNTP, nothing in that direction at all is needed, then I think that should be fine. But at present, I don't see the arguments for such a procedure. Regards, Martin. From bhern at netscape.com Wed Oct 22 13:41:27 1997 From: bhern at netscape.com (Brian Hernacki) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Re: ietf-nntp Dealing with internationalization in NNTP References: Message-ID: <344E64F7.A4938ACA@netscape.com> Martin J. D?rst wrote: > > > I would definitely prefer to use UTF-8 only for these things, but > > > UTF-8 should be prescribed in a way that doesn't completely > > > forbid local existing customs. For examlpe, an NNTP server > > > should not refuse a command with a newsgroup name or something > > > else just because it does not meet the syntactic constraints > > > that an UTF-8 octet sequence does. > > > > The current standard is ASCII only so I don't see a backwards > > compatibility problem. > > It's not only what the standard says. The FTP standard also was > ASCII only, but people used all kinds of other things for filenames. > The solution there was basically to say that it's okay to use > other things among "private parties", but for the Internet as a > whole, it's UTF-8. Such a policy was facilitated by the fact that > it's fairly easy and safe to identify UTF-8 (for some details, > see my report at > http://www.ifi.unizh.ch/mml/mduerst/papers.html#IUC11-UTF-8). It'd be curious to know how many existing implementations use non-ASCII charsets for things like NG names. I'd be suprised if the number was very high. What I imagined is that folks who want to do local experimental verisions on non UTF-8 charsets could continue to do so. They just couldn't claim to be 977bis compliant. Just like they can't claim to be 977 compliant now. All this means is they cannot assume compatibility with people who are compliant. I don't think we need to say this explicitly since thats the whole idea of a standard. --brian From bhern at netscape.com Wed Oct 29 21:07:42 1997 From: bhern at netscape.com (Brian Hernacki) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Renaming the SEARCH extensions References: Message-ID: <3458161E.E09C8260@netscape.com> I know people are going to hate this idea but I want to change the extensions name for full text search. Right now the extenesion name is SEARCH. Unfortunately this conflicts with an existing experimental extension name already used... by Netscape. I realize this is something we probably should have avoided in the first place, but what's done is done. In order to avoid the numerous problems having different implementations of searching colliding, I'd like to propose that the final extension namefor search be FTSEARCH (full text search). Comments? --brian From bhern at netscape.com Wed Oct 29 21:11:51 1997 From: bhern at netscape.com (Brian Hernacki) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Re: ietf-nntp Dealing with internationalization in NNTP References: Message-ID: <34581717.3AE6E2AF@netscape.com> OK. I have not heard any serious complaints about UTF-8. Martin Durst brought up the possibility of conflicts with local versions of NG names but I think we can avoid that with some language about using local charsets being OK, but not expecting them to interoperate well. Does anyone object to asking Stan to add language to this effect (advocating UTF-8 as the NNTP charset) to the 977bis document? I'm going to work up a language negotiation extensions and submit it to the extensions list to cover the second issue. --brian From sob at academ.com Thu Oct 30 00:43:45 1997 From: sob at academ.com (Stan Barber) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Renaming the SEARCH extensions Message-ID: <199710300643.AAA02514@academ.com> As far as I am concerned nothing is written in stone yet on the extensions. However, I do expect that we will do some stone carving in December, so it's a good idea to resolve this stuff before then. -- Stan | Academ Consulting Services |internet: sob@academ.com Olan | For more info on academ, see this |uucp: {mcsun|amdahl}!academ!sob Barber | URL- http://www.academ.com/academ |Opinions expressed are only mine. From sob at academ.com Thu Oct 30 00:44:49 1997 From: sob at academ.com (Stan Barber) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Re: ietf-nntp Dealing with internationalization in NNTP Message-ID: <199710300644.AAA02550@academ.com> This sounds good to me. I expect to churn out another revision of the bis document in the next 10 or so days. I'd like to make this change and deal with anything else folks have talked about here in the last two months. -- Stan | Academ Consulting Services |internet: sob@academ.com Olan | For more info on academ, see this |uucp: {mcsun|amdahl}!academ!sob Barber | URL- http://www.academ.com/academ |Opinions expressed are only mine. From natba at MICROSOFT.com Fri Oct 31 10:48:07 1997 From: natba at MICROSOFT.com (Nat Ballou) Date: Sun Dec 7 20:19:21 2003 Subject: nntp-extensions Renaming the SEARCH extensions Message-ID: <199710311851.KAA21322@imail2.microsoft.com> Brian, We spoke about this earlier ... It turns out that Microsoft will ship in the next week an implementation of the search draft that will use the extension name SEARCH. This release has been out in beta for several months using the SEARCH extension name, since this has been the extension name used in all versions of the search draft. As I'm sure there will be other changes to the search draft both as a result of RFC977bis work as well as discussions of the search draft on the full mailing list, I view this as a minor issue. Specifically, both Netscape and Microsoft will likely have to release another update to cover changes as a result of these discussions. Thanks, Nat -----Original Message----- From: Brian Hernacki To: nntp-extensions@academ.com Date: Wednesday, October 29, 1997 10:22 PM Subject: nntp-extensions Renaming the SEARCH extensions >I know people are going to hate this idea but I want to change the >extensions name for full text search. > >Right now the extenesion name is SEARCH. Unfortunately this conflicts >with an existing experimental extension name already used... by >Netscape. I realize this is something we probably should have avoided in >the first place, but what's done is done. > >In order to avoid the numerous problems having different implementations >of searching colliding, I'd like to propose that the final extension >namefor search be FTSEARCH (full text search). > >Comments? > >--brian