We are still actively working on the spam issue.

NNTPChan

From InstallGentoo Wiki
Jump to: navigation, search
Cleanup.png
Cleanup.png
CLEANUP CANDIDATE
Relevant discussion may be found on the talk page. Reason: Copied from wiki.bibanon.org


Exported from: https://scalar.vector.im/etherpad/p/!ouONyEOxwYkSvOaljT_matrix.org_bibanon-scratchpad

https://github.com/majestrate/nntpchan/ https://2hu-ch.org/thread-6a5153ae138db794d7a0ed33224812ab803480ea.html

A chan which is designed to be fully decentralized by using the NNTP service, akin to USENET but leveraging cryptographic signing. Could be a great way to have a freeform anonymous board where hosting costs can easily be federated and smaller chance can function as one.

See matrix/riot and pawoo/mastodon for successful examples

The linked thread in question also has very good discussion on the use of IPFS, any downsides. https://2hu-ch.org/img/FB2UGBGOTHQLWC74BC2I3V4E7DQPQMP2LSAVYNZ6GAVJENC6J4QHZEGJM4FCKRX4YD5SMTFQ2EIJIIGSPF5VMLHAYJKEG76J2WI74YA=.png but what uguu (the author) said is that IPFS is too slow for propagation so if we used IPFS instead the problem would be that image posted on one board would not show up on another until a while later you'd be seeing the post/mesasge but a blank image until propagation happens However, when he says that I think he means if you used your own low bandwidth home server to display IPFS images. IPFS does not remove the need for good servers. The seed node needs to have good upload speeds As you saw with sunako's pictures (linked above), images we know were never uploaded before, they showed up quite quickly right? That's because his server acts as the main node and it is pretty fast already without ipfs so you get the speeds too.

brief history

started on anonet2 in may 2013 as a self healing imageboard plugin/fun hack to inn2 called overchan written in python2, single threaded, used 5 seperate sqlite db, second worst code i have ever seen. (no offense sfor) rewritten in go in january 2015, old daemon was killed off during anetwork wide flag day in 2016 when signed messages where changed to use <LF> instead of <CR><LF> there exists several alternative implementations that are complaint to the protocol conventions including one maintained by ivo written in C# (i think?) and a proprietary haskell version written by lulzprincess(i think?)

realistically it's just usenet with a webui and different text format conventions i.e. post cites and board references, interoperation with mainline usenet is possible but not desired.

federation and moderation strategies

The most important key to the success of an NNTPChan-based network is how many nodes exist in the active network and how well it is moderated.

Moderators appear to be identified and authorized across servers using public key infrastructure, so that server nodes can allow moderator actions to be conducted. (correct) Theoretically if the server node admin accedes to our moderator cabal's right to moderate, they just import our keys and filtering/deletion actions much like you would an adblock list: thus allowing the server admins to donate space while delegating the moderation to another group (correct)

Thus, (perhaps an Oath of Fealty) the moderation becomes the community leaders rather than the servers, allowing federation by servers that accede to these moderator's authority. The authority of the moderators is to allow a node to federate with them. The node operators have the responsibility of maintaining the keys and face the consequences of defederation for noncompliant nodes. The node opers are (ideally) the conduit that allow the moderators to moderate the network and the posters to post the content.

current moderation setup

nntpchan uses an open transparent cryptographically signed mod system. anyone can make a moderation suggestion but only trusted identities are executed by the daemon. the idea is that over time new mods can be found by their merit alone, not via nepotism. currently nntpchan has just 1 newsgroup for all moderation, this will probably change as the volume grows. as of today (Nov 2 2017) per board moderation scope is enabled but not yet utilized on the mainline network. * ctl is used as a meta board for control messages, sort of like 4chan's /j/ except open to anyone, transparent and all posts MUST be signed

key features/antifeatures:

  • moderator privs can only be modified by a server admin
  • moderator privs operate at one of 2 scopes, global OR on 1 newsgroup, 1 moderator can moderate more than 1 newsgroup but that needs to be set manually
  • historically everyone that has granted trusted mod access to a server is a global mod by default, this has changed as of Nov 2 2017 to default to per board mods and/or globals. this policy is by server and varies.
  • server admin is god on his box, what he says is absolute.
  • If you don't like it create a new node, if you do like it make a new node anyways.
  • E.g. the server admin can choose not to allow certain moderators on their node, and can also choose to neglect maintenance of the node.
  • The responsibility thus falls on the cabal to remove absentee or noncompliant server admins... see mastodon to observe how this works in practice
  • server admins are assumed to be a cabal that work together, not plotting against one another
  • This is similar to Mastodon, whereby the nodes have to agree on who to federate with. However, often the result is that the mods end up with the same political alignment...
  • Thus, we have to consider the federation cabal as more of a formal organization than a free joining network like email: (disagree in principal, it should be agile and open but not easy to spam, in practice this may be required)
  • However, the Bibliotheca Anonoma and the archivers already follow such an arrangement, where we own both desuarchive and rbt.asia and have agreements with 4plebs
  • globally agreed hard limit on content, i.e. no cp/dox, if you do not follow the hard limit or you peer with someone that doesn't we will not peer with you.

Moderation needs: Text vs Images

  • Images - Requires strong moderation to prevent CP from entering, due to it's contraband status across the world.
  • This is conducive to formal organizations of moderator cabals rather than free form nodes, since we would have to kick out noncompliant or absentee nodes that allow CP to enter anyway...
  • Moderation policies or filters will have to be delineated based on where their servers are hosted.
  • On Mastodon, due to differing legal and cultural regulations, Pawoo and related Japanese artist community nodes were blocked by English language nodes despite being the largest communities that use Mastodon.
  • http://www.ethanzuckerman.com/blog/2017/08/18/mastodon-is-big-in-japan-the-reason-why-is-uncomfortable/
  • E.g. Russia makes homosexual and loli content illegal, but Japan allows both, yet bans uncensored porn.
  • The Bibliotheca Anonoma and its federation will pursue a policy similar to that of 4chan, under the laws of the state of California: no CP, DMCA removal on request, only spam hiding and no other text deletion.
  • Text - Allows for much looser regulation since there are very few situations where text is illegal in the US
  • Probably the only thing we need to do is moderate the removal of spam. Spam text could be made hidden for a retention period of a few months and put on a different table to reduce clogging (note: clear explicit and concise definitnation of what spam is and is not must be provided otherwise people will complain)
  • The First Amendment allows no censorship of political expression. Although corporations can moderate the content of their discussions and ban members at will
  • Hate speech and incitement is a very narrowly defined clause where a place and time must be specified
  • The one possibility where text could be considered contraband is in the very narrow clause where 1. human trafficking is involved and 2. the website was built for that purpose or the admins willingly facilitate it, such as at Backpage

All posts on this site are the responsibility of the individual poster and not the administration, pursuant to 47 U.S.C. § 230. To make a DMCA request or report illegal content, please contact the administration

Maybe a workaround is to use imgur, or to have a oligarchical moderation cabal only for publicly viewable images? (e.g. only display images from certain trusted moderated sources in public net, but allow whatever source in tor) (how would CORS work?)

spam definitions

  • Floods of random/repeated text - when it's designed to reduce the quality or reliability of the network, or is the result of malware (corneila's spam of 4chan post caches)
  • garble text posting - If it doesn't signify meaning "132dh329d2-id2-334325{PL{PL{P:" it has no place in a discussion, unless it's creative and interesting, which is entirely subjective
  • unwarranted advertisements of off topic sites - think random anontalk linkspam on 4chan, not recommendations of websites in a thread

thoughts on above: how can mods determinte intent of poster? :p

Spam Detection without IP Bans

The problem on NNTPChan is similar to that of email: since IP bans are difficult, heuristic spam solutions have had to be made.

what we have inside tor is probably the best you can get
federated inside tor
servers don't see ips, users can't find servers, servers can't see servers
the biggest drawback is no ip bans
but we've dealt with it for long enough that we're used to it
antonizoon
so how do you handle that
__uguu__ (IRC)
brute force man power
:p
no other way
antonizoon
how about data mining
such as the already prevalent spam detection systms
for email
_
__uguu__ (IRC)
how do you see that working?
antonizoon
well it works for email spam
with 1% false positive
it's already a very mature field at least in that narrow focus
_
__uguu__ (IRC)
i was legit considering just bolting on spamasssasin
antonizoon
what stopped you
_
__uguu__ (IRC)
just having to do that
and then training it
mostly training
antonizoon
well hey
we got tons of 4chan archived data uploaded to the internet archive to form a dataset
_
__uguu__ (IRC)
ohay
nice
antonizoon
that can be used to establish expected behavior
_
__uguu__ (IRC)
that would work
for ham at least
what about for spam?
antonizoon
https://archive.org/details/archive-moe-files-a
archive-moe-files-a : Free Download & Streaming : Internet Archive
 - Internet Archive
so at least these sort of past discussions can be used to establish "expected" behavior
as for spam how applicable would existing detection sets be for your use case
they usually detect stuff like floods of random trash or nigerian scam emails
and they might be a bit tuned for emails, but there must be precedent for their application in discussions
_
__uguu__ (IRC)
we use the same format as email
multipart mime
antonizoon
perfect
_
__uguu__ (IRC)
that is why i thought of it
i'll check out what hooks SA has
found a go library 
this will be simple
_
possibly

Spamassassin

uguu is testing SpamAssassin on their NNTPChan nodes. Email spam filtering is a good fit for NNTP's MIME format, which has strong affinity with modern email formats.

  • If it's detected as spam, it doesn't go into a newsgroup or federate.
  • it also checks for dkim, but NNTPChan doesn't use DKIM, so disable this option.
  • Some types of writing such as Cyrillic will be detected as spam by default, so make sure to allow non-latin charsets.
  • it might be possible to integrate it into the mod system so that moderators can train the system to detect manually discovered examples of spam, or review detected spam. This will likely reduce the majority of spam and thus moderator time used processing them, and thus make it easier for people to run their own nodes. Note that DMCA and CP content will still need to be discovered manually and reviewed on a case by case basis.

current federation peering policies

as of nov 2 2017 there are 2 modes of operation in post sync, push and pull. push is immediate but has some realistic technical limits related to throughput, filtering is possible but rather ineffiecient. push uses TAKETHIS message-id@server<CR><LF> followed by the multipart mime message in full delimited by <CR><LF>.<CR><LF> after the entire article is recveived the server replies with either accepted/deferred/rejected status line

pull is periodic and paralellizable and allows the recipiant to choose what and when they download. this mode of operation is high latency but preferred as it slows down the post rate to a nice slow pace.

for archival it makes sense to have a full mesh topology for peering so that posts propagate immediately without the need of middle nodes routing, since all archivers are trusting each other this is also probably okay. meaning, all servers talk to all other servers directly without a middle man. this won't scale to larger server sizes but for the purposes now it is probably fine.

to peer with another server each side needs to exchange tls certificates, right now the tls certs are able to sign subkeys but that functionality isn't used at the moment. effectively treat the tls cert as a public key block. tls uses a fixed cipher suite, probably will upgrade it to chacha20poly1305 for perf related reasons, right now it's RSA 4098 ase 256 gcm sha368 (?)

nntp peering will not work behind a reverse proxy cdn like CF (especially if you use their HTTPS certificates) it must either be over tor hidden service, direct ip or inside a vpn. (wouldn't the peering be fine if server to server communication came out of the server directly, and the frontend was only cloudflare?) (yes, the main caviet is that the dns entries would be missmatched and tls verification would fail unless you use /etc/hosts or stuff), nntp tls certs are pinned and do not use any 3rd party CA, treat them as gpg pubkeys in a way.

URL Scheme

all posts are current addressable via hexencode(sha1(message-id))

i.e. [email protected] is post 12d6aed8880b33f42cf22888dc0e29d3851a0a08

if you reference a reply it's accessed via the OP's thread with a url fragment containing the post id op http://21.3.37.5/t/12d6aed8880b33f42cf22888dc0e29d3851a0a08/#12d6aed8880b33f42cf22888dc0e29d3851a0a08 vs a reply to op http://21.3.37.5/t/12d6aed8880b33f42cf22888dc0e29d3851a0a08/#15ae678050097629f3dfab5a79f9758a949932b3

currently there exists no formally predefined url scheme but in network citations to posts go via truncated message-id hash

12d6aed888 would point to the post with that hash prefix that is nearest to it in keyspace. in the event of a collision the newest message SHOULD be used. 12d6aed8880b33f42cf22888dc0e29d3851a0a08 12d6aed8880b33f42cf228 12d6aed8880b33f4 all refer to the same post given there are no hash collisions.

board cites would be: >>>/overchan.board.name.here/

board urls are typically referenced via https://site.tld/b/overchan.board.name.here/ board pages are /b/overchan.board.name.here/pageno/ json endpoints for all pages append /json to the end of the url path rss endpoints for all pages append /rss.xml to the end of the url path threads are accessed via /t/opmessageidhash overboard at /o/ overboard paginated at /o/pageno/

posting is done via http POST to /post/overchan.board.name.here captcha is required on all posting endpoints

note that post cites can go to another board without notice so cross board post citing is not specified, it is not applicable

base32encode(blake2b(message-id)) would be the new post cite scheme if any upgrades to post cite algorithm were implemented and deployed

Case Studies

Slamspeech.ano - Textboard only NNTPChan

Since text has significantly less regulatory restrictions than images which can end up being contraband, it can be a viable means to allow freer federation than image dense boards would.

This site was a textboard that dropped attachments off posts for display from places it federated with, significantly reducing the moderation burden. It eventually disappeared though. (please come back lulzprincess we miss you ;~;)

https://wiki.bibanon.org/4chan/Successor