2022-10-30

Getting acquainted with Mastodon -- Instances

Elon Musk has to buy Twitter after all. I took this as an opportunity to look at Mastodon at the weekend, a decentralised alternative. TL/DR: super!

What I had to understand first was the concept of an “instance”. My first impression is that you can compare a Mastodon server to an email server. Imagine that in order to write emails, you would have to create an account with one single company in the US and agree to their terms-of-service. Every email you send would go through that company. Not a good idea. That just seems wrong. Fortunately, emails work differently. I can create an account with a server of my choice, in Germany for example with Posteo.de, web.de or gmx.de. And I can send my messages anywhere, to any email server in the world.

This decentralised approach now also works with short messages. Via Mastodon. I can choose my server, or my instance as it is now called. But my messages can be read by all Mastodon users, no matter which instance they use. I find that convincing.

Which instance is the right one for me? Who offers me a Mastodon account now?

My research this weekend revealed 55 potential providers. I collected these manually, I did not find a central overview of providers. (EDIT: As is sometimes the case, after writing this I found the link to the Fediverse Observer. There is even an API there. I'll take a closer look at that another time).

Using the Mastodon API, it is possible to retrieve information from any server. This is easily done via the msocial package that I just uploaded to Github. It uses the new pipe operator introduced with R 4.0. The remotes package is needed for installation, the packages data.table and knitr are needed for this vignette.

#install.packages("remotes")
#remotes::install_github("https://github.com/kweinert/msocial")
library(msocial)
library(data.table)
library(knitr)

The information is retrieved from the instances as follows.

stats <- get_instances() |>
lapply(get_instance_stats) |>
rbindlist()

The code is actually only given for transparency reasons. The result is much more important:

instance registration user_count status_count country_code
https://mastodon.social TRUE 814812 39787754 DE
https://mastodon.cloud TRUE 220222 4424023 US
https://mstdn.social TRUE 76258 5503036 IN
https://fosstodon.org TRUE 24187 1031281 FR
https://mastodon.xyz FALSE 23870 1749182 US
https://mastodon.technology FALSE 23837 1334756 US
https://mstdn.io FALSE 18173 2549736 US
https://qoto.org TRUE 17741 821107 GB
https://social.tchncs.de TRUE 17621 1648232 DE
https://pixelfed.de TRUE 13569 99447 DE
https://octodon.social FALSE 11923 2117274 NA
https://chaos.social FALSE 9198 3025391 IN
https://det.social TRUE 8053 35106 DE
https://mastodon.fun TRUE 7709 761778 US
https://norden.social TRUE 6258 424112 GR
https://hostux.social TRUE 5584 365994 LU
https://meow.social TRUE 5329 620144 GB
https://vis.social TRUE 5181 67908 FR
https://scholar.social FALSE 5144 299049 FR
https://climatejustice.social FALSE 4312 66041 US
https://aus.social TRUE 4209 234313 DE
https://linuxrocks.online TRUE 3845 185353 US
https://social.targaryen.house FALSE 3807 109173 US
https://sueden.social TRUE 2730 14666 US
https://mastodon.partipirate.org TRUE 2417 31210 FR
https://oc.todon.fr TRUE 2377 116484 GB
https://mastodon.gougere.fr TRUE 1809 207780 GB
https://awoo.space FALSE 1743 615982 NL
https://aleph.land TRUE 1542 215486 FR
https://animalliberation.social TRUE 1182 3023 US
https://dresden.network TRUE 1141 46729 NA
https://icosahedron.website FALSE 818 374745 DE
https://im-in.space TRUE 788 65204 DE
https://graz.social TRUE 712 24131 DE
https://maly.io TRUE 653 133470 DE
https://Bonn.social TRUE 597 114461 DE
https://scicomm.xyz TRUE 563 18918 GB
https://xoxo.zone FALSE 497 59844 US
https://oldbytes.space TRUE 382 96653 DE
https://berlin.social TRUE 346 5423 DE
https://functional.cafe TRUE 321 132400 DE
https://social.wxcafe.net TRUE 289 148851 DE
https://fediscience.org TRUE 289 11322 DE
https://mst3k.interlinked.me TRUE 228 343658 CA
https://eupublic.social TRUE 137 19777 US
https://masto.raildecake.fr FALSE 136 5736 FR
https://dads.cool TRUE 84 309546 FR
https://feuerwehr.social TRUE 72 1173 UA
https://eupolicy.social TRUE 65 622 DE
https://mastodon.indie.host TRUE 24 11558 DE
https://mastodon.land TRUE 13 4683 US
https://social.imirhil.fr FALSE 3 47042 GB
https://social.diskseven.com FALSE 1 75273 GB
https://share.elouworld.org FALSE 1 1988 FR
https://social.lkw.tf FALSE 1 6 FR
https://social.ballpointcarrot.net FALSE 1 409 US
https://manx.social TRUE 1 59 CA

The largest instances contain “mastodon” in their address and are rather generic communities. Other names suggest a technical / open source software focus (fosstodon, linuxrocks.online, functional.cafe). There are politically oriented communities (mastodon.partipirate.org, eupublic.social). Other servers have a geographical focus (graz.social, bonn.social, berlin.social, dresden.network, norden.social, sueden.social, aus.social).

I find qoto.org interesting, which also integrates other services like Gitlab and a group concept.

Some interesting communities are closed, e.g. scholar.social or chaos.social.

I can't really make sense of the country codes. To find out the countries, I used the packages iptools and rgeolocate:

get_instance_countrycode <- function(instance) {
    stopifnot(length(instance)==1) # not vectorized
    ip <- iptools::hostname_to_ip(gsub("^https://", "", instance))[[1]]
    if(length(ip)>1) ip <- ip[1]
    if(ip=="Not resolved") return(NA)
    fn <- system.file("extdata","GeoLite2-Country.mmdb", package = "rgeolocate")
    rgeolocate::maxmind(ip, fn)[,"country_code"]
}

I don't really trust the results. For example, aus.social claims to be hosted in Australia.

table(stats$country_code)
## 
## CA DE FR GB GR IN LU NL UA US 
##  2 17  9  7  1  2  1  1  1 14

I decided on berlin.social. Then the local timeline makes sense for me. I have also already found some R users and am now following them. I have also already found two interesting toots that will occupy me in depth in the short to medium term. All in all, a good start.

2022-08-27

Book Review: Siegfried Schrotta (ed.), Systemic Consensus Building (German)

This is an exciting idea based, among other things, on measuring resistance rather than approval to proposals.

Chapter 2 compares this method with the traditional majority principle. Chapter 4 proposes concrete metrics. Chapter 5 reminds me in parts of Design Thinking. Chapter 19 makes reference to Plato's aporia or hopelessness or tension. Chapter 27 has additional very practical ideas.

I can imagine that today there is even more scientific evidence for the effectiveness of this method. For example, Sarah Brosnan's finding that while we have no inborn sense of fairness, we do have an inborn sense of unfairness when we experience it. 

Book Review: M. Scott Peck, The Different Drum

I have not read the book from cover to cover. But maybe it is a book where you look at the table of contents and pick out one or two chapters.

Community building is not easy and the book helps pave the way. 

Would be nice if there was a newer book. A 35 year old book sometimes seems a bit dated with its references to the Cold War. Community building is gaining importance again, after Corona, etc.


Book Review: Alice Munro, Dear Life / Too Much Happiness

From what I remember, Jonathan Frantzen is a fan of Alice Munro. If you don't have enough time to read a novel, read a story by Alice Munro -- that's his advice as I understood it. And that's what I did and enjoyed. A story reads in three hours, ideal for a slow afternoon in the park at the weekend. 

There is probably a deep analysis, or even several, to each story. I'm not going to try to analyse the stories here in this review now. What I like, what I admire, is how Munro manages to take me out of the role of reader. The stories touch me. 

Some stories I have read several times: "Train", "Dimensions", "In Sight of the Lake" (inside?). 

These books will certainly stay on my shelf and I will pull them out from time to time.

2022-04-18

Play & Analyse Wordle Games

So now I, too, wrote an R package with functions that make playing Wordle easy.

English and German Wordle Games are supported.

Installation

You will need the statistical software environment R. See here for installation notes.

To install this github repository, run the following code at the R console:

install.packages("remotes")
library(remotes)
install_github("kweinert/wordlegame")

That's basically it! If you installed the package tinytest, you can optionally check if the installation worked:

library(tinytest)
test_package("wordlegame")

Play Wordle

To use the tool while playing Wordle, the following steps are necessary. First, you set up a "knowledge model" in which all permissible words are stored and later the findings from your guessing attempts are also stored:

library(wordlegame)
kn <- knowledge("en") # 'de' is also supported

The wordlists of permissible words are taken from github (en, de).

Now you can use this object to output one or more suggestions for your first guess attempt. For this purpose, there is the function suggest_guess, which takes as arguments the knowledge object, the current round (between 1 and 6) and the number of words to be output:

suggest_guess(kn, num_guess=1, n=10)
#[1] "ables" "spire" "rones" "maise" "skean" "sorda" "cries" "tines" "togae"
#[10] "safer"

Wordle gives you feedback on your guess attempt. This feedback can be passed on to the knowledge object. Wordle feedback uses colours that need to be translated into letter codes. There are three codes:

  • green means: the letter is in the correct position. This is to be coded as "t" (true).
  • beige means: the letter occurs, but in a different position. This is to be coded as "p" (position).
  • grey means: the letter does not occur. This is to be coded as "f" (false).

So if your guess attempt is e.g. "safer" and the feedback is "grey, beige, beige, green, beige", then this translates into:

kn <- learn(kn, "safer", "fpptf")

and you can use suggest_guess again to get new suggestions:

suggest_guess(kn, num_guess=2, n=10)
# 5 fits: fubar, iftar, friar, filar, flair
#[1] "filar" "flair" "friar" "iftar" "fubar"

And so on.

Some Tricks

Popularity

Many words from the word lists are rare words. It is plausible to assume that these are unlikely to be the solution. To estimate the popularity of words, the function popularity can be used:

popularity(c("fubar", "filar", "friar", "iftar", "flair"))
#   fubar    filar    friar    iftar    flair 
# 1216001   434000  3212094  2630000 13500000

Here we can see that 'flair' is by far the most popular word and thus a good candidate.

The idea for the popularity function came from Kework K. Kalustian -- kudos.

Non-Strict Candidates

Sometimes the guessing attempts reduce the permissible words to relatively few words that are at the same time quite similar. Here is an example:

kn <- knowledge("en")
kn <- learn(kn, "safer", "fffpf")
kn <- learn(kn, "glide", "ttfft")

In this example, after two guesses, only 6 words are possible: glute, glume, gloze, glebe, globe, glove. Now there is the possibility to choose one of these words and rely on luck. Or we can strategically choose a word that, while certainly not the solution, effectively limits the words allowed. The function suggest_guess has the parameter fitting_only. If this is FALSE, then non-permissible words are also suggested. This allows the second strategy to be implemented:

suggest_guess(kn, num_guess=3, n=10, fitting_only=FALSE)
# [1] "cobza" "bloat" "vocab" "above" "tabun" "novum" "combs" "baton" "embox"
# [10] "bokeh"
kn <- learn(kn, "above", "fptft")
# 1 fits: globe

The parameter fitting_only is only evaluated in rounds 2 to 5. If it is not explicitly set, then a heuristic is applied: if there are less than 100 permissible words, non-striked candidates are also included in the consideration, otherwise not.

Evaluating Strategies By Simulations

The most fun is the search for an algorithm that quickly and reliably finds a solution to the puzzles. In my search for a strategy, I came up with four approaches:

  • Probability: Take the words currently allowed and determine which letter/position combinations occur particularly frequently. Then find a word that best fits this probability distribution.
  • Contrasts: Take the currently permissible words and form all two-way combinations from them. For each combination of two, determine the letters that appear in only one of the two words. These so-called contrast letters are good for separating the two words. Now find a word that contains as many contrast letters as possible.
  • Answer entropy: For one word w and the currently allowed words, determine the answer that Wordle would return. These answers form a probability distribution on the space of possible return values, given the word w. Calculate the entropy of these distributions for each admissible word w and take the word with the highest entropy.
  • Full entropy: For each word w and the currently admissible words, determine the answer that Wordle would return. Now additionally determine the allowed words for each possible Wordle pattern. These two pieces of information, frequency of the answer pattern and admissible words, form a probability distribution on the Cartesian product of the answer patterns and the admissible words, given the word w. Calculate the entropy of these distributions for each admissible word w and take the word with the highest entropy.

As can be seen: the strategy can become arbitrarily complicated. Unfortunately, so can the computational time: the above approaches would take -- for my patience and the computational power available to me -- too long. Therefore, I limited the number of allowed words to a maximum of 50 (parameter sample_size in suggest_guess.).

To see how good the strategies are, there are some help functions in the package. With sim_wordle a game is simulated. With distr_wordle several games are simulated. The function compare_methods calls distr_wordle for the above methods and returns the result as data.frame.

Here is the result of 200 simulations for each method except 'full_entropy', which takes too long.

method n_runs duration avg_guess fails
prob 200 53.87 4.431818 24
contrasts 200 88.15 4.699422 27
reply_entropy 200 66.68 4.469613 19

In my opinion there is much room for improvement. Unfortunately, I no longer have the time.

To invent your own strategies, you need to fork the repository and change the function suggest_guess.

Further Readings

Searching Twitter for "#rstats" and "wordle" reveals a lot of other information on the subject.

For example, there are

2022-03-03

Reviewing my First Shiny Project (1/n) – Buttons

My first serious Shiny project is finished: 4517 lines of code (without comments)! Now I’m taking the time to go through the code again and reflect. What has turned out well and can be continued in the future? What are problem areas and need to be reworked?

I share my thoughts here in a series of articles, also to sort them out and not to forget them. This first article is about buttons.

Vertical alignment

Often my button is placed to the right of an input box, slider or similar. The other input elements often have a label, the button does not and is then vertically shifted:

vertically shifted button

On stackoverflow I asked how to fix this and quickly got a solution:

shiny::column(2, 
    shiny::actionButton(ns("dbconnect"), "Connect!"),
    style = "margin-top:25px;" ## <-- !
)

Warnings and Error Messages in Shiny

I want the code that actually does something (hereafter: working code) to be separate from Shiny. That means I don’t want any calls to showNotification in my working code. However, I still want to use the classic signal functions of R – message, warning and stop – in my working functions.

On Stackoverflow I found a solution for this and defined it as the function exec_safely. The function is stored here. It is a decorator, where an expression is executed with tryCatch and withCallingHandlers.

Now when I write an event handler for a button, I use exec_safely:

rv <- shiny::reactiveValues()
shiny::observeEvent(input$btn, exec_safely(session,{
  rv[["log_x"]] <- log(as.character(input$x))
  })
)

The sample app demonstrates the behaviour. To do this, the code at the link above must be read in, e.g. via source("https://pastebin.com/3RURswkd"). Then both the functions exec_safely and demo_exec_safely are defined. The app is started via demo_exec_safely().

Negative values should produce a warning and non-numeric values an error and display in the web interface.

Plot! Button

In my app there are several tabs that visualise data from a database. The visualisation is parameterised. The parameters either concern the data (e.g. a time period) or are display options (e.g. the font size). Accordingly, the renderPlot function can be understood as a function with two arguments: do_plot(dset, plot_options).

It has proved useful to include a “Plot!” button that recalculates the dset variable:

rv <- shiny::reactiveValues()
shiny::observeEvent(input$btn, {
  rv[["dset"]] <- calc_dset() 
})

The function calc_dset is often a reactive expression depending on one or more interactive input parameters and time-consuming.

At the above link, a function demo_confirm_plot is also defined, which demonstrates both the confirmation dialogue and the Plot! button.

Note that the parameter n can be changed without re-plotting. Re-plotting only occurs after the “Plot!” button is pressed. This allows the user to set the data parameters at their leisure (especially if there are multiple parameters) and decide for themselves when to perform the elaborate operation.

If, on the other hand, the parameter ‘col’ is changed, the plot is immediately regenerated, because no data has to be recalculated.

Summary

Regarding the implementation of buttons in my first larger Shiny project, I am satisfied. There are still some code places where I proceeded differently than described here, this still needs to be adjusted and updated.

Share your thoughts on this article on Twitter!