Learning Languages While Being Privacy-Conscious

Let’s start with a bit of background. I am currently trilingual, attempting to go quad-lingual. My native tongue is Polish, and my second language, the one I’ve been learning for the longest time, is English. It’s been part of the curriculum from as early as 1st grade elementary school. In school I picked up the grammar and a good bit of vocabulary, but I didn’t care much for it or use it. Where I really broke through was when I had things to do online that weren’t available in Polish. Playing games and watching videos, I had to put the stuff I’d learned into practice. I remember how, around 2011, many of the kids in class would proudly say words like “fancy” and “charcoal”, which we learned from playing Minecraft. Over the years, I developed my writing, reading and listening very well, but my speaking was underwhelming until I got a group of international friends to chat and play with.

While in school, I also had some German and Russian. I retained some extremely basic vocabulary (Auf wiedersehen, здравствуйте, etc.) and learned to read the Cyrillic alphabet, but due to lack of practice or interest I never developed past that. Then came Swedish. A language which I never heard uttered until I was around 12, and which I soon after had to start using on a daily basis. I learned this one in school, both during Swedish classes and during other classes, which were also conducted in Swedish. Unlike with English, I didn’t get to ignore speaking, as it was the necessary part of everyday life. After some years, I ended up being a fluent speaker. Far from perfect – especially in the casual speech with slang and such – I understand it well enough to listen to podcasts, write documents, read books and talk to others.

Lastly, there’s Japanese. It’s a language I’ve been sort of getting interested in learning for a little over half a year, mostly due to manga and anime (a bit cliché, but hey, whether it’s because of necessity or because of pass-time entertainment, learning is still learning). More specifically, there’s lots of content that I like that is sparsely (if at all) available in English, so if I ever want to see it translated, the best bet is to do it myself. While I’ve had the interest for some time, I haven’t done much about it until roughly 2 months ago. That’s when I started learning Hiragana, and by now I also know Katakana, some Kanji and a bit of basic grammar, which lets me construct basic sentences (おはよう、私の名前はチルノ!見て! 日本語です!). Most of the text in the next section focuses on how I’m learning this language in a privacy-respecting manner.

Imagine that you want to learn a language, but you also care about your privacy. These two don’t seem very connected, but there is in fact a fair bit of overlap. Take Duolingo, for example. It’s a very popular service (supposedly) for learning new languages. Looking at its privacy policy, it collects things like mouse movements, IP address, viewfinder size, and referral links. This information, on top of being used by Duolingo, is shared with third parties like Google and Facebook, which anyone invested in privacy knows to avoid unless necessary.

A more traditional approach would be to go to school. However, that becomes a more problematic prospect with each passing year. With the pandemic encouraging distance learning, many classrooms have moved on to digital spaces like Zoom meetings and Microsoft Teams groups, which introduce their own layers of privacy-invasiveness. Even before that, many a teacher would be keen on using services like Padlet, which also collect arguably-useless data (signal strength…?) and often share it with the likes of Google.

Besides, not everyone has the time, the ability or the funds to get into a “real” school. So let’s shift our attention back to individualised learning, and see how we can replace services like Duolingo, YouTube and a variety of apps with more privacy-friendly options.

Well… alright. The above introduction makes this post seem like a jack-of-all-trades, but it actually isn’t. What services are available to you depends on what languages you know and what languages you want to learn. I cannot possibly know all the combinations (how would a Vietnamese person learn Arabic? I have no idea), so instead I will simply share examples of what I do in my own studies, as possible inspiration for others.

I Don't Need Big Data to be a Weeb

Learning a language comes in two parts – grammar and vocabulary. To get a hang of grammar, arguably the best way is to just get a book, ideally a physical one. Books have the advantage of being literally just some paper, which you can’t stuff full of fingerprinting JavaScript. When getting a physical book, you should consider going to a local book store and paying with cash. If you live in a smaller city, where the book store might not have whatever you’re looking for, consider contacting them first. I did so, and it turned out I can just request some books to come with the next stock delivery, which I can then go and pick up. In my case, the book I’m using to learn Japanese is called Genki. I found it by looking at MIT OpenCourseWare, which has a bunch of free lesson guidelines for Japanese, and uses that book as the base. I personally don’t use MIT’s guidelines, but they’re there, if you’d like to learn some, yourself. Unfortunately, my book is in a digital format (the physical copies cost way too much), but I bring it into the physical by printing out the workbook as I get through it, which is a nice way to spend a bit less time glued to a monitor.

Vocabulary is the most individualised part of learning, as the kind of words you learn and retain are entirely dependent on your interests. I don’t treat it as a thing you sit down to learn, but as something that will just come to you with time, as you do other things. But regardless of your interests, one thing’s for certain – you’ll need a dictionary every once in a while. For Japanese, I found to be a fantastic resource (the website’s name, 辞書, literally means dictionary). It is free and open source, released under the CC-BY-SA license. To my knowledge, the website collects no data at all. It does feature code from Google (mostly in the form of Google Analytics), but you can disable that with uBlock Origin, and everything will keep working fine. It is functional in GNU IceCat, meaning you can use it without any JavaScript. Alternatively, though I haven’t used it personally, you could try using As part of the Wikimedia Foundation, it has a somewhat extensive privacy policy that I haven’t dug through, but – like with Jisho – you can use it with Tor and with no JavaScript, which reduces possible tracking to a minimum. Jisho has the advantage over Wiktionary in that, instead of individual words, you can input entire sentences and it’ll translate each word individually (so no worries over machine translation being bad), while also showing you the sentence structure – what’s a verb, what’s a noun, what’s a particle, etc.

Different to learning languages like German or French, Japanese requires the use of a non-Latin alphabet, or rather three alphabets: Hiragana, Katakana and Kanji. Regardless of what you want to use the language for, you’ll need to learn all three (the third only to some extent) sooner or later, which means a whole lot of remembering. For that, I use the excellent mobile app called Kakugo. You can get it on F-Droid, and it is free and open source. It has everything I need – hiragana to romaji, katakana to romaji, kanji to meaning, drawing practice, etc. - and it lets me adjust what gets shown with respect to my current skill level. Mobile apps are something I’m often wary of, but this one, besides being FOSS, never connects to the internet, and thus no meaningful tracking can be done. And it’s worth having it on the phone, letting me practice the language in spare moments throughout the day.

That’s that for learning a language, but unless I want it to end up like my German and Russian, I ought to practice it, as well. Practising a language comes in four parts – reading, writing, listening and speaking. Reading and listening can be done entirely through just doing things you enjoy – maybe it’s watching movies, maybe it’s reading stories or whatever, so there’s no specific privacy-saving advice I can give. Writing is not something you need any internet for – just a pen and a piece of paper is enough to write, as long as you can find something to write about. Speaking is definitely the hardest part to do on your own, but I personally don’t care much for it, so it doesn’t matter much to me if I can’t speak very well. The nice thing is that, if something’s important to you, you likely already have ways to practice it. For me, the most important part of Japanese is reading, because there’s lots of stuff I want to read, which I can consequently use for practice. If you just moved to Japan, the most important part might be speaking, but then it’s no problem finding practice – just go out and talk to people.

Lastly, there’s Google Translate. I know, I know, it’s about as far from privacy-respecting as you can go, but it has some features that are really tough to beat. Multiple rows of the input field can be surprisingly useful (Jisho only gives you one line to type in), and instant translation lets me detect and correct mistakes as I go much better than a dictionary would. Besides, I need something to make sure what I'm writing makes sense. If I want to know whether the sentence 二十三時に寝ます means what I think it means (“[I] go to sleep at 23:00”), I cannot use a dictionary. It’ll tell me that 二十三 means 23, that 寝ます is the present affirmative form of the verb “to sleep”, and so on, but I already know that – that’s the knowledge I needed to construct the sentence, to begin with. What I need to know is whether the sentence as a whole is correct. If I cannot rely on a school teacher or a friend to check for errors, a machine translator will have to do.

To minimize the problems associated with using Google services, I access the site exclusively through Tor, and I have no account to associate traffic with. I also caught wind of projects like the Simple Web Translator, made by Metalune, which is a front-end for Google Translate, parsing requests and getting back results without any use of JavaScript. From the little testing that I did, it seems to work well enough, so I might try using it more in the future.

A Bit of Surströmming to Finish Things Off

(I never ate surströmming. I bought it once but I was too intimidated to eat it, so it just stayed in the fridge until the expiration date went by and I had to throw it out for some (un)fortunate recycling worker to enjoy… Imagine, a fish that’s rotten twice over. How does that even work?)

Although I am fluent with Swedish and English, I sometimes still need to use a dictionary or a translator. In those cases, instead of using tracking-laden sites like Google Translate or Merriam-Webster (the latter of which blocks you if you use Tor), I use Folkets Lexikon and Wiktionary. Wiktionary is a good general-purpose replacement for Merriam-Webster and other proprietary dictionaries, and it’s especially useful for me for Swedish, because it includes tables of all the different forms that a word can take. The content on Wiktionary is typically licensed under Creative Commons or the GPL, which I always like to see. Folkets Lexikon, a project from the KTH Royal Institute of Technology, is a very simple yet fully-featured dictionary translator between Swedish and English, with all the content also under Creative Commons, and it even has links to download the entire databases for free.

Well, that’s about it as far my experience with learning languages. There are of course more things you can do, that I haven’t brought up simply because they’re not too relevant for me. If you use YouTube a lot to listen and learn, you could use any of the good front-ends for it, like FreeTube. If you watch lots of movies, consider getting them from places other than Netflix. If there’s something you do that you’d like to be better for privacy, do yourself a favour and research alternatives. Just because you’re not aware of them now doesn’t mean they do not exist. Lots of people make great content, and if you find it useful, do the world a favour and support the creators.

