Viking 7B: open LLM for the Nordic languages trained on AMD GPUs

(silo.ai)

113 points | by reqo 72 days ago

13 comments

jug 72 days ago
If you're interested in this, don't miss AI Sweden's GPT-SW3 @ 126M to 40B trained on Nordic languages (not Finnish) and English. It's funded by the Swedish government and partners, and freely available with a pretty lively Discord for ongoing AI research focusing on the Nordic languages. I think Viking is called "first" because it includes Finnish, because otherwise, GPT-SW3 was released earlier.
https://huggingface.co/AI-Sweden-Models
[-]
- lostmsu 72 days ago
  Why do they do training from scratch instead of starting off LLAMA 3 or something else?
smokracek 72 days ago
First thing I notice is that Finnish is part of a completely different language family from the other Nordic languages and English (Uralic vs. Indo-European). I wonder to what extent this affects the effectiveness of their low-resource training. Finnish is highly agglutinative, adding prefixes and suffixes to modify a root. My (amateur) take is that the tokenization and attention patterns may differ a lot? Would love to see more educated people than I discuss this.
[-]
- ghnws 72 days ago
  Then again the culture of Finland is very similar to the other nordics, which looks to be one of the reasons for the project.
- sandworm101 72 days ago
  >> to what extent this affects the effectiveness of
  The correct use of those words demonstrates that you are either not an AI, all of them being trained on so much bad language, or are an AI from a more perfect future.
- anewhnaccount3 72 days ago
  Finnish is not so different dispute having different lineages. Even if we talk about morphology, sometimes it's simply that e.g. prepositions are affixed to the end of a word big whoop. There are many dimensions to language vairation. Finnish has a long history of contact with Scandi languages and a lot of borrowed words and logic. It would be good to have Estonian and possibly Baltic languages too.
  ETA: It is differentof course just perhaps not as much as people sometimes try to say. You can definitely ruffle some feathers with this one given the uniqueness of Finnish is pretty central to Finnish nationalism.
  [-]
  - Tor3 72 days ago
    As someone growing up in relatively close contact with Finnish, I can assure you that there's no real common ground between Scandinavian languages (Swedish, Danish, Norwegian) and Finnish. There are loan words, but they are few and far between and in any case does not make for any mutual understanding. I've been so much to Finland that I would really like to learn to at least understand the language, instead of relying of memorized names of foodstuff and the like. Just have to tackle Japanese first.. (and I consider that one an easier operation)
    [-]
    - TrackerFF 72 days ago
      We do have one small language, the Kven language(https://en.wikipedia.org/wiki/Kven_language), which is a sort of "Finnish structure, but with lots of borrowed Norwegian words. But for all intents and purposes, it very much sounds like Finnish.
      It basically sounds like the language a Finnish person that has lived their whole life in Norway, and then starts to mix words because they forgot the Finnish words.
      But that's about it. I know there are some other dialects, too, but these are all very small-scale languages that are either extinct, or will be extinct in some decades.
      Much easier for Finns to learn Swedish, that for Swedes to learn Finnish, IMO. I speak Norwegian and Finnish (lived in Finland when I was young)
    - anewhnaccount3 71 days ago
      Loan words from Scandi are more common than you think. Eg Hei is a common greeting, tykätä is a common verb. For nouns there is even a whole paradigm for loans, a large number of which are Scandi. They are not necessarily easy to recognise since they undergo sound changes eg plaasteri becomes laasteri
      [-]
      - akx 71 days ago
        Loan words, yes, but that has very little to do with the grammar and structure of the language. "Jag tycker om dig" [sv] translates to "Tykkään sinusta" [fi], which isn't anywhere near the Scandic.
        Also, it's "laastari", not "laasteri", so uh.
        [-]
        anewhnaccount3 58 days ago
        Oh well if I made a spelling mistake that obviously invalidates my whole point. Thank you for teaching me that Finnish is a very special language -- just like the Finns -- such an amazing an unique people ;)
larodi 72 days ago
The fact it was trained on HPC which covers 20% heat consumption in a city is absolutely wild and on par with how wild it is to have English/Nordic model.
“ Further emphasizing digital sovereignty, Viking is trained on the EuroHPC supercomputer LUMI, utilizing up to 4096 AMD MI-250X GPUs. LUMI is not only Europe’s most powerful supercomputer and the 5th most powerful in the world, but also the 3rd greenest supercomputer among the top 500 supercomputers. LUMI’s energy consumption is covered with power produced 100% with hydroelectricity, and the waste heat of LUMI will account for about 20 percent of the district heating in the surrounding city of Kajaani. ”
ganzuul 72 days ago
Great talking points. These are highly relevant subjects and I'm delighted we in the Nordics are keeping up with current developments. This work is important for preserving our culture.
I hope to see this used to generate a customized curriculum for each neurodiverse child so that we can live in a more equitable society.
[-]
- nwoli 72 days ago
  No offense but this reply is giving me such “generated by an LLM” vibes, I’m curious if it is
  [-]
  - ganzuul 72 days ago
    Who knows? Maybe I am an AI set to break encryption and I'm just hallucinating this, a training environment.
bangaladore 72 days ago
I have had this question. How much better would common LLMs (Llama, GPTN) be if they were only trained in one language? I have to assume they would perform better, but I might be wrong.
[-]
- coffeebeqn 72 days ago
  Perform better how? Knowing more languages gives you more data and different points of view rather than just using the English corpus and culture. When I ask chatgpt for a translation it seems to understand the meaning behind the words and finds the closest thing in the other language. The datasets seem to merge in some way
  [-]
  - ClarityJones 72 days ago
    Fair, but there may be overhead that doesn't need to exist. Certainly - for the limited compute my brain can accomplish - I could gain a deeper understanding of physics, if I focused on learning physics and didn't also have to simultaneously learn French.
    [-]
    - staticman2 72 days ago
      Wouldn't a better metaphor be if a child growing up in a bilingual household would be worse at physics as an adult? My guess would be growing up bilingual would have no impact.
      [-]
      - fikama 71 days ago
        This hypotetical kid would have the same size of brain/number of neurons anyway. In case of LLMs one could create a model that could be smaller thakns to not including the knowlegde about unecessary languages. A problem though could be with lacking traing data in other languages.
    - olddustytrail 72 days ago
      In the short term. In the longer term you'll understand concepts better when you're multilingual.
    - NhanH 72 days ago
      Human is not limited by computational power of brain (or rather, it is not the limitation we encounter). We are limited by time and the fact that our machinery degrades with time (aging).
- fermuch 72 days ago
  Just like adding code to textual models helps the model develop its reasoning capabilities, it seems like adding more languages helps in other areas too. What is needed is more good quality data to train on...
  [-]
  - nickpsecurity 72 days ago
    We also see humans get worse at specific things when they learn too much in general. There is a cut-off point to how many concepts we can learn with what skill. To be most effective, we have to specialize in the right things while continuing to acquire generalist knowledge. It’s a balancing act.
    These architectures are less capable than brains in many ways. So, we should expect them to have such trade-offs. An efficient one should work fine on English, mathematical notation, and a programming language. Maybe samples of others that illustrate unique concepts. I’m also curious how many languages or concepts you can add to a given architecture before its effectiveness starts dropping.
  - worldsayshi 72 days ago
    I guess you mean non-textual data then because the amount of text data they are being trained on ought to be enough for agi by now?
    Some kind of diminishing returns asymptote from text volume alone must have been hit a long time ago.
    [-]
    - imtringued 72 days ago
      It's not the amount that is wrong, it's how the model is trained. The model is trained for zero and few shot tasks. It is not surprising that it is performing well when you ask for that.
  - darby_eight 72 days ago
    > its reasoning capabilities
    To be clear, LLMs are not capable of reasoning.
    [-]
    - whimsicalism 72 days ago
      imo this is an uninteresting debate over semantics/metaphysics
      [-]
      - ganzuul 72 days ago
        Would you say a deontologist reasons? Evolution survives, but does it reason?
        Is it reasonable to show interest in something you call uninteresting?
        Was Gödel a reasonable man, starving to death in fear of being poisoned?
- richdougherty 72 days ago
  I can't track down the citation (either Google or DeepMind I think), but I remember reading research from a year or two ago how adding extra languages (French, German) improved English language performance. There may have also been an investigation about multi modality too, which found that adding vision or audio helped with text as well.
- phlip9 72 days ago
  Interesting thought. Maybe an LLM would build deeper insight with only one training language. On the other hand, the model might overfit with just one language -- maybe multilingual models generalize better?
- whimsicalism 72 days ago
  they would perform worse, i promise you
  [-]
  - ClarityJones 72 days ago
    I think this makes sense to the extent that an understanding of the differences between language helps separate out language from the underlying meaning. However... the models that are used receive input (i.e. translate from language), and to learn / understand, and to output information (i.e. re-encode into language), do not all have to be the same.
  - rangerelf 72 days ago
    "I promise you"?
    This is Hackernews, I would have expected data, not promises.
matsemann 72 days ago
Would an LLM trained on a smaller language have better cultural awareness etc than one trained in English? Because English is written all over the world by all kinds of people, an English LLM will average that (and for instance feel a bit off for an American). But a Norwegian LLM for instance, trained on a language mostly written by Norwegians, would that feel more natural to me in comparison?
jarbus 72 days ago
Would love to know more about their experience training on AMD GPUs. Was it just as seamless as using Cuda?
[-]
- ganzuul 72 days ago
  > To leverage the capabilities of MI250X, ROCm enables the use of GPU matrix cores through its rocBLAS and MIOpen library implementations that, in turn,are leveraged by PyTorch.
  - https://aclanthology.org/2023.emnlp-main.164.pdf
  https://github.com/TurkuNLP/
- imtringued 72 days ago
  They probably got a lot of hand holding from AMD.
  [-]
  - KeplerBoy 72 days ago
    Having access to enterprise GPUs on one of the biggest HPCs systems in Europe is probably enough.
    AMDs bad rep in AI is mostly due to flaky support of its consumer GPUs.
Bedon292 72 days ago
I cannot seem to find a link to the actual model from this page or anywhere on the website. This appears to be it: https://huggingface.co/LumiOpen/Viking-7B
halgir 72 days ago
> extends to include Danish, Finnish, Norwegian, Icelandic, Swedish
* cries in Faroese *
dmichulke 72 days ago
Is there something similar for romance or Germanic languages?
And how did they decide that, e.g., German or Dutch would make the model worse?
[-]
- frodo8sam 72 days ago
  I don't think they decided that, they included Finish which is completely unrelated to the other nordic languages. If they just picked languages that are related for cross learning including Dutch or German would have made more sense indeed.
  [-]
  - chymist 72 days ago
    The company is based in Finland, they started with Finnish
    [-]
    - frodo8sam 72 days ago
      I understand, not saying they did something wrong just pointing out the selection of languages was not because of them belonging to the same family but rather to serve a certain region.
  - coffeebeqn 72 days ago
    The root is unrelated but Finnish has certainly been shaped by Swedish and Russian and most recently English languages in the last 200 years
  - Jensson 72 days ago
    Including Finnish was probably just a political choice, since Finland and Sweden are very close politically, much closer than Germany or other areas with more similar languages.
    [-]
    - Asraelite 72 days ago
      This was done by a Finnish company and university. They would've included Finnish even without any political motivation.
- KeplerBoy 72 days ago
  The nordic languages are germanic except for finnish, but yeah finnish is an exception and I'd expect most small LLMs to struggle with it.
melenaboija 72 days ago
Although not nordic not including basque which I guess could also be considered an European low-resource language.
[-]
- ghnws 72 days ago
  I got the impression they are focusing on the nordic culture as much as the languages.
  >Silo AI and TurkuNLP are dedicated to developing models that not only excel in linguistic performance and inclusivity but are also attuned to local values and cultures.
ChrisArchitect 72 days ago
double slash in the shared link probably not ideal (though inconsequential)
https://www.silo.ai/blog/viking-7b-the-first-open-llm-for-th...