LINE社の日本語言語モデルをRubyで試す

はじめに

LINE社の日本語言語モデルを手元のmacOSで試したくて、llama.cppのRuby bindingsを作ってる経験を活かして簡単なGPT-NeoXモデルのクライアントを作った。

モデルの準備

LINE社の言語モデルをggml形式に変換する。ggmlをcloneしてきて、変換用のPython スクリプトを動かす。自分の環境だけかもしれないが、この際、requirements.txtのものをインストールするだけでなく、protobuf v3.20.0も必要だった。

$ git clone https://github.com/ggerganov/ggml.git
$ cd ggml
$ pip install -U protobuf~=3.20.0
$ python -m pip install -r requirements.txt

LINE社の言語モデルはHugging Faceから取得できる。最終的にチャット的なことを試したいので、instruction tuningしたモデルを取得した。

$ git lfs install
$ git clone https://huggingface.co/line-corporation/japanese-large-lm-3.6b-instruction-sft

取得したモデルに対して変換スクリプトを実行する。ggml-model-f16.binというファイルが、ggml形式に変換したモデルになる。

$ python examples/gpt-neox/convert-h5-to-ggml.py japanese-large-lm-3.6b-instruction-sft 1
$ ls japanese-large-lm-3.6b-instruction-sft/ggml-model-f16.bin
japanese-large-lm-3.6b-instruction-sft/ggml-model-f16.bin

IRBで試す

LINE社の言語モデルはGPT-NeoXと呼ばれるものを利用している（日本語言語モデルのRinnaもそう）。 ggmlのexamplesのなかに、GPT-NeoXなモデルを読み込んで補間するものがあり、これのRuby bindingsを作った。それがGPTNeoXClientである。 gemコマンドでインストールできる。

$ gem install gpt_neox_client

もろもろ準備できたので、IRBで簡単に試す。 Hugging Faceのリポジトリにある四国の県名をたずねる例を試してみた。基本CPUで動くので、結果の出力までに時間はかかる。

irb(main):001:0> require 'gpt_neox_client'
=> true
irb(main):002:0> client = GPTNeoXClient.new(path: '/path/to/japanese-large-lm-3.6b-instruction-sft/ggml-model-f16.bin', seed: 123_456_789, n_threads: 8)
gpt_neox_model_load: loading model from '/path/to/japanese-large-lm-3.6b-instruction-sft/ggml-model-f16.bin' - please wait ...
gpt_neox_model_load: n_vocab = 51200
gpt_neox_model_load: n_ctx   = 2048
gpt_neox_model_load: n_embd  = 3072
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 30
gpt_neox_model_load: n_rot   = 96
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 1
gpt_neox_model_load: qntvr   = 0
gpt_neox_model_load: ggml ctx size = 9604.72 MB
gpt_neox_model_load: memory_size =   720.00 MB, n_mem = 61440
gpt_neox_model_load: ............................................. done
gpt_neox_model_load: model size =  7084.59 MB / num tensors = 364
=>
#<GPTNeoXClient:0x00000001081e38f0
...
irb(main):003:1* client.completions(
irb(main):004:1*   'ユーザー:四国の県名を全て列挙してください。<0x0A>システム:',
irb(main):005:1*   top_p: 0.9,
irb(main):006:1*   top_k: 1,
irb(main):007:1*   temperature: 0.7
irb(main):008:0> )
=> "ユーザー:四国の県名を全て列挙してください。<0x0A>システム:徳島県、香川県、愛媛県、高知県</s>"

チャットな感じにする

動作が確認できたので、チャット形式でやりとりできるようにする。readlineで入力を受け付けて、それをcompletionsメソッドに渡すだけで、それらしいものができる。出力にある「<0x0A>」は改行を表し「」は終端を表すので、それを置換してる。

require 'gpt_neox_client'
require 'readline'

MODEL_PATH = '/path/to/japanese-large-lm-3.6b-instruction-sft/ggml-model-f16.bin'

client = GPTNeoXClient.new(path: MODEL_PATH, seed: 123_456_789, n_threads: 8)

puts '---'

while (buf = Readline.readline('ユーザー: ', true))
  prompt = "ユーザー:#{buf}<0x0A>システム:"
  result = client.completions(
    prompt,
    top_p: 0.9,
    top_k: 1,
    temperature: 0.7
  ).gsub(prompt, '').gsub('<0x0A>', "\n").gsub('</s>', '')
  puts "システム: #{result}"
end

実行して、人生を聞いてみた。

$ ruby chat.rb
gpt_neox_model_load: loading model from '/path/to/japanese-large-lm-3.6b-instruction-sft/ggml-model-f16.bin' - please wait ...
gpt_neox_model_load: n_vocab = 51200
gpt_neox_model_load: n_ctx   = 2048
gpt_neox_model_load: n_embd  = 3072
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 30
gpt_neox_model_load: n_rot   = 96
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 1
gpt_neox_model_load: qntvr   = 0
gpt_neox_model_load: ggml ctx size = 9604.72 MB
gpt_neox_model_load: memory_size =   720.00 MB, n_mem = 61440
gpt_neox_model_load: ............................................. done
gpt_neox_model_load: model size =  7084.59 MB / num tensors = 364
---
ユーザー: よりよい人生を送る方法を教えてください。
システム: より良い人生を生きる方法は次のとおりです。

1.あなたのニーズと欲求に優先順位を付けます。
2.あなたのために働くためにあなたのライフスタイルを適応させます。
3.あなたの健康と幸福を優先します。
4.あなたの家族、友人、コミュニティとの前向きな関係を維持します。
5.あなたの価値観と優先順位に従って決定を下します。
6.あなたの目標と目的に集中してください。
7.あなたの目標と目的が達成されるまで、あなたのプロセスを継続的に改善します。

これらの7つのステップに従うことで、あなたはより良い人生を生きることができます。
ユーザー:

エッセンシャル思考とエフォートレス思考をまとめた様なことが出力された。いい感じですね。

エッセンシャル思考最少の時間で成果を最大にする

作者:グレッグ・マキューン
かんき出版

Amazon

エフォートレス思考努力を最小化して成果を最大化する

作者:グレッグ・マキューン
かんき出版

Amazon

おわりに

ggml形式なGPT-NeoXモデルのRubyクライアントを作って、LINE社の日本語言語モデルを試してみた。本当はRailsでいい感じのデモ作れるとカッコいいんでしょうけど、ここまでで満足してしまった。GPTNeoXClientは、シンプルなクライアントで、ggml形式のGPT-NeoXモデルの読み込みと補間しかできない。気が向いたら改良します。ちなみに、Windowsは、GitHub Actions上ではggmlのコードのコンパイルでコケたので、たぶん動かないです。

LINE社の言語モデルは、ライセンスがApache License 2.0なのが、太っ腹で最高です!!日本の大規模言語モデルの発展に寄与するぞ、という気概が伝わってきます。

洋食の日記

「だ・である」調ではなく「です・ます」調で書きはじめれば良かったなと後悔してる人のブログです