欢迎光临
感受代码之美

使用Elasticsearch Ingest Attachment Processor 插件处理文档

elasticsearch.jpg

1.环境

  • Elasticsearch 7.1.1;

2.安装

./bin/elasticsearch-plugin install ingest-attachment

然后重启各节点。

3.单个附件处理

3.1 创建索引

为单个附件处理创建个索引:

curl -XPUT 'http://10.47.0.96:9200/data_archives_attachment'

3.2 创建管道流

curl -H  'Content-Type: application/json' -XPUT 'http://10.47.0.96:9200/_ingest/pipeline/single_attachment' -d'
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field": "data",
        "indexed_chars" : -1,
        "ignore_missing" : true
      }
    }
  ]
}'

3.3 创建索引文档

curl -H  'Content-Type: application/json' -XPUT 'http://10.47.0.96:9200/data_archives_attachment/_doc/1?pipeline=single_attachment' -d'
{
  "filename" : "ipsum.txt",
  "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
}'

3.4 索引数据

curl -H  'Content-Type: application/json' -XGET 'http://10.47.0.96:9200/data_archives_attachment/_doc/1?pretty'

返回结果如下:

[inspur@localhost conf.d]$ curl -H  'Content-Type: application/json' -XGET 'http://10.47.0.96:9200/data_archives_attachment/_doc/1?pretty'
{
  "_index" : "data_archives_attachment",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "filename" : "ipsum.txt",
    "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
    "attachment" : {
      "content_type" : "text/plain; charset=ISO-8859-1",
      "language" : "en",
      "content" : "this is\njust some text",
      "content_length" : 24
    }
  }
}

4.多个附件处理

4.1 创建索引

为单个附件处理创建个索引:

curl -XPUT 'http://10.47.0.96:9200/data_archives_multi_attachment'

4.2 创建管道流

curl -H  'Content-Type: application/json' -XPUT 'http://10.47.0.96:9200/_ingest/pipeline/multi_attachment' -d'
{
  "description" : "Extract attachment information from arrays",
  "processors" : [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "target_field": "_ingest._value.attachment",
            "field": "_ingest._value.data",
            "indexed_chars" : -1,
            "ignore_missing" : true
          }
        }
      }
    }
  ]
}'

4.3 创建索引文档

curl -H  'Content-Type: application/json' -XPUT 'http://10.47.0.96:9200/data_archives_multi_attachment/_doc/1?pipeline=multi_attachment' -d'
{
  "attachments" : [
    {
      "filename" : "ipsum.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}'

4.4 索引数据

curl -H  'Content-Type: application/json' -XGET 'http://10.47.0.96:9200/data_archives_multi_attachment/_doc/1?pretty'

返回结果如下:

[inspur@localhost conf.d]$ curl -H  'Content-Type: application/json' -XGET 'http://10.47.0.96:9200/data_archives_multi_attachment/_doc/1?pretty'
{
  "_index" : "data_archives_multi_attachment",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "attachments" : [
      {
        "filename" : "ipsum.txt",
        "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
        "attachment" : {
          "content_type" : "text/plain; charset=ISO-8859-1",
          "language" : "en",
          "content" : "this is\njust some text",
          "content_length" : 24
        }
      },
      {
        "filename" : "test.txt",
        "data" : "VGhpcyBpcyBhIHRlc3QK",
        "attachment" : {
          "content_type" : "text/plain; charset=ISO-8859-1",
          "language" : "en",
          "content" : "This is a test",
          "content_length" : 16
        }
      }
    ]
  }
}

参考:

  1. Ingest Attachment Processor Plugin
  2. Ingest Attachment Processor Plugin
  3. Ingest Attachment Processor Plugin 基本用法
转载请注明来源:四个空格 » 使用Elasticsearch Ingest Attachment Processor 插件处理文档

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址