2019-02-24

Python将文件读取成不带换行符的字符串数组

Reading a file without newlines

You can read the whole file and split lines using str.splitlines:

1	temp = file.read().splitlines()

Or you can strip the newline by hand:

1	temp = [line[:-1] for line in file]

Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.

from https://stackoverflow.com/questions/12330522/reading-a-file-without-newlines

2019-02-24

编程

Nginx中include其他配置文件

/usr/local/nginx/conf/nginx.conf

#user  nobody;
worker_processes  1;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    log_format mylog_format '$remote_addr|$remote_user|$time_local'
                       '$request|$status|$bytes_sent'
                       '$http_referer|$http_user_agent|$gzip_ratio'
                       '#$bytes_sent|$connection|$connection_requests|$msec|$pipe|$request_length|$request_time|$status|$time_iso8601|$time_local';

    access_log logs/access.log mylog_format;

    server {
	listen 80;
	location / {
		proxy_pass http://127.0.0.1:8181;
	}
   }

server {
    listen 443;
    client_max_body_size 5M;
    server_name api.dict123.cn;
    ssl on;
    ssl_certificate   cert/1638702783821.pem;
    ssl_certificate_key  cert/1638702783821.key;
    ssl_session_timeout 5m;
    ssl_ciphers EBDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:DIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    location / {
        proxy_pass http://132.232.74.114:80;
    }
}

include /usr/local/nginx/conf/test1.conf;

}

如果test1.conf文件中不指定server_name，在使用nginx -s reload命令的时候，提示

1	nginx: [warn] conflicting server name "" on 0.0.0.0:80, ignored

这个时候需要在test1.conf的配置文件中加上server_name的配置，配置如下：

/usr/local/nginx/conf/test1.conf

server {
    listen 80;
    server_name hohode.com;
    location /test1 {
        alias /data/pserver/;
        autoindex on;
        autoindex_exact_size off;
        autoindex_localtime on;
        charset utf-8,gbk;
    }
}

2019-01-23

编程

删除Elasticsearch文档中的字段

from https://cinhtau.net/2017/09/01/remove-field-from-elasticsearch-document/

POST /test_index/log/_update_by_query
{
  "script": {
    "inline": "ctx._source.remove('source_type')",
    "lang": "painless"
  },
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "source_type"
          }
        }
      ]
    }
  }
}

2019-01-23

编程

CDH5 Hive on Spark搭建

Hive On Spark的搭建，参考此文 https://www.jianshu.com/p/f4f058d0b0a4

先使用 hadoop classpath 查询HADOOP_CLASSPATH的路径

在当前用户的.bashrc文件中添加:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

先将chrome浏览器的语言设置为英语
然后按照下边链接方式更改cdh的配置
https://stackoverflow.com/questions/29800239/how-to-set-up-dynamic-allocation-on-cloudera-5-in-yarn

http://xn--jlq582ax31c.xn--fiqs8s/post/242
按照上边的方式对链接中的参数进行修改

进入hive命令行
然后执行类似sql语句： select count(1) from eqxdb.mall_tag;
如果出现Query Hive on Spark job信息，并且计算结果正确的话，说明配置成功了
或者
hive –hiveconf hive.root.logger=DEBUG,console -e “select count(1) from eqxdb.mall_tag;”

2019-01-23

编程

为什么每次进入命令都要重新Source /Etc/profile 才能生效？

#编辑JDK8
export JAVA_HOME="/usr/java/java8"
#编辑maven
export M2_HOME="/opt/idea-IU-162.1121.32/plugins/maven/lib/maven3"
#编辑PATH
export PATH="$JAVA_HOME/bin:$M2_HOME/bin:$PATH"

这是我的/etc/profile末尾的配置，JDK是没有问题的，不用source，echo $JAVA_HOME能出来，问题是如果要用mvn，每次就要source一遍才行，maven我用的是IDEA自带的。

回答 1、也可以放在~/.bashrc里面。或者在~/.bashrc里面加一句

1	source /etc/profile

（采用此方法，已成功生效）
回答 2、你可以把这几条命令写在 /etc/bash里面就会自动执行了

from https://blog.csdn.net/lwplvx/article/details/79192182

2019-01-20

编程

VSCode使用Remote VSCode编辑远程服务器文件

本地的配置

1 在VSCode的扩展插件中找到Remote VSCode插件并进行安装；

2 配置ssh config文件，终端中输入：

1	vim ~/.ssh/config

粘帖一下内容（记得干掉注释内容）

Host dl_aws   # dl_aws 是一个名称，就类似于人类的名字一样
    HostName 18.212.101.21   # remote的地址
    User ubuntu    # remote的名称，这个需要登陆到远端亲自确认
    ForwardAgent yes     # 默认yes就行
    RemoteForward 52698 127.0.0.1:52698   # 前一个52698表示远端的端口，后一个是local的接收地址和端口，这个端口设置为和2步骤中的port一样，这样就可以在VSCode中进行上传和下载了
    IdentityFile /Users/xxx/.ssh/id_rsa.pub  # 这个表示是密匙文件，类比于平时接触的git下ssh的公匙

Hive使用动态分区插入数据

往hive分区表中插入数据时，如果需要创建的分区很多，比如以表中某个字段进行分区存储，则需要复制粘贴修改很多sql去执行，效率低。因为hive是批处理系统，所以hive提供了一个动态分区功能，其可以基于查询参数的位置去推断分区的名称，从而建立分区。

1.创建一个单一字段分区表

1
2
3

hive>
   create table dpartition(id int ,name string )
   partitioned by(ct string  );

Create Separate Columns From Array Column in Spark Dataframe in Scala

scala> import org.apache.spark.sql.Column
scala> val df = Seq((Array(3,5,25), 3),(Array(2,7,15),4),(Array(1,10,12),2)).toDF("column1", "column2")
df: org.apache.spark.sql.DataFrame = [column1: array<int>, column2: int]

scala> def getColAtIndex(id:Int): Column = col(s"column1")(id).as(s"column1_${id+1}")
getColAtIndex: (id: Int)org.apache.spark.sql.Column

scala> val columns: IndexedSeq[Column] = (0 to 2).map(getColAtIndex) :+ col("column2") //Here, instead of 2, you can give the value of n
columns: IndexedSeq[org.apache.spark.sql.Column] = Vector(column1[0] AS `column1_1`, column1[1] AS `column1_2`, column1[2] AS `column1_3`, column2)

scala> df.select(columns: _*).show
+---------+---------+---------+-------+
|column1_1|column1_2|column1_3|column2|
+---------+---------+---------+-------+
|        3|        5|       25|      3|
|        2|        7|       15|      4|
|        1|       10|       12|      2|
+---------+---------+---------+-------+