2021-11-23

Clickouse新特性调研 Release v21.11, 2021-11-09

向后兼容变化

改变了SQL/JSON函数中json_path和json参数的位置。
移除了MergeTree表的write_final_mark设置，这个值总是为true，新的版本会兼容所有的表，不需要其他的设置。
移除了bayesAB函数。
如果你已经开始使用clickhouse-keeper 特性，可以看一下这条内容。现在ClickHouse Keeper快照默认使用ZSTD编码的方式进行压缩，而不再是LZ4块压缩，可以通过compress_snapshots_with_zstd_format 设置来关闭此特性(在所有的集群副本上保持设置一致)。一般都是向后兼容的，但是也有可能出现不兼容的情况，当一个新节点向不能读取ZSTD格式快照的旧节点发送快照的时候(在恢复的时候可能会发生)。

Tornado 报错TypeError: Object of Type Datetime Is Not JSON Serializable

在使用Tornado从mysql查询出来数据，然后返回通过self.write(result)给前端，但是如果数据中包含日期时间类型的字段，比如timestamp类型，就会报TypeError: Object of Type Datetime Is Not JSON Serializable错误。

这个问题其实很好解决，看tornado的源码就可以知道，self.write()属于web.py中的方法，这个方法中有一段如下的代码

1
2
3

if isinstance(chunk, dict):
    chunk = escape.json_encode(chunk)
    self.set_header("Content-Type", "application/json; charset=UTF-8")

其中escape.json_encode(chunk)方法中调用了
json.dumps(value).replace("</", "<\\/")

这就知道如何做了，改变json.dumps对时间日期字段的解析方式就可以了,先定义一个解码类

import json
from decimal import Decimal
from datetime import datetime,date

class DateEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, datetime):
            return o.strftime('%Y-%m-%d %H:%M:%S')
        elif isinstance(o, date):
            return o.strftime("%Y-%m-%d")
        elif isinstance(o, Decimal):
            return float(o)
        else:
            return json.JSONEncoder.default(self, o)

然后将 def json_encode(value: Any) -> str: 方法的实现改为

1	return json.dumps(value,cls=DateEncoder).replace("</", "<\\/")

就可以了，重启一下服务，就可以了。

这种方式看似可以，实则有个大坑，会导致很多接口调用参数传递失败，还是不建议这么做，如果真的要修改字符字段的类型，在程序中修改吧。

2021-11-04

编程

Python为程序添加系统路径

将当前文件夹添加到Python系统路径下，也就是sys.path变量中

from os.path import abspath, join, dirname
filepath = abspath(dirname(__file__))
print("filepath={}".format(filepath))
sys.path.insert(0, filepath)
print(sys.path)

参考 https://python3-cookbook.readthedocs.io/zh_CN/latest/c10/p09_add_directories_to_sys_path.html

2021-11-02

编程

Axios CORS跨域问题解决

前端通过使用axios请求后台服务，默认是不带cookie上报的，要想带上，需要进行如下设置：

1	axios.defaults.withCredentials = true

然后后端服务也要做些调整，比如Tornado需要设置

1 2	self.set_header("Access-Control-Allow-Credentials", "true") self.set_header("Access-Control-Allow-Origin",origin)

如果不想在程序中设置的话，可以调整一下nginx的配置，比如某个location下增加如下配置：

add_header 'Access-Control-Allow-Credentials' "true";
add_header 'Access-Control-Allow-Origin' "$http_origin";
add_header 'Access-Control-Allow-Headers' '*';
add_header 'Access-Control-Max-Age' 1000;
add_header 'Access-Control-Allow-Methods' 'POST, GET, OPTIONS';
add_header 'Access-Control-Allow-Headers' 'authorization, Authorization, Content-Type, Access-Control-Allow-Origin, Access-Control-Allow-Headers, X-Requested-By, Access-Control-Allow-Methods';

2021-11-01

编程

Maven项目打包

Maven项目打包的方式有多种，我常用的就是使用assembly的方式进行打包。

1 首先需要在pom项目中加入如下依赖

    ...
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.2</version>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>
    
</project>

2 使用如下命令进行编译打包

1	mvn assembly:assembly

2021-10-26

编程

Nginx日志输出cookie的值

一输出全部cookie的信息

log_format data_log '$remote_addr - $remote_user [$time_local] '
                '"$request" $status $body_bytes_sent '
                '"$http_referer" "$http_user_agent" $cookie_C_I $http_x_forwarded_for'
                '"$http_cookie"'
                '"$upstream_addr" "$upstream_status" "$upstream_response_time" "$request_time" ';

通过$http_cookie就可以将请求的全部cookie获得。

二输出单个cookie

输出单个cookie也很简单，只需要为cookie key加上$cookie_前缀就可以了，例如有一个cookie的key为_tracker_user_id_,那么在nginx中可以通过$cookie__tracker_user_id_就可以获取到了。

log_format data_log '$remote_addr - $remote_user [$time_local] '
                '"$request" $status $body_bytes_sent '
                '"$http_referer" "$http_user_agent" $cookie_C_I $http_x_forwarded_for'
                '"$cookie__tracker_user_id_"'
                '"$upstream_addr" "$upstream_status" "$upstream_response_time" "$request_time"';

2021-09-06

编程

Kafka Cmake(原kafka Manager)的使用

lastest_version

下载压缩包

wget https://github.com/yahoo/CMAK/releases/download/3.0.0.5/cmak-3.0.0.5.zip

解压cmak

unzip cmak-3.0.0.5.zip

修改conf/application.conf

cmak.zkhosts=”hadoop101.eqxiu.com:2181” 改为真实的zk地址

下载open jdk11

wget https://download.java.net/openjdk/jdk11/ri/openjdk-11+28_linux-x64_bin.tar.gz

解压 open jdk11

tar -zxvf openjdk-11+28_linux-x64_bin.tar.gz

修改bin/cmak启动脚本

在文件的最上边加上JAVA_HOME的路径,比如:

1	JAVA_HOME=/data/software/jdk-11

启动cmak

1	./bin/cmak -Dhttp.port=10010

在浏览器上访问

1	http://your-ip-address:10010

参考
https://github.com/yahoo/CMAK
https://cloud.tencent.com/developer/article/1651137

2021-08-31

编程

如何使用Python连接hive

安装依赖

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

python脚本示例


from pyhive import hive

HOST="127.0.0.1"
PORT=10000
USERNAME="hadoop"
DATABASE="default"

conn=hive.Connection(host=HOST, port=PORT, username=USERNAME,database=DATABASE)
 
cursor = conn.cursor()
#cursor.execute("INSERT INTO TABLE test_out(name,count,time) SELECT name,count(1),to_date(time) FROM test GROUP BY name,to_date(time)")
cursor.execute("SELECT * FROM test")
for result in cursor.fetchall():
    print(result[2])

参考 https://segmentfault.com/a/1190000022358127

2021-08-16

编程

自定义Tornado请求日志格式

请求日志中有很多的404 HEAD请求，为了将这部分日志过滤掉，修改了一下tornado请求日志的逻辑。
代码如下：

自定义请求日志的处理逻辑

# 日志消息格式
def log_request(handler):
    if handler.get_status() < 400:
        log_method = tornado.web.access_log.info
    elif handler.get_status() < 500:
        log_method = tornado.web.access_log.info
    else:
        log_method = tornado.web.access_log.info
    request_time = 1000.0 * handler.request.request_time()

    request_status = handler.get_status()
    request_method = handler.request.method

    # log_method("HH status={} method={}".format(request_status, request_method ))
    if request_status == 404 and request_method == 'HEAD':
        #不输出日志
        pass
    else:
        log_method("%s|%d|%s|%.2f|%s",datetime.datetime.now(), request_status , handler._request_summary(), request_time, handler.request.headers.get("User-Agent", ""))

设置为Application.settings的log_function属性

1	app.settings["log_function"] = log_request

最后输出的日志格式：

输出的日志格式

参考 https://blog.wencan.org/2017/02/07/tornado-log-format/

2021-08-11

编程

Nginx根据参数转发

首先明确一点：Nginx的location不能否匹配到问号后的参数。参考https://www.zhihu.com/question/50190510

所以通过在location中的if条件来进行逻辑判断

location /p.gif {
        if ($args ~ "getip") {
            add_header Content-Type "text/plain;charset=utf-8";
            return 200 '$proxy_add_x_forwarded_for';
        }
        proxy_pass         http://big-da/log-server/push;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
}

url匹配规则参考https://www.cnblogs.com/woshimrf/p/nginx-config-location.html

location [=|~|~*|^~|@] /uri/ {
  ...
} 
= : 表示精确匹配后面的url
~ : 表示正则匹配，但是区分大小写
~* : 正则匹配，不区分大小写
^~ : 表示普通字符匹配，如果该选项匹配，只匹配该选项，不匹配别的选项，一般用来匹配目录
@ : "@" 定义一个命名的 location，使用在内部定向时，例如 error_page