2018-04-20

Clickhouse官方入门资料

tutorial : https://clickhouse.yandex/tutorial.html

demo : https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/

stackoverflow : https://stackoverflow.com/questions/tagged/clickhouse

2018-04-18

Clickhouse单机安装

参考
https://blog.csdn.net/JIANG123456T/article/details/77674857
http://www.clickhouse.com.cn/topic/5a366e97828d76d75ab5d5a0

一. 环境

1、Centos 7.3
2、Rpm包下载
http://repo.red-soft.biz/repos/clickhouse/stable/el7/

Last Block Does Not Have Enough Number of Replicas

【问题解决办法】

可以通过调整参数 dfs.client.block.write.locateFollowingBlock.retries的值来增加retry的次数，可以将值设置为6，那么中间睡眠等待的时间为400ms、800ms、1600ms、3200ms、6400ms、12800ms，也就是说close函数最多要50.8秒才能返回。

但是该dfs.client.block.write.locateFollowingBlock.retries 在开源配置中不开放，调整参数也只能规避问题，若CPU负荷很大的情况，依然会存在该问题。

建议降低任务并发量或者控制cpu使用率来减轻网络的传输，使得DN能顺利向NN汇报block情况。

问题结论：

减轻系统负载。集群发生的时候负载很重，CPU的32个核（100%）全部分配跑MR认为了，至少要留20%的CPU

问题原因分析过程

2018-04-16

编程

Mac安装brew

在命令行输入

1	ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

参考 https://www.jianshu.com/p/4e80b42823d5

2018-04-16

编程

在Mac上错bash: Pip: Command Not Found

执行

1	pip install configparser

报错：bash: pip: command not found

解决办法：执行

1	sudo easy_install pip

输入密码

1	sudo pip install configparser

提示：
以后执行pip的时候都要带上sudo

2018-04-13

编程

Get TopN of All Groups After Group by Using Spark DataFrame

https://stackoverflow.com/questions/33655467/get-topn-of-all-groups-after-group-by-using-spark-dataframe

You can use rank window function as follows

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.{rank, desc}

val n: Int = ???

// Window definition
val w = Window.partitionBy($"user").orderBy(desc("rating"))

// Filter
df.withColumn("rank", rank.over(w)).where($"rank" <= n)

If you don’t care about ties then you can replace rank with rowNumber

2018-04-13

编程

在Mac上使用iTerm2登录堡垒机jumpserver

Profile -> Open Profiles… -> Edit Profiles…
点击左下角+号
输入Profile Name，比如jumper

右边Command下选择Command，然后输入

1	ssh -i /Users/yourname/Documents/yourname.pem yourname@12.26.20.16

点击右上边Advanced菜单，然后点击Triggers下的Edit按钮
在打开的Triggers窗口中，点击左下角的+号
在Regular Expression中输入 Enter*
Action选择Send Text…
Parameters写上你的堡垒机登录密码
关闭所有窗口
在Iterm2的一个窗口中选择右键New Tab,选择刚创建的jumper，然后回车就登录上了。

2018-04-11

编程

SpringBoot的知识点

1.一些配置相关的类以及Controller等需要放到@SpringBootApplication注释的启动类的同级或者下级目录中。
比如启动类

package com.maple;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class MapleApplication {

    public static void main(String[] args) {
        SpringApplication.run(MapleApplication.class, args);
    }
}

配置类：

package com.maple;

import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;

//加上注释@Component，可以直接在其他地方使用@Autowired来创建其实例对象  
@Component  
@ConfigurationProperties(prefix = "hive")
public class HiveConf {
    public static  String url;  
    public static  String user;  
    public static  String password;
    
	public String getUrl() {
		System.out.println("the url is " + url);
		return url;
	}
	public void setUrl(String url) {
		System.out.println("the url is " + url);
		this.url = url;
	}
	public String getUser() {
		return user;
	}
	public void setUser(String user) {
		this.user = user;
	}
	public String getPassword() {
		return password;
	}
	public void setPassword(String password) {
		this.password = password;
	}  
}

2018-04-11

编程

Parquet Schema 合并

类似 ProtocolBuffer，Avro，以及 Thrift，Parquet 也支持 schema 演变。用户可以从一个简单的 schema 开始，并且根据需要逐渐地向 schema 中添加更多的列。这样，用户最终可能会有多个不同但是具有相互兼容 schema 的 Parquet 文件。Parquet 数据源现在可以自动地发现这种情况，并且将所有这些文件的 schema 进行合并。

由于 schema 合并是一个性格开销比较高的操作，并且在大部分场景下不是必须的，从 Spark 1.5.0 开始默认关闭了这项功能。你可以通过以下方式开启 :

设置数据源选项 mergeSchema 为 true 当读取 Parquet 文件时（如下面展示的例子），或者
这是全局 SQL 选项 spark.sql.parquet.mergeSchema 为 true。
import spark.implicits._

// Create a simple DataFrame, store into a partition directory
val squaresDF = spark.sparkContext.makeRDD(1 to 5).map(i => (i, i * i)).toDF("value", "square")
squaresDF.write.parquet("data/test_table/key=1")
  
// Create another DataFrame in a new partition directory,
// adding a new column and dropping an existing column
val cubesDF = spark.sparkContext.makeRDD(6 to 10).map(i => (i, i * i * i)).toDF("value", "cube")
cubesDF.write.parquet("data/test_table/key=2")
  
// Read the partitioned table
val mergedDF = spark.read.option("mergeSchema", "true").parquet("data/test_table")
mergedDF.printSchema()
  
// The final schema consists of all 3 columns in the Parquet files together
// with the partitioning column appeared in the partition directory paths
// root
//  |-- value: int (nullable = true)
//  |-- square: int (nullable = true)
//  |-- cube: int (nullable = true)
//  |-- key: int (nullable = true)

2018-04-11

编程

Hive使用insert Into Values 这种方式插入中文数据乱码

插入数据时，要将中文数据转码为 iso8859-1
new String(“测试数据”.getBytes(),”iso8859-1”);