Guru: 2019

Wednesday, December 18, 2019

How to use Docker, GPU on Linux

Ubuntu 18.04

1. Install Docker: https://docs.docker.com/install/

$ sudo apt install docker.io

2. Install NVIDIA docker support: https://github.com/NVIDIA/nvidia-docker

# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

3. Grant permission

$ sudo setfacl -m user:pengy6:rw /var/run/docker.sock

4. Download a TensorFlow Docker image

$ docker pull tensorflow/tensorflow:1.15.0-gpu

$ docker pull pytorch/pytorch

5. Start a TensorFlow Docker container

$ docker run -it --rm pytorch/pytorch bash

6. Mount local folder (in order)

$ docker run --mount type=bind,source=source,target=target -it --rm pytorch/pytorch bash

Tuesday, December 10, 2019

Using Bibtex in MS Word 2015 (Mac OS)

BibTeX is a reference management software for formatting lists of references. It makes it easy to cite sources in a consistent manner. However, BibTex is typically used together with the LaTeX document preparation system.

On the other hand, Microsoft Word is still the most commonly used text editor and used in the group to share documents. Thus, I use BibTex to manage the bibliography and use MS Word to write documents.

One of the most famous reference manager software integrated into Word is EndNote, but unfortunately it is a very expensive and not open-source software. In Windows OS, there is a awesome plug-in called “Bib4Word”, but it is not usable in Max OS.

This post will describe how to use BibTex in MS Word in Mac for free.

To organize BibTex, I am using JabRef, which is an open-source and free reference manager. Please follow the following steps:

In JabRef, export the bibliography in MS Word 2008 xml format
Name the file Sources.xml (case sensitive)
In Mac OS with MS Word 2015, go to ~/Library/Containers/com.microsoft.word/Data/Library/Application Support/Microsoft/Office.
Rename the original Sources.xml file to Sources.xml.bak
Copy the generated Sources.xml in this folder
Restart MS Word.

Now you can choose citations from the list.

Build FreeRDP with smartcard on Ubuntu

Compilation instruction

Besides the suggested base dependencies, also install libpcsclite-dev

sudo apt install libpcsclite-dev

cmake -DWITH_PCSC=ON -DWITH_SSE2=ON .

Then follow the Build section in the compilation instruction.

Java data structure to use C implementation of word2vec

Data structure to use C implementation of word2vec. https://github.com/yfpeng/pengyifan-word2vec

Getting started

<dependency>

  <groupid>com.pengyifan.word2vec</groupid>

  <artifactid>pengyifan-word2vec</artifactid>

  <version>0.0.1</version>

</dependency>`

<repositories>

    <repository>

        <id>oss-sonatype</id>

        <name>oss-sonatype</name>

        <url>https://oss.sonatype.org/content/repositories/snapshots/</url>

        <snapshots>

            <enabled>true</enabled>

        </snapshots>

    </repository>

</repositories>

...

<dependency>

  <groupid>com.pengyifan.word2vec</groupid>

  <artifactid>pengyifan-word2vec</artifactid>

  <version>0.0.1-SNAPSHOT</version>

</dependency>

Webpage

The official word2vec webpage is available with all up-to-date instructions and code.

https://code.google.com/p/word2vec/

How to install pip and create virtualenv on Windows without administrative permission

Since Python 2.7.9, pip is released together with the Python. But I still cannot find it when our administrator installed the latest python on my PC. How to install pip, then create virtualenv on Windows without the administrative permission?

Download get-pip.py.
Run python get-pip.py --user. It will install pip locally.
Install virtualenv by running python -m pip install --user virtualenv. This will install the package of virtualenv
Run python -m virtualenv ENV to create a new virtual environment. ENV is a directory to place the new virtual environment.
Activate the script by running ENV/Scripts/activate.bat.

Now you are in the new virtualenv that is isolated from the python that was used to create it.
You can then install python package by running pip install ... as usual.

Install brat on Ubuntu (Apache2)

Install brat

Install Apache2
Download and unzip the brat v1.3
Move the folder to /var/www/brat
In /var/www/brat run ‘./install’. Follow the instruction to set username, password and email.
In etc/apache2/sites-avialble/000-default.conf add

Alias /brat "/var/www/brat"

<Directory "/var/www/brat">

    Options +ExecCGI

    AddHandler cgi-script .cgi

    # AddHandler fastcgi-script fcgi

    AllowOverride Options Indexes FileInfo Limit

    AddType application/xhtml+xml .xhtml

    AddType font/ttf .ttf

</Directory>

Restart apache2:

sudo service apache2 reload

Enable FastCGI

Install fastcgi by running ‘sudo apt-get install libapache2-mod-fastcgi’
Change etc/apache2/sites-avialble/000-default.conf

 
# AddHandler cgi-script .cgi

AddHandler fastcgi-script fcgi

Build FreeRDP with smartcard on Ubuntu

Compilation instruction
Besides the suggested base dependencies, also install libpcsclite-dev

sudo apt install libpcsclite-dev

cmake -DWITH_PCSC=ON -DWITH_SSE2=ON .

Then follow the Build section in the compilation instruction.

How to use xRDP for remote access to Ubuntu

By Yifan Peng | February 14, 201

Since the current desktop manager of Ubuntu does not work with xRDP, an alternative desktop manager needs to be installed. In this post, we will use XFCE.

Install xRDP and XFCE

sudo apt-get install xrdp xfce4 xfce4-terminal gnome-icon-theme-full tango-icon-theme

Configure xRDP

First, create an .xsession file in the home directory.

echo xfce4-session >~/.xsession

Then edit the startup file for xRDP /etc/xrdp/startwm.sh.

#!/bin/sh

if [ -r /etc/default/locale ]; then

  . /etc/default/locale

  export LANG LANGUAGE

fi

startxfce4

Restart xRDP

sudo service xrdp restart

Test xRDP

As an example, on Windows, start the Remote Desktop client (mstsc.exe) and enter the IP address of Ubuntu. Then you will see the login screen. After entering the Ubuntu username and password and click “OK”, a window will show the login process. Finally, you’ll have access to your Ubuntu machine.

Saturday, December 7, 2019

How to delete all .svn in a folder

By Yifan Peng | September 20, 2013

If you need to remove all .svn folders in a project, run

find . -name ".svn" -exec rm -rf {} ;

or more precisely, run

find . -type d -name '.svn' -print -exec rm -rf {} ;

The script starts from the current directory and searches recursively to the sub-folders. This script also works on other names by replacing “.svn”. But if the name is too general especially if you use the regular expression, you may end up by deleting a bunch of folders/files you don’t want to. A safer way is to run find first to verify all found folders/files are really what you intend to remove.

How to convert video using MEncoder

By Yifan Peng | September 26, 2013

MEncoder is a free command line video decoding, encoding, and filtering tool. As a by-product of MPlayer, it can convert all formats that Mplayer can understand.

install mencoder
```
sudo apt-get install mencode
```
avi → mp4
```
mencoder 1.avi -o 1.mp4 -oac copy -ovc lavc -lavcopts vcodec=mpeg1video -of mpeg
```
- o: output file name.
- oac: audio codecs for encoding. copy does not reencode but just copy compressed frames.
- ovc: video codecs for encoding. lavc means using one of libavcodec’s video codecs.
- lavcopts: libavcodec’s video codecs. Here we choose MPEG-1 video.
- of: output container formats. mpeg means MPEG-1 and MPEG-2 PS.
mov → avi
```
mencoder -oac mp3lame -ovc x264 1.mov -o 1.avi
```
- oac: mp3lame means encoding to VBR, ABR or CBR MP3 with LAME.
- ovc: x264 means using x264, MPEG-4 Advanced Video Coding (AVC), AKA H.264 codec.

Install Source Code Pro on Ubuntu

By Yifan Peng | September 28, 2013

Source code pro is a free set of monospace fonts created by Adobe. It is very suitable for text editors and terminal windows. The fonts are also available on various Google Web Fonts.

download: https://sourceforge.net/projects/sourcecodepro.adobe/files/
unzip file
mkdir -p ~/.fonts
move OpenType fonts (.otf) or TrueType files (.ttf) to ~/.fonts
sudo fc-cache

Find receipts in Gmail

By Yifan Peng | October 8, 2013

I just found that Gmail can automatically detect receipts from PayPal, Google Play and Google Checkout, order confirmations from eBay, Amazon and other shopping sites. Try to search label:^smartlabel_receipt in your Gmail!!!

How To Run Android 4.0 In Virtualbox (Linux)

By Yifan Peng | October 9, 2013

Sometimes I need to test my app or try some new apps on main Android devices available on the market. Of course I don’t want to install them on my working cellphone. Under such circumstances, I prefer installing a virtual machine on my PC and testing whatever on it. Here in this post, I provide a way to run Android in Virtualbox.

Choose an Android 4.0.3_r1 (20120518 build)
Install (import) the package into VirtualBox
Install Android SDK
- Download Android SDK
- Extract the file to “$ANDROID”, and go to $ANDROID/tools
- run ./android and install Android SDK Platform-tools
Set up NAT
- Open VirtualBox
- Go to buildroid VM’s Settings → Network
- Under Adapter 1 tab, choose Attached to: NAT.
- Expand Advanced and click Port Forwarding button.
- Click Insert new rule button, input 5555 into Host Port and Guest Port.
- Click OK to return to VirtualBox.
Install Google play
- Download Google apps
- go to $ANDROID/platform-tools, and run

adb connect localhost

adb push $PATH/buildroid-gapps-ics-20120317-signed.tgz /sdcard/

adb shell

su

mount -o remount,rw /system

tar -xvzf /sdcard/buildroid-gapps-ics-20120317-signed.tgz

mount -o remount,ro /system

reboot

Share folders via Samba without a password

By Yifan Peng | September 8, 2013

Switching from Windows to Linux, the first thing might be to share folders on Linux with Windows. That will enable transferring files easily. In Linux, such operation is not always trivial. On Ubuntu, We need to install the samba to share files/folders. Here are the instructions:

install the samba package:
sudo apt-get install samba
edit the configure file /etc/samba/smb.conf.
change information: security = share.

add new section:

[share] 
comment = Ubuntu File Server Share 
path = /srv/samba/share 
browsable = yes 
guest ok = yes 
guest only = yes
read only = no
writable = True 
create mask = 0755

create directory and change the permission:

sudo mkdir -p /srv/samba/share 
sudo chown nobody.nogroup /srv/samba/share/

restart the samba services to enable the new configuration:
```
sudo restart smbd 
sudo restart nmbd
```

Friday, December 6, 2019

Mount Google Filestorage to Google VM

sudo apt-get -y install nfs-common
sudo mkdir /mnt/test
sudo mount xxx.xxx.xxx.xxx:/vol1 /mnt/test
sudo chmod go+rw /mnt/test

Permanent mount

nano /etc/fstab
add `xxx.xxx.xxx.xxx:/vol1 /mnt/test nfs defaults 0 0

Thursday, December 5, 2019

How to adjust height of an old Steelcase chair (454311M)?

By Yifan Peng | September 8, 2013

The chairs do not have the pneumatic mechanism to adjust the height of the chair. The height is adjusted by turning the chair in circles around the base. The base post contains a screw mechanism that gives 2-3″ of adjustment.

Book review: Introduction to Machine Learning (2ed)

Introduction to Machine Learning (2ed), by Ethem Alpaydin, MIT Press, 2010. ISBN 0-262-01243-X.

This book provides students, researchers, and developers a comprehensive introduction to the machine learning techniques. It is structured primarily as coursebook, which is a valuable teaching textbook for graduates or undergraduates. This book is also a good resources for self-study by researches and developers, but they have to be familiar with AI and advanced mathematics.

This book begins with an introduction chapter, followed by 18 chapters plus an appendix. Each chapter presents a stand-alone topic, beginning with a brief introduction and ending with notes. Therefore, the readers can quickly obtain an overview for the topic and catch the possible direction to further development in this subject area. The book covers a variety of machine learning techniques: supervised and unsupervised learning, parametric and nonparametric methods. All of these are followed by methods of how to assess and compare classification algorithms, combine multiple learners, and reinforce learning procedure.

As a book dealing with machine learning, it presents a varied collection of the different programming techniques; however some topics are not well organized. In Bayesian decision theory, the multiple inputs and outputs case is a complex area, but this chapter only briefly mentions the preliminary knowledge. In chapter 3, Naive Bayes classifier is also a really important classifier, but it is not given the amount of space that it deserves. Chapter 6 has a section on linear discriminate analysis (6.6), which is better to located in Chapter 10.

Generally, examples present at the end of each chapter are often too simple. They lack some mathematical problems which can only appear when the size of examples is relative large. Furthermore, for the list of reference at the end of each chapter, it is disappointing that almost all references are dated from 2000 or earlier, but the publication of this book is 2010.

(Mitchell, 1997) is the closest substitute for this book, and worth keeping in your library. (Russell and Norvig, 2003) covers important aspects of machine learning as well as many related concepts such as knowledge representation, and different search heuristics. Although much of this content has earlier been covered by (Mitchell, 1997) and less so by (Russell and Norvig, 2003), this book still stands out.

References

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New York.
Russell, S. J. and Norvig, P. (2003). Artificial Intelligence: A Modern Approach (2ed). Prentice Hall.

Journal Article Review

By Yifan Peng | September 8, 2013

This is the review of the current state of research in my domain with generalized observations and suggestions for research.

Computational Linguistics: This journal’s scope is quite wide, but it is highly referred in papers that are related to my research.
Knowledge-Based Systems focuses on systems that use knowledge-based techniques to support human decision-making, learning and action.
Bioinformatic: I will focus on the data and text mining part.
- Genome analysis
- Sequence analysis
- Phylogenetics
- Structural Bioinformatics
- Gene Expression
- Genetics and Population Analysis
- Systems Biology
- Data and Text Mining
- Databases and Ontologies
Nucleic Acids Research: Though computational biology is one of its scope, I didn’t find any papers talking about BioNLP in the current volume.
Information Processing & Management: Basic and applied research in information science: use of information; information retrieval (IR); knowledge organization and distribution.
Artificial Intelligence in Medicine: The information system issues seems related to my research application:
- medical knowledge engineering
- knowledge-based and agent-based systems
- computational intelligence in bio- and clinical medicine
- intelligent medical information systems
BMC Bioinformatics: This is the ideal journal. Some papers focus on the BioNLP field. Memo: they didn’t have “introduction” section in each article, but “background”.
Machine Learning Journal: Learning Problems and Learning Methods are very related.
Journal of Machine Learning Research: Learning Problems and Learning Methods are very related.
IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE/ACM TCBB): Though they anticipate the publications will represent a mixture of fundamental methodological, experimental and implementation issues, and serious application of methods, from the more specific topics of interest, I can find only things like protein analysis, DNA folding, but not NLP issues.
Computer Speech and Language: Algorithms and models for speech recognition and synthesis, is not related to my area. But the natural language processing techniques are helpful.
Journal of Molecular Biology: Computational biology, but not focus on NLP.
Journal of Artificial Intelligence: General journal, suitable for any topics in AI.

Best Markdown Editors for Windows, Linux, and the web

Markdown is a lightweight markup language, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”. An excellent Markdown Syntax Guide is by Daring Fireball. Sites such as GitHub, reddit, Diaspora, Stack Overflow, OpenStreetMap, and SourceForge use Markdown to facilitate discussion between users. GitHub uses “GitHub Flavored Markdown” (GFM) for messages, issues, and comments. It differs from standard Markdown (SM) in a few significant ways and adds some additional functionality.

As Markdown becomes more popular, new tools have been developed to cater to writing. This post won’t enumerate all or most tools at hand. Instead, I will just list a couple of my personally favorite Markdown editors on different platforms.

MarkdownPad is a full-featured Markdown editor for Windows. It supports instant HTML preview, spell check, custom css, etc.

ReText is very similar to MardowPad, but it works on Linux. It is written in Python using Qt libraries, therefore is able to run on any platforms. It supports full markdown syntax, live preview, HTML/pdf/odt/Google Docs export etc. To install the latest version on Ubuntu, use

>> sudo add-apt-repository ppa:mitya57/retext-beta
>> sudo apt-get update
>> sudo apt-get install retext

Markable is a remarkable online Markdown editor. It supports syntax highlighting, line preview, and file sharing with Dropbox and Evernote. For small task, it is my favorite editor.

Eclipse, gedit, Sublime, etc all provide plugins that add support for Makrdown. I will gradually add the links after I REALLY use them:

for Eclipse: http://www.winterwell.com/software/markdown-editor.php
for gedit: http://www.jpfleury.net/en/software/gedit-markdown.php
for Sublime: http://www.macstories.net/roundups/sublime-text-2-and-markdown-tips-tricks-and-links/

D-link DIR-628 connection dropouts

Problems

I am using a D-link DIR-628 and experiencing regular Wireless signal dropout. It is occurring on multiple wireless devices in my home, including a Macbook, an iPad, and an iPhone. It is hard for them to connect to the router when I come back home, and it frequently loses connection thereafter.

I am sure there is no problem of the incoming internet because my desktop that connects the internet via ethernet never dropouts. BTW, I am using a 5GHz wireless network, so there shouldn’t be many conflicts.

Suggestions

Ensure DNS IP addresses are being filled in under Setup/Internet/Manual.
Turn off ALL QoS GameFuel options under Advanced/QoS or Gamefuel.
Turn off Advanced DNS Services if you have this option under Setup/Internet/Manual.
Turn on DNS Relay under Setup/Networking.
Setup DHCP reserved IP addresses for all devices on the router under Setup/Networking
Set Firewall settings to Endpoint Independent for TCP and UDP.

How to fix Python SSL CERTIFICATE_VERIFY_FAILED

Here I explain how to fix Python SSL errors when downloading web pages using the https protocol in Python (e.g. by using the urllib, urllib2, httplib or requests. This error looks like (possibly with a line number different from 509):

self._sslobj.do_handshake()

 SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

Server certificate verification by default has been introduced to Python recently (in 2.7.9). This protects against man-in-the-middle attacks, and it makes the client sure that the server is indeed who it claims to be.

As a quick (and insecure) fix, you can turn certificate verification off, by:

Set PYTHONHTTPSVERIFY environment variable to 0. For example, run

export PYTHONHTTPSVERIFY=0

python your_script

PYTHONHTTPSVERIFY=0 python your_script

Alternatively, you can add this to your code before doing the https request

import os, ssl

if (not os.environ.get('PYTHONHTTPSVERIFY', '') and

    getattr(ssl, '_create_unverified_context', None)): 

    ssl._create_default_https_context = ssl._create_unverified_context