git commit -am "informative message"
git log -p -- file
git show COMMIT:file
git checkout COMMIT
git reset HEAD~
and then for the real commit git commit -c ORIG_HEAD
.git st -s | grep '??' | cut -f2 -d ' ' | xargs git add
git restore --staged FILE
git reset
git remote add mine git@github.com:jmonlong/REPO.git
git config credential.helper store
git config --global alias.co checkout
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.br branch
git config --global user.email '<EMAIL>'
git config --global user.name 'Jean Monlong'
git branch
git branch -a
git remote prune origin
git checkout -b hotfix
git branch -u origin/hotflix
git co -t origin/hotfix
git merge hotfix
git branch -d hotfix
git push origin :hotfix
git clone --recursive https://github.com/XXX/XXX.git
git submodule update --init --recursive
Fetch new submodule commits:
## in the submodule
git fetch
git checkout <COMMIT>
## in the main repo
git add .
git commit -m "updated the submodule"
git push
I have an alias calling the following commands:
WD=`pwd`
for ff in `find . -maxdepth 5 -name .git`
do
GDIR=`dirname $ff`
echo $GDIR
cd $WD/$GDIR
git st -s
git st | grep ahead
done
cd $WD
A few trick I’d like to remember or try soon:
{
}
in expand or NOT to define wildcards.{data,\d+}
or using wildcard_constraints (within a rule or globally).temp()
are deleted when not needed by any rules anymore.touch()
for flag files.remote()
Configuration files used as config["samples"]
.
Can also be used as snakemake --config yourparam=1.5
.
For sample metadata, a tabular configuration can also be used using Pandas.
It’s also possible to define a separate config file for the cluster configuration (e.g. resources for each rule).
To use more flexible inputs, use a function and unpack
:
def input_func(wildcards):
return {'file1': '{wildcards.val}.txt'.format(wildcards=wildcards)}
rule myrule:
input:
unpack(input_func)
output:
"output.{val}.txt"
shell: " ... {input.file1} ..."
echo SAMPLENAME > temp.txt
bcftools reheader -s temp.txt input.vcf.gz > output.vcf.gz
bcftools annotate -x INFO,^FORMAT/GT input.vcf.gz
bcftools view -c 1 -s SAMPLENAME input.vcf.gz
bcftools +fill-tags input.vcf.gz -Oz -o output.vcf.gz --threads 4 -- -t AN,AC,AF
Lesson at programminghistorian.org
jq 'select(.field==value)'
jq '{id: .id, title: .title}'
jq '.array | @tsv'
vd can read many file formats, including TSV, CSV, JSON. I use it to explore TSV files as a more powerful less. It’s great to format wide columns but also to quickly explore summary stats of the table.
Keybindings:
Ctr-H
or z?
triggers the help page_
expand/contract column.z_ <N>
set current column width to N./
regex search in current columng/
regex search in all columnsn
/N
move to next/previous match[
/]
sort ascending/descending by current column|
select by regexp in current column,
select rows matching current cell in current columnz"
copy selected rows to new sheetF
toggle a frequency table/histogram of the current column.
Enter
to focus on a subset defined by a row in frequency table.I
toggle Describe sheet with summary statistics for each column..
toggle dot plot.
#
.!
first.rsync
is not completely intuitive to me.
Here are some of the commands I could make work.
To recurrently sync all the files that match the patterns in rsyncIncludes.txt
:
rsync -r --include='*/' --include-from=rsyncIncludes.txt --exclude='*' --prune-empty-dirs SRC DEST
To recurrently sync all the files that match the patterns in rsyncIncludes.txt
EXCEPT some with a specific pattern.
Practical example: all the R scripts but not the ones created by BatchJobs in *-files
directories:
rsync -r --exclude="*-files" --include='*/' --include='*.R' --exclude='*' --prune-empty-dirs SRC DEST
WORKDIR /root
sets the working directory.COPY PopSV_1.0.tar.gz ./
copies a file in the instance. The /
is important !RUN
.To run in the folder with the Dockerfile
.
docker build -t jmonlong/popsv-docker .
Ignore (big) files fro the build context using a .dockerignore
file.
To make setup a time zone for a ubuntu-based container, use tzdata in the Dockerfile:
RUN apt-get -y update && DEBIAN_FRONTEND=noninteractive apt-get -y install tzdata
ENV TZ=America/Los_Angeles
The time zone can also be changed in run command using -e TZ=America/Los_Angeles
.
List of time zone codes
To build smaller images, use less layers and a smaller base image (e.g. Alpine images).
To launch an interactive instance with a shared folder:
docker run -t -i -v /home/ubuntu/analysis1:/root/analysis1 jmonlong/popsv-docker
-t
and -i
are used for interactive run.-v
links folder in the host with folder in the image. It must be absolute paths.bash
as the command to force interactive, or --entrypoint /bin/bash
if the image uses an ENTRYPOINT.-u `id -u $USER`
.In Mac OS, I had some problems with the docker stopping because of memory issues. I fixed by changing:
docker-machine stop
VBoxManage modifyvm default --cpus 3
VBoxManage modifyvm default --memory 8192
docker-machine start
To remove all images:
docker rm -vf $(docker ps -a -q)
docker rmi -f $(docker images -a -q)
To clean cache too:
docker system prune -a
A few useful commands/trick/remainders for WDL (see full specs).
size(file_name, 'G')
to get the size in Gb (e.g. for dynamic disk space allocation). Used with ceil
/round
to get a round number.&&
, or=||
~{true="yes" false="no" boolean_value}
to inject different strings in the command section, depending on the value of a Boolean.~{sep="," array_value}
to “join” an array into a string.Int value = if othervalue < 5 then 1 else 4
glob("*.bam")
to cath output files for example.String out_prefix = basename(in_gam_file, ".gam")
to get the basename of a file and strip a suffix too.String out_prefix = sub(sub(sub(basename(in_gam_file), "\\.gz$", ""), "\\.gaf$", ""), "\\.gam$", "")
another way for when the extension is variable.File sel_file = select_first([OPTIONAL_INPUT_FILE, task.output_file])
to select the first defined arg in an array (e.g. when a task can recreate an optional input).flatten([array1, [new_element]])
to add new_element to an existing array1.Array[Pair[File,File]] paired_files_list = zip(file_list_1, file_list_2)
, e.g. to scatter across two lists. In the scatter, access with .left
/.right
.set -eux -o pipefail
to stop the job at the first error, even in a pipe.If an input is a (long) array, use it through a file with one element per line (see in WDL Spec):
input {
Array[File] in_vcf_list
}
command <<<
while read invcf
do
command $invcf
done < ~{write_lines(in_vcf_list)}
## or
command -f ~{write_lines(in_vcf_list)}
>>>
$@
the target$<
the first prerequisite$^
all prerequisites$(@D)
the directory part of $@
(works the same with <
,^
, etc).$(@F)
the filename part of $@
(works the same with <
,^
, etc).$(notdir src/foo.c)
returns foo.c.$(addsuffix .c,foo)
returns foo.c.$(basename dir/foo.test.c)
returns dir/foo.test.objects := $(wildcard *.o)
to list files in a variable.$(patsubst %.c,%.o,file.c)
to substitute file extensions$(subst from,to,text)
replaces all occurrences of from to to in text.$(word n,text)
returns the n-th word in text.Shell function: $(shell cat foo)
runs a shell command.
Basic commands:
cd
/ls
to change directory/list files in the remote locationget
to copy a file from the remote location to the local machineput
to copy a file from the local machine to the remote locationdelete
to remove a file in the remote locationlcd
to change directory in the local machinehelp
to list FTP commandsinkscape --file=in.svg --export-area-drawing --without-gui --export-pdf=out.pdf
ffmpeg -i in.m4a -acodec mp3 -ac 2 -ab 192k out.mp3
pandoc -o out.pdf --include-in-header h.tex URL
where h.tex could contain LaTeX packages declarations like \usepackage{fullpage}
.I ended up using Inkscape in command-line mode. The result is not so bad (better than the pdf2eps
etc).
inkscape document.pdf --export-eps=document.eps
Apparently, pdftops
is even better.
pdftops -eps document.pdf
In the end I had to use Acrobat Reader Pro… Still, converting the PDF using the following commands beforehand helped (otherwise Acrobat Reader Pro couldn’t convert it):
gs -dPDFA=1 -dBATCH -dNOPAUSE -dEmbedAllFonts=true -dSubsetFonts=false -dHaveTrueTypes=true -dPDFSETTINGS=/prepress -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=mainPDFA.pdf main.pdf
On the other hand, passing by a .ps
stage as recommended here, produced a smaller PDF that was directly PDF/A compliant (no need for Acrobat Reader Pro) but lost all cross-reference links :(
pdftops main.pdf main.ps
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=mainPDFA.pdf main.ps
To check for PDF/A compliance I used this online validator or Acrobat Reader Pro.
Another way to check for problems is to look at the emb column of pdffonts main.pdf
(should be all embedded) and the type column of pdfimages -list main.pdf
(should be all image).
Note: this is based only on my one-time experience with the PDF of my thesis.
On Ubuntu, I digitally add signatures using Xournal. It’s ugly but the output PDF is left unchanged and it’s easy to place the signature (from a .png file for example) and export to PDF.
Some articles have diagonal watermarks behind the text. It makes it more difficult to select and copy text. Personally, it’s annoying when I want to highlight/annotate a PDF.
Sometimes it’s as easy as uncompressing the PDF and replacing the text of the watermark. From StackExchange:
pdftk original.pdf output uncompressed.pdf uncompress
sed -e "s/watermarktextstring/ /" uncompressed.pdf > unwatermarked.pdf
pdftk unwatermarked.pdf output fixed.pdf compress
Sometimes, the watermark text is not there in the PDF (e.g. the letters are split or the font uses unicodes or something).
That’s my experience with Nature papers with the annoyingly big “ACCELERATED ARTICLE PREVIEW” watermark.
In this situation, I had to look at the uncompress PDF (e.g. cat uncompress.pdf | less
) to guess the block with the watermark and then mess it up (e.g. its text matrix field Tm):
sed -e "s/33.94110107 33.94110107 -33.94110107 33.94110107 5.8685999 122.48609924 Tm/0 0 0 0 0 0 Tm/g" < uncompress.pdf > unwatermarked.pdf
for a block that looked like:
...
BT
0 0 0 rg
/GS2 gs
/C2_1 1 Tf
0 Tc
0 Tw
33.94110107 33.94110107 -33.94110107 33.94110107 5.8685999 122.48609924 Tm
(^@$)Tj
.695 0 Td
(^@&)Tj
.722 0 Td
(^@&)Tj
.695 0 Td
(^@\()Tj
.61199998 0 Td
(^@/)Tj
.61199998 0 Td
(^@\()Tj
.695 0 Td
(^@5)Tj
.695 0 Td
(^@$)Tj
.639 0 Td
(^@7)Tj
.639 0 Td
(^@\()Tj
.695 0 Td
(^@')Tj
.361 -.85799998 Td
(^@^C)Tj
...