git commit -am "informative message"git log -p -- filegit show COMMIT:filegit checkout COMMITgit reset HEAD~ and then for the real commit git commit -c ORIG_HEAD.git st -s | grep '??' | cut -f2 -d ' ' | xargs git addgit restore --staged FILEgit resetgit remote add mine git@github.com:jmonlong/REPO.gitgit config credential.helper storegit config --global alias.co checkout
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.br branch
git config --global user.email '<EMAIL>'
git config --global user.name 'Jean Monlong'
git branchgit branch -agit remote prune origingit checkout -b hotfixgit branch -u origin/hotflixgit co -t origin/hotfixgit merge hotfixgit branch -d hotfixgit push origin :hotfixgit clone --recursive https://github.com/XXX/XXX.gitgit submodule update --init --recursiveFetch new submodule commits:
## in the submodule
git fetch
git checkout <COMMIT>
## in the main repo
git add .
git commit -m "updated the submodule"
git push
I have an alias calling the following commands:
WD=`pwd`
for ff in `find . -maxdepth 5 -name .git`
do
GDIR=`dirname $ff`
echo $GDIR
cd $WD/$GDIR
git st -s
git st | grep ahead
done
cd $WD
A few trick I’d like to remember or try soon:
{ } in expand or NOT to define wildcards.{data,\d+} or using wildcard_constraints (within a rule or globally).temp() are deleted when not needed by any rules anymore.touch() for flag files.remote()Configuration files used as config["samples"].
Can also be used as snakemake --config yourparam=1.5.
For sample metadata, a tabular configuration can also be used using Pandas.
It’s also possible to define a separate config file for the cluster configuration (e.g. resources for each rule).
To use more flexible inputs, use a function and unpack:
def input_func(wildcards):
return {'file1': '{wildcards.val}.txt'.format(wildcards=wildcards)}
rule myrule:
input:
unpack(input_func)
output:
"output.{val}.txt"
shell: " ... {input.file1} ..."
echo SAMPLENAME > temp.txt
bcftools reheader -s temp.txt input.vcf.gz > output.vcf.gz
bcftools annotate -x INFO,^FORMAT/GT input.vcf.gz
bcftools view -c 1 -s SAMPLENAME input.vcf.gz
bcftools +fill-tags input.vcf.gz -Oz -o output.vcf.gz --threads 4 -- -t AN,AC,AF
bcftools view -i "STRLEN(REF)<30 & MAX(STRLEN(ALT))<30" input.vcf
Lesson at programminghistorian.org
jq 'select(.field==value)'jq '{id: .id, title: .title}'jq '.array | @tsv'vd can read many file formats, including TSV, CSV, JSON. I use it to explore TSV files as a more powerful less. It’s great to format wide columns but also to quickly explore summary stats of the table.
Keybindings:
Ctr-H or z? triggers the help page_ expand/contract column.z_ <N> set current column width to N./ regex search in current columng/ regex search in all columnsn/N move to next/previous match[/] sort ascending/descending by current column| select by regexp in current column, select rows matching current cell in current columnz| select by Python expressionz" copy selected rows to new sheetF toggle a frequency table/histogram of the current column.
Enter to focus on a subset defined by a row in frequency table.O)I toggle Describe sheet with summary statistics for each column.. toggle dot plot.
#.! first.To keep the same color palette/theme as my terminal, I set the theme to asciimono in my ~/.visidatarc (see manual):
options.theme = "asciimono"
rsync is not completely intuitive to me.
Here are some of the commands I could make work.
To recurrently sync all the files that match the patterns in rsyncIncludes.txt:
rsync -r --include='*/' --include-from=rsyncIncludes.txt --exclude='*' --prune-empty-dirs SRC DEST
To recurrently sync all the files that match the patterns in rsyncIncludes.txt EXCEPT some with a specific pattern.
Practical example: all the R scripts but not the ones created by BatchJobs in *-files directories:
rsync -r --exclude="*-files" --include='*/' --include='*.R' --exclude='*' --prune-empty-dirs SRC DEST
WORKDIR /root sets the working directory.COPY PopSV_1.0.tar.gz ./ copies a file in the instance. The / is important !RUN.To run in the folder with the Dockerfile.
docker build -t jmonlong/popsv-docker .
Ignore (big) files fro the build context using a .dockerignore file.
To make setup a time zone for a ubuntu-based container, use tzdata in the Dockerfile:
RUN apt-get -y update && DEBIAN_FRONTEND=noninteractive apt-get -y install tzdata
ENV TZ=America/Los_Angeles
The time zone can also be changed in run command using -e TZ=America/Los_Angeles.
List of time zone codes
To build smaller images, use less layers and a smaller base image (e.g. Alpine images).
To launch an interactive instance with a shared folder:
docker run -t -i -v /home/ubuntu/analysis1:/root/analysis1 jmonlong/popsv-docker
-t and -i are used for interactive run.-v links folder in the host with folder in the image. It must be absolute paths.bash as the command to force interactive, or --entrypoint /bin/bash if the image uses an ENTRYPOINT.-u `id -u $USER`.In Mac OS, I had some problems with the docker stopping because of memory issues. I fixed by changing:
docker-machine stop
VBoxManage modifyvm default --cpus 3
VBoxManage modifyvm default --memory 8192
docker-machine start
To remove all images:
docker rm -vf $(docker ps -a -q)
docker rmi -f $(docker images -a -q)
To clean cache too:
docker system prune -a
A few useful commands/trick/remainders for WDL (see full specs).
size(file_name, 'G') to get the size in Gb (e.g. for dynamic disk space allocation). Used with ceil/round to get a round number.&&, or=||~{true="yes" false="no" boolean_value} to inject different strings in the command section, depending on the value of a Boolean.~{sep="," array_value} to “join” an array into a string.Int value = if othervalue < 5 then 1 else 4glob("*.bam") to cath output files for example.String out_prefix = basename(in_gam_file, ".gam") to get the basename of a file and strip a suffix too.String out_prefix = sub(sub(sub(basename(in_gam_file), "\\.gz$", ""), "\\.gaf$", ""), "\\.gam$", "") another way for when the extension is variable.File sel_file = select_first([OPTIONAL_INPUT_FILE, task.output_file]) to select the first defined arg in an array (e.g. when a task can recreate an optional input).flatten([array1, [new_element]]) to add new_element to an existing array1.Array[Pair[File,File]] paired_files_list = zip(file_list_1, file_list_2), e.g. to scatter across two lists. In the scatter, access with .left/.right.set -eux -o pipefail to stop the job at the first error, even in a pipe.If an input is a (long) array, use it through a file with one element per line (see in WDL Spec):
input {
Array[File] in_vcf_list
}
command <<<
while read invcf
do
command $invcf
done < ~{write_lines(in_vcf_list)}
## or
command -f ~{write_lines(in_vcf_list)}
>>>
$@ the target$< the first prerequisite$^ all prerequisites$(@D) the directory part of $@ (works the same with <,^, etc).$(@F) the filename part of $@ (works the same with <,^, etc).$(notdir src/foo.c) returns foo.c.$(addsuffix .c,foo) returns foo.c.$(basename dir/foo.test.c) returns dir/foo.test.objects := $(wildcard *.o) to list files in a variable.$(patsubst %.c,%.o,file.c) to substitute file extensions$(subst from,to,text) replaces all occurrences of from to to in text.$(word n,text) returns the n-th word in text.Shell function: $(shell cat foo) runs a shell command.
Basic commands:
cd/ls to change directory/list files in the remote locationget to copy a file from the remote location to the local machineput to copy a file from the local machine to the remote locationdelete to remove a file in the remote locationlcd to change directory in the local machinehelp to list FTP commandsSettings, General, Config Editor (at the bottom), Set intl.date_time.pattern_override.date_short to a string yyyy-MM-dd.ACtrl+NCtrl+Shift+ACtrl+EnterCtrl+R or Ctrl+Shift+RCtrl+LLeft/RightCtrl+KCtrl+Shift+FB/Finkscape --file=in.svg --export-area-drawing --without-gui --export-pdf=out.pdfpandoc -o out.pdf --include-in-header h.tex URL where h.tex could contain LaTeX packages declarations like \usepackage{fullpage}.pandoc -o out.pdf --pdf-engine=weasyprint in.docx with WeasyPrint (installable through pipffmpeg -i in.m4a -acodec mp3 -ac 2 -ab 192k out.mp3
For example to avoid my tablet to lag.
ffmpeg -y -i large.mkv -vf "scale=-1:720" -c:v libx264 -crf 23 smaller.mkv
720 to the desired resolution-crf value for smaller file size (and video quality).I ended up using Inkscape in command-line mode. The result is not so bad (better than the pdf2eps etc).
inkscape document.pdf --export-eps=document.eps
Apparently, pdftops is even better.
pdftops -eps document.pdf
In the end I had to use Acrobat Reader Pro… Still, converting the PDF using the following commands beforehand helped (otherwise Acrobat Reader Pro couldn’t convert it):
gs -dPDFA=1 -dBATCH -dNOPAUSE -dEmbedAllFonts=true -dSubsetFonts=false -dHaveTrueTypes=true -dPDFSETTINGS=/prepress -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=mainPDFA.pdf main.pdf
On the other hand, passing by a .ps stage as recommended here, produced a smaller PDF that was directly PDF/A compliant (no need for Acrobat Reader Pro) but lost all cross-reference links :(
pdftops main.pdf main.ps
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=mainPDFA.pdf main.ps
To check for PDF/A compliance I used this online validator or Acrobat Reader Pro.
Another way to check for problems is to look at the emb column of pdffonts main.pdf (should be all embedded) and the type column of pdfimages -list main.pdf (should be all image).
Note: this is based only on my one-time experience with the PDF of my thesis.
On Ubuntu, I digitally add signatures using Xournal. It’s ugly but the output PDF is left unchanged and it’s easy to place the signature (from a .png file for example) and export to PDF.
Some articles have diagonal watermarks behind the text. It makes it more difficult to select and copy text. Personally, it’s annoying when I want to highlight/annotate a PDF.
Sometimes it’s as easy as uncompressing the PDF and replacing the text of the watermark. From StackExchange:
pdftk original.pdf output uncompressed.pdf uncompress
sed -e "s/watermarktextstring/ /" uncompressed.pdf > unwatermarked.pdf
pdftk unwatermarked.pdf output fixed.pdf compress
Sometimes, the watermark text is not there in the PDF (e.g. the letters are split or the font uses unicodes or something).
That’s my experience with Nature papers with the annoyingly big “ACCELERATED ARTICLE PREVIEW” watermark.
In this situation, I had to look at the uncompress PDF (e.g. cat uncompress.pdf | less) to guess the block with the watermark and then mess it up (e.g. its text matrix field Tm):
sed -e "s/33.94110107 33.94110107 -33.94110107 33.94110107 5.8685999 122.48609924 Tm/0 0 0 0 0 0 Tm/g" < uncompress.pdf > unwatermarked.pdf
for a block that looked like:
...
BT
0 0 0 rg
/GS2 gs
/C2_1 1 Tf
0 Tc
0 Tw
33.94110107 33.94110107 -33.94110107 33.94110107 5.8685999 122.48609924 Tm
(^@$)Tj
.695 0 Td
(^@&)Tj
.722 0 Td
(^@&)Tj
.695 0 Td
(^@\()Tj
.61199998 0 Td
(^@/)Tj
.61199998 0 Td
(^@\()Tj
.695 0 Td
(^@5)Tj
.695 0 Td
(^@$)Tj
.639 0 Td
(^@7)Tj
.639 0 Td
(^@\()Tj
.695 0 Td
(^@')Tj
.361 -.85799998 Td
(^@^C)Tj
...